[DRBD-user] Split Brain Mode?

Ross S. W. Walker rwalker at medallion.com
Wed Jan 10 01:42:26 CET 2007


> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com 
> [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of 
> Clinton Rosander
> Sent: Tuesday, January 09, 2007 7:06 PM
> To: drbd-user at lists.linbit.com
> Subject: [DRBD-user] Split Brain Mode?
> 
> Hi,
> 
> I currently have a DRBD/Heartbeat setup consisting of
> two servers (db01 and db02), of which, the primary
> nodes (db01) role will be to run an instance of MySQL.
> While doing some maintenance, I powered down the
> primary node (db01), resulting the in the secondary
> node (db02) taking over the role of primary (as it
> should). 
> 
> The problem arose when I powered back up db01. It came
> back up as a secondary node, but does not recognize
> that the other node indeed exists. The primary (now
> db02) does not seem to see db01 either.
> 
> Can anyone please give me some insight on why this
> happened or what the solution is? I'm new to this
> failover solution and it seems like there should be
> something simple I'm missing here. The physical link
> between the servers is unchanged and the interfaces
> sharing that link can see each other. The following is
> the output that I currently see:
> 
> db02:
> 
>  cat /proc/drbd
> version: 0.7.21 (api:79/proto:74)
> SVN Revision: 2326 build by
> buildcentos at x8664-build.centos.org, 2006-10-07
> 05:47:44
> 0: cs:StandAlone st:Primary/Unknown ld:Consistent
        ^^^^^^^^^^

Maybe you set the resource to go standalone when the peer disconnects?

If you use drbdadm wait_connect all on the primary it should switch to
WFConnection state and allow the peer to connect.


>     ns:320 nr:1096624 dw:1098160 dr:29056 al:16 bm:867
> lo:0 pe:0 ua:0 ap:0
> 1: cs:StandAlone st:Primary/Unknown ld:Consistent
>     ns:128 nr:5192 dw:857945 dr:4180 al:0 bm:235 lo:0
> pe:0 ua:0 ap:0
> 
> db01:
> 
> cat /proc/drbd
> version: 0.7.21 (api:79/proto:74)
> SVN Revision: 2326 build by
> buildcentos at x8664-build.centos.org, 2006-10-07
> 05:47:44
> 0: cs:WFConnection st:Secondary/Unknown ld:Consistent
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
> 1: cs:WFConnection st:Secondary/Unknown ld:Consistent
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
> 
> 
> The following are logs from db01 during startup:
> 
> Starting DRBD resources:    Jan  9 15:41:06 sdcdb01
> kernel: drbd: initialised. Version: 0.7.21
> (api:79/proto:74)
> Jan  9 15:41:06 sdcdb01 kernel: drbd: SVN Revision:
> 2326 build by buildcentos at x8664-build.centos.org,
> 2006-10-07 05:47:44
> Jan  9 15:41:06 sdcdb01 kernel: drbd: registered as
> block device major 147
> [ d0 Jan  9 15:41:12 sdcdb01 kernel: drbd0: resync
> bitmap: bits=71366747 words=1115106
> Jan  9 15:41:12 sdcdb01 kernel: drbd0: size = 272 GB
> (285466986 KB)
> Jan  9 15:41:12 sdcdb01 kernel: klogd 1.4.1,
> ---------- state change ----------
> d1 s0 s1 n0 n1 ].
> Jan  9 15:41:14 sdcdb01 kernel: drbd0: 156 KB marked
> out-of-sync by on disk bit-map.
> Jan  9 15:41:14 sdcdb01 kernel: drbd0: Found 6
> transactions (276 active extents) in activity log.
> Jan  9 15:41:14 sdcdb01 kernel: drbd0: drbdsetup
> [4774]: cstate Unconfigured --> StandAlone
> Jan  9 15:41:14 sdcdb01 kernel: drbd1: resync bitmap:
> bits=15729636 words=245776
> Jan  9 15:41:14 sdcdb01 kernel: drbd1: size = 60 GB
> (62918541 KB)
> Jan  9 15:41:14 sdcdb01 kernel: drbd1: 32 KB marked
> out-of-sync by on disk bit-map.
> Jan  9 15:41:14 sdcdb01 kernel: drbd1: Found 6
> transactions (273 active extents) in activity log.
> Jan  9 15:41:14 sdcdb01 kernel: drbd1: drbdsetup
> [4778]: cstate Unconfigured --> StandAlone
> Jan  9 15:41:14 sdcdb01 kernel: drbd0: drbdsetup
> [4796]: cstate StandAlone --> Unconnected
> Jan  9 15:41:14 sdcdb01 kernel: drbd0: drbd0_receiver
> [4797]: cstate Unconnected --> WFConnection
> Jan  9 15:41:14 sdcdb01 kernel: drbd1: drbdsetup
> [4804]: cstate StandAlone --> Unconnected
> Jan  9 15:41:14 sdcdb01 kernel: drbd1: drbd1_receiver
> [4805]: cstate Unconnected --> WFConnection
> ..........
> ***************************************************************
> DRBD's startup script waits for the peer node(s) to
> appear.
> - In case this node was already a degraded cluster
> before the
>    reboot the timeout is 60 seconds.
> [degr-wfc-timeout]
> - If the peer was available before the reboot the
> timeout will
>    expire after 0 seconds. [wfc-timeout]
>    (These values are for resource 'cdbdata'; 0 sec ->
> wait forever)
> To abort waiting enter 'yes' [  90]:
> To abort waiting enter 'yes' [  94]:
> To abort waiting enter 'yes' [  94]:
> To abort waiting enter 'yes' [  95]:
> To abort waiting enter 'yes' [  95]:
> To abort waiting enter 'yes' [  95]:
> To abort waiting enter 'yes' [  95]:
> To abort waiting enter 'yes' [ 100]:Jan  9 15:42:58
> sdcdb01 ntpd[3532]: time reset -0.423527 s
> Jan  9 15:42:58 sdcdb01 ntpd[3532]: kernel time sync
> enabled 0001
>                               [ 105]:
> To abort waiting enter 'yes' [ 109]:
> To abort waiting enter 'yes' [ 109]:
> 
> 
> 
> Any input would be greatly appreciated.
> 
> Thanks,
> 
> Clint
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.




More information about the drbd-user mailing list