[DRBD-user] Split Brain Mode?

Clinton Rosander clintrosander at yahoo.com
Wed Jan 10 01:05:41 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I currently have a DRBD/Heartbeat setup consisting of
two servers (db01 and db02), of which, the primary
nodes (db01) role will be to run an instance of MySQL.
While doing some maintenance, I powered down the
primary node (db01), resulting the in the secondary
node (db02) taking over the role of primary (as it
should). 

The problem arose when I powered back up db01. It came
back up as a secondary node, but does not recognize
that the other node indeed exists. The primary (now
db02) does not seem to see db01 either.

Can anyone please give me some insight on why this
happened or what the solution is? I'm new to this
failover solution and it seems like there should be
something simple I'm missing here. The physical link
between the servers is unchanged and the interfaces
sharing that link can see each other. The following is
the output that I currently see:

db02:

 cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by
buildcentos at x8664-build.centos.org, 2006-10-07
05:47:44
0: cs:StandAlone st:Primary/Unknown ld:Consistent
    ns:320 nr:1096624 dw:1098160 dr:29056 al:16 bm:867
lo:0 pe:0 ua:0 ap:0
1: cs:StandAlone st:Primary/Unknown ld:Consistent
    ns:128 nr:5192 dw:857945 dr:4180 al:0 bm:235 lo:0
pe:0 ua:0 ap:0

db01:

cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by
buildcentos at x8664-build.centos.org, 2006-10-07
05:47:44
0: cs:WFConnection st:Secondary/Unknown ld:Consistent
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
1: cs:WFConnection st:Secondary/Unknown ld:Consistent
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0


The following are logs from db01 during startup:

Starting DRBD resources:    Jan  9 15:41:06 sdcdb01
kernel: drbd: initialised. Version: 0.7.21
(api:79/proto:74)
Jan  9 15:41:06 sdcdb01 kernel: drbd: SVN Revision:
2326 build by buildcentos at x8664-build.centos.org,
2006-10-07 05:47:44
Jan  9 15:41:06 sdcdb01 kernel: drbd: registered as
block device major 147
[ d0 Jan  9 15:41:12 sdcdb01 kernel: drbd0: resync
bitmap: bits=71366747 words=1115106
Jan  9 15:41:12 sdcdb01 kernel: drbd0: size = 272 GB
(285466986 KB)
Jan  9 15:41:12 sdcdb01 kernel: klogd 1.4.1,
---------- state change ----------
d1 s0 s1 n0 n1 ].
Jan  9 15:41:14 sdcdb01 kernel: drbd0: 156 KB marked
out-of-sync by on disk bit-map.
Jan  9 15:41:14 sdcdb01 kernel: drbd0: Found 6
transactions (276 active extents) in activity log.
Jan  9 15:41:14 sdcdb01 kernel: drbd0: drbdsetup
[4774]: cstate Unconfigured --> StandAlone
Jan  9 15:41:14 sdcdb01 kernel: drbd1: resync bitmap:
bits=15729636 words=245776
Jan  9 15:41:14 sdcdb01 kernel: drbd1: size = 60 GB
(62918541 KB)
Jan  9 15:41:14 sdcdb01 kernel: drbd1: 32 KB marked
out-of-sync by on disk bit-map.
Jan  9 15:41:14 sdcdb01 kernel: drbd1: Found 6
transactions (273 active extents) in activity log.
Jan  9 15:41:14 sdcdb01 kernel: drbd1: drbdsetup
[4778]: cstate Unconfigured --> StandAlone
Jan  9 15:41:14 sdcdb01 kernel: drbd0: drbdsetup
[4796]: cstate StandAlone --> Unconnected
Jan  9 15:41:14 sdcdb01 kernel: drbd0: drbd0_receiver
[4797]: cstate Unconnected --> WFConnection
Jan  9 15:41:14 sdcdb01 kernel: drbd1: drbdsetup
[4804]: cstate StandAlone --> Unconnected
Jan  9 15:41:14 sdcdb01 kernel: drbd1: drbd1_receiver
[4805]: cstate Unconnected --> WFConnection
..........
***************************************************************
DRBD's startup script waits for the peer node(s) to
appear.
- In case this node was already a degraded cluster
before the
   reboot the timeout is 60 seconds.
[degr-wfc-timeout]
- If the peer was available before the reboot the
timeout will
   expire after 0 seconds. [wfc-timeout]
   (These values are for resource 'cdbdata'; 0 sec ->
wait forever)
To abort waiting enter 'yes' [  90]:
To abort waiting enter 'yes' [  94]:
To abort waiting enter 'yes' [  94]:
To abort waiting enter 'yes' [  95]:
To abort waiting enter 'yes' [  95]:
To abort waiting enter 'yes' [  95]:
To abort waiting enter 'yes' [  95]:
To abort waiting enter 'yes' [ 100]:Jan  9 15:42:58
sdcdb01 ntpd[3532]: time reset -0.423527 s
Jan  9 15:42:58 sdcdb01 ntpd[3532]: kernel time sync
enabled 0001
                              [ 105]:
To abort waiting enter 'yes' [ 109]:
To abort waiting enter 'yes' [ 109]:



Any input would be greatly appreciated.

Thanks,

Clint

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the drbd-user mailing list