[DRBD-user] split brain detected when switching back to the 2node cluster from the DR node

Pierre LEBRECH pierre.lebrech at laposte.net
Tue Aug 4 18:58:15 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I always get a split brain when I switch the HA services back to the 2node cluster from my DR node.

Here are the steps I follow :

- HA services are on the DR node
- I stop these HA services
- I umount the data

The state of DRBD on node3 is as follow :

------------------------------
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root at hcns1, 2009-08-04 09:41:09

 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
    ns:0 nr:34083396 dw:34084032 dr:68168163 al:12 bm:2094 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:208
------------------------------

Then, on node1 :

- drbdadm primary r0
- I start the HA IP
- drbdadm --stacked up r0-U

At this point, every thing is OK. Here is the output of cat /proc/drbd :

------------------------------
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root at hans1, 2009-08-04 09:43:39
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:13706 nr:174 dw:15096 dr:102401147 al:30 bm:2170 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:244 dw:244 dr:416 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
------------------------------

Then, I set all things back (reset) :

on node1 :

- drbdadm --stacked down r0-U
- drbdadm secondary r0
- I stop the HA IP

The state on node1 is as follow :

------------------------------
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root at hans1, 2009-08-04 09:43:39
 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----
    ns:13707 nr:174 dw:15097 dr:102401147 al:30 bm:2186 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Unconfigured
------------------------------

on node3, I type these commands to reset the state :

- drbdadm secondary r0-U

Then, on node1 and node2, I start heartbeat normally.



Well, each time I follow theses steps, node3 gets a split-brain.

Where is the problem?




context : 3-node cluster, every node connected, HA services on node1, DRBD version 8.3.2 on linux 2.6.30.




More information about the drbd-user mailing list