[DRBD-user] Problem with split-brain in drbd 0.8pre3

Wed Jul 26 14:23:04 CEST 2006

I have two nodes, with internal metadata. Simulating a network failure
when one node has a filesystem on the drbd resource mounted, naturally
gives me a split brain. However, im not able to recover from the
split-brain in an easy manner.
My question is why wont it sync when the secondary, node2, is outdated?
My command and output history:

[root at node1 ~]# modprobe drbd
[root at node2 ~]# modprobe drbd
[root at node1 ~]# drbdadm up all
[root at node2 ~]# drbdadm up all
[root at node1 ~]# drbdadm primary all
[root at node2 ~]# drbdadm primary all
[root at node1 ~]# cat /proc/drbd
version: 8.0pre3 (api:82/proto:80)
SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29
14:42:49
 0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/7 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
 1: cs:Unconfigured
[root at node1 ~]# mount /dev/drbdvg/test-root /mnt/ -text3
[root at node2 ~]# ifconfig eth0 down
[root at node2 ~]#
drbd0: PingAck did not arrive in time.
drbd0: short read expecting header on sock: r=-512
[root at node2 ~]# cat /proc/drbd
version: 8.0pre3 (api:82/proto:80)
SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29
14:42:49
 0: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown r---
    ns:0 nr:484 dw:484 dr:416 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/7 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
 1: cs:Unconfigured
[root at node2 ~]# ifconfig eth0 up
[root at node2 ~]#
drbd0: Split-Brain detected, dropping connection!
drbd0: error receiving ReportState, l: 4!
[root at node2 ~]# reboot
(later)
[root at node2 ~]# modprobe drbd
[root at node2 ~]# drbdadm outdate all
[root at node2 ~]# drbdadm up all
drbd0: Backing device has merge_bvec_fn()!
drbd0: No usable activity log found.
[root at node2 ~]# cat /proc/drbd
version: 8.0pre3 (api:82/proto:80)
SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29 14:42:49
 0: cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/7 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
 1: cs:Unconfigured

[root at node1 ~]# drbdadm connect all
[root at node2 ~]# drbd0: Split-Brain detected, dropping connection!
drbd0: error receiving ReportState, l: 4!
[root at node2 ~]# drbdadm down all
[root at node2 ~]# drbdadm create-md all

Valid meta-data already in place, recreate new?
[need to type 'yes' to confirm] yes

Creating meta data...
initialising activity log
initialising bitmap (3296 KB)
99%New drbd meta data block sucessfully created.

[root at node2 ~]# drbdadm up all
drbd0: No usable activity log found.
[root at node2 ~]# cat /proc/drbd
version: 8.0pre3 (api:82/proto:80)
SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29
14:42:49
 0: cs:WFConnection st:Secondary/Unknown ds:Inconsistent/DUnknown r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/7 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
 1: cs:Unconfigured
[root at node1 ~]# drbdadm connect all
[root at node2 ~]# cat /proc/drbd
version: 8.0pre3 (api:82/proto:80)
SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29 14:42:49
 0: cs:SyncTarget st:Secondary/Primary ds:Inconsistent/UpToDate r---
    ns:0 nr:18193408 dw:18193376 dr:0 al:0 bm:1110 lo:9 pe:1351 ua:12 ap:0
        [===>................] sync'ed: 16.9% (87656/105423)M
        finish: 0:14:15 speed: 104,896 (92,348) K/sec
        resync: used:4/7 hits:1137313 misses:1114 starving:0 dirty:0
changed:1114
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
 1: cs:Unconfigured

So is create-md the only solution when in a split-brain? I can't
invalidate node2 with drbd invalidate all since its not in connected
state, and connecting it just prints split brain and disconnects the nodes.

if it's of intrest, my conf looks like this:
global {
   usage-count yes;
  }
  common {
   syncer { rate 200M; }
  }
  resource r0 {
     protocol C;
     handlers {
      pri-on-incon-degr "halt -f";
      pri-lost-after-sb "halt -f";
      outdate-peer "/usr/lib/drbd/outdate-peer.sh on amd 192.168.22.11
192.168.23.11 on alf 192.168.22.12 192.168.23.12";
     }
     startup {
      degr-wfc-timeout 120;    # 2 minutes.
     }
     disk {
      on-io-error   pass_on;
     }
     net {
      allow-two-primaries;
      cram-hmac-alg "sha1";
      shared-secret "password";
      after-sb-0pri disconnect;
      after-sb-1pri disconnect;
      after-sb-2pri disconnect;
     }
     syncer {
      al-extents 257;
     }
     on node1.domain.com {
      device     /dev/drbd0;
      disk       /dev/md0;
      address    10.1.1.26:7788;
      flexible-meta-disk  internal;
     }
     on node2.domain.com {
      device    /dev/drbd0;
      disk      /dev/md0;
      address   10.1.1.30:7788;
      meta-disk internal;
     }
}

Best regards,
/Håkan