[DRBD-user] Split brain in a dual primary configuration

Jean-Francois Chevrette jfchevrette at iweb.com
Fri Oct 30 19:43:11 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

we have a Citrix XenServer two-nodes cluster on which both nodes has a 
local partition that is configured as a DRBD resource. The resource is 
set to become primary on both nodes simultaneously. XenServer uses LVM 
and it is my understanding that it works in a way that any LV will ever 
be in use on both hosts at the same this and thus ensuring consistency 
between our dual-primary hosts.

For the DRBD connectivity, both nodes are connected directly through a 
cross-over cable.

For testing purposes, we have unplugged the network interfaces and thus 
forced both nodes to become WFConnection and in a Primary/Unknown state. 
VMs on each node kept working as usual.

However, after reconnecting the network interfaces, both nodes became 
StandAlone and logs were showing that a Split-brain had been detected. 
It was my understanding that DRBD would have been able to sync OOS 
blocks from each nodes to the other one properly.

What is supposed to happen when nodes from a dual-primary configuration 
reconnects to each other?

Our configuration is as follow:

global {
   usage-count no;
}

common {
   protocol C;

   startup {
     become-primary-on both;
   }

   syncer {
     rate 33M;
     verify-alg crc32c;
     al-extents 1801;
   }
   net {
     cram-hmac-alg sha1;
     max-epoch-size 8192;
     max-buffers 8192;
     after-sb-0pri discard-zero-changes;
     after-sb-1pri discard-secondary;
     after-sb-2pri disconnect;
     allow-two-primaries;
   }

   disk {
     on-io-error detach;
     no-disk-flushes;
     no-disk-barrier;
     no-md-flushes;
   }
}

resource drbd0 {
   disk /dev/sda3;
   device /dev/drbd0;
   flexible-meta-disk internal;
   on node1 {
     address 10.10.0.1:7788;
   }
   on node2 {
     address 10.10.0.2:7788;
   }
}


Logs from when we reconnected both nodes:
block drbd0: Handshake successful: Agreed network protocol version 91
block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd0: conn( WFConnection -> WFReportParams )
block drbd0: Starting asender thread (from drbd0_receiver [7644])
block drbd0: data-integrity-alg: <not-used>
block drbd0: drbd_sync_handshake:
block drbd0: self 
95BA39C140141F17:ADE0E340AD8230BB:0CAA835AA97548CC:CF72ED70E8F22F57 
bits:160 flags:0
block drbd0: peer 
F83F651106A22A31:ADE0E340AD8230BB:0CAA835AA97548CC:CF72ED70E8F22F57 
bits:51795 flags:0
block drbd0: uuid_compare()=100 by rule 90
block drbd0: Split-Brain detected, dropping connection!
block drbd0: helper command: /sbin/drbdadm split-brain minor-0
block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 
0 (0x0)
block drbd0: conn( WFReportParams -> Disconnecting )
block drbd0: error receiving ReportState, l: 4!
block drbd0: asender terminated
block drbd0: Terminating asender thread
block drbd0: Connection closed
block drbd0: conn( Disconnecting -> StandAlone )
block drbd0: receiver terminated
block drbd0: Terminating receiver thread



Can anyone tell me why I am not getting the behavior I am expecting?


Regards,
-- 
Jean-François Chevrette [iWeb]





More information about the drbd-user mailing list