Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, we have a Citrix XenServer two-nodes cluster on which both nodes has a local partition that is configured as a DRBD resource. The resource is set to become primary on both nodes simultaneously. XenServer uses LVM and it is my understanding that it works in a way that any LV will ever be in use on both hosts at the same this and thus ensuring consistency between our dual-primary hosts. For the DRBD connectivity, both nodes are connected directly through a cross-over cable. For testing purposes, we have unplugged the network interfaces and thus forced both nodes to become WFConnection and in a Primary/Unknown state. VMs on each node kept working as usual. However, after reconnecting the network interfaces, both nodes became StandAlone and logs were showing that a Split-brain had been detected. It was my understanding that DRBD would have been able to sync OOS blocks from each nodes to the other one properly. What is supposed to happen when nodes from a dual-primary configuration reconnects to each other? Our configuration is as follow: global { usage-count no; } common { protocol C; startup { become-primary-on both; } syncer { rate 33M; verify-alg crc32c; al-extents 1801; } net { cram-hmac-alg sha1; max-epoch-size 8192; max-buffers 8192; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; allow-two-primaries; } disk { on-io-error detach; no-disk-flushes; no-disk-barrier; no-md-flushes; } } resource drbd0 { disk /dev/sda3; device /dev/drbd0; flexible-meta-disk internal; on node1 { address 10.10.0.1:7788; } on node2 { address 10.10.0.2:7788; } } Logs from when we reconnected both nodes: block drbd0: Handshake successful: Agreed network protocol version 91 block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC block drbd0: conn( WFConnection -> WFReportParams ) block drbd0: Starting asender thread (from drbd0_receiver [7644]) block drbd0: data-integrity-alg: <not-used> block drbd0: drbd_sync_handshake: block drbd0: self 95BA39C140141F17:ADE0E340AD8230BB:0CAA835AA97548CC:CF72ED70E8F22F57 bits:160 flags:0 block drbd0: peer F83F651106A22A31:ADE0E340AD8230BB:0CAA835AA97548CC:CF72ED70E8F22F57 bits:51795 flags:0 block drbd0: uuid_compare()=100 by rule 90 block drbd0: Split-Brain detected, dropping connection! block drbd0: helper command: /sbin/drbdadm split-brain minor-0 block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) block drbd0: conn( WFReportParams -> Disconnecting ) block drbd0: error receiving ReportState, l: 4! block drbd0: asender terminated block drbd0: Terminating asender thread block drbd0: Connection closed block drbd0: conn( Disconnecting -> StandAlone ) block drbd0: receiver terminated block drbd0: Terminating receiver thread Can anyone tell me why I am not getting the behavior I am expecting? Regards, -- Jean-François Chevrette [iWeb]