Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
we have a Citrix XenServer two-nodes cluster on which both nodes has a
local partition that is configured as a DRBD resource. The resource is
set to become primary on both nodes simultaneously. XenServer uses LVM
and it is my understanding that it works in a way that any LV will ever
be in use on both hosts at the same this and thus ensuring consistency
between our dual-primary hosts.
For the DRBD connectivity, both nodes are connected directly through a
cross-over cable.
For testing purposes, we have unplugged the network interfaces and thus
forced both nodes to become WFConnection and in a Primary/Unknown state.
VMs on each node kept working as usual.
However, after reconnecting the network interfaces, both nodes became
StandAlone and logs were showing that a Split-brain had been detected.
It was my understanding that DRBD would have been able to sync OOS
blocks from each nodes to the other one properly.
What is supposed to happen when nodes from a dual-primary configuration
reconnects to each other?
Our configuration is as follow:
global {
usage-count no;
}
common {
protocol C;
startup {
become-primary-on both;
}
syncer {
rate 33M;
verify-alg crc32c;
al-extents 1801;
}
net {
cram-hmac-alg sha1;
max-epoch-size 8192;
max-buffers 8192;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
allow-two-primaries;
}
disk {
on-io-error detach;
no-disk-flushes;
no-disk-barrier;
no-md-flushes;
}
}
resource drbd0 {
disk /dev/sda3;
device /dev/drbd0;
flexible-meta-disk internal;
on node1 {
address 10.10.0.1:7788;
}
on node2 {
address 10.10.0.2:7788;
}
}
Logs from when we reconnected both nodes:
block drbd0: Handshake successful: Agreed network protocol version 91
block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd0: conn( WFConnection -> WFReportParams )
block drbd0: Starting asender thread (from drbd0_receiver [7644])
block drbd0: data-integrity-alg: <not-used>
block drbd0: drbd_sync_handshake:
block drbd0: self
95BA39C140141F17:ADE0E340AD8230BB:0CAA835AA97548CC:CF72ED70E8F22F57
bits:160 flags:0
block drbd0: peer
F83F651106A22A31:ADE0E340AD8230BB:0CAA835AA97548CC:CF72ED70E8F22F57
bits:51795 flags:0
block drbd0: uuid_compare()=100 by rule 90
block drbd0: Split-Brain detected, dropping connection!
block drbd0: helper command: /sbin/drbdadm split-brain minor-0
block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code
0 (0x0)
block drbd0: conn( WFReportParams -> Disconnecting )
block drbd0: error receiving ReportState, l: 4!
block drbd0: asender terminated
block drbd0: Terminating asender thread
block drbd0: Connection closed
block drbd0: conn( Disconnecting -> StandAlone )
block drbd0: receiver terminated
block drbd0: Terminating receiver thread
Can anyone tell me why I am not getting the behavior I am expecting?
Regards,
--
Jean-François Chevrette [iWeb]