[DRBD-user] drbd/hearbeat leads to brain-split
rb at megabit.net
Wed Oct 17 16:52:34 CEST 2007
I'm currently trying to setup a drbd/iscsi/heartbeat-environment. This seems to work out quite well:
- two boxes with drbd storage
- heartbeat to monitor links and to start iscsi-target etc.
Upon failure of the primary node, heartbeat starts iscsi-target on the secondary and makes it the primary node. This works great even while clients are writing to the filesystem via iscsi - the iscsi initiator simply reconnects to the cluster-IP and continues writing.
But as soon as node1 comes back, both nodes complain about a "split brain" situation and refuses to resync - although nothing has been written to the device on node1 since its disconnect! Shouldn't heartbeat handle this situation? On top of that, I'm not able to resync the devices without playing
around with various drbdadm commands (including taking both sides down completely - which would not be an acceptable solution in a production environment)
Both sides are equally configured debian etch systems, using heartbeat v2 packages and drbd8-packets from backports.org:
version: 8.0.4 (api:86/proto:86)
SVN Revision: 2947 build by root at nas02, 2007-10-16 13:43:43
Please note that the two systems do NOT have equal hardware componentes - it's just a test environment with different storage capabilities (~230GB vs. ~60GB)
below is a log extract and my heartbeat configuration:
dmesg output from nas02 (after nas01 has been started again):
r8169: eth1: link up
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: Considerable difference in lower level device sizes: 121720712s vs. 455185536s
drbd0: Split-Brain detected, dropping connection!
drbd0: self C765DF24A2676E31:DDFCF96A854616A7:B34DB3CFA71F57C2:15641BAC6660D448
drbd0: peer E785C3CFDDDC5C05:DDFCF96A854616A7:B34DB3CFA71F57C2:15641BAC6660D448
drbd0: conn( WFReportParams -> Disconnecting )
drbd0: error receiving ReportState, l: 4!
drbd0: meta connection shut down by peer.
drbd0: asender terminated
drbd0: Connection closed
drbd0: conn( Disconnecting -> StandAlone )
drbd0: receiver terminated
ha.cf (taken from nas02, besides the ucast-settings thay are identical):
ucast eth1 172.16.15.1
ucast eth0 192.168.0.30
nas01 192.168.0.34 drbddisk::storage iscsi-target MailTo::technik at megabit.net
and here is a quick "drawing" of the whole layout:
nas01 ClusterIP 192.168.0.34 nas02
Mit freundlichen Grüßen / with kind regards
Megabit Informationstechnik GmbH
Karstr.25 41068 Moenchengladbach Tel:02161/30898-0 Fax:-18
AG MG HRB 10141, GF: Dipl.-Ing. Thomas Tillig, Michael Benten
More information about the drbd-user