Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
I have found some issues with the following constellation:
Two-Node Cluster with red-hat-cluster suite
OS: centos 5
/dev/drbd0 on /dev/sda3 with primary/primary
physical volume on /dev/drbd0
logical volume on this physical volume
gfs (not gfs2) on the logical volume
I want to run high available XEN-DomU's.
The drbd has a dedicated GBit link with a crossover-cable.
Always when I rebooted one node, drbd went into a split brain situation.
After some late night struggling with this I came across the following:
clvmd and gfs were started before drbd, and it seemed to me, that even if
i had filtered out the /dev/sda3 partition for use with lvm, that some
attempts to change data were made before drbd was started.
I changed the order of starting to drbd - clvmd - gfs by editing the
init-script dependencies.
Now I have the split brain not on every reboot, but still on some. I have
configured the following policies:
net {
allow-two-primaries;
after-sb-0pri discard-younger-primary;
after-sb-1pri consensus;
after-sb-2pri violently-as0p;
}
Is this making sense? The last SB gave me this log-entries (I resolved it
manually):
node1, not rebooted:
drbd0: conn( Unconnected -> WFConnection )
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: Discard younger/older primary did not found a decision
Using discard-least-changes instead
drbd0: Split-Brain detected, manually solved. Sync from this node
drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
pdsk( DUnknown -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: peer( Secondary -> Primary )
drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
drbd0: Began resync as SyncSource (will sync 17548 KB [4387 bits set]).
drbd0: Writing meta data super block now.
drbd0: pdsk( Inconsistent -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: Resync done (total 1 sec; paused 0 sec; 17548 K/sec)
drbd0: conn( SyncSource -> Connected )
drbd0: Writing meta data super block now.
node2, this was the one rebooted:
drbd0: conn( Unconnected -> WFConnection )
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: Discard younger/older primary did not found a decision
Using discard-least-changes instead
drbd0: Split-Brain detected, dropping connection!
drbd0: self
001F032C07DCF73D:B2D597D3ADD4C93B:16D238C367CE3B36:760E57ED858F7F09
drbd0: peer
EB7952B94F8F01B5:B2D597D3ADD4C93B:16D238C367CE3B36:760E57ED858F7F09
drbd0: conn( WFReportParams -> Disconnecting )
drbd0: error receiving ReportState, l: 4!
drbd0: meta connection shut down by peer.
drbd0: asender terminated
drbd0: tl_clear()
drbd0: Connection closed
drbd0: conn( Disconnecting -> StandAlone )
drbd0: receiver terminated
drbd0: conn( StandAlone -> Unconnected )
drbd0: receiver (re)started
drbd0: conn( Unconnected -> WFConnection )
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: Discard younger/older primary did not found a decision
Using discard-least-changes instead
drbd0: Split-Brain detected, manually solved. Sync from peer node
drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT )
pdsk( DUnknown -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: role( Secondary -> Primary )
drbd0: conn( WFBitMapT -> WFSyncUUID )
drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
drbd0: Began resync as SyncTarget (will sync 17548 KB [4387 bits set]).
drbd0: Writing meta data super block now.
drbd0: Resync done (total 1 sec; paused 0 sec; 17548 K/sec)
drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
drbd0: Writing meta data super block now.
Has someone a working solution for the mentioned scenario? Or will I have
stay with getting split-brain-situations and resolving them manually?
Best regards,
Stefan