[DRBD-user] how to avoid drbd8 (primary/primary)+ clvmd + gfs => split brain ?

Stefan Marx stefan at linux-lunatix.org
Thu Oct 18 11:47:12 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I have found some issues with the following constellation:

Two-Node Cluster with red-hat-cluster suite
OS: centos 5
/dev/drbd0 on /dev/sda3 with primary/primary
physical volume on /dev/drbd0
logical volume on this physical volume
gfs (not gfs2) on the logical volume
I want to run high available XEN-DomU's.
The drbd has a dedicated GBit link with a crossover-cable.

Always when I rebooted one node, drbd went into a split brain situation.
After some late night struggling with this I came across the following:

clvmd and gfs were started before drbd, and it seemed to me, that even if
i had filtered out the /dev/sda3 partition for use with lvm, that some
attempts to change data were made before drbd was started.

I changed the order of starting to drbd - clvmd - gfs by editing the
init-script dependencies.

Now I have the split brain not on every reboot, but still on some. I have
configured the following policies:

  net {
        allow-two-primaries;
        after-sb-0pri discard-younger-primary;
        after-sb-1pri consensus;
        after-sb-2pri violently-as0p;

}

Is this making sense? The last SB gave me this log-entries (I resolved it
manually):

node1, not rebooted:

drbd0: conn( Unconnected -> WFConnection )
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: Discard younger/older primary did not found a decision
Using discard-least-changes instead
drbd0: Split-Brain detected, manually solved. Sync from this node
drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
pdsk( DUnknown -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: peer( Secondary -> Primary )
drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
drbd0: Began resync as SyncSource (will sync 17548 KB [4387 bits set]).
drbd0: Writing meta data super block now.
drbd0: pdsk( Inconsistent -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: Resync done (total 1 sec; paused 0 sec; 17548 K/sec)
drbd0: conn( SyncSource -> Connected )
drbd0: Writing meta data super block now.


node2, this was the one rebooted:

drbd0: conn( Unconnected -> WFConnection )
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: Discard younger/older primary did not found a decision
Using discard-least-changes instead
drbd0: Split-Brain detected, dropping connection!
drbd0: self
001F032C07DCF73D:B2D597D3ADD4C93B:16D238C367CE3B36:760E57ED858F7F09
drbd0: peer
EB7952B94F8F01B5:B2D597D3ADD4C93B:16D238C367CE3B36:760E57ED858F7F09
drbd0: conn( WFReportParams -> Disconnecting )
drbd0: error receiving ReportState, l: 4!
drbd0: meta connection shut down by peer.
drbd0: asender terminated
drbd0: tl_clear()
drbd0: Connection closed
drbd0: conn( Disconnecting -> StandAlone )
drbd0: receiver terminated
drbd0: conn( StandAlone -> Unconnected )
drbd0: receiver (re)started
drbd0: conn( Unconnected -> WFConnection )
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: Discard younger/older primary did not found a decision
Using discard-least-changes instead
drbd0: Split-Brain detected, manually solved. Sync from peer node
drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT )
pdsk( DUnknown -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: role( Secondary -> Primary )
drbd0: conn( WFBitMapT -> WFSyncUUID )
drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
drbd0: Began resync as SyncTarget (will sync 17548 KB [4387 bits set]).
drbd0: Writing meta data super block now.
drbd0: Resync done (total 1 sec; paused 0 sec; 17548 K/sec)
drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
drbd0: Writing meta data super block now.

Has someone a working solution for the mentioned scenario? Or will I have
stay with getting split-brain-situations and resolving them manually?

Best regards,

Stefan




More information about the drbd-user mailing list