Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I have found some issues with the following constellation: Two-Node Cluster with red-hat-cluster suite OS: centos 5 /dev/drbd0 on /dev/sda3 with primary/primary physical volume on /dev/drbd0 logical volume on this physical volume gfs (not gfs2) on the logical volume I want to run high available XEN-DomU's. The drbd has a dedicated GBit link with a crossover-cable. Always when I rebooted one node, drbd went into a split brain situation. After some late night struggling with this I came across the following: clvmd and gfs were started before drbd, and it seemed to me, that even if i had filtered out the /dev/sda3 partition for use with lvm, that some attempts to change data were made before drbd was started. I changed the order of starting to drbd - clvmd - gfs by editing the init-script dependencies. Now I have the split brain not on every reboot, but still on some. I have configured the following policies: net { allow-two-primaries; after-sb-0pri discard-younger-primary; after-sb-1pri consensus; after-sb-2pri violently-as0p; } Is this making sense? The last SB gave me this log-entries (I resolved it manually): node1, not rebooted: drbd0: conn( Unconnected -> WFConnection ) drbd0: conn( WFConnection -> WFReportParams ) drbd0: Handshake successful: DRBD Network Protocol version 86 drbd0: Discard younger/older primary did not found a decision Using discard-least-changes instead drbd0: Split-Brain detected, manually solved. Sync from this node drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) drbd0: Writing meta data super block now. drbd0: peer( Secondary -> Primary ) drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) drbd0: Began resync as SyncSource (will sync 17548 KB [4387 bits set]). drbd0: Writing meta data super block now. drbd0: pdsk( Inconsistent -> UpToDate ) drbd0: Writing meta data super block now. drbd0: Resync done (total 1 sec; paused 0 sec; 17548 K/sec) drbd0: conn( SyncSource -> Connected ) drbd0: Writing meta data super block now. node2, this was the one rebooted: drbd0: conn( Unconnected -> WFConnection ) drbd0: conn( WFConnection -> WFReportParams ) drbd0: Handshake successful: DRBD Network Protocol version 86 drbd0: Discard younger/older primary did not found a decision Using discard-least-changes instead drbd0: Split-Brain detected, dropping connection! drbd0: self 001F032C07DCF73D:B2D597D3ADD4C93B:16D238C367CE3B36:760E57ED858F7F09 drbd0: peer EB7952B94F8F01B5:B2D597D3ADD4C93B:16D238C367CE3B36:760E57ED858F7F09 drbd0: conn( WFReportParams -> Disconnecting ) drbd0: error receiving ReportState, l: 4! drbd0: meta connection shut down by peer. drbd0: asender terminated drbd0: tl_clear() drbd0: Connection closed drbd0: conn( Disconnecting -> StandAlone ) drbd0: receiver terminated drbd0: conn( StandAlone -> Unconnected ) drbd0: receiver (re)started drbd0: conn( Unconnected -> WFConnection ) drbd0: conn( WFConnection -> WFReportParams ) drbd0: Handshake successful: DRBD Network Protocol version 86 drbd0: Discard younger/older primary did not found a decision Using discard-least-changes instead drbd0: Split-Brain detected, manually solved. Sync from peer node drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) drbd0: Writing meta data super block now. drbd0: role( Secondary -> Primary ) drbd0: conn( WFBitMapT -> WFSyncUUID ) drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) drbd0: Began resync as SyncTarget (will sync 17548 KB [4387 bits set]). drbd0: Writing meta data super block now. drbd0: Resync done (total 1 sec; paused 0 sec; 17548 K/sec) drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) drbd0: Writing meta data super block now. Has someone a working solution for the mentioned scenario? Or will I have stay with getting split-brain-situations and resolving them manually? Best regards, Stefan