Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
i'm new to DRBD and have some problems with splitbrains every week or so.
I have a standard Active/Passive setup as given in the
drbd-documentation and this is my config file:
global {
usage-count no;
}
common {
syncer { rate 100M; }
protocol C;
}
resource mailstore {
startup {
wfc-timeout 0;
degr-wfc-timeout
120;
}
disk { on-io-error detach; no-disk-barrier; }
on primary.axigen.cluster {
device /dev/drbd1;
disk /dev/axigen/mailstore;
address 192.168.1.10:7791;
meta-disk internal;
}
on secondary.axigen.cluster {
device /dev/drbd1;
disk /dev/axigen/mailstore;
address 192.168.1.20:7791;
meta-disk internal;
}
}
After some days my cluster always changes his state as following:
[root at primary ~]# drbd-overview
1:mailstore StandAlone Primary/Unknown UpToDate/DUnknown r----
/var/opt/axigen ext3 99G 298M 98G 1%
[root at secondary ~]# drbd-overview
1:mailstore StandAlone Secondary/Unknown Outdated/DUnknown r----
I get the following messages in in dmesg on my primary node:
[root at primary ~]# dmesg |grep block
drbd: registered as block device major 147
block drbd1: Starting worker thread (from cqueue/1 [195])
block drbd1: disk( Diskless -> Attaching )
block drbd1: Found 4 transactions (23 active extents) in activity log.
block drbd1: Method to ensure write ordering: flush
block drbd1: max_segment_size ( = BIO size ) = 32768
block drbd1: drbd_bm_resize called with capacity == 209708728
block drbd1: resync bitmap: bits=26213591 words=409588
block drbd1: size = 100 GB (104854364 KB)
block drbd1: recounting of set bits took additional 2 jiffies
block drbd1: 184 KB (46 bits) marked out-of-sync by on disk bit-map.
block drbd1: disk( Attaching -> UpToDate )
block drbd1: Barriers not supported on meta data device - disabling
block drbd1: conn( StandAlone -> Unconnected )
block drbd1: Starting receiver thread (from drbd1_worker [3504])
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3517])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self
B0A76171352A5A3C:C23C1E60BAF36299:CDF18AFFD009B9F5:7F938649A9B876DD
bits:46 flags:0
block drbd1: peer
C23C1E60BAF36298:0000000000000000:CDF18AFFD009B9F4:7F938649A9B876DD
bits:0 flags:0
block drbd1: uuid_compare()=1 by rule 70
block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams ->
WFBitMapS ) pdsk( DUnknown -> UpToDate )
block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate ->
Inconsistent )
block drbd1: Began resync as SyncSource (will sync 184 KB [46 bits set]).
block drbd1: Resync done (total 1 sec; paused 0 sec; 184 K/sec)
block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent ->
UpToDate )
block drbd1: peer( Secondary -> Primary )
block drbd1: peer( Primary -> Secondary )
block drbd1: role( Secondary -> Primary )
block drbd1: sock was shut down by peer
block drbd1: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe
) pdsk( UpToDate -> DUnknown )
block drbd1: short read expecting header on sock: r=0
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: Creating new current UUID
block drbd1: Connection closed
block drbd1: conn( BrokenPipe -> Unconnected )
block drbd1: receiver terminated
block drbd1: Restarting receiver thread
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3517])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self
5A7A31FAC38B4C31:B0A76171352A5A3D:28B6F22789E39014:C23C1E60BAF36299
bits:0 flags:0
block drbd1: peer
B0A76171352A5A3C:0000000000000000:28B6F22789E39014:C23C1E60BAF36299
bits:0 flags:0
block drbd1: uuid_compare()=1 by rule 70
block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams ->
WFBitMapS ) pdsk( DUnknown -> UpToDate )
block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate ->
Inconsistent )
block drbd1: Began resync as SyncSource (will sync 0 KB [0 bits set]).
block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent ->
UpToDate )
block drbd1: sock was shut down by peer
block drbd1: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe
) pdsk( UpToDate -> DUnknown )
block drbd1: short read expecting header on sock: r=0
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: Creating new current UUID
block drbd1: Connection closed
block drbd1: conn( BrokenPipe -> Unconnected )
block drbd1: receiver terminated
block drbd1: Restarting receiver thread
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: State change failed: Need access to UpToDate data
block drbd1: state = { cs:WFConnection ro:Primary/Unknown
ds:UpToDate/DUnknown r--- }
block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown
ds:Outdated/DUnknown r--- }
block drbd1: role( Primary -> Secondary )
block drbd1: role( Secondary -> Primary )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3517])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self
F601A659931B404D:5A7A31FAC38B4C31:F52C25D8178F32E6:B0A76171352A5A3D
bits:2 flags:0
block drbd1: peer
CDEEDF66F24D3ECC:5A7A31FAC38B4C30:F52C25D8178F32E6:B0A76171352A5A3D
bits:1132 flags:0
block drbd1: uuid_compare()=100 by rule 90
block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
exit code 0 (0x0)
block drbd1: Split-Brain detected but unresolved, dropping connection!
block drbd1: helper command: /sbin/drbdadm split-brain minor-1
block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code
0 (0x0)
block drbd1: conn( WFReportParams -> Disconnecting )
block drbd1: error receiving ReportState, l: 4!
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: Connection closed
block drbd1: conn( Disconnecting -> StandAlone )
block drbd1: receiver terminated
block drbd1: Terminating receiver thread
[root at primary ~]#
And on my secondary:
[root at secondary ~]# dmesg |grep block
drbd: registered as block device major 147
block drbd1: Starting worker thread (from cqueue/1 [195])
block drbd1: disk( Diskless -> Attaching )
block drbd1: Found 4 transactions (46 active extents) in activity log.
block drbd1: Method to ensure write ordering: flush
block drbd1: max_segment_size ( = BIO size ) = 32768
block drbd1: drbd_bm_resize called with capacity == 209708728
block drbd1: resync bitmap: bits=26213591 words=409588
block drbd1: size = 100 GB (104854364 KB)
block drbd1: recounting of set bits took additional 3 jiffies
block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
block drbd1: disk( Attaching -> UpToDate )
block drbd1: Barriers not supported on meta data device - disabling
block drbd1: conn( StandAlone -> Unconnected )
block drbd1: Starting receiver thread (from drbd1_worker [3527])
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3538])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self
C23C1E60BAF36298:0000000000000000:CDF18AFFD009B9F4:7F938649A9B876DD
bits:0 flags:0
block drbd1: peer
B0A76171352A5A3C:C23C1E60BAF36299:CDF18AFFD009B9F5:7F938649A9B876DD
bits:46 flags:0
block drbd1: uuid_compare()=-1 by rule 50
block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams ->
WFBitMapT ) pdsk( DUnknown -> UpToDate )
block drbd1: conn( WFBitMapT -> WFSyncUUID )
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
exit code 0 (0x0)
block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate ->
Inconsistent )
block drbd1: Began resync as SyncTarget (will sync 184 KB [46 bits set]).
block drbd1: Resync done (total 1 sec; paused 0 sec; 184 K/sec)
block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent ->
UpToDate )
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
exit code 0 (0x0)
block drbd1: role( Secondary -> Primary )
block drbd1: role( Primary -> Secondary )
block drbd1: peer( Secondary -> Primary )
block drbd1: PingAck did not arrive in time.
block drbd1: peer( Primary -> Unknown ) conn( Connected ->
NetworkFailure ) pdsk( UpToDate -> DUnknown )
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: short read expecting header on sock: r=-512
block drbd1: Connection closed
block drbd1: conn( NetworkFailure -> Unconnected )
block drbd1: receiver terminated
block drbd1: Restarting receiver thread
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3538])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self
B0A76171352A5A3C:0000000000000000:28B6F22789E39014:C23C1E60BAF36299
bits:0 flags:0
block drbd1: peer
5A7A31FAC38B4C31:B0A76171352A5A3D:28B6F22789E39014:C23C1E60BAF36299
bits:0 flags:0
block drbd1: uuid_compare()=-1 by rule 50
block drbd1: peer( Unknown -> Primary ) conn( WFReportParams ->
WFBitMapT ) pdsk( DUnknown -> UpToDate )
block drbd1: conn( WFBitMapT -> WFSyncUUID )
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
exit code 0 (0x0)
block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate ->
Inconsistent )
block drbd1: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent ->
UpToDate )
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
exit code 0 (0x0)
block drbd1: Connected in w_make_resync_request
block drbd1: PingAck did not arrive in time.
block drbd1: peer( Primary -> Unknown ) conn( Connected ->
NetworkFailure ) pdsk( UpToDate -> DUnknown )
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: short read expecting header on sock: r=-512
block drbd1: Connection closed
block drbd1: conn( NetworkFailure -> Unconnected )
block drbd1: receiver terminated
block drbd1: Restarting receiver thread
block drbd1: receiver (re)started
block drbd1: role( Secondary -> Primary )
block drbd1: Creating new current UUID
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: role( Primary -> Secondary )
block drbd1: disk( UpToDate -> Outdated )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3538])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self
CDEEDF66F24D3ECC:5A7A31FAC38B4C30:F52C25D8178F32E6:B0A76171352A5A3D
bits:1132 flags:0
block drbd1: peer
F601A659931B404D:5A7A31FAC38B4C31:F52C25D8178F32E6:B0A76171352A5A3D
bits:2 flags:0
block drbd1: uuid_compare()=100 by rule 90
block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
block drbd1: meta connection shut down by peer.
block drbd1: conn( WFReportParams -> NetworkFailure )
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
exit code 0 (0x0)
block drbd1: Split-Brain detected but unresolved, dropping connection!
block drbd1: helper command: /sbin/drbdadm split-brain minor-1
block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code
0 (0x0)
block drbd1: conn( NetworkFailure -> Disconnecting )
block drbd1: error receiving ReportState, l: 4!
block drbd1: Connection closed
block drbd1: conn( Disconnecting -> StandAlone )
block drbd1: receiver terminated
block drbd1: Terminating receiver thread
[root at secondary ~]#
Could anyone please explain me what is happening here?
Thanks in advance
Anton