[DRBD-user] Sporadic splitbrain occurs after a week or more

Felix Frank ff at mpexnet.de
Tue Jan 11 10:11:05 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

a kernel log with timestamps would be a lot more useful here.

> 
> And on my secondary:
> 

<snip>

> block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent ->
> UpToDate )

Now you're good.

> block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
> block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
> exit code 0 (0x0)
> block drbd1: role( Secondary -> Primary )
> block drbd1: role( Primary -> Secondary )

Did you do that by hand? If not, who or what does this? I don't think a
node is supposed to do this.

> block drbd1: peer( Secondary -> Primary )
> block drbd1: PingAck did not arrive in time.
> block drbd1: peer( Primary -> Unknown ) conn( Connected ->
> NetworkFailure ) pdsk( UpToDate -> DUnknown )

Network failure. Correlate this to other logs.

> block drbd1: asender terminated
> block drbd1: Terminating asender thread
> block drbd1: short read expecting header on sock: r=-512
> block drbd1: Connection closed
> block drbd1: conn( NetworkFailure -> Unconnected )
> block drbd1: receiver terminated
> block drbd1: Restarting receiver thread
> block drbd1: receiver (re)started
> block drbd1: conn( Unconnected -> WFConnection )
> block drbd1: Handshake successful: Agreed network protocol version 94
> block drbd1: conn( WFConnection -> WFReportParams )
> block drbd1: Starting asender thread (from drbd1_receiver [3538])
> block drbd1: data-integrity-alg: <not-used>
> block drbd1: drbd_sync_handshake:
> block drbd1: self
> B0A76171352A5A3C:0000000000000000:28B6F22789E39014:C23C1E60BAF36299
> bits:0 flags:0
> block drbd1: peer
> 5A7A31FAC38B4C31:B0A76171352A5A3D:28B6F22789E39014:C23C1E60BAF36299
> bits:0 flags:0
> block drbd1: uuid_compare()=-1 by rule 50
> block drbd1: peer( Unknown -> Primary ) conn( WFReportParams ->
> WFBitMapT ) pdsk( DUnknown -> UpToDate )
> block drbd1: conn( WFBitMapT -> WFSyncUUID )
> block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
> block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
> exit code 0 (0x0)
> block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate ->
> Inconsistent )
> block drbd1: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
> block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
> block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent ->
> UpToDate )

You're good once more...

> block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
> block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
> exit code 0 (0x0)
> block drbd1: Connected in w_make_resync_request
> block drbd1: PingAck did not arrive in time.
> block drbd1: peer( Primary -> Unknown ) conn( Connected ->
> NetworkFailure ) pdsk( UpToDate -> DUnknown )

...and once again.

> block drbd1: asender terminated
> block drbd1: Terminating asender thread
> block drbd1: short read expecting header on sock: r=-512
> block drbd1: Connection closed
> block drbd1: conn( NetworkFailure -> Unconnected )
> block drbd1: receiver terminated
> block drbd1: Restarting receiver thread
> block drbd1: receiver (re)started
> block drbd1: role( Secondary -> Primary )

And this should not have happened. Your secondary is going primary while
unconnected. Hence splitbrain. Again - when and why does it do that? You
need to try and find out about that.

Cheers,
Felix



More information about the drbd-user mailing list