Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, i'm new to DRBD and have some problems with splitbrains every week or so. I have a standard Active/Passive setup as given in the drbd-documentation and this is my config file: global { usage-count no; } common { syncer { rate 100M; } protocol C; } resource mailstore { startup { wfc-timeout 0; degr-wfc-timeout 120; } disk { on-io-error detach; no-disk-barrier; } on primary.axigen.cluster { device /dev/drbd1; disk /dev/axigen/mailstore; address 192.168.1.10:7791; meta-disk internal; } on secondary.axigen.cluster { device /dev/drbd1; disk /dev/axigen/mailstore; address 192.168.1.20:7791; meta-disk internal; } } After some days my cluster always changes his state as following: [root at primary ~]# drbd-overview 1:mailstore StandAlone Primary/Unknown UpToDate/DUnknown r---- /var/opt/axigen ext3 99G 298M 98G 1% [root at secondary ~]# drbd-overview 1:mailstore StandAlone Secondary/Unknown Outdated/DUnknown r---- I get the following messages in in dmesg on my primary node: [root at primary ~]# dmesg |grep block drbd: registered as block device major 147 block drbd1: Starting worker thread (from cqueue/1 [195]) block drbd1: disk( Diskless -> Attaching ) block drbd1: Found 4 transactions (23 active extents) in activity log. block drbd1: Method to ensure write ordering: flush block drbd1: max_segment_size ( = BIO size ) = 32768 block drbd1: drbd_bm_resize called with capacity == 209708728 block drbd1: resync bitmap: bits=26213591 words=409588 block drbd1: size = 100 GB (104854364 KB) block drbd1: recounting of set bits took additional 2 jiffies block drbd1: 184 KB (46 bits) marked out-of-sync by on disk bit-map. block drbd1: disk( Attaching -> UpToDate ) block drbd1: Barriers not supported on meta data device - disabling block drbd1: conn( StandAlone -> Unconnected ) block drbd1: Starting receiver thread (from drbd1_worker [3504]) block drbd1: receiver (re)started block drbd1: conn( Unconnected -> WFConnection ) block drbd1: Handshake successful: Agreed network protocol version 94 block drbd1: conn( WFConnection -> WFReportParams ) block drbd1: Starting asender thread (from drbd1_receiver [3517]) block drbd1: data-integrity-alg: <not-used> block drbd1: drbd_sync_handshake: block drbd1: self B0A76171352A5A3C:C23C1E60BAF36299:CDF18AFFD009B9F5:7F938649A9B876DD bits:46 flags:0 block drbd1: peer C23C1E60BAF36298:0000000000000000:CDF18AFFD009B9F4:7F938649A9B876DD bits:0 flags:0 block drbd1: uuid_compare()=1 by rule 70 block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) block drbd1: Began resync as SyncSource (will sync 184 KB [46 bits set]). block drbd1: Resync done (total 1 sec; paused 0 sec; 184 K/sec) block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) block drbd1: peer( Secondary -> Primary ) block drbd1: peer( Primary -> Secondary ) block drbd1: role( Secondary -> Primary ) block drbd1: sock was shut down by peer block drbd1: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) block drbd1: short read expecting header on sock: r=0 block drbd1: asender terminated block drbd1: Terminating asender thread block drbd1: Creating new current UUID block drbd1: Connection closed block drbd1: conn( BrokenPipe -> Unconnected ) block drbd1: receiver terminated block drbd1: Restarting receiver thread block drbd1: receiver (re)started block drbd1: conn( Unconnected -> WFConnection ) block drbd1: Handshake successful: Agreed network protocol version 94 block drbd1: conn( WFConnection -> WFReportParams ) block drbd1: Starting asender thread (from drbd1_receiver [3517]) block drbd1: data-integrity-alg: <not-used> block drbd1: drbd_sync_handshake: block drbd1: self 5A7A31FAC38B4C31:B0A76171352A5A3D:28B6F22789E39014:C23C1E60BAF36299 bits:0 flags:0 block drbd1: peer B0A76171352A5A3C:0000000000000000:28B6F22789E39014:C23C1E60BAF36299 bits:0 flags:0 block drbd1: uuid_compare()=1 by rule 70 block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) block drbd1: Began resync as SyncSource (will sync 0 KB [0 bits set]). block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec) block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) block drbd1: sock was shut down by peer block drbd1: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) block drbd1: short read expecting header on sock: r=0 block drbd1: asender terminated block drbd1: Terminating asender thread block drbd1: Creating new current UUID block drbd1: Connection closed block drbd1: conn( BrokenPipe -> Unconnected ) block drbd1: receiver terminated block drbd1: Restarting receiver thread block drbd1: receiver (re)started block drbd1: conn( Unconnected -> WFConnection ) block drbd1: State change failed: Need access to UpToDate data block drbd1: state = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown r--- } block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown ds:Outdated/DUnknown r--- } block drbd1: role( Primary -> Secondary ) block drbd1: role( Secondary -> Primary ) block drbd1: Handshake successful: Agreed network protocol version 94 block drbd1: conn( WFConnection -> WFReportParams ) block drbd1: Starting asender thread (from drbd1_receiver [3517]) block drbd1: data-integrity-alg: <not-used> block drbd1: drbd_sync_handshake: block drbd1: self F601A659931B404D:5A7A31FAC38B4C31:F52C25D8178F32E6:B0A76171352A5A3D bits:2 flags:0 block drbd1: peer CDEEDF66F24D3ECC:5A7A31FAC38B4C30:F52C25D8178F32E6:B0A76171352A5A3D bits:1132 flags:0 block drbd1: uuid_compare()=100 by rule 90 block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0) block drbd1: Split-Brain detected but unresolved, dropping connection! block drbd1: helper command: /sbin/drbdadm split-brain minor-1 block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0) block drbd1: conn( WFReportParams -> Disconnecting ) block drbd1: error receiving ReportState, l: 4! block drbd1: asender terminated block drbd1: Terminating asender thread block drbd1: Connection closed block drbd1: conn( Disconnecting -> StandAlone ) block drbd1: receiver terminated block drbd1: Terminating receiver thread [root at primary ~]# And on my secondary: [root at secondary ~]# dmesg |grep block drbd: registered as block device major 147 block drbd1: Starting worker thread (from cqueue/1 [195]) block drbd1: disk( Diskless -> Attaching ) block drbd1: Found 4 transactions (46 active extents) in activity log. block drbd1: Method to ensure write ordering: flush block drbd1: max_segment_size ( = BIO size ) = 32768 block drbd1: drbd_bm_resize called with capacity == 209708728 block drbd1: resync bitmap: bits=26213591 words=409588 block drbd1: size = 100 GB (104854364 KB) block drbd1: recounting of set bits took additional 3 jiffies block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map. block drbd1: disk( Attaching -> UpToDate ) block drbd1: Barriers not supported on meta data device - disabling block drbd1: conn( StandAlone -> Unconnected ) block drbd1: Starting receiver thread (from drbd1_worker [3527]) block drbd1: receiver (re)started block drbd1: conn( Unconnected -> WFConnection ) block drbd1: Handshake successful: Agreed network protocol version 94 block drbd1: conn( WFConnection -> WFReportParams ) block drbd1: Starting asender thread (from drbd1_receiver [3538]) block drbd1: data-integrity-alg: <not-used> block drbd1: drbd_sync_handshake: block drbd1: self C23C1E60BAF36298:0000000000000000:CDF18AFFD009B9F4:7F938649A9B876DD bits:0 flags:0 block drbd1: peer B0A76171352A5A3C:C23C1E60BAF36299:CDF18AFFD009B9F5:7F938649A9B876DD bits:46 flags:0 block drbd1: uuid_compare()=-1 by rule 50 block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) block drbd1: conn( WFBitMapT -> WFSyncUUID ) block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) block drbd1: Began resync as SyncTarget (will sync 184 KB [46 bits set]). block drbd1: Resync done (total 1 sec; paused 0 sec; 184 K/sec) block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0) block drbd1: role( Secondary -> Primary ) block drbd1: role( Primary -> Secondary ) block drbd1: peer( Secondary -> Primary ) block drbd1: PingAck did not arrive in time. block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) block drbd1: asender terminated block drbd1: Terminating asender thread block drbd1: short read expecting header on sock: r=-512 block drbd1: Connection closed block drbd1: conn( NetworkFailure -> Unconnected ) block drbd1: receiver terminated block drbd1: Restarting receiver thread block drbd1: receiver (re)started block drbd1: conn( Unconnected -> WFConnection ) block drbd1: Handshake successful: Agreed network protocol version 94 block drbd1: conn( WFConnection -> WFReportParams ) block drbd1: Starting asender thread (from drbd1_receiver [3538]) block drbd1: data-integrity-alg: <not-used> block drbd1: drbd_sync_handshake: block drbd1: self B0A76171352A5A3C:0000000000000000:28B6F22789E39014:C23C1E60BAF36299 bits:0 flags:0 block drbd1: peer 5A7A31FAC38B4C31:B0A76171352A5A3D:28B6F22789E39014:C23C1E60BAF36299 bits:0 flags:0 block drbd1: uuid_compare()=-1 by rule 50 block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) block drbd1: conn( WFBitMapT -> WFSyncUUID ) block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) block drbd1: Began resync as SyncTarget (will sync 0 KB [0 bits set]). block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec) block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0) block drbd1: Connected in w_make_resync_request block drbd1: PingAck did not arrive in time. block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) block drbd1: asender terminated block drbd1: Terminating asender thread block drbd1: short read expecting header on sock: r=-512 block drbd1: Connection closed block drbd1: conn( NetworkFailure -> Unconnected ) block drbd1: receiver terminated block drbd1: Restarting receiver thread block drbd1: receiver (re)started block drbd1: role( Secondary -> Primary ) block drbd1: Creating new current UUID block drbd1: conn( Unconnected -> WFConnection ) block drbd1: role( Primary -> Secondary ) block drbd1: disk( UpToDate -> Outdated ) block drbd1: Handshake successful: Agreed network protocol version 94 block drbd1: conn( WFConnection -> WFReportParams ) block drbd1: Starting asender thread (from drbd1_receiver [3538]) block drbd1: data-integrity-alg: <not-used> block drbd1: drbd_sync_handshake: block drbd1: self CDEEDF66F24D3ECC:5A7A31FAC38B4C30:F52C25D8178F32E6:B0A76171352A5A3D bits:1132 flags:0 block drbd1: peer F601A659931B404D:5A7A31FAC38B4C31:F52C25D8178F32E6:B0A76171352A5A3D bits:2 flags:0 block drbd1: uuid_compare()=100 by rule 90 block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 block drbd1: meta connection shut down by peer. block drbd1: conn( WFReportParams -> NetworkFailure ) block drbd1: asender terminated block drbd1: Terminating asender thread block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0) block drbd1: Split-Brain detected but unresolved, dropping connection! block drbd1: helper command: /sbin/drbdadm split-brain minor-1 block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0) block drbd1: conn( NetworkFailure -> Disconnecting ) block drbd1: error receiving ReportState, l: 4! block drbd1: Connection closed block drbd1: conn( Disconnecting -> StandAlone ) block drbd1: receiver terminated block drbd1: Terminating receiver thread [root at secondary ~]# Could anyone please explain me what is happening here? Thanks in advance Anton