[DRBD-user] drbd issue?

Nicolas nicolas at shivaserv.fr
Tue Aug 28 16:43:47 CEST 2018


Hi

I'm using some servers on debian with ganeti and drbd.

Since I've upgraded them to debian 9, and drbd 8.9.10-2 (from debian repo).

I got a lot of issue with my drbd resources, I got randomly on my dmesg some resources disconnected:

today for example:

[Tue Aug 28 14:32:38 2018] drbd resource10: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) 
[Tue Aug 28 14:32:38 2018] drbd resource10: ack_receiver terminated
[Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_a_resource
[Tue Aug 28 14:32:38 2018] drbd resource10: Connection closed
[Tue Aug 28 14:32:38 2018] drbd resource10: conn( Disconnecting -> StandAlone ) 
[Tue Aug 28 14:32:38 2018] drbd resource10: receiver terminated
[Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_r_resource
[Tue Aug 28 14:32:38 2018] block drbd10: disk( UpToDate -> Failed ) 
[Tue Aug 28 14:32:38 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[Tue Aug 28 14:32:38 2018] block drbd10: disk( Failed -> Diskless ) 
[Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_w_resource
[Tue Aug 28 14:32:40 2018] drbd resource10: Starting worker thread (from drbdsetup-84 [10222])
[Tue Aug 28 14:32:40 2018] block drbd10: disk( Diskless -> Attaching ) 
[Tue Aug 28 14:32:40 2018] drbd resource10: Method to ensure write ordering: flush
[Tue Aug 28 14:32:40 2018] block drbd10: max BIO size = 262144
[Tue Aug 28 14:32:40 2018] block drbd10: Adjusting my ra_pages to backing device's (32 -> 256)
[Tue Aug 28 14:32:40 2018] block drbd10: drbd_bm_resize called with capacity == 314572800
[Tue Aug 28 14:32:40 2018] block drbd10: resync bitmap: bits=39321600 words=614400 pages=1200
[Tue Aug 28 14:32:40 2018] block drbd10: size = 150 GB (157286400 KB)
[Tue Aug 28 14:32:40 2018] block drbd10: recounting of set bits took additional 0 jiffies
[Tue Aug 28 14:32:40 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[Tue Aug 28 14:32:40 2018] block drbd10: disk( Attaching -> UpToDate ) 
[Tue Aug 28 14:32:40 2018] block drbd10: attached to UUIDs 0748EE11C429D3B4:0000000000000000:FDAEFCD2E8D9890A:FDADFCD2E8D9890B
[Tue Aug 28 14:32:40 2018] drbd resource10: conn( StandAlone -> Unconnected ) 
[Tue Aug 28 14:32:40 2018] drbd resource10: Starting receiver thread (from drbd_w_resource [10225])
[Tue Aug 28 14:32:40 2018] drbd resource10: receiver (re)started
[Tue Aug 28 14:32:40 2018] drbd resource10: conn( Unconnected -> WFConnection ) 
[Tue Aug 28 14:32:41 2018] drbd resource10: Handshake successful: Agreed network protocol version 101
[Tue Aug 28 14:32:41 2018] drbd resource10: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
[Tue Aug 28 14:32:41 2018] drbd resource10: Peer authenticated using 16 bytes HMAC
[Tue Aug 28 14:32:41 2018] drbd resource10: conn( WFConnection -> WFReportParams ) 
[Tue Aug 28 14:32:41 2018] drbd resource10: Starting ack_recv thread (from drbd_r_resource [10246])
[Tue Aug 28 14:32:41 2018] block drbd10: drbd_sync_handshake:
[Tue Aug 28 14:32:41 2018] block drbd10: self 0748EE11C429D3B4:0000000000000000:FDAEFCD2E8D9890A:FDADFCD2E8D9890B bits:0 flags:0
[Tue Aug 28 14:32:41 2018] block drbd10: peer 629F1036CD6CA2AF:0748EE11C429D3B5:FDAEFCD2E8D9890B:FDADFCD2E8D9890B bits:0 flags:0
[Tue Aug 28 14:32:41 2018] block drbd10: uuid_compare()=-1 by rule 50
[Tue Aug 28 14:32:41 2018] block drbd10: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) 
[Tue Aug 28 14:32:41 2018] block drbd10: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[Tue Aug 28 14:32:41 2018] block drbd10: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[Tue Aug 28 14:32:41 2018] block drbd10: conn( WFBitMapT -> WFSyncUUID ) 
[Tue Aug 28 14:32:41 2018] block drbd10: updated sync uuid 0749EE11C429D3B4:0000000000000000:FDAEFCD2E8D9890A:FDADFCD2E8D9890B
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true before-resync-target minor-10
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true before-resync-target minor-10 exit code 0 (0x0)
[Tue Aug 28 14:32:41 2018] block drbd10: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) 
[Tue Aug 28 14:32:41 2018] block drbd10: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
[Tue Aug 28 14:32:41 2018] block drbd10: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
[Tue Aug 28 14:32:41 2018] block drbd10: updated UUIDs 629F1036CD6CA2AE:0000000000000000:0749EE11C429D3B4:0748EE11C429D3B5
[Tue Aug 28 14:32:41 2018] block drbd10: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) 
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true after-resync-target minor-10
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true after-resync-target minor-10 exit code 0 (0x0)

I've got 4 differents clusters on these same versions and I got the same problem on all.

It's not always the same resource.
	Any idea what I can check?

Thanks a lot,
Nicolas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180828/1c1996b7/attachment.htm>


More information about the drbd-user mailing list