[DRBD-user] No resync of oos data in bitmap

Christiaan den Besten chris at prolocation.net
Fri May 4 10:10:48 CEST 2018


Hi !

Question. Using DRBD 9.0.14 (latest from git) we can't get a resync after verify working. Having a simple 2-node resource created/configured 8.x style.

A "drbdadm verify" now succesfully ends at 100% ( thank you some much Lars for fixing this! ) and it notices inconsistent data blocks ( self inflicted by dd'ing some zeros on the secondary node ).

We then have :

[149702.915093] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: conn( Unconnected -> Connecting )
[149704.335863] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Handshake to peer 0 successful: Agreed network protocol version 113
[149704.335866] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[149704.336280] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Peer authenticated using 20 bytes HMAC
[149704.336299] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Starting ack_recv thread (from drbd_r_r_drbd9. [4924])
[149704.391726] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Preparing remote state change 196805945
[149704.392341] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Committing remote state change 196805945 (primary_nodes=2)
[149704.392364] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
[149704.397800] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: drbd_sync_handshake:
[149704.397805] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: self 9E1AD7F59E5434FA:0000000000000000:B3BDA5F13EDDFCEA:EE9BDB393791EAAC bits:0 flags:120
[149704.397807] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: peer 9E1AD7F59E5434FA:0000000000000000:9E1AD7F59E5434FA:B3BDA5F13EDDFCEA bits:0 flags:120
[149704.397809] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: uuid_compare()=0 by rule 38
[149704.397830] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: repl( Off -> Established )
[149704.405793] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: drbd_sync_handshake:
[149704.405796] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: self 686DD0F922994E9C:0000000000000000:AEB10B63BD82F43A:6805740BE5A46E08 bits:1048 flags:120
[149704.405799] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: peer 686DD0F922994E9C:0000000000000000:686DD0F922994E9C:AEB10B63BD82F43A bits:1048 flags:120
[149704.405801] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: uuid_compare()=0 by rule 38
[149704.405803] drbd r_drbd9.prolocation.net/1 drbd12: No resync, but 1048 bits in bitmap!
[149704.405821] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: repl( Off -> Established )

and the same on the other node

[146265.229215] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: drbd_sync_handshake:
[146265.229218] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: self 686DD0F922994E9C:0000000000000000:686DD0F922994E9C:AEB10B63BD82F43A bits:1048 flags:120
[146265.229221] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: peer 686DD0F922994E9C:0000000000000000:AEB10B63BD82F43A:6805740BE5A46E08 bits:1048 flags:120
[146265.229223] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: uuid_compare()=0 by rule 38
[146265.229225] drbd r_drbd9.prolocation.net/1 drbd12: No resync, but 1048 bits in bitmap!
[146265.229244] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

with

[root at mhxen10 ~]# grep ^ /sys/kernel/debug/drbd/resources/*/connections/*/*/proc_drbd
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:11: cs:Established ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:    ns:41941724 nr:0 dw:0 dr:167767960 al:0 bm:0 lo:0 pe:[0;0] ua:0 ap:[0;0] ep:1 wo:1 oos:0
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:       resync: used:0/61 hits:0 misses:0 starving:0 locked:0 changed:0
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:       act_log: used:0/1237 hits:0 misses:0 starving:0 locked:0 changed:0
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:       blocked on activity log: 0
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:12: cs:Established ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:    ns:41943040 nr:0 dw:0 dr:167773196 al:0 bm:0 lo:0 pe:[0;0] ua:0 ap:[0;0] ep:1 wo:1 oos:4192
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:       resync: used:0/61 hits:0 misses:0 starving:0 locked:0 changed:0
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:       act_log: used:0/1237 hits:0 misses:0 starving:0 locked:0 changed:0
/sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:       blocked on activity log: 0

Notice the oos:4192.

Disconnecting/reconnecting one or both ends won't make it resync. Is this something we misconfigured, or should it have worked ... ?

A "drbdadm invalidate-remote r_drbd9.prolocation.net" on the primary node forcing a full resync does get the job done.

Any advise on this ?

-- 
Met vriendelijke groet,
Christiaan den Besten - Prolocation B.V.

T: +31 (0)70 - 326 04 25


More information about the drbd-user mailing list