[DRBD-user] No resync of oos data in bitmap

Eddie Chapman eddie at ehuk.net
Fri May 4 12:08:52 CEST 2018


On 04/05/18 09:10, Christiaan den Besten wrote:
> Hi !
> 
> Question. Using DRBD 9.0.14 (latest from git) we can't get a resync after verify working. Having a simple 2-node resource created/configured 8.x style.
> 
> A "drbdadm verify" now succesfully ends at 100% ( thank you some much Lars for fixing this! ) and it notices inconsistent data blocks ( self inflicted by dd'ing some zeros on the secondary node ).
> 
> We then have :
> 
> [149702.915093] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: conn( Unconnected -> Connecting )
> [149704.335863] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Handshake to peer 0 successful: Agreed network protocol version 113
> [149704.335866] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
> [149704.336280] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Peer authenticated using 20 bytes HMAC
> [149704.336299] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Starting ack_recv thread (from drbd_r_r_drbd9. [4924])
> [149704.391726] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Preparing remote state change 196805945
> [149704.392341] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: Committing remote state change 196805945 (primary_nodes=2)
> [149704.392364] drbd r_drbd9.prolocation.net mhxen20.prolocation.net: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
> [149704.397800] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: drbd_sync_handshake:
> [149704.397805] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: self 9E1AD7F59E5434FA:0000000000000000:B3BDA5F13EDDFCEA:EE9BDB393791EAAC bits:0 flags:120
> [149704.397807] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: peer 9E1AD7F59E5434FA:0000000000000000:9E1AD7F59E5434FA:B3BDA5F13EDDFCEA bits:0 flags:120
> [149704.397809] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: uuid_compare()=0 by rule 38
> [149704.397830] drbd r_drbd9.prolocation.net/0 drbd11 mhxen20.prolocation.net: repl( Off -> Established )
> [149704.405793] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: drbd_sync_handshake:
> [149704.405796] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: self 686DD0F922994E9C:0000000000000000:AEB10B63BD82F43A:6805740BE5A46E08 bits:1048 flags:120
> [149704.405799] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: peer 686DD0F922994E9C:0000000000000000:686DD0F922994E9C:AEB10B63BD82F43A bits:1048 flags:120
> [149704.405801] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: uuid_compare()=0 by rule 38
> [149704.405803] drbd r_drbd9.prolocation.net/1 drbd12: No resync, but 1048 bits in bitmap!
> [149704.405821] drbd r_drbd9.prolocation.net/1 drbd12 mhxen20.prolocation.net: repl( Off -> Established )
> 
> and the same on the other node
> 
> [146265.229215] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: drbd_sync_handshake:
> [146265.229218] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: self 686DD0F922994E9C:0000000000000000:686DD0F922994E9C:AEB10B63BD82F43A bits:1048 flags:120
> [146265.229221] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: peer 686DD0F922994E9C:0000000000000000:AEB10B63BD82F43A:6805740BE5A46E08 bits:1048 flags:120
> [146265.229223] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: uuid_compare()=0 by rule 38
> [146265.229225] drbd r_drbd9.prolocation.net/1 drbd12: No resync, but 1048 bits in bitmap!
> [146265.229244] drbd r_drbd9.prolocation.net/1 drbd12 mhxen10.prolocation.net: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
> 
> with
> 
> [root at mhxen10 ~]# grep ^ /sys/kernel/debug/drbd/resources/*/connections/*/*/proc_drbd
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:11: cs:Established ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:    ns:41941724 nr:0 dw:0 dr:167767960 al:0 bm:0 lo:0 pe:[0;0] ua:0 ap:[0;0] ep:1 wo:1 oos:0
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:       resync: used:0/61 hits:0 misses:0 starving:0 locked:0 changed:0
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:       act_log: used:0/1237 hits:0 misses:0 starving:0 locked:0 changed:0
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/0/proc_drbd:       blocked on activity log: 0
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:12: cs:Established ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:    ns:41943040 nr:0 dw:0 dr:167773196 al:0 bm:0 lo:0 pe:[0;0] ua:0 ap:[0;0] ep:1 wo:1 oos:4192
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:       resync: used:0/61 hits:0 misses:0 starving:0 locked:0 changed:0
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:       act_log: used:0/1237 hits:0 misses:0 starving:0 locked:0 changed:0
> /sys/kernel/debug/drbd/resources/r_drbd9.prolocation.net/connections/mhxen20.prolocation.net/1/proc_drbd:       blocked on activity log: 0
> 
> Notice the oos:4192.
> 
> Disconnecting/reconnecting one or both ends won't make it resync. Is this something we misconfigured, or should it have worked ... ?
> 
> A "drbdadm invalidate-remote r_drbd9.prolocation.net" on the primary node forcing a full resync does get the job done.
> 
> Any advise on this ?


Hi Christiaan,

I frequently come across this on 9.x. The workaround I have used for a 
long time is to disconnect and then wait long enough until some writes 
have come in (or if you have access to the upper layer fs just touch a 
file is enough). Then reconnect, which of course leads to a normal 
resync (and I assume the oos are also taken care of as part of that). It 
seems to work as I think I checked in the past and a subsequent verify 
pass shows no oos).

regards,
Eddie


More information about the drbd-user mailing list