[DRBD-user] DRBD 9 diskless Primary, subsequent resync of new block device never completes

Eddie Chapman eddie at ehuk.net
Fri May 19 17:06:48 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

This happens to me often lately on an otherwise working well two node 
cluster.

If I have a Primary/Secondary resource, with the Primary then becoming 
diskless through me having run drbdadm detach on it, and I then 
create-md and attach a *new* block device to the Primary, the subsequent 
resync reaches 99% but never completes. Nothing is logged in dmesg at 
99%, resync disk activity stops and never completes. Over time it then 
drops slowly to 98%, 97% if I leave it. In the end I have no choice but 
to detach the new block device. If I re-attach it again same happens, it 
starts completely new resync of whole bitmap. The resource continues 
working fine regardless throughout.

I have a resource right this minute with this problem:

node1 ~ # drbdadm status YK39N2GA
YK39N2GA role:Primary
   disk:Inconsistent
   node2.mydomain role:Secondary
     replication:SyncTarget peer-disk:UpToDate done:99.97

Is there anything perhaps I can query on this resource to give some more 
info on what might be wrong? I'll leave it like this for as long as I 
can.  Or anything specific I can monitor, during the resync, if I try 
attaching again?

Both nodes are:

uptodate Gentoo
Vanilla kernel.org 4.4
kernel module 9.0.7, source tar.gz downloaded from drbd.org
drbd utilities 8.9.11 (same)

With the kernel, currently the Primary is at 4.4.68, Secondary at 
4.4.59, if that may be relevant.

I'm using drbdadm and friends rather than drbdmanage. I've used drbd 
many years, I like and am familiar with drbdadm, reluctant to change :-)

Below is what was logged when the new block device was attached if it is 
any help. As I say nothing further is logged after the initial messages 
on attaching.

thanks,
Eddie

[15244.562956] drbd YK39N2GA/0 drbd52: disk( Diskless -> Attaching )
[15244.562970] drbd YK39N2GA/0 drbd52: Maximum number of peer devices = 1
[15244.563682] drbd YK39N2GA/0 drbd52: my node_id: 0
[15244.563686] drbd YK39N2GA/0 drbd52: Adjusting my ra_pages to backing 
device's (768 -> 32)
[15244.563688] drbd YK39N2GA/0 drbd52: my node_id: 0
[15244.563690] drbd YK39N2GA/0 drbd52: drbd_bm_resize called with 
capacity == 58720256
[15244.564163] drbd YK39N2GA/0 drbd52: resync bitmap: bits=7340032 
words=114688 pages=224
[15244.564165] drbd YK39N2GA/0 drbd52: size = 28 GB (29360128 KB)
[15244.591274] drbd YK39N2GA/0 drbd52: Writing the whole bitmap, size 
changed
[15244.603901] drbd YK39N2GA/0 drbd52: recounting of set bits took 
additional 0ms
[15244.603923] drbd YK39N2GA: Preparing cluster-wide state change 
3918652602 (0->-1 7680/2048)
[15244.604139] drbd YK39N2GA: State change 3918652602: primary_nodes=1, 
weak_nodes=FFFFFFFFFFFFFFFC
[15244.604141] drbd YK39N2GA: Committing cluster-wide state change 
3918652602 (0ms)
[15244.604166] drbd YK39N2GA/0 drbd52: disk( Attaching -> Negotiating )
[15244.604170] drbd YK39N2GA/0 drbd52: attached to current UUID: 
0000000000000004
[15244.604371] drbd YK39N2GA/0 drbd52 node2.mydomain: drbd_sync_handshake:
[15244.604374] drbd YK39N2GA/0 drbd52 node2.mydomain: self 
0000000000000005:0000000000000000:0000000000000000:0000000000000000 
bits:7340032 flags:24
[15244.604376] drbd YK39N2GA/0 drbd52 node2.mydomain: peer 
A7E214FA09B78A66:543013BD23E0568A:FBFEDABFE5489160:835D869FF28BD86A 
bits:5236287 flags:100
[15244.604377] drbd YK39N2GA/0 drbd52 node2.mydomain: uuid_compare()=-3 
by rule 20
[15244.604379] drbd YK39N2GA/0 drbd52 node2.mydomain: Writing the whole 
bitmap, full sync required after drbd_sync_handshake.
[15244.620147] drbd YK39N2GA/0 drbd52: disk( Negotiating -> Inconsistent )
[15244.620149] drbd YK39N2GA/0 drbd52 node2.mydomain: repl( Established 
-> WFBitMapT )
[15244.634881] drbd YK39N2GA/0 drbd52 node2.mydomain: receive bitmap 
stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[15244.636111] drbd YK39N2GA/0 drbd52 node2.mydomain: send bitmap stats 
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[15244.636118] drbd YK39N2GA/0 drbd52 node2.mydomain: helper command: 
/sbin/drbdadm before-resync-target
[15244.639180] drbd YK39N2GA/0 drbd52 node2.mydomain: helper command: 
/sbin/drbdadm before-resync-target exit code 0 (0x0)
[15244.639195] drbd YK39N2GA/0 drbd52 node2.mydomain: repl( WFBitMapT -> 
SyncTarget )
[15244.639547] drbd YK39N2GA/0 drbd52 node2.mydomain: Began resync as 
SyncTarget (will sync 29360128 KB [7340032 bits set]).



More information about the drbd-user mailing list