Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
This happens to me often lately on an otherwise working well two node
cluster.
If I have a Primary/Secondary resource, with the Primary then becoming
diskless through me having run drbdadm detach on it, and I then
create-md and attach a *new* block device to the Primary, the subsequent
resync reaches 99% but never completes. Nothing is logged in dmesg at
99%, resync disk activity stops and never completes. Over time it then
drops slowly to 98%, 97% if I leave it. In the end I have no choice but
to detach the new block device. If I re-attach it again same happens, it
starts completely new resync of whole bitmap. The resource continues
working fine regardless throughout.
I have a resource right this minute with this problem:
node1 ~ # drbdadm status YK39N2GA
YK39N2GA role:Primary
disk:Inconsistent
node2.mydomain role:Secondary
replication:SyncTarget peer-disk:UpToDate done:99.97
Is there anything perhaps I can query on this resource to give some more
info on what might be wrong? I'll leave it like this for as long as I
can. Or anything specific I can monitor, during the resync, if I try
attaching again?
Both nodes are:
uptodate Gentoo
Vanilla kernel.org 4.4
kernel module 9.0.7, source tar.gz downloaded from drbd.org
drbd utilities 8.9.11 (same)
With the kernel, currently the Primary is at 4.4.68, Secondary at
4.4.59, if that may be relevant.
I'm using drbdadm and friends rather than drbdmanage. I've used drbd
many years, I like and am familiar with drbdadm, reluctant to change :-)
Below is what was logged when the new block device was attached if it is
any help. As I say nothing further is logged after the initial messages
on attaching.
thanks,
Eddie
[15244.562956] drbd YK39N2GA/0 drbd52: disk( Diskless -> Attaching )
[15244.562970] drbd YK39N2GA/0 drbd52: Maximum number of peer devices = 1
[15244.563682] drbd YK39N2GA/0 drbd52: my node_id: 0
[15244.563686] drbd YK39N2GA/0 drbd52: Adjusting my ra_pages to backing
device's (768 -> 32)
[15244.563688] drbd YK39N2GA/0 drbd52: my node_id: 0
[15244.563690] drbd YK39N2GA/0 drbd52: drbd_bm_resize called with
capacity == 58720256
[15244.564163] drbd YK39N2GA/0 drbd52: resync bitmap: bits=7340032
words=114688 pages=224
[15244.564165] drbd YK39N2GA/0 drbd52: size = 28 GB (29360128 KB)
[15244.591274] drbd YK39N2GA/0 drbd52: Writing the whole bitmap, size
changed
[15244.603901] drbd YK39N2GA/0 drbd52: recounting of set bits took
additional 0ms
[15244.603923] drbd YK39N2GA: Preparing cluster-wide state change
3918652602 (0->-1 7680/2048)
[15244.604139] drbd YK39N2GA: State change 3918652602: primary_nodes=1,
weak_nodes=FFFFFFFFFFFFFFFC
[15244.604141] drbd YK39N2GA: Committing cluster-wide state change
3918652602 (0ms)
[15244.604166] drbd YK39N2GA/0 drbd52: disk( Attaching -> Negotiating )
[15244.604170] drbd YK39N2GA/0 drbd52: attached to current UUID:
0000000000000004
[15244.604371] drbd YK39N2GA/0 drbd52 node2.mydomain: drbd_sync_handshake:
[15244.604374] drbd YK39N2GA/0 drbd52 node2.mydomain: self
0000000000000005:0000000000000000:0000000000000000:0000000000000000
bits:7340032 flags:24
[15244.604376] drbd YK39N2GA/0 drbd52 node2.mydomain: peer
A7E214FA09B78A66:543013BD23E0568A:FBFEDABFE5489160:835D869FF28BD86A
bits:5236287 flags:100
[15244.604377] drbd YK39N2GA/0 drbd52 node2.mydomain: uuid_compare()=-3
by rule 20
[15244.604379] drbd YK39N2GA/0 drbd52 node2.mydomain: Writing the whole
bitmap, full sync required after drbd_sync_handshake.
[15244.620147] drbd YK39N2GA/0 drbd52: disk( Negotiating -> Inconsistent )
[15244.620149] drbd YK39N2GA/0 drbd52 node2.mydomain: repl( Established
-> WFBitMapT )
[15244.634881] drbd YK39N2GA/0 drbd52 node2.mydomain: receive bitmap
stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[15244.636111] drbd YK39N2GA/0 drbd52 node2.mydomain: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[15244.636118] drbd YK39N2GA/0 drbd52 node2.mydomain: helper command:
/sbin/drbdadm before-resync-target
[15244.639180] drbd YK39N2GA/0 drbd52 node2.mydomain: helper command:
/sbin/drbdadm before-resync-target exit code 0 (0x0)
[15244.639195] drbd YK39N2GA/0 drbd52 node2.mydomain: repl( WFBitMapT ->
SyncTarget )
[15244.639547] drbd YK39N2GA/0 drbd52 node2.mydomain: Began resync as
SyncTarget (will sync 29360128 KB [7340032 bits set]).