Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, This happens to me often lately on an otherwise working well two node cluster. If I have a Primary/Secondary resource, with the Primary then becoming diskless through me having run drbdadm detach on it, and I then create-md and attach a *new* block device to the Primary, the subsequent resync reaches 99% but never completes. Nothing is logged in dmesg at 99%, resync disk activity stops and never completes. Over time it then drops slowly to 98%, 97% if I leave it. In the end I have no choice but to detach the new block device. If I re-attach it again same happens, it starts completely new resync of whole bitmap. The resource continues working fine regardless throughout. I have a resource right this minute with this problem: node1 ~ # drbdadm status YK39N2GA YK39N2GA role:Primary disk:Inconsistent node2.mydomain role:Secondary replication:SyncTarget peer-disk:UpToDate done:99.97 Is there anything perhaps I can query on this resource to give some more info on what might be wrong? I'll leave it like this for as long as I can. Or anything specific I can monitor, during the resync, if I try attaching again? Both nodes are: uptodate Gentoo Vanilla kernel.org 4.4 kernel module 9.0.7, source tar.gz downloaded from drbd.org drbd utilities 8.9.11 (same) With the kernel, currently the Primary is at 4.4.68, Secondary at 4.4.59, if that may be relevant. I'm using drbdadm and friends rather than drbdmanage. I've used drbd many years, I like and am familiar with drbdadm, reluctant to change :-) Below is what was logged when the new block device was attached if it is any help. As I say nothing further is logged after the initial messages on attaching. thanks, Eddie [15244.562956] drbd YK39N2GA/0 drbd52: disk( Diskless -> Attaching ) [15244.562970] drbd YK39N2GA/0 drbd52: Maximum number of peer devices = 1 [15244.563682] drbd YK39N2GA/0 drbd52: my node_id: 0 [15244.563686] drbd YK39N2GA/0 drbd52: Adjusting my ra_pages to backing device's (768 -> 32) [15244.563688] drbd YK39N2GA/0 drbd52: my node_id: 0 [15244.563690] drbd YK39N2GA/0 drbd52: drbd_bm_resize called with capacity == 58720256 [15244.564163] drbd YK39N2GA/0 drbd52: resync bitmap: bits=7340032 words=114688 pages=224 [15244.564165] drbd YK39N2GA/0 drbd52: size = 28 GB (29360128 KB) [15244.591274] drbd YK39N2GA/0 drbd52: Writing the whole bitmap, size changed [15244.603901] drbd YK39N2GA/0 drbd52: recounting of set bits took additional 0ms [15244.603923] drbd YK39N2GA: Preparing cluster-wide state change 3918652602 (0->-1 7680/2048) [15244.604139] drbd YK39N2GA: State change 3918652602: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC [15244.604141] drbd YK39N2GA: Committing cluster-wide state change 3918652602 (0ms) [15244.604166] drbd YK39N2GA/0 drbd52: disk( Attaching -> Negotiating ) [15244.604170] drbd YK39N2GA/0 drbd52: attached to current UUID: 0000000000000004 [15244.604371] drbd YK39N2GA/0 drbd52 node2.mydomain: drbd_sync_handshake: [15244.604374] drbd YK39N2GA/0 drbd52 node2.mydomain: self 0000000000000005:0000000000000000:0000000000000000:0000000000000000 bits:7340032 flags:24 [15244.604376] drbd YK39N2GA/0 drbd52 node2.mydomain: peer A7E214FA09B78A66:543013BD23E0568A:FBFEDABFE5489160:835D869FF28BD86A bits:5236287 flags:100 [15244.604377] drbd YK39N2GA/0 drbd52 node2.mydomain: uuid_compare()=-3 by rule 20 [15244.604379] drbd YK39N2GA/0 drbd52 node2.mydomain: Writing the whole bitmap, full sync required after drbd_sync_handshake. [15244.620147] drbd YK39N2GA/0 drbd52: disk( Negotiating -> Inconsistent ) [15244.620149] drbd YK39N2GA/0 drbd52 node2.mydomain: repl( Established -> WFBitMapT ) [15244.634881] drbd YK39N2GA/0 drbd52 node2.mydomain: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% [15244.636111] drbd YK39N2GA/0 drbd52 node2.mydomain: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% [15244.636118] drbd YK39N2GA/0 drbd52 node2.mydomain: helper command: /sbin/drbdadm before-resync-target [15244.639180] drbd YK39N2GA/0 drbd52 node2.mydomain: helper command: /sbin/drbdadm before-resync-target exit code 0 (0x0) [15244.639195] drbd YK39N2GA/0 drbd52 node2.mydomain: repl( WFBitMapT -> SyncTarget ) [15244.639547] drbd YK39N2GA/0 drbd52 node2.mydomain: Began resync as SyncTarget (will sync 29360128 KB [7340032 bits set]).