Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi! I am not sure what I did test yesterday, but today it is definitely not working! Now I am getting "drbd_rs_complete_io(,5576704 [=170]) called, but refcnt is 0!?" and "drbd_rs_complete_io() called, but extent not found" outputs in syslog. I changed the configuration to 8.4 syntax as advices by Igor in reply to another question. global_common.conf: common { disk { # do not rely on default of 1237 al-extents 2000; # Full speed for 1GBit link c-plan-ahead 0; resync-rate 110M; # best performance for RAID with cache and BBU no-disk-barrier; no-disk-flushes; } net { max-buffers 8000; max-epoch-size 8000; sndbuf-size 512k; cram-hmac-alg sha1; shared-secret "xxxx"; verify-alg md5; protocol A; on-congestion pull-ahead; # should be 90% of the TCP send buffer if no DRBD Proxy is used congestion-fill 460k; # should be 90% of the al-extents congestion-extents 1800; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } } vm_100_disk_root.res: resource vm-100-disk-root { device /dev/drbd0; meta-disk internal; disk /dev/vg_vm_disks_A/vm_100_disk_root; on A { address 10.0.0.227:7788; } on B { address 10.0.0.226:7788; } } The VM is started on server A. When I now write to the disk I get a lot of warnings on the replicated node B and also on A it looks different now. Node B (secondary): Dec 17 18:44:01 B kernel: [15086.423229] block drbd1: conn( Connected -> Behind ) disk( UpToDate -> Outdated ) Dec 17 18:44:01 B pveproxy[31495]: Clearing outdated entries from certificate cache Dec 17 18:44:02 B kernel: [15087.756755] block drbd1: updated sync uuid 124E20C44E300D26:0000000000000000:066FFC07CEB258B8: 0265DAB48237BD3C Dec 17 18:44:02 B kernel: [15087.762865] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 Dec 17 18:44:02 B kernel: [15087.764706] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) Dec 17 18:44:02 B kernel: [15087.764811] block drbd1: conn( Behind -> SyncTarget ) disk( Outdated -> Inconsistent ) Dec 17 18:44:02 B kernel: [15087.764916] block drbd1: Began resync as SyncTarget (will sync 36792 KB [9198 bits set]). Dec 17 18:44:02 B kernel: [15088.239060] block drbd1: conn( SyncTarget -> Behind ) Dec 17 18:44:04 B kernel: [15089.759160] block drbd1: updated sync uuid 124F20C44E300D26:0000000000000000:066FFC07CEB258B8: 0265DAB48237BD3C Dec 17 18:44:04 B kernel: [15089.779443] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 Dec 17 18:44:04 B kernel: [15089.781228] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) Dec 17 18:44:04 B kernel: [15089.781408] block drbd1: conn( Behind -> SyncTarget ) Dec 17 18:44:04 B kernel: [15089.781477] block drbd1: Began resync as SyncTarget (will sync 104188 KB [26047 bits set]). Dec 17 18:44:04 B kernel: [15090.103001] block drbd1: drbd_rs_complete_io(,5541888 [=169]) called, but refcnt is 0!? Dec 17 18:44:04 B kernel: [15090.112360] block drbd1: drbd_rs_complete_io(,5543936 [=169]) called, but refcnt is 0!? Dec 17 18:44:04 B kernel: [15090.121661] block drbd1: drbd_rs_complete_io(,5545984 [=169]) called, but refcnt is 0!? Dec 17 18:44:06 B kernel: [15092.142175] block drbd1: conn( SyncTarget -> Behind ) Dec 17 18:44:07 B kernel: [15093.263781] block drbd1: updated sync uuid 125020C44E300D26:0000000000000000:066FFC07CEB258B8: 0265DAB48237BD3C Dec 17 18:44:07 B kernel: [15093.271052] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 Dec 17 18:44:07 B kernel: [15093.272906] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) Dec 17 18:44:07 B kernel: [15093.273019] block drbd1: conn( Behind -> SyncTarget ) Dec 17 18:44:07 B kernel: [15093.273092] block drbd1: Began resync as SyncTarget (will sync 146652 KB [36663 bits set]). Dec 17 18:44:07 B kernel: [15093.287939] block drbd1: drbd_rs_complete_io(,5576704 [=170]) called, but refcnt is 0!? Dec 17 18:44:07 B kernel: [15093.288249] block drbd1: drbd_rs_complete_io() called, but extent not found Dec 17 18:44:07 B kernel: [15093.297693] block drbd1: drbd_rs_complete_io() called, but extent not found .... Dec 17 18:44:09 B kernel: [15094.506888] block drbd1: Began resync as SyncTarget (will sync 186016 KB [46504 bits set]). Dec 17 18:44:11 B kernel: [15096.339852] block drbd1: Resync done (total 1 sec; paused 0 sec; 186016 K/sec) Dec 17 18:44:11 B kernel: [15096.339940] block drbd1: updated UUIDs 406A434F37639588:0000000000000000:125120C44E300D26:066FFC07CEB258B8 Dec 17 18:44:11 B kernel: [15096.340034] block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Dec 17 18:44:11 B kernel: [15096.387692] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 Dec 17 18:44:11 B kernel: [15096.390005] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0) Node A (primary): Dec 17 18:44:01 A kernel: [155462.657020] block drbd1: Congestion-fill threshold reached Dec 17 18:44:01 A kernel: [155462.657094] block drbd1: conn( Connected -> Ahead ) pdsk( UpToDate -> Consistent ) Dec 17 18:44:02 A kernel: [155463.728133] block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1 Dec 17 18:44:02 A kernel: [155463.731161] block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1 exit code 0 (0x0) Dec 17 18:44:02 A kernel: [155463.731275] block drbd1: conn( Ahead -> SyncSource ) pdsk( Consistent -> Inconsistent ) Dec 17 18:44:02 A kernel: [155463.731393] block drbd1: Began resync as SyncSource (will sync 36792 KB [9198 bits set]). Dec 17 18:44:02 A kernel: [155463.807042] block drbd1: updated sync UUID 406A434F37639589:124E20C44E300D27:066FFC07CEB258B8: 066EFC07CEB258B8 Dec 17 18:44:02 A kernel: [155464.373423] block drbd1: Congestion-fill threshold reached Dec 17 18:44:02 A kernel: [155464.373492] block drbd1: conn( SyncSource -> Ahead ) Dec 17 18:44:03 A kernel: [155465.520030] block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1 Dec 17 18:44:03 A kernel: [155465.522488] block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1 exit code 0 (0x0) Dec 17 18:44:03 A kernel: [155465.522601] block drbd1: conn( Ahead -> SyncSource ) Dec 17 18:44:03 A kernel: [155465.522681] block drbd1: Began resync as SyncSource (will sync 104188 KB [26047 bits set]). Dec 17 18:44:03 A kernel: [155465.522789] block drbd1: updated sync UUID 406A434F37639589:124F20C44E300D27:124E20C44E300D27: 066FFC07CEB258B8 Dec 17 18:44:06 A kernel: [155468.133306] block drbd1: Congestion-fill threshold reached Dec 17 18:44:06 A kernel: [155468.133377] block drbd1: conn( SyncSource -> Ahead ) Dec 17 18:44:07 A kernel: [155469.424035] block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1 Dec 17 18:44:07 A kernel: [155469.427250] block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1 exit code 0 (0x0) Dec 17 18:44:07 A kernel: [155469.427384] block drbd1: conn( Ahead -> SyncSource ) Dec 17 18:44:07 A kernel: [155469.427472] block drbd1: Began resync as SyncSource (will sync 146652 KB [36663 bits set]). Dec 17 18:44:07 A kernel: [155469.427594] block drbd1: updated sync UUID 406A434F37639589:125020C44E300D27:124F20C44E300D27: 124E20C44E300D27 Dec 17 18:44:09 A pvestatd[22514]: status update time (6.528 seconds) Dec 17 18:44:11 A kernel: [155472.674189] block drbd1: Resync done (total 1 sec; paused 0 sec; 186016 K/sec) Dec 17 18:44:11 A kernel: [155472.674282] block drbd1: updated UUIDs 406A434F37639589:0000000000000000:125120C44E300D27:125020C44E300D27 Dec 17 18:44:11 A kernel: [155472.674384] block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) I found no possibility to get back to the state from yesterday. I tried the configuration from yesterday and did reload the Kernel driver and also reboot the server. Then I re-initalized the meta data from scratch but still the same behaviour. I am doubting, if it ever worked yesterday ... . I did a test with a sndbuf-size 10M with the same result. I also googled the "called, but refcnt is 0!?" and "complete_io() called, but extent not found" but I found only hints for an older DRDB version. I am using the latest 8.4.9-1 Version from Git with the kernel_sendmsg fix. Is this now an intended behaviour or did I trigger a hidded "feature" which I can help to debug? BR, Jasmin