Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi!
I am not sure what I did test yesterday, but today it is definitely not
working!
Now I am getting "drbd_rs_complete_io(,5576704 [=170]) called, but refcnt is
0!?" and "drbd_rs_complete_io() called, but extent not found" outputs in
syslog.
I changed the configuration to 8.4 syntax as advices by Igor in reply to
another question.
global_common.conf:
common {
disk {
# do not rely on default of 1237
al-extents 2000;
# Full speed for 1GBit link
c-plan-ahead 0;
resync-rate 110M;
# best performance for RAID with cache and BBU
no-disk-barrier;
no-disk-flushes;
}
net {
max-buffers 8000;
max-epoch-size 8000;
sndbuf-size 512k;
cram-hmac-alg sha1;
shared-secret "xxxx";
verify-alg md5;
protocol A;
on-congestion pull-ahead;
# should be 90% of the TCP send buffer if no DRBD Proxy is used
congestion-fill 460k;
# should be 90% of the al-extents
congestion-extents 1800;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
}
vm_100_disk_root.res:
resource vm-100-disk-root {
device /dev/drbd0;
meta-disk internal;
disk /dev/vg_vm_disks_A/vm_100_disk_root;
on A {
address 10.0.0.227:7788;
}
on B {
address 10.0.0.226:7788;
}
}
The VM is started on server A. When I now write to the disk I get a lot of
warnings on the replicated node B and also on A it looks different now.
Node B (secondary):
Dec 17 18:44:01 B kernel: [15086.423229] block drbd1: conn( Connected ->
Behind ) disk( UpToDate -> Outdated )
Dec 17 18:44:01 B pveproxy[31495]: Clearing outdated entries from certificate
cache
Dec 17 18:44:02 B kernel: [15087.756755] block drbd1: updated sync uuid
124E20C44E300D26:0000000000000000:066FFC07CEB258B8: 0265DAB48237BD3C
Dec 17 18:44:02 B kernel: [15087.762865] block drbd1: helper command:
/sbin/drbdadm before-resync-target minor-1
Dec 17 18:44:02 B kernel: [15087.764706] block drbd1: helper command:
/sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
Dec 17 18:44:02 B kernel: [15087.764811] block drbd1: conn( Behind ->
SyncTarget ) disk( Outdated -> Inconsistent )
Dec 17 18:44:02 B kernel: [15087.764916] block drbd1: Began resync as
SyncTarget (will sync 36792 KB [9198 bits set]).
Dec 17 18:44:02 B kernel: [15088.239060] block drbd1: conn( SyncTarget ->
Behind )
Dec 17 18:44:04 B kernel: [15089.759160] block drbd1: updated sync uuid
124F20C44E300D26:0000000000000000:066FFC07CEB258B8: 0265DAB48237BD3C
Dec 17 18:44:04 B kernel: [15089.779443] block drbd1: helper command:
/sbin/drbdadm before-resync-target minor-1
Dec 17 18:44:04 B kernel: [15089.781228] block drbd1: helper command:
/sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
Dec 17 18:44:04 B kernel: [15089.781408] block drbd1: conn( Behind ->
SyncTarget )
Dec 17 18:44:04 B kernel: [15089.781477] block drbd1: Began resync as
SyncTarget (will sync 104188 KB [26047 bits set]).
Dec 17 18:44:04 B kernel: [15090.103001] block drbd1:
drbd_rs_complete_io(,5541888 [=169]) called, but refcnt is 0!?
Dec 17 18:44:04 B kernel: [15090.112360] block drbd1:
drbd_rs_complete_io(,5543936 [=169]) called, but refcnt is 0!?
Dec 17 18:44:04 B kernel: [15090.121661] block drbd1:
drbd_rs_complete_io(,5545984 [=169]) called, but refcnt is 0!?
Dec 17 18:44:06 B kernel: [15092.142175] block drbd1: conn( SyncTarget ->
Behind )
Dec 17 18:44:07 B kernel: [15093.263781] block drbd1: updated sync uuid
125020C44E300D26:0000000000000000:066FFC07CEB258B8: 0265DAB48237BD3C
Dec 17 18:44:07 B kernel: [15093.271052] block drbd1: helper command:
/sbin/drbdadm before-resync-target minor-1
Dec 17 18:44:07 B kernel: [15093.272906] block drbd1: helper command:
/sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
Dec 17 18:44:07 B kernel: [15093.273019] block drbd1: conn( Behind ->
SyncTarget )
Dec 17 18:44:07 B kernel: [15093.273092] block drbd1: Began resync as
SyncTarget (will sync 146652 KB [36663 bits set]).
Dec 17 18:44:07 B kernel: [15093.287939] block drbd1:
drbd_rs_complete_io(,5576704 [=170]) called, but refcnt is 0!?
Dec 17 18:44:07 B kernel: [15093.288249] block drbd1: drbd_rs_complete_io()
called, but extent not found
Dec 17 18:44:07 B kernel: [15093.297693] block drbd1: drbd_rs_complete_io()
called, but extent not found
....
Dec 17 18:44:09 B kernel: [15094.506888] block drbd1: Began resync as
SyncTarget (will sync 186016 KB [46504 bits set]).
Dec 17 18:44:11 B kernel: [15096.339852] block drbd1: Resync done (total 1
sec; paused 0 sec; 186016 K/sec)
Dec 17 18:44:11 B kernel: [15096.339940] block drbd1: updated UUIDs
406A434F37639588:0000000000000000:125120C44E300D26:066FFC07CEB258B8
Dec 17 18:44:11 B kernel: [15096.340034] block drbd1: conn( SyncTarget ->
Connected ) disk( Inconsistent -> UpToDate )
Dec 17 18:44:11 B kernel: [15096.387692] block drbd1: helper command:
/sbin/drbdadm after-resync-target minor-1
Dec 17 18:44:11 B kernel: [15096.390005] block drbd1: helper command:
/sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0)
Node A (primary):
Dec 17 18:44:01 A kernel: [155462.657020] block drbd1: Congestion-fill
threshold reached
Dec 17 18:44:01 A kernel: [155462.657094] block drbd1: conn( Connected ->
Ahead ) pdsk( UpToDate -> Consistent )
Dec 17 18:44:02 A kernel: [155463.728133] block drbd1: helper command:
/sbin/drbdadm before-resync-source minor-1
Dec 17 18:44:02 A kernel: [155463.731161] block drbd1: helper command:
/sbin/drbdadm before-resync-source minor-1 exit code 0 (0x0)
Dec 17 18:44:02 A kernel: [155463.731275] block drbd1: conn( Ahead ->
SyncSource ) pdsk( Consistent -> Inconsistent )
Dec 17 18:44:02 A kernel: [155463.731393] block drbd1: Began resync as
SyncSource (will sync 36792 KB [9198 bits set]).
Dec 17 18:44:02 A kernel: [155463.807042] block drbd1: updated sync UUID
406A434F37639589:124E20C44E300D27:066FFC07CEB258B8: 066EFC07CEB258B8
Dec 17 18:44:02 A kernel: [155464.373423] block drbd1: Congestion-fill
threshold reached
Dec 17 18:44:02 A kernel: [155464.373492] block drbd1: conn( SyncSource ->
Ahead )
Dec 17 18:44:03 A kernel: [155465.520030] block drbd1: helper command:
/sbin/drbdadm before-resync-source minor-1
Dec 17 18:44:03 A kernel: [155465.522488] block drbd1: helper command:
/sbin/drbdadm before-resync-source minor-1 exit code 0 (0x0)
Dec 17 18:44:03 A kernel: [155465.522601] block drbd1: conn( Ahead ->
SyncSource )
Dec 17 18:44:03 A kernel: [155465.522681] block drbd1: Began resync as
SyncSource (will sync 104188 KB [26047 bits set]).
Dec 17 18:44:03 A kernel: [155465.522789] block drbd1: updated sync UUID
406A434F37639589:124F20C44E300D27:124E20C44E300D27: 066FFC07CEB258B8
Dec 17 18:44:06 A kernel: [155468.133306] block drbd1: Congestion-fill
threshold reached
Dec 17 18:44:06 A kernel: [155468.133377] block drbd1: conn( SyncSource ->
Ahead )
Dec 17 18:44:07 A kernel: [155469.424035] block drbd1: helper command:
/sbin/drbdadm before-resync-source minor-1
Dec 17 18:44:07 A kernel: [155469.427250] block drbd1: helper command:
/sbin/drbdadm before-resync-source minor-1 exit code 0 (0x0)
Dec 17 18:44:07 A kernel: [155469.427384] block drbd1: conn( Ahead ->
SyncSource )
Dec 17 18:44:07 A kernel: [155469.427472] block drbd1: Began resync as
SyncSource (will sync 146652 KB [36663 bits set]).
Dec 17 18:44:07 A kernel: [155469.427594] block drbd1: updated sync UUID
406A434F37639589:125020C44E300D27:124F20C44E300D27: 124E20C44E300D27
Dec 17 18:44:09 A pvestatd[22514]: status update time (6.528 seconds)
Dec 17 18:44:11 A kernel: [155472.674189] block drbd1: Resync done (total 1
sec; paused 0 sec; 186016 K/sec)
Dec 17 18:44:11 A kernel: [155472.674282] block drbd1: updated UUIDs
406A434F37639589:0000000000000000:125120C44E300D27:125020C44E300D27
Dec 17 18:44:11 A kernel: [155472.674384] block drbd1: conn( SyncSource ->
Connected ) pdsk( Inconsistent -> UpToDate )
I found no possibility to get back to the state from yesterday. I tried the
configuration from yesterday and did reload the Kernel driver and also reboot
the server.
Then I re-initalized the meta data from scratch but still the same behaviour.
I am doubting, if it ever worked yesterday ... .
I did a test with a sndbuf-size 10M with the same result.
I also googled the "called, but refcnt is 0!?" and "complete_io() called, but
extent not found" but I found only hints for an older DRDB version.
I am using the latest 8.4.9-1 Version from Git with the kernel_sendmsg fix.
Is this now an intended behaviour or did I trigger a hidded "feature" which I
can help to debug?
BR,
Jasmin