Dear all,<br><br>I'm trying to catch the issue with out-of-sync and I've stuck so far. Can anybody give me a hint what can I check next?<br><br>Configuration:<br>- two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same configuration)<br>
- drbd0 master-master (size is 900GiB)<br>- direct connection (two 1Gbit/s ethernet adapters in bonding balance-rr)<br>- data-integrity-alg is crc32c (it has been enabled for testing purposes)<br>- LVM on top of DRBD (LVM volumes are used by virtual machines)<br>
<br>Software:<br>- DRBD module version: 8.3.13<br>- kernel: Linux 2.6.32-19-pve #1 SMP x86_64 GNU/Linux<br><br>Problem:<br>- Each time when I do online verification it founds some sectors are out of sync (not many usually, about 5-15 messages after verification is done)<br>
- In fact these sectors are not synced (checked with dd and md5sum)<br>- data-integrity-alg doesn't cause any messages in logs since drbdadm is connected all and until verification process finds some sectors out of sync<br>
<br>Questions:<br>- How is that possible?<br>- Why data-integrity-alg doesn't catch the problem?<br>- How to fix?<br><br>*** extracts from kernel log ***<br>Mar 24 13:23:38 host1 kernel: block drbd0: conn( Connected -> VerifyS )<br>
Mar 24 13:23:38 host1 kernel: block drbd0: Starting Online Verify from sector 0<br>Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996928, size=8 (sectors)<br>Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996984, size=8 (sectors)<br>
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718997224, size=8 (sectors)<br>*********************************<br><br>*** check with dd and md5sum ***<br># dd iflag=direct if=/dev/drbd0 bs=512 skip=718997224 count=8 | md5sum<br>
host1: 669a5c2ba22fa931aac16cdd2f03e22a<br>host2: ceeac3bd59178ee13f94ce283e3a4de3<br>********************************<br><br>*** drbdadm /dev/drbd0 show ***<br>disk {<br> size 0s _is_default; # bytes<br>
on-io-error pass_on _is_default;<br> fencing dont-care _is_default;<br> max-bio-bvecs 0 _is_default;<br>}<br>net {<br> timeout 60 _is_default; # 1/10 seconds<br>
max-epoch-size 2048 _is_default;<br> max-buffers 2048 _is_default;<br> unplug-watermark 128 _is_default;<br> connect-int 10 _is_default; # seconds<br> ping-int 10 _is_default; # seconds<br>
sndbuf-size 0 _is_default; # bytes<br> rcvbuf-size 0 _is_default; # bytes<br> ko-count 0 _is_default;<br> allow-two-primaries;<br> cram-hmac-alg "sha1";<br>
shared-secret "XXXXXXXXXXXXXXXXXXX";<br> after-sb-0pri discard-zero-changes;<br> after-sb-1pri discard-secondary;<br> after-sb-2pri disconnect _is_default;<br>
rr-conflict disconnect _is_default;<br> ping-timeout 5 _is_default; # 1/10 seconds<br> data-integrity-alg "crc32c";<br> on-congestion block _is_default;<br>
congestion-fill 0s _is_default; # byte<br> congestion-extents 127 _is_default;<br>}<br>syncer {<br> rate 153600k; # bytes/second<br> after -1 _is_default;<br>
al-extents 127 _is_default;<br> verify-alg "md5";<br> on-no-data-accessible io-error _is_default;<br> c-plan-ahead 0 _is_default; # 1/10 seconds<br>
c-delay-target 10 _is_default; # 1/10 seconds<br> c-fill-target 0s _is_default; # bytes<br> c-max-rate 102400k _is_default; # bytes/second<br> c-min-rate 4096k _is_default; # bytes/second<br>
}<br>protocol C;<br>_this_host {<br> device minor 0;<br> disk "/dev/sda3";<br> meta-disk internal;<br> address ipv4 <a href="http://172.23.10.1:7788">172.23.10.1:7788</a>;<br>
}<br>_remote_host {<br> address ipv4 <a href="http://172.23.10.2:7788">172.23.10.2:7788</a>;<br>}<br># (89) unknown tag = (integer) 0 [len: 4]<br># Found unknown tags, you should update your<br>
# userland tools<br>*******************************<br><br clear="all">Best regards,<br>Stanislav<br>