[DRBD-user] Uncatchable DRBD out-of-sync issue

Stanislav German-Evtushenko ginermail at gmail.com
Sun Mar 24 11:59:34 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Dear all,

I'm trying to catch the issue with out-of-sync and I've stuck so far. Can
anybody give me a hint what can I check next?

Configuration:
- two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same
configuration)
- drbd0 master-master (size is 900GiB)
- direct connection (two 1Gbit/s ethernet adapters in bonding balance-rr)
- data-integrity-alg is crc32c (it has been enabled for testing purposes)
- LVM on top of DRBD (LVM volumes are used by virtual machines)

Software:
- DRBD module version: 8.3.13
- kernel: Linux 2.6.32-19-pve #1 SMP x86_64 GNU/Linux

Problem:
- Each time when I do online verification it founds some sectors are out of
sync (not many usually, about 5-15 messages after verification is done)
- In fact these sectors are not synced (checked with dd and md5sum)
- data-integrity-alg doesn't cause any messages in logs since drbdadm is
connected all and until verification process finds some sectors out of sync

Questions:
- How is that possible?
- Why data-integrity-alg doesn't catch the problem?
- How to fix?

*** extracts from kernel log ***
Mar 24 13:23:38 host1 kernel: block drbd0: conn( Connected -> VerifyS )
Mar 24 13:23:38 host1 kernel: block drbd0: Starting Online Verify from
sector 0
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996928,
size=8 (sectors)
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996984,
size=8 (sectors)
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718997224,
size=8 (sectors)
*********************************

*** check with dd and md5sum ***
# dd iflag=direct if=/dev/drbd0 bs=512 skip=718997224 count=8 | md5sum
host1: 669a5c2ba22fa931aac16cdd2f03e22a
host2: ceeac3bd59178ee13f94ce283e3a4de3
********************************

*** drbdadm /dev/drbd0 show ***
disk {
        size                    0s _is_default; # bytes
        on-io-error             pass_on _is_default;
        fencing                 dont-care _is_default;
        max-bio-bvecs           0 _is_default;
}
net {
        timeout                 60 _is_default; # 1/10 seconds
        max-epoch-size          2048 _is_default;
        max-buffers             2048 _is_default;
        unplug-watermark        128 _is_default;
        connect-int             10 _is_default; # seconds
        ping-int                10 _is_default; # seconds
        sndbuf-size             0 _is_default; # bytes
        rcvbuf-size             0 _is_default; # bytes
        ko-count                0 _is_default;
        allow-two-primaries;
        cram-hmac-alg           "sha1";
        shared-secret           "XXXXXXXXXXXXXXXXXXX";
        after-sb-0pri           discard-zero-changes;
        after-sb-1pri           discard-secondary;
        after-sb-2pri           disconnect _is_default;
        rr-conflict             disconnect _is_default;
        ping-timeout            5 _is_default; # 1/10 seconds
        data-integrity-alg      "crc32c";
        on-congestion           block _is_default;
        congestion-fill         0s _is_default; # byte
        congestion-extents      127 _is_default;
}
syncer {
        rate                    153600k; # bytes/second
        after                   -1 _is_default;
        al-extents              127 _is_default;
        verify-alg              "md5";
        on-no-data-accessible   io-error _is_default;
        c-plan-ahead            0 _is_default; # 1/10 seconds
        c-delay-target          10 _is_default; # 1/10 seconds
        c-fill-target           0s _is_default; # bytes
        c-max-rate              102400k _is_default; # bytes/second
        c-min-rate              4096k _is_default; # bytes/second
}
protocol C;
_this_host {
        device                  minor 0;
        disk                    "/dev/sda3";
        meta-disk               internal;
        address                 ipv4 172.23.10.1:7788;
}
_remote_host {
        address                 ipv4 172.23.10.2:7788;
}
# (89)      unknown tag = (integer) 0   [len: 4]
# Found unknown tags, you should update your
# userland tools
*******************************

Best regards,
Stanislav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130324/8f73cf18/attachment.htm>


More information about the drbd-user mailing list