[DRBD-user] Uncatchable DRBD out-of-sync issue

Dan Barker dbarker at visioncomm.net
Sun Mar 24 15:36:49 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Stanislav, my system sends me an email when verify finds an out-of-sync condition. You can use the same handler if you like.

In my global, handlers section:
out-of-sync      "/usr/lib/drbd/notify-out-of-sync.sh myemailaddress";

Are you resyncing after the error is detected (disconnect/connect the resource)?

Dan, in Atlanta

From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Stanislav German-Evtushenko
Sent: Sunday, March 24, 2013 7:00 AM
To: drbd-user at lists.linbit.com
Subject: [DRBD-user] Uncatchable DRBD out-of-sync issue

Dear all,

I'm trying to catch the issue with out-of-sync and I've stuck so far. Can anybody give me a hint what can I check next?

Configuration:
- two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same configuration)
- drbd0 master-master (size is 900GiB)
- direct connection (two 1Gbit/s ethernet adapters in bonding balance-rr)
- data-integrity-alg is crc32c (it has been enabled for testing purposes)
- LVM on top of DRBD (LVM volumes are used by virtual machines)

Software:
- DRBD module version: 8.3.13
- kernel: Linux 2.6.32-19-pve #1 SMP x86_64 GNU/Linux

Problem:
- Each time when I do online verification it founds some sectors are out of sync (not many usually, about 5-15 messages after verification is done)
- In fact these sectors are not synced (checked with dd and md5sum)
- data-integrity-alg doesn't cause any messages in logs since drbdadm is connected all and until verification process finds some sectors out of sync

Questions:
- How is that possible?
- Why data-integrity-alg doesn't catch the problem?
- How to fix?

*** extracts from kernel log ***
Mar 24 13:23:38 host1 kernel: block drbd0: conn( Connected -> VerifyS )
Mar 24 13:23:38 host1 kernel: block drbd0: Starting Online Verify from sector 0
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996928, size=8 (sectors)
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996984, size=8 (sectors)
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718997224, size=8 (sectors)
*********************************

*** check with dd and md5sum ***
# dd iflag=direct if=/dev/drbd0 bs=512 skip=718997224 count=8 | md5sum
host1: 669a5c2ba22fa931aac16cdd2f03e22a
host2: ceeac3bd59178ee13f94ce283e3a4de3
********************************

*** drbdadm /dev/drbd0 show ***
disk {
        size                    0s _is_default; # bytes
        on-io-error             pass_on _is_default;
        fencing                 dont-care _is_default;
        max-bio-bvecs           0 _is_default;
}
net {
        timeout                 60 _is_default; # 1/10 seconds
        max-epoch-size          2048 _is_default;
        max-buffers             2048 _is_default;
        unplug-watermark        128 _is_default;
        connect-int             10 _is_default; # seconds
        ping-int                10 _is_default; # seconds
        sndbuf-size             0 _is_default; # bytes
        rcvbuf-size             0 _is_default; # bytes
        ko-count                0 _is_default;
        allow-two-primaries;
        cram-hmac-alg           "sha1";
        shared-secret           "XXXXXXXXXXXXXXXXXXX";
        after-sb-0pri           discard-zero-changes;
        after-sb-1pri           discard-secondary;
        after-sb-2pri           disconnect _is_default;
        rr-conflict             disconnect _is_default;
        ping-timeout            5 _is_default; # 1/10 seconds
        data-integrity-alg      "crc32c";
        on-congestion           block _is_default;
        congestion-fill         0s _is_default; # byte
        congestion-extents      127 _is_default;
}
syncer {
        rate                    153600k; # bytes/second
        after                   -1 _is_default;
        al-extents              127 _is_default;
        verify-alg              "md5";
        on-no-data-accessible   io-error _is_default;
        c-plan-ahead            0 _is_default; # 1/10 seconds
        c-delay-target          10 _is_default; # 1/10 seconds
        c-fill-target           0s _is_default; # bytes
        c-max-rate              102400k _is_default; # bytes/second
        c-min-rate              4096k _is_default; # bytes/second
}
protocol C;
_this_host {
        device                  minor 0;
        disk                    "/dev/sda3";
        meta-disk               internal;
        address                 ipv4 172.23.10.1:7788<http://172.23.10.1:7788>;
}
_remote_host {
        address                 ipv4 172.23.10.2:7788<http://172.23.10.2:7788>;
}
# (89)      unknown tag = (integer) 0   [len: 4]
# Found unknown tags, you should update your
# userland tools
*******************************

Best regards,
Stanislav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130324/38deaeee/attachment.htm>


More information about the drbd-user mailing list