[DRBD-user] Digest integrity check FAILED (was: ASSERT FAILED: drbd_al_read_log)

Walter Haidinger walter.haidinger at gmx.at
Fri Feb 25 12:12:15 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I've now replaced the onboard NICs used for the drbd link with PCIe models. The integrity checks still fail every couple of hours.
This is hardly suprising, though, because I was unable to reproduce any transmissions errors other than with drbd. 

Is it therefore safe to assume to rule out the network hardware?

> kernel logs, config, and meta data dump are more interesting.

Allright. Please tell me if anything else is interesting too.
Any hints regarding howto diagnose this problem are highly appreciated!

Please note that the system is otherwise stable, no problems except the
failed integrity checks of drbd.

/proc/drbd:
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by @build.k9, 2011-02-25 09:08:11
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:9220020 dw:9220016 dr:3065364 al:0 bm:881 lo:1 pe:0 ua:1 ap:0 ep:1 wo:b oos:0
        resync: used:0/61 hits:51 misses:11 starving:0 dirty:0 changed:11
        act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0

drbdadm dump:
# /etc/drbd.conf
global { minor-count 16; }

common {
    net {
        data-integrity-alg md5;
        sndbuf-size       1M;
        rcvbuf-size       1M;
    }
    syncer {
        rate             100M;
        c-plan-ahead      30;
        c-fill-target     4k;
        c-max-rate       120M;
        c-min-rate       1024;
        verify-alg       sha1;
        csums-alg        sha1;
    }
}

# resource md3 on prod1b.k9: not ignored, not stacked
resource md3 {
    protocol               C;
    on prod1a.k9 {
        device           /dev/drbd0 minor 0;
        disk             /dev/md3;
        address          ipv4 192.168.10.1:7788;
        flexible-meta-disk /dev/sys/drbd_meta0;
    }
    on prod1b.k9 {
        device           /dev/drbd0 minor 0;
        disk             /dev/md3;
        address          ipv4 192.168.10.2:7788;
        flexible-meta-disk /dev/sys/drbd_meta0;
    }
    net {
        timeout          100;
        connect-int       10;
        ping-int          10;
        ping-timeout       5;
        max-buffers      4096;
        unplug-watermark 2048;
        max-epoch-size   4096;
        ko-count           5;
        cram-hmac-alg    sha256;
        shared-secret    secret;
        after-sb-0pri    discard-younger-primary;
        after-sb-1pri    consensus;
        after-sb-2pri    disconnect;
        rr-conflict      disconnect;
    }
    disk {
        on-io-error      detach;
        fencing          dont-care;
    }
    syncer {
        al-extents       3389;
    }
    startup {
        wfc-timeout      120;
        degr-wfc-timeout  30;
    }
    handlers {
        pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
        pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
        local-io-error   "echo o > /proc/sysrq-trigger ; halt -f";
        fence-peer       /usr/sbin/drbd-peer-outdater;
    }

kernel dmesg output of an error:
drbd0: Digest integrity check FAILED: 182846680s +4096
drbd0: error receiving Data, l: 4136!
drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
drbd0: asender terminated
drbd0: Terminating asender thread
drbd0: Connection closed
drbd0: conn( ProtocolError -> Unconnected )
drbd0: receiver terminated
drbd0: Restarting receiver thread
drbd0: receiver (re)started
drbd0: conn( Unconnected -> WFConnection )
drbd0: Handshake successful: Agreed network protocol version 96
drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Starting asender thread (from drbd0_receiver [5679])
drbd0: data-integrity-alg: md5
drbd0: max BIO size = 130560
drbd0: drbd_sync_handshake:
drbd0: self 232D95BBCD88356C:0000000000000000:9FD19CF528E7A53A:9FD09CF528E7A53B bits:0 flags:0
drbd0: peer 5C46D84FC9C15C7D:232D95BBCD88356D:9FD19CF528E7A53B:9FD09CF528E7A53B bits:1 flags:0
drbd0: uuid_compare()=-1 by rule 50
drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
drbd0: conn( WFBitMapT -> WFSyncUUID )
drbd0: updated sync uuid 232E95BBCD88356C:0000000000000000:9FD19CF528E7A53A:9FD09CF528E7A53B
drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
drbd0: Began resync as SyncTarget (will sync 736 KB [184 bits set]).
drbd0: Resync done (total 1 sec; paused 0 sec; 736 K/sec)
drbd0: 0 % had equal check sums, eliminated: 0K; transferred 736K total 736K
drbd0: updated UUIDs 5C46D84FC9C15C7C:0000000000000000:232E95BBCD88356C:232D95BBCD88356D
drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
drbd0: bitmap WRITE of 5924 pages took 13 jiffies
drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.

#drbdadm dump-md md3
# DRBD meta data dump
# 2011-02-25 11:58:33 +0100 [1298631513]
# prod1b.k9> drbdmeta 0 v08 /dev/sys/drbd_meta0 flex-external dump-md
#

version "v08";

# md_size_sect 139264
# md_offset 0
# al_offset 4096
# bm_offset 36864

uuid {
    0x5C46D84FC9C15C7C; 0x0000000000000000; 0x232E95BBCD88356C; 0x232D95BBCD88356D;
    flags 0x00000011;
}
# al-extents 3389;
la-size-sect 1555043584;
bm-byte-per-bit 4096;
device-uuid 0x95962A5B877A5C33;
# bm-bytes 24297560;
bm {
   # at 0kB
    3037248 times 0x0000000000000000;
}
# bits-set 0;

Last but not least the system configuration:
Two nodes, identical hardware, running as a simple active/passive heartbeat v1 cluster (no CRM).
OS: CentOS 5.5 x86_64 with vanilla 2.6.35.11 kernel and drbd 8.3.10.
HW: Asus M3A-H mainboard, Phenom X4 965, 8G DDR2-800 ECC (EDAC enabled).
NICs (all Gigabit): Onboard Atheros L1, PCIe Intel 82572EI, PCIe Intel 82574L (used as dedicated drbd link, directly connected, no switch)
Storage: drbd on top of 3-way raid-1 (Linux md software-raid of SATA drives), LVM on top of drbd, all filesystems ext3.

Again, if anything else is interesting (lsmod, lspci?), just tell me.

Regards,
Walter
-- 
Schon gehört? GMX hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://www.gmx.net/de/go/toolbar



More information about the drbd-user mailing list