[DRBD-user] Digest integrity check FAILED (was: ASSERT FAILED: drbd_al_read_log)

Fri Feb 25 12:33:36 CET 2011

On Fri, Feb 25, 2011 at 12:12:15PM +0100, Walter Haidinger wrote:
> I've now replaced the onboard NICs used for the drbd link with PCIe models. The integrity checks still fail every couple of hours.
> This is hardly suprising, though, because I was unable to reproduce any transmissions errors other than with drbd. 
> 
> Is it therefore safe to assume to rule out the network hardware?
> 
> > kernel logs, config, and meta data dump are more interesting.
> 
> Allright. Please tell me if anything else is interesting too.
> Any hints regarding howto diagnose this problem are highly appreciated!
> 
> Please note that the system is otherwise stable, no problems except the
> failed integrity checks of drbd.

So you no longer have any problems/ASSERTs regarding drbd_al_read_log?

> /proc/drbd:
> version: 8.3.10 (api:88/proto:86-96)
> GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by @build.k9, 2011-02-25 09:08:11
>  0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
>     ns:0 nr:9220020 dw:9220016 dr:3065364 al:0 bm:881 lo:1 pe:0 ua:1 ap:0 ep:1 wo:b oos:0
>         resync: used:0/61 hits:51 misses:11 starving:0 dirty:0 changed:11
>         act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0
> 
> drbdadm dump:
> # /etc/drbd.conf
> global { minor-count 16; }
> 
> common {
>     net {
>         data-integrity-alg md5;
>         sndbuf-size       1M;
>         rcvbuf-size       1M;
>     }
>     syncer {
>         rate             100M;
>         c-plan-ahead      30;
>         c-fill-target     4k;
>         c-max-rate       120M;
>         c-min-rate       1024;
>         verify-alg       sha1;
>         csums-alg        sha1;
>     }
> }
> 
> # resource md3 on prod1b.k9: not ignored, not stacked
> resource md3 {
>     protocol               C;
>     on prod1a.k9 {
>         device           /dev/drbd0 minor 0;
>         disk             /dev/md3;
>         address          ipv4 192.168.10.1:7788;
>         flexible-meta-disk /dev/sys/drbd_meta0;
>     }
>     on prod1b.k9 {
>         device           /dev/drbd0 minor 0;
>         disk             /dev/md3;
>         address          ipv4 192.168.10.2:7788;
>         flexible-meta-disk /dev/sys/drbd_meta0;
>     }
>     net {
>         timeout          100;
>         connect-int       10;
>         ping-int          10;
>         ping-timeout       5;
>         max-buffers      4096;
>         unplug-watermark 2048;
>         max-epoch-size   4096;
>         ko-count           5;
>         cram-hmac-alg    sha256;
>         shared-secret    secret;
>         after-sb-0pri    discard-younger-primary;
>         after-sb-1pri    consensus;
>         after-sb-2pri    disconnect;
>         rr-conflict      disconnect;
>     }
>     disk {
>         on-io-error      detach;
>         fencing          dont-care;
>     }
>     syncer {
>         al-extents       3389;
>     }
>     startup {
>         wfc-timeout      120;
>         degr-wfc-timeout  30;
>     }
>     handlers {
>         pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
>         pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
>         local-io-error   "echo o > /proc/sysrq-trigger ; halt -f";
>         fence-peer       /usr/sbin/drbd-peer-outdater;
>     }
> 
> kernel dmesg output of an error:
> drbd0: Digest integrity check FAILED: 182846680s +4096
> drbd0: error receiving Data, l: 4136!
> drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )

Well, what does the other (Primary) side say?

I'd expect it to say
"Digest mismatch, buffer modified by upper layers during write: ..."

If it does not, your link corrupty data.
If it does, well, then that's what happens.
(note: this double check on the sending side
 has only been introduced with 8.3.10)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed