Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Feb 21, 2011 at 10:24:13AM +0100, Lars Ellenberg wrote: > On Mon, Feb 21, 2011 at 10:02:30AM +0100, Raoul Bhatia [IPAX] wrote: > > hi, > > > > after a couple of days, i can tell that i do not see the described > > problem with > > drbd 8.3.7 and kernel 2.6.32-bpo.5-amd64 > > (backports from squeeze to debian lenny) > > > > > root at c02n01 ~ # cat /proc/drbd > > > version: 8.3.7 (api:88/proto:86-91) > > > srcversion: EE47D8BF18AC166BE219757 > > > > > > taking a closer look, i also do not see the original error message > > anymore: (Digest mismatch, buffer modified by upper layers during write: > > 0s +4096) > > we changed the log message, respectively added the ability to > distinguish between detecting mismatch on the receiving end (previously > possible already), and detecting mismatch on the sending end as well > (previously not checked). > > > instead, i now see dmesg like: > > > [197080.750826] block drbd1: Digest integrity check FAILED. > > > [197080.750871] block drbd1: error receiving Data, l: 4136! > > > [197080.750905] block drbd1: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) > > > [197080.750977] block drbd1: asender terminated > > > > however, the devices correctly get back in sync. > > > > i'll additionally run a manual verify later on and will report back. > > > > lars: were you able to extract the logfiles from my original post? > > The logs of your original post are completely boring. No, wait. They are not ;-) Feb 16 06:25:03 c02n01 kernel: [3687390.120354] block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) Feb 16 06:25:03 c02n01 kernel: [3687390.120362] block drbd1: Began resync as SyncSource (will sync 4 KB [1 bits set]). Feb 16 06:25:03 c02n01 kernel: [3687390.120797] block drbd1: updated sync UUID 3C1DADF6B38C1AD7:E7E50184F3F3AC0B:E7E40184F3F3AC0B:3CFC3B16AAE1131D Feb 16 06:25:03 c02n01 kernel: [3687390.131787] block drbd1: Retrying drbd_rs_del_all() later. refcnt=1 Feb 16 06:25:04 c02n01 kernel: [3687390.232237] block drbd1: Resync done (total 1 sec; paused 0 sec; 4 K/sec) Feb 16 06:25:04 c02n01 kernel: [3687390.232314] block drbd1: updated UUIDs 3C1DADF6B38C1AD7:0000000000000000:E7E50184F3F3AC0B:E7E40184F3F3AC0B Feb 16 06:25:04 c02n01 kernel: [3687390.232434] block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Feb 16 06:25:04 c02n01 kernel: [3687390.274089] block drbd1: bitmap WRITE of 762 pages took 10 jiffies Feb 16 06:25:04 c02n01 kernel: [3687390.274154] block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Feb 16 06:25:04 c02n01 kernel: [3687390.947353] block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 1 (0x100) Feb 16 06:25:04 c02n01 kernel: [3687390.947487] block drbd1: fence-peer helper broken, returned 1 Fix your fence-peer helper, that may be the cause of trouble there. Feb 16 06:25:04 c02n01 kernel: [3687390.947555] block drbd1: pdsk( UpToDate -> DUnknown ) This should not have happened, either: We must not change the pdsk state to DUnknown while keeping conn state at Connected. That's nonsense. Feb 16 06:25:04 c02n01 kernel: [3687390.947633] block drbd1: new current UUID 89084B22FE454C03:3C1DADF6B38C1AD7:E7E50184F3F3AC0B:E7E40184F3F3AC0B -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed