[DRBD-user] Digest integrity check FAILED (was: ASSERT FAILED: drbd_al_read_log)

Walter Haidinger walter.haidinger at gmx.at
Sat Feb 26 19:31:03 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Lars, thanks for the reply.

> So you no longer have any problems/ASSERTs regarding drbd_al_read_log?

No, those are gone. I did a create-md on the secondary node and a full resync. Don't know if that was "the fix", though, but I suppose so.
 
> Well, what does the other (Primary) side say?
> I'd expect it to say
> "Digest mismatch, buffer modified by upper layers during write: ..."

Yes, it does (see the kernel logs below).

> If it does not, your link corrupty data.
> If it does, well, then that's what happens.
> (note: this double check on the sending side
>  has only been introduced with 8.3.10)

Now where do I go from here? 
Any way to tell who or what is responsible for the data corruption?

Trimmed kernel logs from todays (Feb 26th) corruption:
-- primary node --
14:42:11 Digest mismatch, buffer modified by upper layers during write: 713118272s +4096
14:42:11 sock was shut down by peer
14:42:11 peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
14:42:11 short read expecting header on sock: r=0
14:42:11 new current UUID B9F806E19286A6F7:7015E2DDD8881707:9682DC6D3721EFE1:9681DC6D3721EFE1
14:42:11 meta connection shut down by peer.
14:42:11 asender terminated
14:42:11 Terminating asender thread
14:42:11 Connection closed
14:42:11 conn( BrokenPipe -> Unconnected )
14:42:11 receiver terminated
14:42:11 Restarting receiver thread
14:42:11 receiver (re)started
14:42:11 conn( Unconnected -> WFConnection )
14:42:11 Handshake successful: Agreed network protocol version 96
14:42:11 Peer authenticated using 32 bytes of 'sha256' HMAC
14:42:11 conn( WFConnection -> WFReportParams )
14:42:11 Starting asender thread (from drbd0_receiver [11524])
14:42:11 data-integrity-alg: md5
14:42:11 max BIO size = 130560
14:42:12 drbd_sync_handshake:
14:42:12 self B9F806E19286A6F7:7015E2DDD8881707:9682DC6D3721EFE1:9681DC6D3721EFE1 bits:54 flags:0
14:42:12 peer 7015E2DDD8881706:0000000000000000:9682DC6D3721EFE0:9681DC6D3721EFE1 bits:0 flags:0
14:42:12 uuid_compare()=1 by rule 70
14:42:12 peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
14:42:12 helper command: /sbin/drbdadm before-resync-source minor-0
14:42:12 helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
14:42:12 conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
14:42:12 Began resync as SyncSource (will sync 240 KB [60 bits set]).
14:42:12 updated sync UUID B9F806E19286A6F7:7016E2DDD8881707:7015E2DDD8881707:9682DC6D3721EFE1
14:42:12 Resync done (total 1 sec; paused 0 sec; 240 K/sec)
14:42:12 1 % had equal check sums, eliminated: 4K; transferred 236K total 240K
14:42:12 updated UUIDs B9F806E19286A6F7:0000000000000000:7016E2DDD8881707:7015E2DDD8881707
14:42:12 conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
14:42:12 bitmap WRITE of 5931 pages took 13 jiffies
14:42:13 0 KB (0 bits) marked out-of-sync by on disk bit-map.

-- secondary node --
14:42:11 Digest integrity check FAILED: 713118272s +4096
14:42:11 error receiving Data, l: 4136!
14:42:11 peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
14:42:11 asender terminated
14:42:11 Terminating asender thread
14:42:11 Connection closed
14:42:11 conn( ProtocolError -> Unconnected )
14:42:11 receiver terminated
14:42:11 Restarting receiver thread
14:42:11 receiver (re)started
14:42:11 conn( Unconnected -> WFConnection )
14:42:11 Handshake successful: Agreed network protocol version 96
14:42:11 Peer authenticated using 32 bytes of 'sha256' HMAC
14:42:11 conn( WFConnection -> WFReportParams )
14:42:11 Starting asender thread (from drbd0_receiver [9650])
14:42:11 data-integrity-alg: md5
14:42:11 max BIO size = 130560
14:42:11 drbd_sync_handshake:
14:42:11 self 7015E2DDD8881706:0000000000000000:9682DC6D3721EFE0:9681DC6D3721EFE1 bits:0 flags:0
14:42:11 peer B9F806E19286A6F7:7015E2DDD8881707:9682DC6D3721EFE1:9681DC6D3721EFE1 bits:54 flags:0
14:42:11 uuid_compare()=-1 by rule 50
14:42:11 peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
14:42:12 conn( WFBitMapT -> WFSyncUUID )
14:42:12 updated sync uuid 7016E2DDD8881706:0000000000000000:9682DC6D3721EFE0:9681DC6D3721EFE1
14:42:12 helper command: /sbin/drbdadm before-resync-target minor-0
14:42:12 helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
14:42:12 conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
14:42:12 Began resync as SyncTarget (will sync 240 KB [60 bits set]).
14:42:12 Resync done (total 1 sec; paused 0 sec; 240 K/sec)
14:42:12 1 % had equal check sums, eliminated: 4K; transferred 236K total 240K
14:42:12 updated UUIDs B9F806E19286A6F6:0000000000000000:7016E2DDD8881706:7015E2DDD8881707
14:42:12 conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
14:42:12 helper command: /sbin/drbdadm after-resync-target minor-0
14:42:12 helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
14:42:13 bitmap WRITE of 5931 pages took 15 jiffies
14:42:13 0 KB (0 bits) marked out-of-sync by on disk bit-map.



-- 
NEU: FreePhone - kostenlos mobil telefonieren und surfen!			
Jetzt informieren: http://www.gmx.net/de/go/freephone



More information about the drbd-user mailing list