Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Lars, thanks for the reply. > So you no longer have any problems/ASSERTs regarding drbd_al_read_log? No, those are gone. I did a create-md on the secondary node and a full resync. Don't know if that was "the fix", though, but I suppose so. > Well, what does the other (Primary) side say? > I'd expect it to say > "Digest mismatch, buffer modified by upper layers during write: ..." Yes, it does (see the kernel logs below). > If it does not, your link corrupty data. > If it does, well, then that's what happens. > (note: this double check on the sending side > has only been introduced with 8.3.10) Now where do I go from here? Any way to tell who or what is responsible for the data corruption? Trimmed kernel logs from todays (Feb 26th) corruption: -- primary node -- 14:42:11 Digest mismatch, buffer modified by upper layers during write: 713118272s +4096 14:42:11 sock was shut down by peer 14:42:11 peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) 14:42:11 short read expecting header on sock: r=0 14:42:11 new current UUID B9F806E19286A6F7:7015E2DDD8881707:9682DC6D3721EFE1:9681DC6D3721EFE1 14:42:11 meta connection shut down by peer. 14:42:11 asender terminated 14:42:11 Terminating asender thread 14:42:11 Connection closed 14:42:11 conn( BrokenPipe -> Unconnected ) 14:42:11 receiver terminated 14:42:11 Restarting receiver thread 14:42:11 receiver (re)started 14:42:11 conn( Unconnected -> WFConnection ) 14:42:11 Handshake successful: Agreed network protocol version 96 14:42:11 Peer authenticated using 32 bytes of 'sha256' HMAC 14:42:11 conn( WFConnection -> WFReportParams ) 14:42:11 Starting asender thread (from drbd0_receiver [11524]) 14:42:11 data-integrity-alg: md5 14:42:11 max BIO size = 130560 14:42:12 drbd_sync_handshake: 14:42:12 self B9F806E19286A6F7:7015E2DDD8881707:9682DC6D3721EFE1:9681DC6D3721EFE1 bits:54 flags:0 14:42:12 peer 7015E2DDD8881706:0000000000000000:9682DC6D3721EFE0:9681DC6D3721EFE1 bits:0 flags:0 14:42:12 uuid_compare()=1 by rule 70 14:42:12 peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) 14:42:12 helper command: /sbin/drbdadm before-resync-source minor-0 14:42:12 helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0) 14:42:12 conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 14:42:12 Began resync as SyncSource (will sync 240 KB [60 bits set]). 14:42:12 updated sync UUID B9F806E19286A6F7:7016E2DDD8881707:7015E2DDD8881707:9682DC6D3721EFE1 14:42:12 Resync done (total 1 sec; paused 0 sec; 240 K/sec) 14:42:12 1 % had equal check sums, eliminated: 4K; transferred 236K total 240K 14:42:12 updated UUIDs B9F806E19286A6F7:0000000000000000:7016E2DDD8881707:7015E2DDD8881707 14:42:12 conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 14:42:12 bitmap WRITE of 5931 pages took 13 jiffies 14:42:13 0 KB (0 bits) marked out-of-sync by on disk bit-map. -- secondary node -- 14:42:11 Digest integrity check FAILED: 713118272s +4096 14:42:11 error receiving Data, l: 4136! 14:42:11 peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) 14:42:11 asender terminated 14:42:11 Terminating asender thread 14:42:11 Connection closed 14:42:11 conn( ProtocolError -> Unconnected ) 14:42:11 receiver terminated 14:42:11 Restarting receiver thread 14:42:11 receiver (re)started 14:42:11 conn( Unconnected -> WFConnection ) 14:42:11 Handshake successful: Agreed network protocol version 96 14:42:11 Peer authenticated using 32 bytes of 'sha256' HMAC 14:42:11 conn( WFConnection -> WFReportParams ) 14:42:11 Starting asender thread (from drbd0_receiver [9650]) 14:42:11 data-integrity-alg: md5 14:42:11 max BIO size = 130560 14:42:11 drbd_sync_handshake: 14:42:11 self 7015E2DDD8881706:0000000000000000:9682DC6D3721EFE0:9681DC6D3721EFE1 bits:0 flags:0 14:42:11 peer B9F806E19286A6F7:7015E2DDD8881707:9682DC6D3721EFE1:9681DC6D3721EFE1 bits:54 flags:0 14:42:11 uuid_compare()=-1 by rule 50 14:42:11 peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) 14:42:12 conn( WFBitMapT -> WFSyncUUID ) 14:42:12 updated sync uuid 7016E2DDD8881706:0000000000000000:9682DC6D3721EFE0:9681DC6D3721EFE1 14:42:12 helper command: /sbin/drbdadm before-resync-target minor-0 14:42:12 helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) 14:42:12 conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) 14:42:12 Began resync as SyncTarget (will sync 240 KB [60 bits set]). 14:42:12 Resync done (total 1 sec; paused 0 sec; 240 K/sec) 14:42:12 1 % had equal check sums, eliminated: 4K; transferred 236K total 240K 14:42:12 updated UUIDs B9F806E19286A6F6:0000000000000000:7016E2DDD8881706:7015E2DDD8881707 14:42:12 conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) 14:42:12 helper command: /sbin/drbdadm after-resync-target minor-0 14:42:12 helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) 14:42:13 bitmap WRITE of 5931 pages took 15 jiffies 14:42:13 0 KB (0 bits) marked out-of-sync by on disk bit-map. -- NEU: FreePhone - kostenlos mobil telefonieren und surfen! Jetzt informieren: http://www.gmx.net/de/go/freephone