Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Sep 13, 2013 at 10:47:17AM +0200, Martin Reissner wrote: > Hello, > > for some days now I've been getting these errors in the log every > couple hours and I have a hard time figuring out where they come from. > I know this is most likely not a DRBD issue as the setup has been > running without problems for months and nothing has been changed. I > don't know what else to try though, can someone on here maybe point me > in the right direction? > > I have a simple active/passive Setup running Mysql on Debian 6.0.7 > (Squeeze), DRBD Version is 8.3.7. > > We tried running a manual Online Verify but each time it was aborted by > the disconnect caused by the "Digest integrity check FAILED". Finally I > disabled the "data-integrity-alg" Option and then the Verify completed > without any errors. > > I've had the Hardware (RAM,CPU,Disks) checked on both nodes to no avail > and I also replaced the NICs for the Direct/Crosslink that is used by DRBD. > > Following up are corresponding logs from mdb1-ha1 and mdb1-ha2, I will > gladly provide further info if needed. FWIW, the setup is still running > live without any issues and unless I turn on the "data-integrity-alg" > the logs stay clean. Do these threads help? http://thread.gmane.org/gmane.linux.network.drbd/21223/focus=21391 http://thread.gmane.org/gmane.linux.network.drbd/22836/focus=22897 http://thread.gmane.org/gmane.linux.network.drbd/19409/focus=19426 And more ... tl;dr: *maybe* you have hardware problems. *likely* you just have "normal" behaviour of "misbehaving" (from the point of view of the storage subsystem) application/kernel. Upgrading the kernel may help. Or not. We should rename "data integrity" to "calculate and double check message digests for diagnostic purposes and burn cpu as a side effect". Lars > > Martin > > ha1: > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012133] block drbd1: Digest > integrity check FAILED. > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012167] block drbd1: error > receiving Data, l: 4140! > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012197] block drbd1: peer( > Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate > -> DUnknown ) > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012212] block drbd1: asender > terminated > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012215] block drbd1: Terminating > drbd1_asender > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013179] block drbd1: Connection > closed > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013182] block drbd1: conn( > ProtocolError -> Unconnected ) > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013185] block drbd1: receiver > terminated > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013186] block drbd1: Restarting > drbd1_receiver > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013188] block drbd1: receiver > (re)started > Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013191] block drbd1: conn( > Unconnected -> WFConnection ) > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177560] block drbd1: Handshake > successful: Agreed network protocol version 91 > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177566] block drbd1: conn( > WFConnection -> WFReportParams ) > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177582] block drbd1: Starting > asender thread (from drbd1_receiver [2032]) > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177689] block drbd1: > data-integrity-alg: sha1 > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177753] block drbd1: > drbd_sync_handshake: > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177757] block drbd1: self > 095ABE2754A6CE94:0000000000000000:F0420ACD09464C04:704D31CBB5F812AF > bits:0 flags:0 > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177761] block drbd1: peer > 90C3D267D663925D:095ABE2754A6CE95:F0420ACD09464C05:704D31CBB5F812AF > bits:61 flags:0 > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177765] block drbd1: > uuid_compare()=-1 by rule 50 > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177770] block drbd1: peer( > Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown > -> UpToDate ) > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.442588] block drbd1: conn( > WFBitMapT -> WFSyncUUID ) > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.445292] block drbd1: helper > command: /sbin/drbdadm before-resync-target minor-1 > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.446469] block drbd1: helper > command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.446472] block drbd1: conn( > WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.446476] block drbd1: Began > resync as SyncTarget (will sync 244 KB [61 bits set]). > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.533948] block drbd1: Resync done > (total 1 sec; paused 0 sec; 244 K/sec) > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.533957] block drbd1: conn( > SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.533964] block drbd1: helper > command: /sbin/drbdadm after-resync-target minor-1 > Sep 5 07:49:11 mdb1-ha1 kernel: [68272.554497] block drbd1: helper > command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0) > > ha2: > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587564] block drbd1: sock was > shut down by peer > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587567] block drbd1: meta > connection shut down by peer. > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587572] block drbd1: peer( > Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( > UpToDate -> DUnknown ) > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587580] block drbd1: asender > terminated > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587581] block drbd1: > Terminating drbd1_asender > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587584] block drbd1: Creating > new current UUID > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587593] block drbd1: > sock_sendmsg returned -32 > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587595] block drbd1: short > sent ReportUUIDs size=56 sent=0 > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587648] block drbd1: short > read expecting header on sock: r=0 > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587837] block drbd1: > Connection closed > Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587841] block drbd1: helper > command: /sbin/drbdadm fence-peer minor-1 > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.650659] block drbd1: helper > command: /sbin/drbdadm fence-peer minor-1 exit code 4 (0x400) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.650662] block drbd1: > fence-peer helper returned 4 (peer was fenced) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.650667] block drbd1: pdsk( > DUnknown -> Outdated ) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655320] block drbd1: conn( > NetworkFailure -> Unconnected ) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655326] block drbd1: receiver > terminated > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655327] block drbd1: > Restarting drbd1_receiver > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655329] block drbd1: receiver > (re)started > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655333] block drbd1: conn( > Unconnected -> WFConnection ) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.752623] block drbd1: > Handshake successful: Agreed network protocol version 91 > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.752630] block drbd1: conn( > WFConnection -> WFReportParams ) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.752644] block drbd1: Starting > asender thread (from drbd1_receiver [1758]) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.752696] block drbd1: > data-integrity-alg: sha1 > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757933] block drbd1: > drbd_sync_handshake: > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757937] block drbd1: self > 90C3D267D663925D:095ABE2754A6CE95:F0420ACD09464C05:704D31CBB5F812AF > bits:61 flags:0 > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757940] block drbd1: peer > 095ABE2754A6CE94:0000000000000000:F0420ACD09464C04:704D31CBB5F812AF > bits:0 flags:0 > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757942] block drbd1: > uuid_compare()=1 by rule 70 > Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757947] block drbd1: peer( > Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( > Outdated -> UpToDate ) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102360.020204] block drbd1: conn( > WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102360.020212] block drbd1: Began > resync as SyncSource (will sync 244 KB [61 bits set]). > Sep 5 07:49:11 mdb1-ha2 kernel: [32102360.109042] block drbd1: Resync > done (total 1 sec; paused 0 sec; 244 K/sec) > Sep 5 07:49:11 mdb1-ha2 kernel: [32102360.109047] block drbd1: conn( > SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) > > -- > Wavecon GmbH | Ludwigstraße 2 | 90763 Fuerth > HR/HRN: 10780 | GF: Cemil Degirmenci > Ust-ID: DE251398082| Fon +49 911 120 6581 > Fax: +49 911 212 923 3 | Web: wavecon.de > Mail + Jabber: mreissner at wavecon.de > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed