Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, for some days now I've been getting these errors in the log every couple hours and I have a hard time figuring out where they come from. I know this is most likely not a DRBD issue as the setup has been running without problems for months and nothing has been changed. I don't know what else to try though, can someone on here maybe point me in the right direction? I have a simple active/passive Setup running Mysql on Debian 6.0.7 (Squeeze), DRBD Version is 8.3.7. We tried running a manual Online Verify but each time it was aborted by the disconnect caused by the "Digest integrity check FAILED". Finally I disabled the "data-integrity-alg" Option and then the Verify completed without any errors. I've had the Hardware (RAM,CPU,Disks) checked on both nodes to no avail and I also replaced the NICs for the Direct/Crosslink that is used by DRBD. Following up are corresponding logs from mdb1-ha1 and mdb1-ha2, I will gladly provide further info if needed. FWIW, the setup is still running live without any issues and unless I turn on the "data-integrity-alg" the logs stay clean. Martin ha1: Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012133] block drbd1: Digest integrity check FAILED. Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012167] block drbd1: error receiving Data, l: 4140! Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012197] block drbd1: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012212] block drbd1: asender terminated Sep 5 07:49:10 mdb1-ha1 kernel: [68271.012215] block drbd1: Terminating drbd1_asender Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013179] block drbd1: Connection closed Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013182] block drbd1: conn( ProtocolError -> Unconnected ) Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013185] block drbd1: receiver terminated Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013186] block drbd1: Restarting drbd1_receiver Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013188] block drbd1: receiver (re)started Sep 5 07:49:10 mdb1-ha1 kernel: [68271.013191] block drbd1: conn( Unconnected -> WFConnection ) Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177560] block drbd1: Handshake successful: Agreed network protocol version 91 Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177566] block drbd1: conn( WFConnection -> WFReportParams ) Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177582] block drbd1: Starting asender thread (from drbd1_receiver [2032]) Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177689] block drbd1: data-integrity-alg: sha1 Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177753] block drbd1: drbd_sync_handshake: Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177757] block drbd1: self 095ABE2754A6CE94:0000000000000000:F0420ACD09464C04:704D31CBB5F812AF bits:0 flags:0 Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177761] block drbd1: peer 90C3D267D663925D:095ABE2754A6CE95:F0420ACD09464C05:704D31CBB5F812AF bits:61 flags:0 Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177765] block drbd1: uuid_compare()=-1 by rule 50 Sep 5 07:49:11 mdb1-ha1 kernel: [68272.177770] block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Sep 5 07:49:11 mdb1-ha1 kernel: [68272.442588] block drbd1: conn( WFBitMapT -> WFSyncUUID ) Sep 5 07:49:11 mdb1-ha1 kernel: [68272.445292] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 Sep 5 07:49:11 mdb1-ha1 kernel: [68272.446469] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) Sep 5 07:49:11 mdb1-ha1 kernel: [68272.446472] block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Sep 5 07:49:11 mdb1-ha1 kernel: [68272.446476] block drbd1: Began resync as SyncTarget (will sync 244 KB [61 bits set]). Sep 5 07:49:11 mdb1-ha1 kernel: [68272.533948] block drbd1: Resync done (total 1 sec; paused 0 sec; 244 K/sec) Sep 5 07:49:11 mdb1-ha1 kernel: [68272.533957] block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Sep 5 07:49:11 mdb1-ha1 kernel: [68272.533964] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 Sep 5 07:49:11 mdb1-ha1 kernel: [68272.554497] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0) ha2: Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587564] block drbd1: sock was shut down by peer Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587567] block drbd1: meta connection shut down by peer. Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587572] block drbd1: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587580] block drbd1: asender terminated Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587581] block drbd1: Terminating drbd1_asender Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587584] block drbd1: Creating new current UUID Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587593] block drbd1: sock_sendmsg returned -32 Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587595] block drbd1: short sent ReportUUIDs size=56 sent=0 Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587648] block drbd1: short read expecting header on sock: r=0 Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587837] block drbd1: Connection closed Sep 5 07:49:10 mdb1-ha2 kernel: [32102358.587841] block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.650659] block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 4 (0x400) Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.650662] block drbd1: fence-peer helper returned 4 (peer was fenced) Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.650667] block drbd1: pdsk( DUnknown -> Outdated ) Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655320] block drbd1: conn( NetworkFailure -> Unconnected ) Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655326] block drbd1: receiver terminated Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655327] block drbd1: Restarting drbd1_receiver Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655329] block drbd1: receiver (re)started Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.655333] block drbd1: conn( Unconnected -> WFConnection ) Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.752623] block drbd1: Handshake successful: Agreed network protocol version 91 Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.752630] block drbd1: conn( WFConnection -> WFReportParams ) Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.752644] block drbd1: Starting asender thread (from drbd1_receiver [1758]) Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.752696] block drbd1: data-integrity-alg: sha1 Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757933] block drbd1: drbd_sync_handshake: Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757937] block drbd1: self 90C3D267D663925D:095ABE2754A6CE95:F0420ACD09464C05:704D31CBB5F812AF bits:61 flags:0 Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757940] block drbd1: peer 095ABE2754A6CE94:0000000000000000:F0420ACD09464C04:704D31CBB5F812AF bits:0 flags:0 Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757942] block drbd1: uuid_compare()=1 by rule 70 Sep 5 07:49:11 mdb1-ha2 kernel: [32102359.757947] block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> UpToDate ) Sep 5 07:49:11 mdb1-ha2 kernel: [32102360.020204] block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Sep 5 07:49:11 mdb1-ha2 kernel: [32102360.020212] block drbd1: Began resync as SyncSource (will sync 244 KB [61 bits set]). Sep 5 07:49:11 mdb1-ha2 kernel: [32102360.109042] block drbd1: Resync done (total 1 sec; paused 0 sec; 244 K/sec) Sep 5 07:49:11 mdb1-ha2 kernel: [32102360.109047] block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) -- Wavecon GmbH | Ludwigstraße 2 | 90763 Fuerth HR/HRN: 10780 | GF: Cemil Degirmenci Ust-ID: DE251398082| Fon +49 911 120 6581 Fax: +49 911 212 923 3 | Web: wavecon.de Mail + Jabber: mreissner at wavecon.de