Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello Lars, On 09/13/2013 01:52 PM, Lars Ellenberg wrote: > On Fri, Sep 13, 2013 at 10:47:17AM +0200, Martin Reissner wrote: >> for some days now I've been getting these errors in the log every >> couple hours and I have a hard time figuring out where they come from. >> I know this is most likely not a DRBD issue as the setup has been >> running without problems for months and nothing has been changed. I >> don't know what else to try though, can someone on here maybe point me >> in the right direction? >> >> I have a simple active/passive Setup running Mysql on Debian 6.0.7 >> (Squeeze), DRBD Version is 8.3.7. >> >> We tried running a manual Online Verify but each time it was aborted by >> the disconnect caused by the "Digest integrity check FAILED". Finally I >> disabled the "data-integrity-alg" Option and then the Verify completed >> without any errors. >> >> I've had the Hardware (RAM,CPU,Disks) checked on both nodes to no avail >> and I also replaced the NICs for the Direct/Crosslink that is used by DRBD. >> >> Following up are corresponding logs from mdb1-ha1 and mdb1-ha2, I will >> gladly provide further info if needed. FWIW, the setup is still running >> live without any issues and unless I turn on the "data-integrity-alg" >> the logs stay clean. > > Do these threads help? > > http://thread.gmane.org/gmane.linux.network.drbd/21223/focus=21391 > http://thread.gmane.org/gmane.linux.network.drbd/22836/focus=22897 > http://thread.gmane.org/gmane.linux.network.drbd/19409/focus=19426 > > And more ... I had already read the first two threads and just so did with the third, unfortunately I couldn't get much help out of them. I tried a full resync after reading the threads but to no avail, the errors still keep showing. You wrote that a change of access patterns might cause the said errors because buffers might be modified in-flight. Do you think it possible that some changed or new mysql queries might be enough of a change to cause the errors, because this is the most change that might have occurred? > *maybe* you have hardware problems. As nothing has changed serverside I was almost certain of it being a hardware problem, I just don't know what else to check besides replacing the whole servers which would be the next step. I just thought I'd try to ask the list for further ideas as a last resort before a complete replacement. > *likely* you just have "normal" behaviour of "misbehaving" (from > the point of view of the storage subsystem) application/kernel. > > Upgrading the kernel may help. Or not. Of course this would be my preferred solution and the successful online verifies support this but as I wrote, nothing was changed on the servers, no updates, no newly installed software.. Can you think of a scenario that would cause the errors even without any changes? I did (kernel) updates along with the hardware checks but without any effect, I will try installing a kernel from backports though, shouldn't be that much effort. > We should rename "data integrity" to > "calculate and double check message digests for diagnostic purposes > and burn cpu as a side effect" Thanks, Martin