[DRBD-user] Digest integrity check FAILED - Help tracking down the cause

Martin Reissner mreissner at wavecon.de
Mon Sep 16 11:01:08 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello Lars,

On 09/13/2013 01:52 PM, Lars Ellenberg wrote:
> On Fri, Sep 13, 2013 at 10:47:17AM +0200, Martin Reissner wrote:
>> for some days now I've been getting these errors in the log every
>> couple hours and I have a hard time figuring out where they come from.
>> I know this is most likely not a DRBD issue as the setup has been
>> running without problems for months and nothing has been changed. I
>> don't know what else to try though, can someone on here maybe point me
>> in the right direction?
>>
>> I have a simple active/passive Setup running Mysql on Debian 6.0.7
>> (Squeeze), DRBD Version is 8.3.7.
>>
>> We tried running a manual Online Verify but each time it was aborted by
>> the disconnect caused by the "Digest integrity check FAILED". Finally I
>> disabled the "data-integrity-alg" Option and then the Verify completed
>> without any errors.
>>
>> I've had the Hardware (RAM,CPU,Disks) checked on both nodes to no avail
>> and I also replaced the NICs for the Direct/Crosslink that is used by DRBD.
>>
>> Following up are corresponding logs from mdb1-ha1 and mdb1-ha2, I will
>> gladly provide further info if needed. FWIW, the setup is still running
>> live without any issues and unless I turn on the "data-integrity-alg"
>> the logs stay clean.
> 
> Do these threads help?
> 
> http://thread.gmane.org/gmane.linux.network.drbd/21223/focus=21391
> http://thread.gmane.org/gmane.linux.network.drbd/22836/focus=22897
> http://thread.gmane.org/gmane.linux.network.drbd/19409/focus=19426
> 
> And more ...

I had already read the first two threads and just so did with the third,
unfortunately I couldn't get much help out of them. I tried a full
resync after reading the threads but to no avail, the errors still keep
showing.
You wrote that a change of access patterns might cause the said errors
because buffers might be modified in-flight. Do you think it possible
that some changed or new mysql queries might be enough of a change to
cause the errors, because this is the most change that might have occurred?


> *maybe* you have hardware problems.

As nothing has changed serverside I was almost certain of it being a
hardware problem, I just don't know what else to check besides replacing
the whole servers which would be the next step. I just thought I'd try
to ask the list for further ideas as a last resort before a complete
replacement.


> *likely* you just have "normal" behaviour of "misbehaving" (from
> the point of view of the storage subsystem) application/kernel.
> 
> Upgrading the kernel may help. Or not.

Of course this would be my preferred solution and the successful online
verifies support this but as I wrote, nothing was changed on the
servers, no updates, no newly installed software.. Can you think of a
scenario that would cause the errors even without any changes?

I did (kernel) updates along with the hardware checks but without any
effect, I will try installing a kernel from backports though, shouldn't
be that much effort.


> We should rename "data integrity" to
> "calculate and double check message digests for diagnostic purposes
>  and burn cpu as a side effect"

Thanks,

Martin



More information about the drbd-user mailing list