[DRBD-user] Explained: Digest integrity check FAILED

Sun Feb 27 00:40:29 CET 2011

On Sat, Feb 26, 2011 at 07:31:03PM +0100, Walter Haidinger wrote:
> Hi Lars, thanks for the reply.
> 
> > So you no longer have any problems/ASSERTs regarding drbd_al_read_log?
> 
> No, those are gone. I did a create-md on the secondary node and a full resync. Don't know if that was "the fix", though, but I suppose so.
>  
> > Well, what does the other (Primary) side say?
> > I'd expect it to say
> > "Digest mismatch, buffer modified by upper layers during write: ..."
> 
> Yes, it does (see the kernel logs below).
> 
> > If it does not, your link corrupty data.
> > If it does, well, then that's what happens.
> > (note: this double check on the sending side
> >  has only been introduced with 8.3.10)
> 
> Now where do I go from here? 
> Any way to tell who or what is responsible for the data corruption?

There is just "buffers modified during writeout".
That's not necessarily the same as data corruption.

Quoting the DRBD User's Guide:

  Notes on data integrity

  There are two independent methods in DRBD to ensure the integrity of the
  mirrored data. The online-verify mechanism and the data-integrity-alg of
  the network section.

  Both mechanisms might deliver false positives if the user of DRBD
  modifies the data which gets written to disk while the transfer goes on.
  This may happen for swap, or for certain append while global sync, or
  truncate/rewrite workloads, and not necessarily poses a problem for the
  integrity of the data. Usually when the initiator of the data transfer
  does this, it already knows that that data block will not be part of an
  on disk data structure, or will be resubmitted with correct data soon
  enough.

  ...

If you don't want to know about that, disable that check.
If the replication link interruptions caused by that check
are bad for your setup (particularly so in dual primary setups),
disabled that check.

If you want to use it anyways: that's great, do so, and live with it.

If you want to have DRBD do "end-to-end" data checksums, even if the
data buffers may be modified while being in flight, and still want it to
be efficient, sponsor feature development.

The Problem:
  http://lwn.net/Articles/429305/
  http://thread.gmane.org/gmane.linux.kernel/1103571
  http://thread.gmane.org/gmane.linux.scsi/59259

And many many more older threads on various ML,
some of them misleading, some of them mixing
this issue of in-flight modifications
with actual (hardware caused) data corruption.

Possible Solutions:
  - DRBD starts to first copy every submitted data to some
    private pages, then calculates the checksum.
    As this is now a checksum over *private* pages, if it does not
    match, that's a always a sign of data corruption.
    It also is a significant performance hit.  Potentially, we could
    optimistically try to get away without copying, and only take the
    performance hit once we see a mismatch, in which case we'd need to
    copy it still anyways, and send it again -- if we still have it.

  - Linux generic write-out path is fixed to not allow
    modifications of data during write-out.

  - Linux generic block integrity framework is fixed in whatever
    way is deemed most useful, and DRBD switches to use that instead,
    respectively simply forward integrity information, which may
    already have been generated by some layer above DRBD.

The "generic write out path" people seem to be on it, this time.
Not sure if it will help much with VMs on top of DRBD, as they will run
older kernels or different operating systems doing things differently,
potentially screwing things up.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com