[DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

Tue Feb 4 23:28:38 CET 2014

On Thu, Jan 30, 2014 at 11:26:43AM +0400, Stanislav German-Evtushenko wrote:
> Just to make things clearer. These results are not false-positive, they are
> real. False-positive also happen but rarely.

Since you re-opened this after about one year,
allow me to paste my answer from back then as well.

.----------
| On Mon, Mar 25, 2013 at 12:20:20PM +0400, Stanislav German-Evtushenko
| wrote:
| > Futher investigations...
| >
| > First vefification went well but then strange things started to
| > happen.
| > Full logs are here: http://pastebin.com/ntbQcaNz
| 
| ... "Digest mismatch, buffer modified by upper layers during write" ...
| 
| You may want to read this (and following; or even the whole thread):
| http://www.gossamer-threads.com/lists/drbd/users/21069#21069
| 
| as well as the links mentioned there
| | The Problem:
| | http://lwn.net/Articles/429305/
| | http://thread.gmane.org/gmane.linux.kernel/1103571
| | http://thread.gmane.org/gmane.linux.scsi/59259
| 
| So you *possibly* have ongoing data corruption
| caused by hardware, or layers above DRBD.
| 
| Or you may just have "normal behaviour",
| and if DRBD was not that paranoid, you'd not even notice, ever.
`---------------

[...]

> Most of the time (99%) I see ERR for the swap space of virtual machines.

If you enable "integrity-alg", do you still see those "buffer modified
by upper layers during write"?

Well, then that is your problem,
and that problem can *NOT* be fixed with DRBD "config tuning".

What does that mean?

  Upper layer submits write to DRBD.
  DRBD calculates checksum over data buffer.
  DRBD sends that checksum.
  DRBD submits data buffer to "local" backend block device.
      Meanwhile, upper layer changes data buffer.
  DRBD sends data buffer to peer.
  DRBD receives local completion.
  DRBD receives remote ACK.
  DRBD completes this write to upper layer.
      *only now* would the upper layer be "allowed"
      to change that data buffer again.

  Misbehaving upper layer results in potentially divergent blocks
  on the DRBD peers.  Or would result in potentially divergent blocks on
  a local software RAID 1. Which is why the mdadm maintenance script
  in rhel, "raid-check", intended to be run periodically from cron,
  has this tell-tale chunk:
	mismatch_cnt=`cat /sys/block/$dev/md/mismatch_cnt`
        # Due to the fact that raid1/10 writes in the kernel are unbuffered,
        # a raid1 array can have non-0 mismatch counts even when the
        # array is healthy.  These non-0 counts will only exist in
        # transient data areas where they don't pose a problem.  However,
        # since we can't tell the difference between a non-0 count that
        # is just in transient data or a non-0 count that signifies a
        # real problem, simply don't check the mismatch_cnt on raid1
        # devices as it's providing far too many false positives.  But by
        # leaving the raid1 device in the check list and performing the
        # check, we still catch and correct any bad sectors there might
        # be in the device.
        raid_lvl=`cat /sys/block/$dev/md/level`
        if [ "$raid_lvl" = "raid1" -o "$raid_lvl" = "raid10" ]; then
            continue
        fi

Anyways.
Point being: Either have those upper layers stop modifying buffers
while they are in-flight (keyword: "stable pages").
Kernel upgrade within the VMs may do it.  Changing something in the
"virtual IO path configuration" may do it.  Or not.

Or live with the results, which are
potentially not identical blocks on the DRBD peers.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed