[DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

lars.ellenberg at linbit.com lars.ellenberg at linbit.com
Mon Feb 24 17:54:45 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Feb 24, 2014 at 01:28:58PM +0400, Stanislav German-Evtushenko wrote:
> > > Most of the time (99%) I see ERR for the swap space of virtual machines.
> >
> > If you enable "integrity-alg", do you still see those "buffer modified
> > by upper layers during write"?
> >
> > Well, then that is your problem,
> > and that problem can *NOT* be fixed with DRBD "config tuning".
> >
> > What does that mean?
> >
> >   Upper layer submits write to DRBD.
> >   DRBD calculates checksum over data buffer.
> >   DRBD sends that checksum.
> >   DRBD submits data buffer to "local" backend block device.
> >       Meanwhile, upper layer changes data buffer.
> >   DRBD sends data buffer to peer.
> >   DRBD receives local completion.
> >   DRBD receives remote ACK.
> >   DRBD completes this write to upper layer.
> >       *only now* would the upper layer be "allowed"
> >       to change that data buffer again.
> >
> >   Misbehaving upper layer results in potentially divergent blocks
> >   on the DRBD peers.  Or would result in potentially divergent blocks on
> >   a local software RAID 1. Which is why the mdadm maintenance script
> >   in rhel, "raid-check", intended to be run periodically from cron,
> >   has this tell-tale chunk:
> >         mismatch_cnt=`cat /sys/block/$dev/md/mismatch_cnt`
> >         # Due to the fact that raid1/10 writes in the kernel are
> > unbuffered,
> >         # a raid1 array can have non-0 mismatch counts even when the
> >         # array is healthy.  These non-0 counts will only exist in
> >         # transient data areas where they don't pose a problem.  However,
> >         # since we can't tell the difference between a non-0 count that
> >         # is just in transient data or a non-0 count that signifies a
> >         # real problem, simply don't check the mismatch_cnt on raid1
> >         # devices as it's providing far too many false positives.  But by
> >         # leaving the raid1 device in the check list and performing the
> >         # check, we still catch and correct any bad sectors there might
> >         # be in the device.
> >         raid_lvl=`cat /sys/block/$dev/md/level`
> >         if [ "$raid_lvl" = "raid1" -o "$raid_lvl" = "raid10" ]; then
> >             continue
> >         fi
> >
> > Anyways.
> > Point being: Either have those upper layers stop modifying buffers
> > while they are in-flight (keyword: "stable pages").
> > Kernel upgrade within the VMs may do it.  Changing something in the
> > "virtual IO path configuration" may do it.  Or not.
> >
> > Or live with the results, which are
> > potentially not identical blocks on the DRBD peers.
> >
> 
> Hello Lars,
> 
> Thank you for the detailed explanation. I've done some more tests and found
> that "out of sync" sectors appear for master-slave also, not only for
> master-master.
> 
> Can you share your thoughts about what can cause upper layer changes in the
> following schema?
> KVM (usually virtio) -> LVM -> DRBD -> RAID10 -> Physical drives, while LVM
> snapshots are not used.

The virtual machine itself is most likely doing "it".

> Can LVM cause these OOS?

Very unlikely.

> Could it help if we replace by the following schema?
> KVM (usually virtio) -> DRBD -> LVM -> RAID10 -> Physical drives,
> while LVM snapshots are not used.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list