[DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

Mon Feb 24 10:28:58 CET 2014

> > Most of the time (99%) I see ERR for the swap space of virtual machines.
>
> If you enable "integrity-alg", do you still see those "buffer modified
> by upper layers during write"?
>
> Well, then that is your problem,
> and that problem can *NOT* be fixed with DRBD "config tuning".
>
> What does that mean?
>
>   Upper layer submits write to DRBD.
>   DRBD calculates checksum over data buffer.
>   DRBD sends that checksum.
>   DRBD submits data buffer to "local" backend block device.
>       Meanwhile, upper layer changes data buffer.
>   DRBD sends data buffer to peer.
>   DRBD receives local completion.
>   DRBD receives remote ACK.
>   DRBD completes this write to upper layer.
>       *only now* would the upper layer be "allowed"
>       to change that data buffer again.
>
>   Misbehaving upper layer results in potentially divergent blocks
>   on the DRBD peers.  Or would result in potentially divergent blocks on
>   a local software RAID 1. Which is why the mdadm maintenance script
>   in rhel, "raid-check", intended to be run periodically from cron,
>   has this tell-tale chunk:
>         mismatch_cnt=`cat /sys/block/$dev/md/mismatch_cnt`
>         # Due to the fact that raid1/10 writes in the kernel are
> unbuffered,
>         # a raid1 array can have non-0 mismatch counts even when the
>         # array is healthy.  These non-0 counts will only exist in
>         # transient data areas where they don't pose a problem.  However,
>         # since we can't tell the difference between a non-0 count that
>         # is just in transient data or a non-0 count that signifies a
>         # real problem, simply don't check the mismatch_cnt on raid1
>         # devices as it's providing far too many false positives.  But by
>         # leaving the raid1 device in the check list and performing the
>         # check, we still catch and correct any bad sectors there might
>         # be in the device.
>         raid_lvl=`cat /sys/block/$dev/md/level`
>         if [ "$raid_lvl" = "raid1" -o "$raid_lvl" = "raid10" ]; then
>             continue
>         fi
>
> Anyways.
> Point being: Either have those upper layers stop modifying buffers
> while they are in-flight (keyword: "stable pages").
> Kernel upgrade within the VMs may do it.  Changing something in the
> "virtual IO path configuration" may do it.  Or not.
>
> Or live with the results, which are
> potentially not identical blocks on the DRBD peers.
>

Hello Lars,

Thank you for the detailed explanation. I've done some more tests and found
that "out of sync" sectors appear for master-slave also, not only for
master-master.

Can you share your thoughts about what can cause upper layer changes in the
following schema?
KVM (usually virtio) -> LVM -> DRBD -> RAID10 -> Physical drives, while LVM
snapshots are not used.

Can LVM cause these OOS? Could it help if we replace by the following
schema?
KVM (usually virtio) -> DRBD -> LVM -> RAID10 -> Physical drives, while LVM
snapshots are not used.

Stanislav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140224/950d5133/attachment.htm>