[DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

Stanislav German-Evtushenko ginermail at gmail.com
Mon Feb 24 10:28:58 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

> > Most of the time (99%) I see ERR for the swap space of virtual machines.
> If you enable "integrity-alg", do you still see those "buffer modified
> by upper layers during write"?
> Well, then that is your problem,
> and that problem can *NOT* be fixed with DRBD "config tuning".
> What does that mean?
>   Upper layer submits write to DRBD.
>   DRBD calculates checksum over data buffer.
>   DRBD sends that checksum.
>   DRBD submits data buffer to "local" backend block device.
>       Meanwhile, upper layer changes data buffer.
>   DRBD sends data buffer to peer.
>   DRBD receives local completion.
>   DRBD receives remote ACK.
>   DRBD completes this write to upper layer.
>       *only now* would the upper layer be "allowed"
>       to change that data buffer again.
>   Misbehaving upper layer results in potentially divergent blocks
>   on the DRBD peers.  Or would result in potentially divergent blocks on
>   a local software RAID 1. Which is why the mdadm maintenance script
>   in rhel, "raid-check", intended to be run periodically from cron,
>   has this tell-tale chunk:
>         mismatch_cnt=`cat /sys/block/$dev/md/mismatch_cnt`
>         # Due to the fact that raid1/10 writes in the kernel are
> unbuffered,
>         # a raid1 array can have non-0 mismatch counts even when the
>         # array is healthy.  These non-0 counts will only exist in
>         # transient data areas where they don't pose a problem.  However,
>         # since we can't tell the difference between a non-0 count that
>         # is just in transient data or a non-0 count that signifies a
>         # real problem, simply don't check the mismatch_cnt on raid1
>         # devices as it's providing far too many false positives.  But by
>         # leaving the raid1 device in the check list and performing the
>         # check, we still catch and correct any bad sectors there might
>         # be in the device.
>         raid_lvl=`cat /sys/block/$dev/md/level`
>         if [ "$raid_lvl" = "raid1" -o "$raid_lvl" = "raid10" ]; then
>             continue
>         fi
> Anyways.
> Point being: Either have those upper layers stop modifying buffers
> while they are in-flight (keyword: "stable pages").
> Kernel upgrade within the VMs may do it.  Changing something in the
> "virtual IO path configuration" may do it.  Or not.
> Or live with the results, which are
> potentially not identical blocks on the DRBD peers.

Hello Lars,

Thank you for the detailed explanation. I've done some more tests and found
that "out of sync" sectors appear for master-slave also, not only for

Can you share your thoughts about what can cause upper layer changes in the
following schema?
KVM (usually virtio) -> LVM -> DRBD -> RAID10 -> Physical drives, while LVM
snapshots are not used.

Can LVM cause these OOS? Could it help if we replace by the following
KVM (usually virtio) -> DRBD -> LVM -> RAID10 -> Physical drives, while LVM
snapshots are not used.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140224/950d5133/attachment.htm>

More information about the drbd-user mailing list