Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Feb 24, 2014 at 01:28:58PM +0400, Stanislav German-Evtushenko wrote: > > > Most of the time (99%) I see ERR for the swap space of virtual machines. > > > > If you enable "integrity-alg", do you still see those "buffer modified > > by upper layers during write"? > > > > Well, then that is your problem, > > and that problem can *NOT* be fixed with DRBD "config tuning". > > > > What does that mean? > > > > Upper layer submits write to DRBD. > > DRBD calculates checksum over data buffer. > > DRBD sends that checksum. > > DRBD submits data buffer to "local" backend block device. > > Meanwhile, upper layer changes data buffer. > > DRBD sends data buffer to peer. > > DRBD receives local completion. > > DRBD receives remote ACK. > > DRBD completes this write to upper layer. > > *only now* would the upper layer be "allowed" > > to change that data buffer again. > > > > Misbehaving upper layer results in potentially divergent blocks > > on the DRBD peers. Or would result in potentially divergent blocks on > > a local software RAID 1. Which is why the mdadm maintenance script > > in rhel, "raid-check", intended to be run periodically from cron, > > has this tell-tale chunk: > > mismatch_cnt=`cat /sys/block/$dev/md/mismatch_cnt` > > # Due to the fact that raid1/10 writes in the kernel are > > unbuffered, > > # a raid1 array can have non-0 mismatch counts even when the > > # array is healthy. These non-0 counts will only exist in > > # transient data areas where they don't pose a problem. However, > > # since we can't tell the difference between a non-0 count that > > # is just in transient data or a non-0 count that signifies a > > # real problem, simply don't check the mismatch_cnt on raid1 > > # devices as it's providing far too many false positives. But by > > # leaving the raid1 device in the check list and performing the > > # check, we still catch and correct any bad sectors there might > > # be in the device. > > raid_lvl=`cat /sys/block/$dev/md/level` > > if [ "$raid_lvl" = "raid1" -o "$raid_lvl" = "raid10" ]; then > > continue > > fi > > > > Anyways. > > Point being: Either have those upper layers stop modifying buffers > > while they are in-flight (keyword: "stable pages"). > > Kernel upgrade within the VMs may do it. Changing something in the > > "virtual IO path configuration" may do it. Or not. > > > > Or live with the results, which are > > potentially not identical blocks on the DRBD peers. > > > > Hello Lars, > > Thank you for the detailed explanation. I've done some more tests and found > that "out of sync" sectors appear for master-slave also, not only for > master-master. > > Can you share your thoughts about what can cause upper layer changes in the > following schema? > KVM (usually virtio) -> LVM -> DRBD -> RAID10 -> Physical drives, while LVM > snapshots are not used. The virtual machine itself is most likely doing "it". > Can LVM cause these OOS? Very unlikely. > Could it help if we replace by the following schema? > KVM (usually virtio) -> DRBD -> LVM -> RAID10 -> Physical drives, > while LVM snapshots are not used. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed