[DRBD-user] Out-of-sync woes

Jan Schermer jan at schermer.cz
Fri Aug 4 10:31:02 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


AFAIK this should not affect data integrity at rest (related to “verify-alg”) but only in-flight (csum-alg), and even then at most few blocks (that are in-flight) should be affected? (btw shouldn’t stable_pages_required be enabled?)

I think it’s more likely he’s hitting a number of bugs that are getting fixed in DRBD, where it would simply not resync data while appearing Consistent/UpToDate etc. I urge you to look at drbdsetup status --verbose —statistics $resource and look for out-of-sync counter >0.

We used cache=none with qemu and switched to cache=writeback with no corruption - you just need to take care only to have it primary on one node then (works with live migrations if you know what you’re doing though).

Jan


> On 4 Aug 2017, at 09:55, Veit Wahlich <cru.lists at zodia.de> wrote:
> 
> Hi Luke,
> 
> I assume you are experiencing the results of data inconsistency by
> in-flight writes. This means that a process (here your VM's qemu) can
> change a block that already waits to be written to disk.
> Whether this happens (undetected) or not depends on how the data is
> accessed for writing and synced to disk.
> 
> For qemu, you have to consider two factors; the guest OS' file systems'
> configuration and qemu's disk caching configuration:
> On Linux guests, this usually only happens for guests with file systems,
> that are NOT mounted either sync or with barriers, and with block-backed
> swap.
> On Windows guests it always happens.
> For qemu it depends on how the disk caching strategy is configured and
> thus whether it allows in-fight writes or not.
> 
> The common position is to configure qemu for writethrough caching for
> all disks and leave your guests' OS unchanged. You will also have to
> ignore/override libvirt's warning about unsafe migration with this cache
> setting, as it only applies to file-backed VM disks, not
> blockdev-backed.
> I use this for hundreds of both Linux and Windows VMs backed by DRBD
> block devices and have no inconsistency problems at all since this
> change.
> 
> Changing qemu's caching strategy might affect performance.
> For performance reasons you are advised to use a hardware RAID
> controller with battery-backed write-back cache.
> 
> For consistency reasons you are advised to use real hardware RAID, too,
> as the in-flight block changing problem described above might also
> affect mdraid, dmraid/FakeRAID, LVM mirroring, etc. (depending on
> configuration).
> 
> Best regards,
> // Veit
> 
> 
> Am Freitag, den 04.08.2017, 11:11 +1200 schrieb Luke Pascoe:
>> Hello everyone.
>> 
>> I have a fairly simple 2-node CentOS 7 setup running KVM virtual
>> machines, with DRBD 8.4.9 between them.
>> 
>> There is one DRBD resource per VM, with at least 1 volume each,
>> totalling 47 volumes.
>> 
>> There's no clustering or heartbeat or other complexity. DRBD has it's
>> own Gig-E interface to sync over.
>> 
>> I recently migrated a host between nodes and it crashed. During
>> diagnostics I did a verification on the drbd volume for the host and
>> found that it had _a lot_ of out of sync blocks.
>> 
>> This led me to run a verification on all volumes, and while I didn't
>> find any other volumes with large numbers of out of sync blocks, there
>> were several with a few. I have disconnected and reconnected all these
>> volumes, to force them to resync.
>> 
>> I have now set up a nightly cron which will verify as many volumes as
>> it can in a 2 hour window, this means I get through the whole lot in
>> about a week.
>> 
>> Almost every night, it reports at least 1 volume which is out-of-sync,
>> and I'm trying to understand why that would be.
>> 
>> I did some research and the only likely candidate I could find was
>> related to TCP checksum offloading on the NICs, which I have now
>> disabled, but it has made no difference.
>> 
>> Any suggestions what might be going on here?
>> 
>> Thanks.
>> 
>> Luke Pascoe
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list