[DRBD-user] Out-of-sync woes

Luke Pascoe luke at osnz.co.nz
Fri Aug 4 01:11:13 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello everyone.

I have a fairly simple 2-node CentOS 7 setup running KVM virtual
machines, with DRBD 8.4.9 between them.

There is one DRBD resource per VM, with at least 1 volume each,
totalling 47 volumes.

There's no clustering or heartbeat or other complexity. DRBD has it's
own Gig-E interface to sync over.

I recently migrated a host between nodes and it crashed. During
diagnostics I did a verification on the drbd volume for the host and
found that it had _a lot_ of out of sync blocks.

This led me to run a verification on all volumes, and while I didn't
find any other volumes with large numbers of out of sync blocks, there
were several with a few. I have disconnected and reconnected all these
volumes, to force them to resync.

I have now set up a nightly cron which will verify as many volumes as
it can in a 2 hour window, this means I get through the whole lot in
about a week.

Almost every night, it reports at least 1 volume which is out-of-sync,
and I'm trying to understand why that would be.

I did some research and the only likely candidate I could find was
related to TCP checksum offloading on the NICs, which I have now
disabled, but it has made no difference.

Any suggestions what might be going on here?

Thanks.

Luke Pascoe



More information about the drbd-user mailing list