[DRBD-user] Out-of-sync woes

Tue Aug 1 03:22:36 CEST 2017

Hello everyone.

I have a fairly simple 2-node CentOS 7 setup running KVM virtual machines,
with DRBD 8.4.9 between them.

There is one DRBD resource per VM, with at least 1 volume each, totalling
47 volumes.

There's no clustering or heartbeat or other complexity. DRBD has it's own
Gig-E interface to sync over.

I recently migrated a host between nodes and it crashed. During diagnostics
I did a verification on the drbd volume for the host and found that it had
_a lot_ of out of sync blocks.

This led me to run a verification on all volumes, and while I didn't find
any other volumes with large numbers of out of sync blocks, there were
several with a few. I have disconnected and reconnected all these volumes,
to force them to resync.

I have now set up a nightly cron which will verify as many volumes as it
can in a 2 hour window, this means I get through the whole lot in about a
week.

Almost every night, it reports at least 1 volume which is out-of-sync, and
I'm trying to understand why that would be.

I did some research and the only likely candidate I could find was related
to TCP checksum offloading on the NICs, which I have now disabled, but it
has made no difference.

Any suggestions what might be going on here?

Thanks.

Luke Pascoe

*E* luke at osnz.co.nz
* P* +64 (9) 296 2961
* M* +64 (27) 426 6649
* W* www.osnz.co.nz

24 Wellington St
Papakura
Auckland, 2110
New Zealand
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170801/56dabe9a/attachment.htm>