[DRBD-user] What causes nodes to become out-of-sync?

Lars Ellenberg lars.ellenberg at linbit.com
Tue Jul 22 11:39:50 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Jul 21, 2008 at 09:53:09AM -0700, Jeffrey Froman wrote:
> Hello,
> 
> > what are the sorts of things that might be causing 
> > blocks to become out-of-sync on this resource?
> 
> A bit of follow-up: verification errors have been hitting us this week 
> with increased frequency. We are using Protocol C for replication of 
> a 135GB resource, and the secondary device reports no disk errors of 
> any kind.
> 
> Yet verification continues to fail -- now up to about 3 times per week 
> (checked daily). We now feel rather unconfident that we can switch 
> roles on the two nodes reliably at any time, since we have no idea 
> when or why synchronization is breaking.
> 
> Is it possible that we are experiencing a race condition between 
> verification and replication, and that the blocks which verification 
> finds out-of-sync have actually just been updated by normal 
> replication between nodes during the checksum comparison?

anything is possible.
though there should not be any race condition.
you are using 8.2.6, aren't you?

which file system?

> If so, is there any option that will force retries of failed checksum 
> comparisons for each block that is compared during verification?

no.
did you try to compare the reported blocks "by hand"?
like 
on both# dd if=/dev/whatever iflags=direct bs=512 \
	skip=$reported_sector_number count=$reported_sector_count \
	of=tmp.$HOSTNAME.$rsn.$rsc
then
on left# scp right:tmp.right.$rsn.$rsc .
       # diff -U0 <(xxd tmp.left.$rsn.$rsc) <(xxd tmp.right.$rsn.$rsc)


> Is there anything else I can add to my configuration to help determine 
> exactly when these blocks are falling out-of-sync?

do you use tcp checksum offloading?
any other tcp offloading?

did you try to enable "data-integrity-alg"?
(go for crc32c, should be good enough)

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list