[DRBD-user] What causes nodes to become out-of-sync?

Lars Ellenberg lars.ellenberg at linbit.com
Tue Jul 22 19:54:32 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Jul 22, 2008 at 09:34:31AM -0700, Jeffrey Froman wrote:
> On Tuesday 22 July 2008 02:39:50 am Lars Ellenberg wrote:
> > you are using 8.2.6, aren't you?
> >
> > which file system?
> 
> We are currently using drbd-8.2.5 under an ext3 filesystem, on 
> CentOS-4.
> 
> > did you try to compare the reported blocks "by hand"?
> 
> I have not, but will do so the next time a verification failure is 
> reported. Thanks for the recipe.
> 
> > do you use tcp checksum offloading?
> > any other tcp offloading?
> 
> Not explicitly, but the nodes are connected via Gigabit NICs, so if I 
> understand correctly, it's possible that checksumming is being 
> offloaded to the NIC.

ethtool -k ethX

> > did you try to enable "data-integrity-alg"?
> > (go for crc32c, should be good enough)
> 
> Ah, will do. This looks like it will solve the problem if in fact it's 
> the NICs that interfering.

no. not necessarily the nics.

sender:   (MAIN RAM 1) ->->->-> (bus 1) ->->- (NIC 1) ->-> (network)
receiver: (MAIN RAM 2) <-<-<-<- (bus 2) <-<-- (NIC 2) <-<-<-' 

so if (bus 1) flips bits, and (NIC 1) calculates the checksum [offloading],
and (network) does not further mangle the data, the checksum will match
on (NIC 2) and the data is assumed to be correct.

if the checksum is calculated in (MAIN RAM 1) [no offloading],
then the (bus 1) flips some bits, the checksum will likely not match
on the other side.

similar when (bus 2) flips bits, that may go undetected if the tcp
checksum was already verified in (NIC 2).

the better "end-to-end" your checksums are,
the more likely they will detect transfer errors.

so we have that data-integrity-alg in drbd, which puts a drbd-to-drbd
checksum on the data (and only the data payload, that is a shortcomming;
we should additionally also checksum the drbd packet header).

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list