[DRBD-user] Tracking down sources of corruption examined by drbdadm verify

Philipp Reisner philipp.reisner at linbit.com
Wed May 7 10:39:02 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Am Montag, 5. Mai 2008 21:41:03 schrieb Szeróvay Gergely:
> I done the testing, my experiences are the following:

Hi Szeróvay,

Thank you very much for running these tests:

> I tryed the test process with ReiserFS, XFS and EXT3. Results:
> - ReiserFS: 12 run, found oos blocks after every run
> - XFS: 11 run, no oos blocks
> - EXT3: 15 run, no oos blocks
> - create an LVM2 VG on drbd0, then use a LV with ReiserFS: 2 run,
> found oos blocks after every run

I would say the results speak for themselfs. ReiserFS clearly modifies
data pages that are under IO. 

  Put in other words, that means that ReiserFS does not care which 
  version of the data it actually makes it to disk. 
  This needs not to be a bug in ReiserFS, this can be on purpose:
  I.e. With a write that was submitted later and finished earlier than
  the modified-in-flight-block the modified-in-flight-block gets removed
  from some on disk data structure. Therefore it is valid for ReiserFS
  to modify that block while it is under IO, since it is no longer part
  of the on disk data structure.

  We already know that the swap code has similar behavior. In case 
  a page gets touched while it is under write out IO, the swap allows
  the modification to the page although it is under IO by the block
  layer. Therefore the swap code does not know which version actually
  got written to disk, but it does not care, since it knows that it
  has the up to date version in core memory. 

I.e. in case you put swap space on DRBD you will see the same out-
of-sync-blocks but that is okay. 

We need to add this to the documentation, and we will also add a
paragraph regarding ResierFS, thanks to your work!

> Another interesting experience: I use the "data-integrity-alg md5;"
> setting during the first 6 runs. In one case there was no problem, but
> in 5 of the 6 runs there was a disconnect/reconnect usually in every
> minutes (drbd0: Digest integrity check FAILED. Broken NICs?). The
> problem was shown with every fs. Then I tested the network, disabled
> the data-integrity-alg feature. In the case of EXT3 & XFS I found no
> oos blocks with disabled data integrity checking. I experienced this
> problem earlier in our production system too, but I thought it's a
> firewall or other network problem.

When a layer above us modifies blocks while we have them under IO, means
that we might take our md5 sum of the old version and send the new 
version over the network.

So, ReiserFS's behavior also explains this.

> My plan is to move from ReiserFS because of these problems and the
> unclean future of it. According your knowledge can you give me an
> advice, which fs is the best choice with DRBD? I ask it because you
> mentioned similar probems with EXT3 in your first reply.

To my knowledge this was a bug in EXT3 that has been fixed in the meantime
an all relevant branches (sucker and upstream). 

We like XFS very much for various reasons, but EXT3 is also an excellent
choice for sure.

: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :

More information about the drbd-user mailing list