[DRBD-user] Tracking down sources of corruption examined by drbdadm verify

Mon May 5 21:41:03 CEST 2008

I done the testing, my experiences are the following:

Our "most problematic volume" contains a Mysql5 database. I made a
snapshot from it, then I collected the write SQL queries to the binary
log for 3 days. So I got a Mysql database in binary form (120MB) and
an SQL file (90MB). I can replay the SQL file on the database with
Mysql.

I run the following test process on my test-nodes (drbd0 = 2GB volume,
Primary/Secondary, UpToDate/UpToDate, protocol C):
1. mkfs on drbd0
2. mount drbd0 to /mnt/test
2. extract the snapshot (120MB) to /mnt/test
3. start Mysql daemon, datadir = /mnt/test
4. replay the SQL file (90MB) with the Mysql client
5. stop Mysql daemon
6. umount /mnt/test
7. drbdadm verify drbd0, wait until it finished
8. collect the new entries from syslog
9. disconnect/connect drbd0 to remove the oos blocks if any

One run takes about 20 mins.

I tryed the test process with ReiserFS, XFS and EXT3. Results:
- ReiserFS: 12 run, found oos blocks after every run
- XFS: 11 run, no oos blocks
- EXT3: 15 run, no oos blocks
- create an LVM2 VG on drbd0, then use a LV with ReiserFS: 2 run,
found oos blocks after every run

I usually ran the test in the following order: ReiserFS, XFS, EXT3,
ReiserFS, XFS, EXT3 ...

So ReiserFS produced the oos blocks in every case. I tryed it with
different journal options, with protocol A, B, C, but the oos block
always generated.

Another interesting experience: I use the "data-integrity-alg md5;"
setting during the first 6 runs. In one case there was no problem, but
in 5 of the 6 runs there was a disconnect/reconnect usually in every
minutes (drbd0: Digest integrity check FAILED. Broken NICs?). The
problem was shown with every fs. Then I tested the network, disabled
the data-integrity-alg feature. In the case of EXT3 & XFS I found no
oos blocks with disabled data integrity checking. I experienced this
problem earlier in our production system too, but I thought it's a
firewall or other network problem.

My plan is to move from ReiserFS because of these problems and the
unclean future of it. According your knowledge can you give me an
advice, which fs is the best choice with DRBD? I ask it because you
mentioned similar probems with EXT3 in your first reply.

On Tue, Apr 22, 2008 at 8:09 PM, Lars Ellenberg
<lars.ellenberg at linbit.com> wrote:
>
> On Tue, Apr 22, 2008 at 04:20:21PM +0200, Szeróvay Gergely wrote:
>  > On Fri, Apr 18, 2008 at 11:29 AM, Lars Ellenberg
>  > <lars.ellenberg at linbit.com> wrote:
>  > > On Thu, Apr 17, 2008 at 07:19:06PM +0200, Szeróvay Gergely wrote:
>  > >  > >  > Any idea would help.
>  > >  > >
>  > >  > >  what file systems?
>  > >  > >  what kernel version?
>  > >  > >  what drbd protocol?
>  > >  > >
>  > >  > >  it is possible (I got this suspicion earlier, but could not prove it
>  > >  > >  during local testing) that something submits a buffer to the block
>  > >  > >  device stack, but then modifies this buffer while it is still in flight.
>  > >  > >
>  > >  > >  these snippets you show look suspiciously like block maps.  if the block
>  > >  > >  offset also confirms that this is within some filesystem block map, than
>  > >  > >  this is my working theory of what happens:
>  > >  > >
>  > >  > >  ext3 submits block to drbd
>  > >  > >   drbd writes to local storage
>  > >  > >   ext3 modifies the page, even though the bio is not yet completed
>  > >  > >   drbd sends the (now modified) page over network
>  > >  > >   drbd is notified of local completion
>  > >  > >   drbd receives acknowledgement of remote completion
>  > >  > >  original request completed.
>  > >  > >
>  > >  > >  i ran into these things while testing the "data integrity" thing,
>  > >  > >  i.e. "data-integrity-alg md5sum", where every now and then
>  > >  > >  an ext3 on top of drbd would produce "wrong checksums",
>  > >  > >  and the hexdump of the corresponding data payload always
>  > >  > >  looked like a block map, and was different in just one 64bit "pointer".
>  > >
>  > >
>  > > > DRBD 8.2.5 with protocol „C"
>  > >  >
>  > >  > Kernel versions (kernels from kernel.org with Vserver patch):
>  > >  > node „immortal": 2.6.21.6-vs2.2.0.3 32bit smp
>  > >  > node „endless": 2.6.22.18-vs2.2.0.6 32bit smp (with new e1000 driver)
>  > >  > node „infinity": 2.6.22.18-vs2.2.0.6 32bit smp (with new e1000 driver)
>  > >  >
>  > >  > I use Reiserfs usually with group quotas enabled. The DRBD device is
>  > >  > on the top of LVM2 (and on software RAID1 in some cases).
>  > >  >
>  > >  > My system often has heavy load, but I cannot found connection between
>  > >  > the oos blocks and the load. My most problematic volume  contains a
>  > >  > Mysql5 database. I try to stress it with move big files to the volume,
>  > >  > but the oos blocks not generated more frequently.
>  > >  >
>  > >  > I tried the crc32 data-integrity-alg on one most problematic volume,
>  > >  > it detected some errors per day, but I think its not a network error,
>  > >  > because the network pass the tests cleanly, and the full resyncs made
>  > >  > no corruptions.
>  > >
>  > >  right. so my working hypthesis is that somehow reiserfs
>  > >  modifies its buffers even while they are in flight.
>  > >
>  > >  because submission to local disk and tcp send over network happen at
>  > >  different times, local disk and remote system see different data.
>  > >
>  > >  to verify that, we could
>  > >   * memcopy the data which is submitted
>  > >   * memcmp it just before it is completed
>  > >  I could provide a patch to do so somewhen next week.
>  > >
>  > >  if it is an option, change the filesystem of your "most problematic"
>  > >  volumes to xfs, and see how it behaves then.
>  >
>  >
>
> > I have an idea about how could I reproduce the ooses in our test
>  > enviroment.  I hope I can try it in this week.  If I can reproduce it,
>  > I can do the tests with the patch and try the XFS file system.
>
>  doing both at the same time would not prove much.
>  the thing is, I have the suspicion that "something"
>  is modifying in-flight io pages.
>  further, the suspicion is some specific "user" (file system) does this.
>
>  so to gather more data points, collecting "circumstancial evidence",
>  you can try and reproduce the effect with a different "user" (e.g. xfs).
>
>  if the "oos" effect does not show up with xfs,
>  but is reproducible with the other file system,
>  I'd say suspicion confirmed.
>
>  you can do that now.
>
>  the other way is to try to _prove_ that someone does modify in-flight io
>  pages. that would be possible by instrumentalizing the device driver,
>  i.e. patch drbd, adding additional page allocation,
>  doing copy on submit and verify before completion.
>
>  I could code up such a crude verification code, but it would be
>  debugging code only, just to be clear on that. And since this additional
>  copy/verification step would heavily alter the timeing and caching
>  behaviour, adding this debug code may even make the observed effect
>  vanish.
>  we'll see.
>
>
>  > What do you think, if the secondary volume has oos blocks, the file
>  > system is damaged on it?
>
>  difficult to say.
>
>  --
>  : commercial DRBD/HA support and consulting: sales at linbit.com :
>  : Lars Ellenberg                            Tel +43-1-8178292-0  :
>  : LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
>  : Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
>  __
>  please use the "List-Reply" function of your email client.
>
>
> _______________________________________________
>  drbd-user mailing list
>  drbd-user at lists.linbit.com
>  http://lists.linbit.com/mailman/listinfo/drbd-user
>