[DRBD-user] Tracking down sources of corruption examined by drbdadm verify

Thu Apr 17 19:19:06 CEST 2008

On Thu, Apr 17, 2008 at 4:56 PM, Lars Ellenberg
<lars.ellenberg at linbit.com> wrote:
>
> On Thu, Apr 17, 2008 at 01:15:21PM +0200, Szeróvay Gergely wrote:
>  > Hello All,
>  >
>  > I have a 3 node system. In the system I have 25 DRBD mirrored
>  > partition, their total size is about 250GB. The 3 node is:
>  > - immortal: Intel 82573L Gigabit Ethernet NIC (kernel 2.6.21.6,
>  > driver: e1000, version: 7.3.20-k2-NAPI, firmware-version: 0.5-7)
>  > - endless: Intel 82566DM-2 Gigabit Ethernet NIC (kernel 2.6.22.18,
>  > driver: e1000, version: 7.6.15.4,
>  > firmware-version: 1.3-0)
>  > - infinity: Intel 82573E Gigabit Ethernet NIC (kernel 2.6.22.18,
>  > driver: e1000, version: 7.6.15.4, firmware-version: 3.1-7)
>  >
>  > One month ago I switched to DRBD 8.2.5 from 7.x. Before I used the 7.x
>  > series without problems. I had no problem during the update, the parts
>  > of the mirrors connected and synced cleanly.
>  >
>  > After updating I started to verify the DRBD volumes:
>  > - most of them has usually not out-of-sync blocks
>  > - one has 2-3 new oos block almost every day
>  > - a few of them has a new oos block about every week
>  >
>  > I try to track down the source of oos blocks. I read through the
>  > drbd-user forums, in the „Tracking down sources of corruption
>  > (possibly) detected by drbdadm verify" thread I found very useful
>  > hints.
>  >
>  > I cheked my network connections between every node, every direction
>  > with this test:
>  >
>  > host1:~ # md5sum /tmp/file_with_1GB_random_data
>  > host2:~ # netcat -l -p 4999 | md5sum
>  > host1:~ # netcat -q0 192.168.x.x 4999 < /tmp/file_with_1GB_random_data
>  >
>  > The test always gives the same md5sums on the two tested node, the
>  > transfer speed is about 100MB/sec when the file is cached.
>  >
>  > I repeated this test between every node-pairs many times, I found no
>  > md5 mismatch.
>  >
>  > I saved the oos blocks from the underlying device.  I used commands like this:
>  >
>  > host:~ dd iflag=direct bs=512 skip=11993992 count=8
>  > if=/dev/immortal0/65data2 | xxd -a > ./primary_4k_dump
>  >
>  > when the syslog message was
>  >
>  > „Apr 17 11:14:09 immortal kernel: drbd6: Out of sync: start=11993992,
>  > size=8 (sectors)"
>  >
>  > and the primary underlying device was /dev/immortal0/65data2.
>  >
>  > I compared the problematic blocks from the two nodes with diff:
>  > host:~ diff ./primary_4k_dump ./secondary_4k_dump
>  >
>  > I usually found 1-2byte difference between the blocks on the two node,
>  > but one time I found that the last 1336 bytes of block was zeroed out
>  > (on the other node  it has "random" data).Two example:
>  >
>  > 1 4k block oos:
>  > c2
>  > < 0000010: 0000 0000 1500 0000 0000 01ff 0000 0000  ................
>  > ---
>  > > 0000010: 0000 0000 1500 0000 0001 01ff 0000 0000  ................
>  >
>  > another 1 4k block oos:
>  > 22c22
>  > < 00001f0: 0b85 0000 0000 0000 1800 0000 0000 0000  ................
>  > ---
>  > > 00001f0: 2d79 0000 0000 0000 1800 0000 0000 0000  -y..............
>  >
>  > Any idea would help.
>
>  what file systems?
>  what kernel version?
>  what drbd protocol?
>
>  it is possible (I got this suspicion earlier, but could not prove it
>  during local testing) that something submits a buffer to the block
>  device stack, but then modifies this buffer while it is still in flight.
>
>  these snippets you show look suspiciously like block maps.  if the block
>  offset also confirms that this is within some filesystem block map, than
>  this is my working theory of what happens:
>
>  ext3 submits block to drbd
>   drbd writes to local storage
>   ext3 modifies the page, even though the bio is not yet completed
>   drbd sends the (now modified) page over network
>   drbd is notified of local completion
>   drbd receives acknowledgement of remote completion
>  original request completed.
>
>  i ran into these things while testing the "data integrity" thing,
>  i.e. "data-integrity-alg md5sum", where every now and then
>  an ext3 on top of drbd would produce "wrong checksums",
>  and the hexdump of the corresponding data payload always
>  looked like a block map, and was different in just one 64bit "pointer".
>
>  --
>  : Lars Ellenberg                           http://www.linbit.com :
>  : DRBD/HA support and consulting             sales at linbit.com :
>  : LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
>  : Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
>  __
>  please don't Cc me, but send to list -- I'm subscribed
>  _______________________________________________
>  drbd-user mailing list
>  drbd-user at lists.linbit.com
>  http://lists.linbit.com/mailman/listinfo/drbd-user
>

DRBD 8.2.5 with protocol „C"

Kernel versions (kernels from kernel.org with Vserver patch):
node „immortal": 2.6.21.6-vs2.2.0.3 32bit smp
node „endless": 2.6.22.18-vs2.2.0.6 32bit smp (with new e1000 driver)
node „infinity": 2.6.22.18-vs2.2.0.6 32bit smp (with new e1000 driver)

I use Reiserfs usually with group quotas enabled. The DRBD device is
on the top of LVM2 (and on software RAID1 in some cases).

My system often has heavy load, but I cannot found connection between
the oos blocks and the load. My most problematic volume  contains a
Mysql5 database. I try to stress it with move big files to the volume,
but the oos blocks not generated more frequently.

I tried the crc32 data-integrity-alg on one most problematic volume,
it detected some errors per day, but I think its not a network error,
because the network pass the tests cleanly, and the full resyncs made
no corruptions.

Thank you: Gergely