[DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Eric Robinson eric.robinson at psmnv.com
Fri Oct 13 20:34:32 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Correction...

Mounted /dev/vg_without_drbd/lv_without_drbd on /volume1

...should read...

Mounted /dev/vg_without_drbd/lv_without_drbd on /volume2



> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-
> bounces at lists.linbit.com] On Behalf Of Eric Robinson
> Sent: Friday, October 13, 2017 11:31 AM
> To: Lars Ellenberg <lars.ellenberg at linbit.com>; drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD
> 8.4 and 9.0
> 
> > First, too "all of you",
> > if someone has some spare hardware and is willing to run the test as
> > suggested by Eric, please do so.
> > Both "no corruption reported after X iterations" and "corruption
> > reported after X iterations" is important feedback.
> > (State the platform and hardware and storage subsystem configuration
> > and other potentially relevant info)
> >
> > Also, interesting question: did you run your non-DRBD tests on the
> > exact same backend (LV, partition, lun, slice, whatever), or on some
> > other "LV" or "partition" on the "same"/"similar" hardware?
> 
> Same hardware. Procedure was as follows:
> 
> 6 x SSD drives in system.
> 
> Created 2 x volume groups:
> 	vgcreate vg_under_drbd0 /dev/sda5 /dev/sdb5 /dev/sdc5
> /dev/sdd5 /dev/sde5 /dev/sdf5
> 	vgcreate vg_without_drbd /dev/sda6 /dev/sdb6 /dev/sdc6
> /dev/sdd6 /dev/sde6 /dev/sdf6
> 
> Created 2 x LVM arrays:
> 	lvcreate -i6 -I4 -l 100%FREE -nlv_under_drbd0 vg_under_drbd0
> 	lvcreate -i6 -I4 -l 100%FREE -nlv_without_drbd vg_without_drbd
> 
> Started drbd
> 
> Created an ext4 filesystem on /dev/drbd0 Created an ext4 filesystem on
> /dev/vg_without_drbd/lv_without_drbd
> 
> Mounted /dev/drbd0 on /volume1
> Mounted /dev/vg_without_drbd/lv_without_drbd on /volume1
> 
> Ran TrimTester on /volume1. It failed after writing 700-900 GB on multiple
> test iterations Ran TrimTester on /volume2. No failure after 20 TB written.
> 
> >
> > Now,
> > "something" is different between test run with or without DRBD.
> >
> > First suspect was something "strange" happening with TRIM, but you
> > think you can rule that out, because you ran the test without trim as well.
> >
> > The file system itself may cause discards (explicit mount option
> > "discard", implicit potentially via mount options set in the
> > superblock), it does not have to be the "fstrim".
> 
> The discard option was not explicitly set. I'm not sure about implicitly.
> 
> >
> > Or maybe you still had the fstrim loop running in the background from
> > a previous test, or maybe something else does an fstrim.
> >
> > So we should double check that, to really rule out TRIM as a suspect.
> >
> 
> Good thought, but I was careful to ensure that the shell script which
> performs the trim was not running.
> 
> 
> > You can disable all trim functionality in linux by echo 0 >
> > /sys/devices/pci0000:00/0000:00:01.1/ata2/host1/target1:0:0/1:0:0:0/bl
> > ock/s
> > r0/queue/discard_max_bytes
> > (or similar nodes)
> >
> > something like this, maybe:
> > echo 0 | tee
> /sys/devices/*/*/*/*/*/*/block/*/queue/discard_max_bytes
> >
> > To have that take effect for "higher level" or "logical" devices,
> > you'd have to "stop and start" those, so deactivate DRBD, deactivate
> > volume group, deactivate md raid, then reactivate all of it.
> >
> > double check with "lsblk -D" if the discards now are really disabled.
> >
> > then re-run the tests.
> >
> 
> Okay, I will try that.
> 
> >
> > In case "corruption reported" even if we are "certain" that discard is
> > out of the picture, that is an important data point as well.
> >
> > What changes when DRBD is in the IO stack?
> > Timing (when does the backend device see which request) may be
> changed.
> > Maximum request size may be changed.
> > Maximum *discard* request size *will* be changed, which may result in
> > differently split discard requests on the backend stack.
> >
> > Also, we have additional memory allocations for DRBD meta data and
> > housekeeping, so possibly different memory pressure.
> >
> > End of brain-dump.
> >
> >
> 
> In the meantime, I tried a different kind of test, as follows:
> 
> ha11a:~ # badblocks -b 4096 -c 4096 -s /dev/drbd0 -w Testing with pattern
> 0xaa: done Reading and comparing: done Testing with pattern 0x55: done
> Reading and comparing: done Testing with pattern 0xff: done Reading and
> comparing: done Testing with pattern 0x00: done Reading and comparing:
> done
> 
> Of course, /dev/drbd0 was unmounted at the time.
> 
> It ran for 16 hours and reported NO bad blocks. I'm not sure if this provides
> any useful clues.
> 
> -Eric
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list