[DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Fri Oct 13 20:31:23 CEST 2017

> First, too "all of you",
> if someone has some spare hardware and is willing to run the test as
> suggested by Eric, please do so.
> Both "no corruption reported after X iterations" and "corruption reported
> after X iterations" is important feedback.
> (State the platform and hardware and storage subsystem configuration and
> other potentially relevant info)
> 
> Also, interesting question: did you run your non-DRBD tests on the exact
> same backend (LV, partition, lun, slice, whatever), or on some other "LV" or
> "partition" on the "same"/"similar" hardware?

Same hardware. Procedure was as follows:

6 x SSD drives in system.

Created 2 x volume groups:
	vgcreate vg_under_drbd0 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5 /dev/sde5 /dev/sdf5
	vgcreate vg_without_drbd /dev/sda6 /dev/sdb6 /dev/sdc6 /dev/sdd6 /dev/sde6 /dev/sdf6

Created 2 x LVM arrays: 
	lvcreate -i6 -I4 -l 100%FREE -nlv_under_drbd0 vg_under_drbd0
	lvcreate -i6 -I4 -l 100%FREE -nlv_without_drbd vg_without_drbd

Started drbd

Created an ext4 filesystem on /dev/drbd0
Created an ext4 filesystem on /dev/vg_without_drbd/lv_without_drbd

Mounted /dev/drbd0 on /volume1
Mounted /dev/vg_without_drbd/lv_without_drbd on /volume1

Ran TrimTester on /volume1. It failed after writing 700-900 GB on multiple test iterations
Ran TrimTester on /volume2. No failure after 20 TB written.

> 
> Now,
> "something" is different between test run with or without DRBD.
> 
> First suspect was something "strange" happening with TRIM, but you think
> you can rule that out, because you ran the test without trim as well.
> 
> The file system itself may cause discards (explicit mount option "discard",
> implicit potentially via mount options set in the superblock), it does not have
> to be the "fstrim".

The discard option was not explicitly set. I'm not sure about implicitly.

> 
> Or maybe you still had the fstrim loop running in the background from a
> previous test, or maybe something else does an fstrim.
> 
> So we should double check that, to really rule out TRIM as a suspect.
> 

Good thought, but I was careful to ensure that the shell script which performs the trim was not running.

> You can disable all trim functionality in linux by echo 0 >
> /sys/devices/pci0000:00/0000:00:01.1/ata2/host1/target1:0:0/1:0:0:0/block/s
> r0/queue/discard_max_bytes
> (or similar nodes)
> 
> something like this, maybe:
> echo 0 | tee  /sys/devices/*/*/*/*/*/*/block/*/queue/discard_max_bytes
> 
> To have that take effect for "higher level" or "logical" devices, you'd have to
> "stop and start" those, so deactivate DRBD, deactivate volume group,
> deactivate md raid, then reactivate all of it.
> 
> double check with "lsblk -D" if the discards now are really disabled.
> 
> then re-run the tests.
> 

Okay, I will try that. 

> 
> In case "corruption reported" even if we are "certain" that discard is out of
> the picture, that is an important data point as well.
> 
> What changes when DRBD is in the IO stack?
> Timing (when does the backend device see which request) may be changed.
> Maximum request size may be changed.
> Maximum *discard* request size *will* be changed, which may result in
> differently split discard requests on the backend stack.
> 
> Also, we have additional memory allocations for DRBD meta data and
> housekeeping, so possibly different memory pressure.
> 
> End of brain-dump.
> 
> 

In the meantime, I tried a different kind of test, as follows:

ha11a:~ # badblocks -b 4096 -c 4096 -s /dev/drbd0 -w
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done

Of course, /dev/drbd0 was unmounted at the time. 

It ran for 16 hours and reported NO bad blocks. I'm not sure if this provides any useful clues.  

-Eric