[DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Eric Robinson eric.robinson at psmnv.com
Wed Oct 11 23:30:11 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Hi Lars -

I'm finally back from my trip and eager to get rolling on this. 

>Actually, alarming.

Glad we agree on that!

> Which *exact* DRBD module versions, identified by their git commit ids?

Does this answer your question?

ha11a:~ # modinfo drbd
filename:       /lib/modules/4.4.74-92.29-default/updates/drbd.ko
alias:          block-major-147-*
license:        GPL
version:        8.4.10-1
description:    drbd - Distributed Replicated Block Device v8.4.10-1
author:         Philipp Reisner <phil at linbit.com>, Lars Ellenberg <lars at linbit.com>
srcversion:     611DC432097FDFEB703FF9F
depends:        libcrc32c
vermagic:       4.4.74-92.29-default SMP mod_unload modversions

> "to make sure SSD TRIM was not a factor":
> how exactly did you try to do that?

The TrimTester program consists of three parts. The main executable (TrimTester) just writes loads of data to the drive and tests for file corruption. My C++ consultant says, "It writes sequential numbers wrapped at 256, spanning multiple files. It checks previously written files, and if the file data is all zeroes, it is considered to be corrupted."

The other two parts of the tool are shell scripts. One scripts periodically calls fstrim, the other periodically drops the caches. I simply ran the TrimTester executable without the scripts so the fstrim command never got called during the test.

> What are the ext4 mount options,
> explicit or implicit?
> (as reported by tune2fs and /proc/mounts)

ha11a:~ # cat /proc/mounts|grep ha
/dev/drbd0 /ha01_mysql ext4 rw,relatime,stripe=6,data=ordered 0 0

> To gather a few more data points,
> does the behavior on DRBD change, if you  disk { disable-write-same; } 
> # introduced only with drbd 8.4.10 or if you set  disk  { al-updates no; } 
> # affects timing, among other things

8.4.1 did not recognize the 'disable-write-same' option, but I'm testing right now with 'al-updates no' and I'll report the results!

> Can you reproduce with other backend devices?

I don't have any other backend devices to test with. All I know is that the problem does not occur when writing directly to the devices (bypassing the drbd layer).


More information about the drbd-user mailing list