[DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Eric Robinson eric.robinson at psmnv.com
Thu Oct 12 02:58:20 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> > To gather a few more data points,
> > does the behavior on DRBD change, if you  disk { disable-write-same; }
> > # introduced only with drbd 8.4.10 or if you set  disk  { al-updates
> > no; } # affects timing, among other things

Yes, the behavior is the same with 'al-updates no'. The program detected a corrupt file after writing 823 GB. 

Plus, see my answers below...


> -----Original Message-----
> From: Eric Robinson
> Sent: Wednesday, October 11, 2017 2:30 PM
> To: Eric Robinson <eric.robinson at psmnv.com>; Lars Ellenberg
> <lars.ellenberg at linbit.com>; drbd-user at lists.linbit.com
> Subject: RE: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD
> 8.4 and 9.0
> 
> Hi Lars -
> 
> I'm finally back from my trip and eager to get rolling on this.
> 
> >Interesting.
> >Actually, alarming.
> 
> Glad we agree on that!
> 
> > Which *exact* DRBD module versions, identified by their git commit ids?
> 
> Does this answer your question?
> 
> ha11a:~ # modinfo drbd
> filename:       /lib/modules/4.4.74-92.29-default/updates/drbd.ko
> alias:          block-major-147-*
> license:        GPL
> version:        8.4.10-1
> description:    drbd - Distributed Replicated Block Device v8.4.10-1
> author:         Philipp Reisner <phil at linbit.com>, Lars Ellenberg
> <lars at linbit.com>
> srcversion:     611DC432097FDFEB703FF9F
> depends:        libcrc32c
> vermagic:       4.4.74-92.29-default SMP mod_unload modversions
> 
> > "to make sure SSD TRIM was not a factor":
> > how exactly did you try to do that?
> 
> The TrimTester program consists of three parts. The main executable
> (TrimTester) just writes loads of data to the drive and tests for file corruption.
> My C++ consultant says, "It writes sequential numbers wrapped at 256,
> spanning multiple files. It checks previously written files, and if the file data is
> all zeroes, it is considered to be corrupted."
> 
> The other two parts of the tool are shell scripts. One scripts periodically calls
> fstrim, the other periodically drops the caches. I simply ran the TrimTester
> executable without the scripts so the fstrim command never got called during
> the test.
> 
> > What are the ext4 mount options,
> > explicit or implicit?
> > (as reported by tune2fs and /proc/mounts)
> 
> ha11a:~ # cat /proc/mounts|grep ha
> /dev/drbd0 /ha01_mysql ext4 rw,relatime,stripe=6,data=ordered 0 0
> 
> > To gather a few more data points,
> > does the behavior on DRBD change, if you  disk { disable-write-same; }
> > # introduced only with drbd 8.4.10 or if you set  disk  { al-updates
> > no; } # affects timing, among other things
> 
> 
> 8.4.1 did not recognize the 'disable-write-same' option, but I'm testing right
> now with 'al-updates no' and I'll report the results!
> 
> > Can you reproduce with other backend devices?
> 
> I don't have any other backend devices to test with. All I know is that the
> problem does not occur when writing directly to the devices (bypassing the
> drbd layer).
> 
> --Eric
> 




More information about the drbd-user mailing list