Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> > To gather a few more data points, > > does the behavior on DRBD change, if you disk { disable-write-same; } > > # introduced only with drbd 8.4.10 or if you set disk { al-updates > > no; } # affects timing, among other things Yes, the behavior is the same with 'al-updates no'. The program detected a corrupt file after writing 823 GB. Plus, see my answers below... > -----Original Message----- > From: Eric Robinson > Sent: Wednesday, October 11, 2017 2:30 PM > To: Eric Robinson <eric.robinson at psmnv.com>; Lars Ellenberg > <lars.ellenberg at linbit.com>; drbd-user at lists.linbit.com > Subject: RE: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD > 8.4 and 9.0 > > Hi Lars - > > I'm finally back from my trip and eager to get rolling on this. > > >Interesting. > >Actually, alarming. > > Glad we agree on that! > > > Which *exact* DRBD module versions, identified by their git commit ids? > > Does this answer your question? > > ha11a:~ # modinfo drbd > filename: /lib/modules/4.4.74-92.29-default/updates/drbd.ko > alias: block-major-147-* > license: GPL > version: 8.4.10-1 > description: drbd - Distributed Replicated Block Device v8.4.10-1 > author: Philipp Reisner <phil at linbit.com>, Lars Ellenberg > <lars at linbit.com> > srcversion: 611DC432097FDFEB703FF9F > depends: libcrc32c > vermagic: 4.4.74-92.29-default SMP mod_unload modversions > > > "to make sure SSD TRIM was not a factor": > > how exactly did you try to do that? > > The TrimTester program consists of three parts. The main executable > (TrimTester) just writes loads of data to the drive and tests for file corruption. > My C++ consultant says, "It writes sequential numbers wrapped at 256, > spanning multiple files. It checks previously written files, and if the file data is > all zeroes, it is considered to be corrupted." > > The other two parts of the tool are shell scripts. One scripts periodically calls > fstrim, the other periodically drops the caches. I simply ran the TrimTester > executable without the scripts so the fstrim command never got called during > the test. > > > What are the ext4 mount options, > > explicit or implicit? > > (as reported by tune2fs and /proc/mounts) > > ha11a:~ # cat /proc/mounts|grep ha > /dev/drbd0 /ha01_mysql ext4 rw,relatime,stripe=6,data=ordered 0 0 > > > To gather a few more data points, > > does the behavior on DRBD change, if you disk { disable-write-same; } > > # introduced only with drbd 8.4.10 or if you set disk { al-updates > > no; } # affects timing, among other things > > > 8.4.1 did not recognize the 'disable-write-same' option, but I'm testing right > now with 'al-updates no' and I'll report the results! > > > Can you reproduce with other backend devices? > > I don't have any other backend devices to test with. All I know is that the > problem does not occur when writing directly to the devices (bypassing the > drbd layer). > > --Eric >