Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> > To gather a few more data points,
> > does the behavior on DRBD change, if you disk { disable-write-same; }
> > # introduced only with drbd 8.4.10 or if you set disk { al-updates
> > no; } # affects timing, among other things
Yes, the behavior is the same with 'al-updates no'. The program detected a corrupt file after writing 823 GB.
Plus, see my answers below...
> -----Original Message-----
> From: Eric Robinson
> Sent: Wednesday, October 11, 2017 2:30 PM
> To: Eric Robinson <eric.robinson at psmnv.com>; Lars Ellenberg
> <lars.ellenberg at linbit.com>; drbd-user at lists.linbit.com
> Subject: RE: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD
> 8.4 and 9.0
>
> Hi Lars -
>
> I'm finally back from my trip and eager to get rolling on this.
>
> >Interesting.
> >Actually, alarming.
>
> Glad we agree on that!
>
> > Which *exact* DRBD module versions, identified by their git commit ids?
>
> Does this answer your question?
>
> ha11a:~ # modinfo drbd
> filename: /lib/modules/4.4.74-92.29-default/updates/drbd.ko
> alias: block-major-147-*
> license: GPL
> version: 8.4.10-1
> description: drbd - Distributed Replicated Block Device v8.4.10-1
> author: Philipp Reisner <phil at linbit.com>, Lars Ellenberg
> <lars at linbit.com>
> srcversion: 611DC432097FDFEB703FF9F
> depends: libcrc32c
> vermagic: 4.4.74-92.29-default SMP mod_unload modversions
>
> > "to make sure SSD TRIM was not a factor":
> > how exactly did you try to do that?
>
> The TrimTester program consists of three parts. The main executable
> (TrimTester) just writes loads of data to the drive and tests for file corruption.
> My C++ consultant says, "It writes sequential numbers wrapped at 256,
> spanning multiple files. It checks previously written files, and if the file data is
> all zeroes, it is considered to be corrupted."
>
> The other two parts of the tool are shell scripts. One scripts periodically calls
> fstrim, the other periodically drops the caches. I simply ran the TrimTester
> executable without the scripts so the fstrim command never got called during
> the test.
>
> > What are the ext4 mount options,
> > explicit or implicit?
> > (as reported by tune2fs and /proc/mounts)
>
> ha11a:~ # cat /proc/mounts|grep ha
> /dev/drbd0 /ha01_mysql ext4 rw,relatime,stripe=6,data=ordered 0 0
>
> > To gather a few more data points,
> > does the behavior on DRBD change, if you disk { disable-write-same; }
> > # introduced only with drbd 8.4.10 or if you set disk { al-updates
> > no; } # affects timing, among other things
>
>
> 8.4.1 did not recognize the 'disable-write-same' option, but I'm testing right
> now with 'al-updates no' and I'll report the results!
>
> > Can you reproduce with other backend devices?
>
> I don't have any other backend devices to test with. All I know is that the
> problem does not occur when writing directly to the devices (bypassing the
> drbd layer).
>
> --Eric
>