[DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Tue Oct 17 17:52:46 CEST 2017

> On Tue, Oct 17, 2017 at 02:46:37PM +0000, Eric Robinson wrote:
> > Guys, I think we have an important new development. Last night I put
> > both nodes into standalone mode and promoted the secondary to primary.
> > This effectively gave me two identical test platforms, running
> > disconnected, both writing through DRBD to the local media on each
> > server. I ran the full TrimTester tool, including the scripts which do
> > periodic TRIM while the test is in progress. Everything went a lot
> > faster because of no network replication. Both servers filled their
> > volumes 24 times without any errors. That's 24TB of disk writing
> > without a single file corruption error.
> >
> > Lars, et. al., this would seem to indicate that the "file corruption"
> > error only occurs when the DRBD nodes are sync'd and replicating.
> 
> Which is when DRBD (due to limitted bandwidth on the network) throttles your
> writer threads and the backend has enough IOPS left to serve the reader more
> quickly, letting it make progress up to the 4g files.
> 
> > This morning I have put the cluster back into Primary/Secondary mode
> > and I'm doing a full resync. After that, I'm going to run the
> > TrimTester tool and see if the corruption happens again. If it does,
> > what does that tell us?
> 
> That does tell us that 8 sequential writer threads completely starve out a single
> mmap reader thread that even explicitly asked for no read-ahead, while
> continually having the cache dropped on it.
> 
> You could change the trimtester to ask for readahead and stop dropping
> caches, which makes it completly useless for its original purpose, but would
> much faster find the spurious 4g wrap-around "corruption".
> 
> You could change the trimtester to start with the "big" files, and then slowly
> reduce the size, instead of starting with the "smaller" files and slowly
> increasing the size, or keep the "increasing sizes" direction, but simply skip the
> smaller sizes.
> The checker would start with checking the large files, and immediately hit its
> 32bit wrap-around bug.
> 
> You could tune your backend or the IO schedulers to prefer reads, or throttle
> writes.
> 
> You could change the trimtester code to
> "sleep" occasionally in the writer threads, to give the checker some more room
> to breath.
> 
> You could change the trimtester to wait for all files to be checked before
> exiting completely due to "disk full". Or change it to have a mode that *only*
> does checking, and run that before the "rm -rf test/".
> 
> Whatever.
> 
> Most importantly: once the trimtester (or *any* "corruption detecting"
> tool) claims that a certain corruption is found, you look at what supposedly is
> corrupt, and double check if it in fact is.
> 
> Before doing anything else.
> 

I did that, but I don't know what a "good" file is supposed to look like, so I can't tell whether it is really corrupted. The last time TrimTester reported a corrupt file, I checked it manually and it looked fine to me, but I don't know what I'm looking for. As far as I know, all the TrimTester tool does is check if the file contents are all zeroes. I don't know how that helps me. Can you tell from looking at the TrimTester code what I should look for to manually check the file?

Or if you're aware of another tool that does high-load read/write tests on a mounted filesystem and tests for file corruption, I'd be happy to use that instead of TrimTester. Of course, I'd still run the scripts that do periodic trim and drop the caches, since the whole point is to verify that everything is working properly with TRIM enabled. 

> Double check if the tool would still claim corruption exists, even if you cannot
> see that corruption with other tools.
> 
> If so, find out why that tool does that, because that'd be clearly a bug in that
> tool.
> 
> --