[DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Lars Ellenberg lars.ellenberg at linbit.com
Tue Oct 17 17:24:03 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Oct 17, 2017 at 02:46:37PM +0000, Eric Robinson wrote:
> Guys, I think we have an important new development. Last night I put
> both nodes into standalone mode and promoted the secondary to primary.
> This effectively gave me two identical test platforms, running
> disconnected, both writing through DRBD to the local media on each
> server. I ran the full TrimTester tool, including the scripts which do
> periodic TRIM while the test is in progress. Everything went a lot
> faster because of no network replication. Both servers filled their
> volumes 24 times without any errors. That's 24TB of disk writing
> without a single file corruption error. 
> 
> Lars, et. al., this would seem to indicate that the "file corruption"
> error only occurs when the DRBD nodes are sync'd and replicating.

Which is when DRBD (due to limitted bandwidth on the network) throttles
your writer threads and the backend has enough IOPS left to serve the
reader more quickly, letting it make progress up to the 4g files.

> This morning I have put the cluster back into Primary/Secondary mode
> and I'm doing a full resync. After that, I'm going to run the
> TrimTester tool and see if the corruption happens again. If it does,
> what does that tell us? 

That does tell us that 8 sequential writer threads
completely starve out a single mmap reader thread
that even explicitly asked for no read-ahead,
while continually having the cache dropped on it.

You could change the trimtester to ask for readahead and stop dropping
caches, which makes it completly useless for its original purpose, but
would much faster find the spurious 4g wrap-around "corruption".

You could change the trimtester to start with the "big" files, and then
slowly reduce the size, instead of starting with the "smaller" files and
slowly increasing the size, or keep the "increasing sizes" direction,
but simply skip the smaller sizes.
The checker would start with checking the large files, and immediately
hit its 32bit wrap-around bug.

You could tune your backend or the IO schedulers to prefer reads,
or throttle writes.

You could change the trimtester code to
"sleep" occasionally in the writer threads,
to give the checker some more room to breath.

You could change the trimtester to wait for all files to be checked
before exiting completely due to "disk full". Or change it to have a
mode that *only* does checking, and run that before the "rm -rf test/".

Whatever.

Most importantly: once the trimtester (or *any* "corruption detecting"
tool) claims that a certain corruption is found, you look at what
supposedly is corrupt, and double check if it in fact is.

Before doing anything else.

Double check if the tool would still claim corruption exists,
even if you cannot see that corruption with other tools.

If so, find out why that tool does that,
because that'd be clearly a bug in that tool.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list