Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Oct 17, 2017 at 02:46:37PM +0000, Eric Robinson wrote: > Guys, I think we have an important new development. Last night I put > both nodes into standalone mode and promoted the secondary to primary. > This effectively gave me two identical test platforms, running > disconnected, both writing through DRBD to the local media on each > server. I ran the full TrimTester tool, including the scripts which do > periodic TRIM while the test is in progress. Everything went a lot > faster because of no network replication. Both servers filled their > volumes 24 times without any errors. That's 24TB of disk writing > without a single file corruption error. > > Lars, et. al., this would seem to indicate that the "file corruption" > error only occurs when the DRBD nodes are sync'd and replicating. Which is when DRBD (due to limitted bandwidth on the network) throttles your writer threads and the backend has enough IOPS left to serve the reader more quickly, letting it make progress up to the 4g files. > This morning I have put the cluster back into Primary/Secondary mode > and I'm doing a full resync. After that, I'm going to run the > TrimTester tool and see if the corruption happens again. If it does, > what does that tell us? That does tell us that 8 sequential writer threads completely starve out a single mmap reader thread that even explicitly asked for no read-ahead, while continually having the cache dropped on it. You could change the trimtester to ask for readahead and stop dropping caches, which makes it completly useless for its original purpose, but would much faster find the spurious 4g wrap-around "corruption". You could change the trimtester to start with the "big" files, and then slowly reduce the size, instead of starting with the "smaller" files and slowly increasing the size, or keep the "increasing sizes" direction, but simply skip the smaller sizes. The checker would start with checking the large files, and immediately hit its 32bit wrap-around bug. You could tune your backend or the IO schedulers to prefer reads, or throttle writes. You could change the trimtester code to "sleep" occasionally in the writer threads, to give the checker some more room to breath. You could change the trimtester to wait for all files to be checked before exiting completely due to "disk full". Or change it to have a mode that *only* does checking, and run that before the "rm -rf test/". Whatever. Most importantly: once the trimtester (or *any* "corruption detecting" tool) claims that a certain corruption is found, you look at what supposedly is corrupt, and double check if it in fact is. Before doing anything else. Double check if the tool would still claim corruption exists, even if you cannot see that corruption with other tools. If so, find out why that tool does that, because that'd be clearly a bug in that tool. -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker DRBD® and LINBIT® are registered trademarks of LINBIT __ please don't Cc me, but send to list -- I'm subscribed