[DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Eric Robinson eric.robinson at psmnv.com
Tue Oct 17 16:46:37 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Guys, I think we have an important new development. Last night I put both nodes into standalone mode and promoted the secondary to primary. This effectively gave me two identical test platforms, running disconnected, both writing through DRBD to the local media on each server. I ran the full TrimTester tool, including the scripts which do periodic TRIM while the test is in progress. Everything went a lot faster because of no network replication. Both servers filled their volumes 24 times without any errors. That's 24TB of disk writing without a single file corruption error. 

Lars, et. al., this would seem to indicate that the "file corruption" error only occurs when the DRBD nodes are sync'd and replicating. This morning I have put the cluster back into Primary/Secondary mode and I'm doing a full resync. After that, I'm going to run the TrimTester tool and see if the corruption happens again. If it does, what does that tell us? 

-Eric 

> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-
> bounces at lists.linbit.com] On Behalf Of Eric Robinson
> Sent: Monday, October 16, 2017 4:13 PM
> To: jan.bakuwel at gmail.com; drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD
> 8.4 and 9.0
> 
> > > Well, damn. As the program was supposedly reviewed by Samsung
> > engineers as part of their efforts to diagnose the root cause of TRIM
> > errors, it never occurred to me that it was that buggy. I can't thank
> > you enough for finding that! The rollout of some new DRBD clusters has
> > been on hold for 2+ months while we've been trying to track this down.
> >
> > Eric, thanks for the time you took to demonstrate and reproduce an
> > issue you thought you found. These things happen. Samsung engineers
> > are people too :-) although it seems they overlooked a rather elementary
> problem.
> >
> > Any many many thanks to Lars for taking the time to review this issue,
> > finding the cause and putting our minds to rest.
> >
> > kind regards,
> > Jan
> >
> 
> I'm not quite decided. As Lars suggested, I'm going to try to make it fail a
> couple more times and see if the supposedly "corrupt" files are indeed >= 4GiB.
> I'll update everyone when I have more information. Thanks all of you for paying
> attention to this.
> 
> --Eric
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list