[DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Tue Oct 3 09:43:05 CEST 2017

On Mon, Sep 25, 2017 at 09:02:57PM +0000, Eric Robinson wrote:
> Problem:
> 
> Under high write load, DRBD exhibits data corruption. In repeated
> tests over a month-long period, file corruption occurred after 700-900
> GB of data had been written to the DRBD volume.

Interesting.
Actually, alarming.

Can anyone else reproduce these findings?
In a similar or different environment?

> Testing Platform:
> 
> 2 x Dell PowerEdge R610 servers
> 32GB RAM
> 6 x Samsung SSD 840 Pro 512GB (latest firmware)
> Dell H200 JBOD Controller
> SUSE Linux Enterprise Server 12 SP2 (kernel 4.4.74-92.32)
> Gigabit network, 900 Mbps throughput, < 1ms latency, 0 packet loss
> 
> Initial Setup:
> 
>     Create 2 RAID-0 software arrays using either mdadm or LVM
>     On Array 1: sda5 through sdf5, create DRBD replicated volume (drbd0) with an ext4 filesystem
>     On Array 2: sda6 through sdf6, create LVM logical volume with an ext4 filesystem
> 
> Procedure:
> 
>     Download and build the TrimTester SSD burn-in and TRIM verification tool from Algolia (https://github.com/algolia/trimtester).
>     Run TrimTester against the filesystem on drbd0, wait for corruption to occur
>     Run TrimTester against the non-drbd backed filesystem, wait for corruption to occur
> 
> Results:
> 
> In multiple tests over a period of a month, TrimTester would report
> file corruption when run against the DRBD volume after 700-900 GB of
> data had been written. The error would usually appear within an hour
> or two. However, when running it against the non-DRBD volume on the
> same physical drives, no corruption would occur. We could let the
> burn-in run for 15+ hours and write 20+ TB of data without a problem.
> Results were the same with DRBD 8.4 and 9.0.

Which *exact* DRBD module versions, identified by their git commit ids?

> We also tried disabling
> the TRIM-testing part of TrimTester and using it as a simple burn-in
> tool, just to make sure that SSD TRIM was not a factor.

"to make sure SSD TRIM was not a factor":
how exactly did you try to do that?
What are the ext4 mount options,
explicit or implicit?
(as reported by tune2fs and /proc/mounts)

> Conclusion:
> 
> We are aware of some controversy surrounding the Samsung SSD 8XX
> series drives; however, the issues related to that controversy were
> resolved and no longer exist as of kernel 4.2. The 840 Pro drives are
> confirmed to support RZAT. Also, the data corruption would only occur
> when writing through the DRBD layer. It never occurred when bypassing
> the DRBD layer and writing directly to the drives, so we must conclude
> that DRBD has a data corruption bug under high write load.

Or that DRBD changes the timing / IO pattern seen by the backend 
sufficiently to expose a bug elsewhere.

> However, we would be more than happy to be proved wrong.

To gather a few more data points,
does the behavior on DRBD change, if you
 disk { disable-write-same; } # introduced only with drbd 8.4.10
or if you set
 disk  { al-updates no; } # affects timing, among other things

Can you reproduce with other backend devices?

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed