[DRBD-user] Problems with oos Sectors after verify

Thu Mar 18 11:50:10 CET 2010

On Wed, Mar 17, 2010 at 11:35:22AM +0100, Lars Ellenberg wrote:
> On Wed, Mar 17, 2010 at 07:08:20AM +0000, Henning Bitsch wrote:
> > I have a problem running drbd 8.3.7-1 on Debian Lenny (2.6.26-AMD64-Xen). 
> > I have  six drbd devices with a total of 3 TB. Both nodes are Supermicro AMD 
> > Opteron boxes (one 12 core, one 4 core) with a dedicated 1 GBit connection for 
> > DRBD and Adaptec 5800 Raid controllers. One side is a NVIDIA forcedeth NIC, 
> > the other side an Intel e1000. Protocol is C. The dom0 has 2 GByte of RAM. 
> > 
> > Basically two symptoms can be observed but I am not sure if they are related:
> > 
> > 1. Data Integrity errors
> > I get occasional data integrity errors (checksummed with crc32c) on both nodes 
> > in the cluster. 
> > 
> > [ 8961.266879] block drbd3: Digest integrity check FAILED.
> > [22846.253694] block drbd3: Digest integrity check FAILED.
> > [23557.272471] block drbd3: Digest integrity check FAILED.
> > 
> > Like recommended before I did the standard procedures (disable offloading, 
> > memtest, replacing cables, replacing one of the boxes) but without success. 
> 
> Then your hardware is broken.
> No more to say.

Though, as an afterthought,
of course, if your system/application habbitually modifies in-flight
buffers, it could lead to the same symptoms as well.

Ah the wonders of the not-my-problem short-circuitry ;-)

Anyways, nothing you can "tune away" in DRBD.
Data _is_ changing.

if it is changing when on local disk already, but not yet on the wire
 -> Digest integrity check FAILED, disconnect, reconnect, short resync.
if it is changing when it already reached the secondary,
  but is not yet on local disk,
  or happens to fool the crc,
  or happens after submission on the secondary:
 -> "silent" data diversion, detected on next verify run.

now, whether this is changing on the wire, on the pci bus,
or in the in-flight buffers by the application,
is not easily determined.

> > The  errors are only reported for devices wich the respective node is 
> > secondary for.

This is expected: the "digest" is calculated over the data packets,
which naturally flows from primary to secondary.

> > 2. oos after verify
> > I always get a few oos sectors after verifying any device which has been used 
> > previously. These are no false positives, the sectors are in fact different:
> > 
> > 2,5c2,5
> > < 0000010: 0000 0000 0800 0000 0000 00ff 0000 0000  ................
> > < 0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> > < 0000030: 0000 0000 ffff ffff ffff ffff 0000 0000  ................
> > < 0000040: 0000 0400 0000 0000 0000 0000 0000 0000  ................
> > ---
> > > 0000010: 0000 0000 0800 0000 0000 19ff 0000 0000  ................
> > > 0000020: 0000 002b 0000 0000 0000 0000 0000 0000  ...+............
> > > 0000030: 0000 002b ffff ffff ffff ffff 0000 0000  ...+............
> > > 0000040: 0000 0400 0000 0000 0002 8668 0000 0000  ...........h....
> > 8c8
> > < 0000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> > ---
> > > 0000070: 0000 0f03 0000 0000 0000 0001 0000 0000  ................

not block but bytelevel changes, random bit patterns,
no obvious pattern or grouping

> > After dis/reconnect/resyncing the device, they are identical again. This 
> > happens   with random sectors and basically every verify.
> > 
> > Here my relevant global config for drbd.
> > 
> >        startup {
> >                 wfc-timeout 60;
> >                 degr-wfc-timeout 300;
> >         }
> > 
> >         disk {
> >                 on-io-error detach;
> >         }
> > 
> >         net {
> >                 cram-hmac-alg sha1;
> >                 after-sb-0pri disconnect;
> >                 after-sb-1pri disconnect;
> >                 after-sb-2pri disconnect;
> >                 data-integrity-alg crc32c;
> >                 max-buffers 3000;
> >                 max-epoch-size 8000;
> >         }
> > 
> >         syncer {
> >                 rate 25M;
> >                 verify-alg crc32c;

to detect things that fooled the data-integrity-check, you should use a
different alg for verify. to detect things that fooled ("collided") the
csums alg, you should best have integrity, verify, and csums all
different.

> >                 csums-alg crc32c;
> >                 al-extents 257;
> >         }
> > 
> > I tweaked the tcp settings using sysctl

Nothing to do with this.  Hopefully there is no such sysctl like "reduce
the rate of random data corruption" ;-)

> > net.ipv4.tcp_rmem = 131072  131072  16777216
> > net.ipv4.tcp_wmem = 131072  131072  16777216
> > net.core.rmem_max = 10485760 
> > net.core.wmem_max = 10485760 
> > net.ipv4.tcp_mem = 96000 128000 256000
> > 
> > 
> > I am not sure in which direction to search next and would be happy about any 
> > suggestions.

What is the usage pattern?
All Xen DomU?
If thats all in the _swap_ of the xen domU's, this may even be "legal".

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed