[DRBD-user] Problems with oos Sectors after verify

Lars Ellenberg lars.ellenberg at linbit.com
Wed Mar 17 11:35:22 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Mar 17, 2010 at 07:08:20AM +0000, Henning Bitsch wrote:
> Hi,
> 
> >
> 
> I have a problem running drbd 8.3.7-1 on Debian Lenny (2.6.26-AMD64-Xen). 
> I have  six drbd devices with a total of 3 TB. Both nodes are Supermicro AMD 
> Opteron boxes (one 12 core, one 4 core) with a dedicated 1 GBit connection for 
> DRBD and Adaptec 5800 Raid controllers. One side is a NVIDIA forcedeth NIC, 
> the other side an Intel e1000. Protocol is C. The dom0 has 2 GByte of RAM. 
> 
> Basically two symptoms can be observed but I am not sure if they are related:
> 
> 1. Data Integrity errors
> I get occasional data integrity errors (checksummed with crc32c) on both nodes 
> in the cluster. 
> 
> [ 8961.266879] block drbd3: Digest integrity check FAILED.
> [22846.253694] block drbd3: Digest integrity check FAILED.
> [23557.272471] block drbd3: Digest integrity check FAILED.
> 
> Like recommended before I did the standard procedures (disable offloading, 
> memtest, replacing cables, replacing one of the boxes) but without success. 

Then your hardware is broken.
No more to say.

> The  errors are only reported for devices wich the respective node is 
> secondary for.
>
> 2. oos after verify
> I always get a few oos sectors after verifying any device which has been used 
> previously. These are no false positives, the sectors are in fact different:
> 
> 2,5c2,5
> < 0000010: 0000 0000 0800 0000 0000 00ff 0000 0000  ................
> < 0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> < 0000030: 0000 0000 ffff ffff ffff ffff 0000 0000  ................
> < 0000040: 0000 0400 0000 0000 0000 0000 0000 0000  ................
> ---
> > 0000010: 0000 0000 0800 0000 0000 19ff 0000 0000  ................
> > 0000020: 0000 002b 0000 0000 0000 0000 0000 0000  ...+............
> > 0000030: 0000 002b ffff ffff ffff ffff 0000 0000  ...+............
> > 0000040: 0000 0400 0000 0000 0002 8668 0000 0000  ...........h....
> 8c8
> < 0000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> ---
> > 0000070: 0000 0f03 0000 0000 0000 0001 0000 0000  ................
> 
> After dis/reconnect/resyncing the device, they are identical again. This 
> happens   with random sectors and basically every verify.
> 
> Here my relevant global config for drbd.
> 
>        startup {
>                 wfc-timeout 60;
>                 degr-wfc-timeout 300;
>         }
> 
>         disk {
>                 on-io-error detach;
>         }
> 
>         net {
>                 cram-hmac-alg sha1;
>                 after-sb-0pri disconnect;
>                 after-sb-1pri disconnect;
>                 after-sb-2pri disconnect;
>                 data-integrity-alg crc32c;
>                 max-buffers 3000;
>                 max-epoch-size 8000;
>         }
> 
>         syncer {
>                 rate 25M;
>                 verify-alg crc32c;
>                 csums-alg crc32c;
>                 al-extents 257;
>         }
> 
> I tweaked the tcp settings using sysctl
> 
> net.ipv4.tcp_rmem = 131072  131072  16777216
> net.ipv4.tcp_wmem = 131072  131072  16777216
> net.core.rmem_max = 10485760 
> net.core.wmem_max = 10485760 
> net.ipv4.tcp_mem = 96000 128000 256000
> 
> 
> I am not sure in which direction to search next and would be happy about any 
> suggestions.
> 
> Thanks.
> 
> Regards,
> Henning
> COM+ IT Consulting
> 
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list