Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Mar 17, 2010 at 07:08:20AM +0000, Henning Bitsch wrote: > Hi, > > > > > I have a problem running drbd 8.3.7-1 on Debian Lenny (2.6.26-AMD64-Xen). > I have six drbd devices with a total of 3 TB. Both nodes are Supermicro AMD > Opteron boxes (one 12 core, one 4 core) with a dedicated 1 GBit connection for > DRBD and Adaptec 5800 Raid controllers. One side is a NVIDIA forcedeth NIC, > the other side an Intel e1000. Protocol is C. The dom0 has 2 GByte of RAM. > > Basically two symptoms can be observed but I am not sure if they are related: > > 1. Data Integrity errors > I get occasional data integrity errors (checksummed with crc32c) on both nodes > in the cluster. > > [ 8961.266879] block drbd3: Digest integrity check FAILED. > [22846.253694] block drbd3: Digest integrity check FAILED. > [23557.272471] block drbd3: Digest integrity check FAILED. > > Like recommended before I did the standard procedures (disable offloading, > memtest, replacing cables, replacing one of the boxes) but without success. Then your hardware is broken. No more to say. > The errors are only reported for devices wich the respective node is > secondary for. > > 2. oos after verify > I always get a few oos sectors after verifying any device which has been used > previously. These are no false positives, the sectors are in fact different: > > 2,5c2,5 > < 0000010: 0000 0000 0800 0000 0000 00ff 0000 0000 ................ > < 0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > < 0000030: 0000 0000 ffff ffff ffff ffff 0000 0000 ................ > < 0000040: 0000 0400 0000 0000 0000 0000 0000 0000 ................ > --- > > 0000010: 0000 0000 0800 0000 0000 19ff 0000 0000 ................ > > 0000020: 0000 002b 0000 0000 0000 0000 0000 0000 ...+............ > > 0000030: 0000 002b ffff ffff ffff ffff 0000 0000 ...+............ > > 0000040: 0000 0400 0000 0000 0002 8668 0000 0000 ...........h.... > 8c8 > < 0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > --- > > 0000070: 0000 0f03 0000 0000 0000 0001 0000 0000 ................ > > After dis/reconnect/resyncing the device, they are identical again. This > happens with random sectors and basically every verify. > > Here my relevant global config for drbd. > > startup { > wfc-timeout 60; > degr-wfc-timeout 300; > } > > disk { > on-io-error detach; > } > > net { > cram-hmac-alg sha1; > after-sb-0pri disconnect; > after-sb-1pri disconnect; > after-sb-2pri disconnect; > data-integrity-alg crc32c; > max-buffers 3000; > max-epoch-size 8000; > } > > syncer { > rate 25M; > verify-alg crc32c; > csums-alg crc32c; > al-extents 257; > } > > I tweaked the tcp settings using sysctl > > net.ipv4.tcp_rmem = 131072 131072 16777216 > net.ipv4.tcp_wmem = 131072 131072 16777216 > net.core.rmem_max = 10485760 > net.core.wmem_max = 10485760 > net.ipv4.tcp_mem = 96000 128000 256000 > > > I am not sure in which direction to search next and would be happy about any > suggestions. > > Thanks. > > Regards, > Henning > COM+ IT Consulting > > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed