Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, > I have a problem running drbd 8.3.7-1 on Debian Lenny (2.6.26-AMD64-Xen). I have six drbd devices with a total of 3 TB. Both nodes are Supermicro AMD Opteron boxes (one 12 core, one 4 core) with a dedicated 1 GBit connection for DRBD and Adaptec 5800 Raid controllers. One side is a NVIDIA forcedeth NIC, the other side an Intel e1000. Protocol is C. The dom0 has 2 GByte of RAM. Basically two symptoms can be observed but I am not sure if they are related: 1. Data Integrity errors I get occasional data integrity errors (checksummed with crc32c) on both nodes in the cluster. [ 8961.266879] block drbd3: Digest integrity check FAILED. [22846.253694] block drbd3: Digest integrity check FAILED. [23557.272471] block drbd3: Digest integrity check FAILED. Like recommended before I did the standard procedures (disable offloading, memtest, replacing cables, replacing one of the boxes) but without success. The errors are only reported for devices wich the respective node is secondary for. 2. oos after verify I always get a few oos sectors after verifying any device which has been used previously. These are no false positives, the sectors are in fact different: 2,5c2,5 < 0000010: 0000 0000 0800 0000 0000 00ff 0000 0000 ................ < 0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ < 0000030: 0000 0000 ffff ffff ffff ffff 0000 0000 ................ < 0000040: 0000 0400 0000 0000 0000 0000 0000 0000 ................ --- > 0000010: 0000 0000 0800 0000 0000 19ff 0000 0000 ................ > 0000020: 0000 002b 0000 0000 0000 0000 0000 0000 ...+............ > 0000030: 0000 002b ffff ffff ffff ffff 0000 0000 ...+............ > 0000040: 0000 0400 0000 0000 0002 8668 0000 0000 ...........h.... 8c8 < 0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ --- > 0000070: 0000 0f03 0000 0000 0000 0001 0000 0000 ................ After dis/reconnect/resyncing the device, they are identical again. This happens with random sectors and basically every verify. Here my relevant global config for drbd. startup { wfc-timeout 60; degr-wfc-timeout 300; } disk { on-io-error detach; } net { cram-hmac-alg sha1; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; data-integrity-alg crc32c; max-buffers 3000; max-epoch-size 8000; } syncer { rate 25M; verify-alg crc32c; csums-alg crc32c; al-extents 257; } I tweaked the tcp settings using sysctl net.ipv4.tcp_rmem = 131072 131072 16777216 net.ipv4.tcp_wmem = 131072 131072 16777216 net.core.rmem_max = 10485760 net.core.wmem_max = 10485760 net.ipv4.tcp_mem = 96000 128000 256000 I am not sure in which direction to search next and would be happy about any suggestions. Thanks. Regards, Henning COM+ IT Consulting