Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello All, I have a 3 node system. In the system I have 25 DRBD mirrored partition, their total size is about 250GB. The 3 node is: - immortal: Intel 82573L Gigabit Ethernet NIC (kernel 2.6.21.6, driver: e1000, version: 7.3.20-k2-NAPI, firmware-version: 0.5-7) - endless: Intel 82566DM-2 Gigabit Ethernet NIC (kernel 2.6.22.18, driver: e1000, version: 7.6.15.4, firmware-version: 1.3-0) - infinity: Intel 82573E Gigabit Ethernet NIC (kernel 2.6.22.18, driver: e1000, version: 7.6.15.4, firmware-version: 3.1-7) One month ago I switched to DRBD 8.2.5 from 7.x. Before I used the 7.x series without problems. I had no problem during the update, the parts of the mirrors connected and synced cleanly. After updating I started to verify the DRBD volumes: - most of them has usually not out-of-sync blocks - one has 2-3 new oos block almost every day - a few of them has a new oos block about every week I try to track down the source of oos blocks. I read through the drbd-user forums, in the „Tracking down sources of corruption (possibly) detected by drbdadm verify" thread I found very useful hints. I cheked my network connections between every node, every direction with this test: host1:~ # md5sum /tmp/file_with_1GB_random_data host2:~ # netcat -l -p 4999 | md5sum host1:~ # netcat -q0 192.168.x.x 4999 < /tmp/file_with_1GB_random_data The test always gives the same md5sums on the two tested node, the transfer speed is about 100MB/sec when the file is cached. I repeated this test between every node-pairs many times, I found no md5 mismatch. I saved the oos blocks from the underlying device. I used commands like this: host:~ dd iflag=direct bs=512 skip=11993992 count=8 if=/dev/immortal0/65data2 | xxd -a > ./primary_4k_dump when the syslog message was „Apr 17 11:14:09 immortal kernel: drbd6: Out of sync: start=11993992, size=8 (sectors)" and the primary underlying device was /dev/immortal0/65data2. I compared the problematic blocks from the two nodes with diff: host:~ diff ./primary_4k_dump ./secondary_4k_dump I usually found 1-2byte difference between the blocks on the two node, but one time I found that the last 1336 bytes of block was zeroed out (on the other node it has "random" data).Two example: 1 4k block oos: c2 < 0000010: 0000 0000 1500 0000 0000 01ff 0000 0000 ................ --- > 0000010: 0000 0000 1500 0000 0001 01ff 0000 0000 ................ another 1 4k block oos: 22c22 < 00001f0: 0b85 0000 0000 0000 1800 0000 0000 0000 ................ --- > 00001f0: 2d79 0000 0000 0000 1800 0000 0000 0000 -y.............. Any idea would help. Thank you all for your time: Gergely Szerovay