Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Well, I've updated both nodes to 2.6.35.11 and 8.3.10 running CentOS 5.5. The good news: I did a create-md and a full resync on the secondary node. No more ASSERTs logged. :-) The bad news: The integrity checks still fails. Just the logs for today: Feb 15 02:49:35 drbd0: Digest integrity check FAILED: 713210072s +4096 Feb 15 06:16:33 drbd0: Digest integrity check FAILED: 713142808s +4096 Feb 15 07:10:39 drbd0: Digest integrity check FAILED: 713049088s +4096 Feb 15 08:47:22 drbd0: Digest integrity check FAILED: 713119656s +4096 Feb 15 09:15:24 drbd0: Digest integrity check FAILED: 713215448s +4096 Feb 15 10:11:01 drbd0: Digest integrity check FAILED: 713232072s +4096 Feb 15 11:12:44 drbd0: Digest integrity check FAILED: 713239944s +4096 Feb 15 11:30:40 drbd0: Digest integrity check FAILED: 713106328s +4096 Feb 15 11:36:22 drbd0: Digest integrity check FAILED: 713151800s +4096 Feb 15 11:40:22 drbd0: Digest integrity check FAILED: 713166384s +4096 Feb 15 13:55:41 drbd0: Digest integrity check FAILED: 713138680s +4096 Feb 15 15:11:14 drbd0: Digest integrity check FAILED: 713189472s +4096 Searching the list shows the possible reasons: http://lists.linbit.com/pipermail/drbd-user/2008-January/008343.html - bit flip (in either sha1 or data) on the way from main memory to NIC (which would go undetected by tcp checksum when you have offloading enabled) - bit flip on the way from NIC to main memory (the same) - any form of corruption due to a race condition or bug in NIC firmware or driver - bit flip/random corruption by some reassembling network compenent along the way (not in your case, as I understand you use a direct passive link) - the application (when using direct-io), respectively the file system, re-using (modifying) the write buffer while it is in flight, without waiting for the write to complete first (unlikely, but we start to believe that we may have evidence this does indeed happen under certain circumstances) - bug in drbd miscalculating stuff (would show up more often) Now, I probably can rule out any NIC problems after transferring a couple of TiB using nc (over plain TCP just like DRBD) in both directions. Every sha1 checksum of the 100 GiB test-file was ok... The NICs are directly connected by a crossover-cable, so no switch involved. This leaves just the last two possibilities, right? How can I test or debug them further? If you need any information regarding my setup, please let me know. Regards, Walter -- Schon gehört? GMX hat einen genialen Phishing-Filter in die Toolbar eingebaut! http://www.gmx.net/de/go/toolbar