Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 25.09.2012 11:28, Lars Ellenberg wrote: > On Sun, Sep 23, 2012 at 12:18:57PM +0200, Markus Müller wrote: >> Hello DRBD Users, >> >> I have a drbd two-node setup running, and got alarmed by the LINBIT >> mail about sync problems with newer kernels. So I updated to 8.4.2 >> and tried to make sure anything is fine now. >> >> Even if the mail of LINBIT says that no action is required after >> upgrading, I tried the "drbdadm verify" feature. And it found "oos", >> means blocks not in sync. I thought okay, good that you thought for >> that, and tried to fix this as described in the LINBIT mail by doing >> "drbdadm disconnect/connect". It synced the found "oos:" and I >> thought everything is fine, so I did rerun the "drbdadm verify" to >> just be sure. And I saw... just found more "oos:"! I did again a >> "drbdadm disconnect/connect" but there were still more "oos:" after >> the next "drbdadm verify". I made this some loops and saw that this >> is not working at all to fix this! > If this is while the device was idle, > it is an indication that your hardware flips bits. > > If it happens while the device is in use, > certain usage patterns can cause blocks to be different, > search for "digest integrity explained" in the list archives. drbd has been stopped by setting from primary to secondary mode on the primary, and then I run "drbdadm down" on both nodes. Then I flushed kernel cache (echo 3 > /proc/sys/vm/drop_caches) on both sides and made a new nbd server on and a new nbd client. I've tested this hardware very well -> it HAD and HAS no problems without the drbd module! I don't think its an good idear to reject verifiable bugs by insinuate buggy hardware!!! LINBIT already found bugs with new kernels, it just seems that there are some more than thought; and I have bad news for you: I reactivated the array yesterday, used it, and deactivated it today again and there is NEW INCONSISTENCY: Run of yesterday: root at as1:~# perl /root/diff.pl 1 bad 101.656 GB 2 bad 101.657 GB 3 bad 102.018 GB 4 bad 102.019 GB 5 bad 107.151 GB 6 bad 107.152 GB 7 bad 111.034 GB 8 bad 111.035 GB 9 bad 131.833 GB 10 bad 131.834 GB 11 bad 132.559 GB 12 bad 132.56 GB 13 bad 137.735 GB 14 bad 137.736 GB 15 bad 140.642 GB 16 bad 140.643 GB 17 bad 141.094 GB 18 bad 141.095 GB 19 bad 535.806 GB 20 bad 535.807 GB 21 bad 556.083 GB 22 bad 566.681 GB 23 bad 599.43 GB 24 bad 619.899 GB root at as1:~# Run of today: root at as1:~# perl diff.pl 1 bad 66.044 GB 2 bad 79.641 GB 3 bad 82.567 GB 4 bad 82.57 GB 5 bad 82.578 GB 6 bad 82.593 GB 7 bad 111.034 GB 8 bad 111.035 GB 9 bad 123.787 GB 10 bad 123.788 GB 11 bad 131.833 GB 12 bad 131.834 GB 13 bad 132.559 GB 14 bad 132.56 GB 15 bad 139.435 GB 16 bad 139.436 GB 17 bad 140.664 GB 18 bad 140.665 GB 19 bad 149.93 GB 20 bad 149.938 GB 21 bad 198.326 GB 22 bad 217.039 GB 23 bad 217.042 GB 24 bad 217.044 GB 25 bad 217.045 GB 26 bad 217.049 GB 27 bad 249.926 GB 28 bad 265.254 GB 29 bad 265.255 GB 30 bad 284.159 GB 31 bad 284.164 GB 32 bad 284.17 GB 33 bad 284.172 GB 34 bad 295.717 GB 35 bad 295.718 GB 36 bad 378.504 GB 37 bad 378.506 GB 38 bad 378.508 GB 39 bad 399.445 GB 40 bad 416.755 GB 41 bad 528.304 GB 42 bad 528.311 GB 43 bad 528.312 GB 44 bad 528.313 GB 45 bad 528.314 GB 46 bad 528.315 GB 47 bad 528.321 GB 48 bad 528.322 GB 49 bad 528.335 GB root at as1:~# It seems that I have now different and more inconsistency! This is absolutely inacceptable. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120926/78103a8e/attachment.htm>