[DRBD-user] DRBD 8.4.2: "drbdadm verify" just do not work

Markus Müller drbd2012 at priv.de
Wed Sep 26 00:09:17 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 25.09.2012 11:28, Lars Ellenberg wrote:
> On Sun, Sep 23, 2012 at 12:18:57PM +0200, Markus Müller wrote:
>> Hello DRBD Users,
>>
>> I have a drbd two-node setup running, and got alarmed by the LINBIT
>> mail about sync problems with newer kernels. So I updated to 8.4.2
>> and tried to make sure anything is fine now.
>>
>> Even if the mail of LINBIT says that no action is required after
>> upgrading, I tried the "drbdadm verify" feature. And it found "oos",
>> means blocks not in sync. I thought okay, good that you thought for
>> that, and tried to fix this as described in the LINBIT mail by doing
>> "drbdadm disconnect/connect". It synced the found "oos:" and I
>> thought everything is fine, so I did rerun the "drbdadm verify" to
>> just be sure. And I saw... just found more "oos:"! I did again a
>> "drbdadm disconnect/connect" but there were still more "oos:" after
>> the next "drbdadm verify". I made this some loops and saw that this
>> is not working at all to fix this!
> If this is while the device was idle,
> it is an indication that your hardware flips bits.
>
> If it happens while the device is in use,
> certain usage patterns can cause blocks to be different,
> search for "digest integrity explained" in the list archives.
drbd has been stopped by setting from primary to secondary mode on the 
primary, and then I run "drbdadm down" on both nodes. Then I flushed 
kernel cache (echo 3 > /proc/sys/vm/drop_caches) on both sides and made 
a new nbd server on and a new nbd client.

I've tested this hardware very well -> it HAD and HAS no problems 
without the drbd module!

I don't think its an good idear to reject verifiable bugs by insinuate 
buggy hardware!!!

LINBIT already found bugs with new kernels, it just seems that there are 
some more than thought; and I have bad news for you: I reactivated the 
array yesterday, used it, and deactivated it today again and there is 
NEW INCONSISTENCY:

Run of yesterday:

root at as1:~# perl /root/diff.pl
1 bad   101.656 GB
2 bad   101.657 GB
3 bad   102.018 GB
4 bad   102.019 GB
5 bad   107.151 GB
6 bad   107.152 GB
7 bad   111.034 GB
8 bad   111.035 GB
9 bad   131.833 GB
10 bad   131.834 GB
11 bad   132.559 GB
12 bad   132.56 GB
13 bad   137.735 GB
14 bad   137.736 GB
15 bad   140.642 GB
16 bad   140.643 GB
17 bad   141.094 GB
18 bad   141.095 GB
19 bad   535.806 GB
20 bad   535.807 GB
21 bad   556.083 GB
22 bad   566.681 GB
23 bad   599.43 GB
24 bad   619.899 GB
root at as1:~#

Run of today:

root at as1:~# perl diff.pl
1 bad   66.044 GB
2 bad   79.641 GB
3 bad   82.567 GB
4 bad   82.57 GB
5 bad   82.578 GB
6 bad   82.593 GB
7 bad   111.034 GB
8 bad   111.035 GB
9 bad   123.787 GB
10 bad   123.788 GB
11 bad   131.833 GB
12 bad   131.834 GB
13 bad   132.559 GB
14 bad   132.56 GB
15 bad   139.435 GB
16 bad   139.436 GB
17 bad   140.664 GB
18 bad   140.665 GB
19 bad   149.93 GB
20 bad   149.938 GB
21 bad   198.326 GB
22 bad   217.039 GB
23 bad   217.042 GB
24 bad   217.044 GB
25 bad   217.045 GB
26 bad   217.049 GB
27 bad   249.926 GB
28 bad   265.254 GB
29 bad   265.255 GB
30 bad   284.159 GB
31 bad   284.164 GB
32 bad   284.17 GB
33 bad   284.172 GB
34 bad   295.717 GB
35 bad   295.718 GB
36 bad   378.504 GB
37 bad   378.506 GB
38 bad   378.508 GB
39 bad   399.445 GB
40 bad   416.755 GB
41 bad   528.304 GB
42 bad   528.311 GB
43 bad   528.312 GB
44 bad   528.313 GB
45 bad   528.314 GB
46 bad   528.315 GB
47 bad   528.321 GB
48 bad   528.322 GB
49 bad   528.335 GB
root at as1:~#

It seems that I have now different and more inconsistency! This is 
absolutely inacceptable.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120926/78103a8e/attachment.htm>


More information about the drbd-user mailing list