[DRBD-user] Digest mismatch resulting in "split brain" after (!) automatic reconnect

Raoul Bhatia [IPAX] r.bhatia at ipax.at
Fri Mar 11 19:18:06 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 02/21/2011 02:18 PM, Lars Ellenberg wrote:
>>> Feb 16 06:25:04 c02n01 kernel: [3687390.947555] block drbd1: pdsk( UpToDate -> DUnknown )
>>>
>>> This should not have happened, either:
>>> We must not change the pdsk state to DUnknown while keeping conn state at Connected.
>>> That's nonsense.
>>>
>>> Feb 16 06:25:04 c02n01 kernel: [3687390.947633] block drbd1: new current UUID 89084B22FE454C03:3C1DADF6B38C1AD7:E7E50184F3F3AC0B:E7E40184F3F3AC0B 
>>
>> please let me know if you need any further input from my side.
> 
> Only if it is easily reproducible, and if so, how.
> Sorry, if you wrote that somewhere already, I missed it.
> Just write it again.

i tried to reproduce the problem by rapidly dropping and restoring
469MB .SQL Dumpfile.

unfortunately, this did not work.
i'll retry this with a bigger dumpfile in the next days.




one other thing: i downgraded most of our servers to 8.3.7 and
linux 2.6.32-bpo.5-amd64 (debian bpo). with this setup, i still see
some "Digest integrity check FAILED."
messages, but the resync works without any problem




now, i have one production cluster where i still did not manage to
downgrade both nodes. my current setup is:

wc01: master with drbd 8.3.10, kernel 2.6.27.57+ipax (self compiled)
wc02: slave with drbd 8.3.7, kernel 2.6.32-bpo.5-amd64 (debian bpo)


in this setup, i still see the described issue:
> root at wc01 ~ # ssh wc01c cat /proc/drbd ; ssh wc02c cat /proc/drbd 

wc01:
> version: 8.3.10 (api:88/proto:86-96)
> GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root at k000866c.ipax.at, 2011-02-03 14:58:22
>  0: cs:Connected ro:Primary/Secondary ds:UpToDate/DUnknown C r-----
>     ns:59832072 nr:0 dw:410069420 dr:1031415173 al:3623746 bm:11501 lo:16 pe:0 ua:0 ap:16 ep:1 wo:b oos:11453780
>  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
>     ns:15565136 nr:8918376 dw:144977376 dr:62777630 al:157147 bm:746 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
please note the DUnknown/oos values from drbd0

wc02:
> version: 8.3.7 (api:88/proto:86-91)
> srcversion: EE47D8BF18AC166BE219757 
>  0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
>     ns:0 nr:31434060 dw:31434060 dr:0 al:0 bm:25 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
>  1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
>     ns:0 nr:15485480 dw:15485480 dr:0 al:0 bm:24 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
wc02 thinks that everything is fine.



i don't know if this is of any help for you, but i thought that you
can ignore it in case it does not matter.

i can provide the logfiles too.

thanks,
raoul
-- 
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc.          email.          r.bhatia at ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG          web.          http://www.ipax.at
Barawitzkagasse 10/2/2/11           email.            office at ipax.at
1190 Wien                           tel.               +43 1 3670030
FN 277995t HG Wien                  fax.            +43 1 3670030 15
____________________________________________________________________



More information about the drbd-user mailing list