[DRBD-user] Recovering from erroneous sync state

Lars Ellenberg lars.ellenberg at linbit.com
Wed May 23 22:45:19 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, May 23, 2012 at 03:34:27PM -0500, Zev Weiss wrote:
> 
> On May 23, 2012, at 3:22 PM, Florian Haas wrote:
> 
> > On Wed, May 23, 2012 at 10:14 PM, Zev Weiss <zweiss at scout.wisc.edu> wrote:
> >> Hi,
> >> 
> >> I'm running DRBD 8.3.12, and recently hit what looks to me like a bug that was listed as fixed in 8.3.13 -- getting into a state where both nodes are in SyncSource (it's just stuck like that, going nowhere).  Luckily this happened on a test resource and not a live one, so it's not a big problem, but I was wondering if there were any known ways of recovering it without doing anything disruptive to the other resources (e.g. rebooting or unloading the kernel module).
> >> 
> >> I've tried 'drbdadm down', but it just hangs -- anyone have any other suggestions?  It doesn't really matter to me if it wipes the resource or anything, I'd just like to have my test device back in a working state without disturbing anything else.
> > 
> > Can you post /proc/drbd contents from both nodes here?
> > 
> 
> Sure -- here's one node:
> 
> version: 8.3.12 (api:88/proto:86-96)
> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by zweiss at mydomain, 2012-03-14 19:52:38
> 
> <snip other resources>
>  9: cs:SyncSource ro:Secondary/Primary ds:UpToDate/Inconsistent C r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536
>         [>...................] sync'ed:  5.9% (65536/65536)K
>         finish: 19046:04:53 speed: 0 (0 -- 0) K/sec (stalled)
>           0% sector pos: 0/10698352
>         resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
>         act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0

drbdsetup 9 disconnect --force
may work,
if you did not try a non-forced disconnect or similar before,
that is to say, if the drbd worker thread is not blocked yet.

You can always cut the tcp connection using iptables,
which should at least get the worker into a responsive state again.

> And here's the other:
> 
> version: 8.3.12 (api:88/proto:86-96)
> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by zweiss at fromage.scout.wisc.edu, 2012-03-14 19:52:38
> 
> <snip other resources>
>  9: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r-----
>     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536
>         [>...................] sync'ed:  5.9% (65536/65536)K
>         finish: 18987:55:05 speed: 0 (0 -- 0) K/sec (stalled)
>           0% sector pos: 0/10698352
>         resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
>         act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0
> 
> 
> > Also, if it still returns a meaningful result, you could add "drbdadm
> > get-gi <resource>" for the affected resource.
> 
> That just hangs, unfortunately (on both nodes).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list