[Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot.

Montrose, Ernest Ernest.Montrose at stratus.com
Thu Nov 30 21:12:13 CET 2006


Lars,
Interesting...Actually I am currently investigating a situation where
from an initial creation state, two out of my 4 devices will sync the
other two
Will get stuck in Inconsistent/Inconsistent Dstate and would never sync,
of course.
Your analysis might hold for that case too.  Only happens when my net  
configuration is:
rr-conflict violently;
after-sb-0pri discard-zero-changes
after-sb-1pri violently-as0p
after-sb-2pri violently-as0p

This situation is automatic after we first install and enabling of drbd
anew.
The configuration above works great if you install without them, sync
and
Then add them later and reboot.

EM--
-----Original Message-----
From: drbd-dev-bounces at linbit.com [mailto:drbd-dev-bounces at linbit.com]
On Behalf Of Lars Ellenberg
Sent: Thursday, November 30, 2006 3:00 PM
To: drbd-dev at linbit.com
Subject: Re: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across
reboot.

/ 2006-11-30 10:38:33 -0500
\ Montrose, Ernest:
> Phil,
> This involves Xen Vm's.  I would create one vm, I would then put an
i/o
> load on there (Something that keeps reading and writing).  I would
then
> go to the host and do an ifdown on the heartbeat interface in an
attempt
> to force a split brain situation. I would then do an ifup. And every
now
> and then this
> would happen (not all the time).  When it happens, it survives a
reboot.
> I actually have not figured out how to get out of it.
> 
> I will try to find a more automatic way to reproduce it.

drbd_receiver.c, drbd_asb_recover_0p

| 	ch_peer = mdev->p_uuid[UUID_SIZE];
|       ch_self = drbd_bm_total_weight(mdev); ### <==

this ch_self may be different
from the one we communicated before, right?

|       switch ( mdev->net_conf->after_sb_0p ) {
|       ...
|
|       case DiscardZeroChg:

so, if we communicated ch_self == 0, but now ch_self is > 0,
and ch_peer is 0 (inactive peer sees this reversed), then

|                if( ch_peer == 0 && ch_self == 0) {

inactive peer does this, and may decide he is the source;

|                        rv=test_bit(DISCARD_CONCURRENT,&mdev->flags) ?
-1 : 1;
|                        break;
|                } else {

active peer does this branch,
and decides he is the source.

|                        if ( ch_peer == 0 ) { rv =  1; break; }
|                        if ( ch_self == 0 ) { rv = -1; break; }
|                }
|                if( mdev->net_conf->after_sb_0p == DiscardZeroChg )
break;

doh. have to think about that...

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
_______________________________________________
drbd-dev mailing list
drbd-dev at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-dev


More information about the drbd-dev mailing list