[Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot.

Lars Ellenberg Lars.Ellenberg at linbit.com
Thu Nov 30 20:59:53 CET 2006


/ 2006-11-30 10:38:33 -0500
\ Montrose, Ernest:
> Phil,
> This involves Xen Vm's.  I would create one vm, I would then put an i/o
> load on there (Something that keeps reading and writing).  I would then
> go to the host and do an ifdown on the heartbeat interface in an attempt
> to force a split brain situation. I would then do an ifup. And every now
> and then this
> would happen (not all the time).  When it happens, it survives a reboot.
> I actually have not figured out how to get out of it.
> 
> I will try to find a more automatic way to reproduce it.

drbd_receiver.c, drbd_asb_recover_0p

| 	ch_peer = mdev->p_uuid[UUID_SIZE];
|       ch_self = drbd_bm_total_weight(mdev); ### <==

this ch_self may be different
from the one we communicated before, right?

|       switch ( mdev->net_conf->after_sb_0p ) {
|       ...
|
|       case DiscardZeroChg:

so, if we communicated ch_self == 0, but now ch_self is > 0,
and ch_peer is 0 (inactive peer sees this reversed), then

|                if( ch_peer == 0 && ch_self == 0) {

inactive peer does this, and may decide he is the source;

|                        rv=test_bit(DISCARD_CONCURRENT,&mdev->flags) ? -1 : 1;
|                        break;
|                } else {

active peer does this branch,
and decides he is the source.

|                        if ( ch_peer == 0 ) { rv =  1; break; }
|                        if ( ch_self == 0 ) { rv = -1; break; }
|                }
|                if( mdev->net_conf->after_sb_0p == DiscardZeroChg ) break;

doh. have to think about that...

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :


More information about the drbd-dev mailing list