[Drbd-dev] Re: drbd_panic() in drbd_receiver.c

Wed Jul 5 10:25:42 CEST 2006

Am Dienstag, 4. Juli 2006 23:35 schrieb Graham, Simon:
> I'm now trying to work through the "internal dependencies and state
> changes that need to be adjusted" and it's proving tricky!
>

Hi Simon,

Pleas note that the way we do state changes has dramatically changed
from 0.7 to 8. In 8 we do it finally in a sane way.

> First things first though -- I'm assuming that in the case of a failed
> resync like this, we really want to end up back in Connected state (but
> still inconsistent) rather than simply staying in SyncTarget and
> continually trying to resync the affected block; do you agree with this
> as a goal?
>

Look out for pre_state_checks() in drbd-8. Currently it probably 
does not allow that state.

I have to add that there is a gracefull way of changing state
[ reuqest_state() ] , and a forcefull way [ force_state() ] .

request_state() is usually used by actions that are initiated by 
on operator, while force_state() is used if something fails...

So, if the disk fails during resync you could use force_state()
to go into Connected/Inconsistent, although this is not a valid
state as expressed by the constraints of pre_state_checks().

We need to check that there are no local requests issued to
the not-yet-synced areas. As far as I recall from the back of
my head, drbd-8 drbd_req.c already checks the local disk status
instead of the connection status, but we need to check this.

> Assuming that is the case, here's my problem (remember this is based on
> 0.7 at the moment) --

Hmm, oops, ok.

> right now, the check for end-of-resync is done in 
> w_update_odbm based on the current weight of the bitmap; what's more,
> this worker routine is only scheduled from drbd_try_to_clean_on_disk_bm
> IF a complete extent is zeroed (and, of course, this routine is only
> called from drbd_set_in_sync) -- so simply modifying w_update_odbm to
> check if the weight is <= the number of failed blocks will miss a couple
> of important cases:
> 1. If the failure is in the very last block and
> 2. If the failure is somewhere in the last extent of the on-disk bitmap

I see the issue here. Have to think about it.

> Apologies for the detail below, but I want to make sure I'm going about
> this the right way - Here's what I'm thinking as a way to fix this --
> please comment; you know this code so much better than I do!
>

I will try to answer that part of the mail later today, currently
I am running out of time

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :