[Drbd-dev] [RFC] (CRM and) DRBD (0.8) states and transistions,
recovery strategies
Philipp Reisner
philipp.reisner at linbit.com
Mon Sep 27 16:52:10 CEST 2004
Am Freitag, 24. September 2004 16:29 schrieb Lars Ellenberg:
[...]
> Currently this covers only the states, and outlines the transitions. It
> should help to define the actions to be taken on every possible "input"
> to the DRBD internal "state machine".
>
While reading through this giant e-mail I lost my confidence that it
could be a good idea to have a "central" state switching function in
DRBD, but of course I will see what this discussions gives...
We have a huge space of possible cominations of these attributes, but
a lot of those are impossible/invalid... etc. Currently these constraints
are expressed by the code ...
The question is, what is easier to read/understand/code/get right.
[...]
>
> Allowed node state transition "inputs" or "reactions" are
>
> * up or down the node
>
> * add/remove the disk (by administrative request or in response to io
> error)
>
> if it was the last accessible good data, should this result in
> suicide, or block all further io, or just fail all further io?
>
> if this lost the meta-data storage at the same time (meta-data
> internal), do we handle this differently?
I guess this is a question we can not answer here for all of our users,
some one might want this, the others that... etc.. If it is a question
you can not answer, it probabely needs to be configurable.
> * fail meta-data storage
>
> should result in suicide.
>
> * establish or lose the connection; quit/start retrying to establish
> a connection.
>
> * promote to active / demote to non-active
>
> To promote an unconnected inconsistent non-active node you need
> brute force. Similar if it thinks it is outdated.
>
> Promoting an unconnected diskless node is not possible. But those
> should have been mapped to a "down" node, anyways.
>
Hmmm ?
Just had a look at what we are currently doing. Probabely we should
drop the DISKLESS bit and replace this by an enum
dstate: inconsistent,
outdated (known to be outdated -- happens via drbdadm outdate and
in data was consistent negotiation's outcome was this this
is old data and sync is Paused),
consistent (this reflects the meta-data meaning of consistent i.e.
might be outdated),
na (=diskless),
uptodate
and display this in /proc/drbd "ld:"
> * start/finish synchronization
>
> One must not request a running and up-to-date active node to become
> target of synchronization.
>
> * block/unblock all io requests
>
> This is in response to drbdadm suspend/resume, or a result of an
> "execption handler".
>
> * commit suicide
>
> This is our last resort emergency handler. It should not be
> implemented as "panic", though currently it is.
>
> Again, this is important, please double check: Did I miss something?
>
I think everything is there... (and reading it is quite inspiring)
-Philipp
More information about the drbd-dev
mailing list