[DRBD-user] Difference Between wtc-timeout and degr-wtc-timeout

Mon Oct 2 23:40:58 CEST 2006

/ 2006-10-02 09:40:48 -0700
\ Robinson, Eric:
> 
> > from drbd changelog:
> > 0.7.16 (api:77/proto:74)
> > -----
> > * There was a bug related to the degr-wcf-timeout config option, it
> was
> >   never used in recent DRBD releases. Fixed.
> >
> >maybe upgrading helps?
> 
> >From the changelog, it appears that the problem was fixed in 0.7.16. We
> are using 0.7.18. Do you have any other possible ideas?

maybe there is an other issue with getting the meta data flags on-disk,
and upgrading still helps.

maybe it is just a misunderstanding of those parameters.

I'll quote the code, to clarify the intention for when
"degr-wfc-timeout" is used:
 /* If I am currently not Primary,
  * but meta data primary indicator is set,
  * I just now recover from a hard crash,
  * and have been Primary before that crash.
  *
  * Now, if I had no connection before that crash
  * (have been degraded Primary), chances are that
  * I won't find my peer now either.
  *
  * In that case, and _only_ in that case,
  * we use the degr-wfc-timeout instead of the default,
  * so we can automatically recover from a crash of a
  * degraded but active "cluster" after a certain timeout.
  */

to put it an other way:

a)
  cluster fine, connected.
  cluster crash.
  only one node reboots.
    --- wfc-timeout is used ---
    if wfc-timeout it 0 (the default) it will block the boot "forever",
    or until operator intervention.

    the latter is:
    if you can login on the console, you are able to type "yes" on that
    infamouse drbd-stops-boot-process "do you really want to" prompt.
    or, you can log in via ssh (if you start your sshd early in the boot
    process, which you definetly should), and
     ps ax | grep drbdsetup | grep wait_connect
    and kill those.

b)
  only one node up and primary.
  crashes.
  --- degr-wfc-timeout is used ---
  may time out, and continue on its own.

c)
  only one node up and primary.
  gets rebooted, i.e. switched to secondary, shutdown cleanly.
  [ to drbd, this looks exactly like:
    only one node up and secondary, then crash ]
  --- wfc-timeout is used ---

  well, you did a clean shutdown.
  obviously you are on that box anyways.

at least that would be the intended behaviour.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.