[DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary

Lars Ellenberg lars.ellenberg at linbit.com
Tue Jan 8 11:56:00 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Jan 07, 2008 at 01:16:16PM -0800, Art Age Software wrote:
> Hi all,
> 
> I've asked this question before and have still not figured it out.
> 
> Either  the degr-wfc-timeout setting is not working as documented, or
> I just don't understand how it is supposed to work.
> 
> Here's the scenario:
> 
> 1) Both primary and secondary nodes (servers) are running. DRBD is
> primary/connected/uptodate on Node1 and secondary/connected/uptodate
> on Node2.
> 
> 2) Shut down Node2. This takes DRBD on Node1 into primary/disconnected state.
> 
> 3) Reboot Node1. (Do **not** start up Node2. It remains shut down.)
> 
> According to my understanding, what I now have is a "degraded
> cluster." However, when Node1 reboots, the init script waits forever,
> ignoring the degr-wfc-timeout setting. It is as if DRBD does not think
> the cluster is degraded.
> 
> Another DRBD user on the list has confirmed seeing this behavior as
> well in his setup.
> 
> So, is this a DRBD bug? Or am I misunderstanding the use of the
> degr-wfc-timeout setting?

If I am currently not Primary,
but meta data primary indicator is set,
I just now recover from a hard crash,
and have been Primary before that crash.
         
Now, if I had no connection before that crash
(have been degraded Primary), chances are that
I won't find my peer now either.

In that case, and _only_ in that case,
we use the degr-wfc-timeout instead of the default,
so we can automatically recover from a crash of a
degraded but active "cluster" after a certain timeout.
         
which means, that if you _reboot_ a degraded node,
this will not use the "degr-wfc-timeout".

the idea is:
if you intentionally reboot it, you aparently "logged in" anyways
(well, reboot will kick you off, but you can immediately log in again).
maybe you fixed some hardware thing, and the reboot is supposed to
pick that up. if not, because you are sitting in front of the console
anyways, you can confirm/kill that wfc-thing if necessary.

if it crashed while being Primary, and then later boots up again,
it will use degr-wfc-timeout.

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list