[DRBD-user] info on degr-wfc-timeout

Lars Ellenberg lars.ellenberg at linbit.com
Sat Aug 30 00:58:29 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Aug 29, 2008 at 01:37:43PM +0200, Gianluca Cecchi wrote:
> My system is a 2.1.4 heartbeat cluster with rh el 5.2.
> I have 2.x config enabled (with crm = on) on it.
> I have installed kmod-drbd82-8.2.6-1.2.6.18_92.el5 and
> drbd82-8.2.6-1.el5.centos
> drbd module is started itself before heartbeat and I use drbdisk resource
> script in heartbeat to manage it.
> If primary node is up and I shutdown the second, on the primary I get some
> change status steps with
> connection: Connected -> NetworkFailure -> Unconnected -> WFConnection
> state of peer: Secondary -> Unknown
> peer disk: DUnknown -> Outdated
> (because outdate-peer helper returned 5 (peer is unreachable, assumed to be
> dead)
> 
> so that the final status on the primary is
> 
> [root at nfsnode1 ~]# service drbd status
> drbd driver loaded OK; device status:
> version: 8.2.6 (api:88/proto:86-88)
> GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
> buildsvn at c5-i386-build, 2008-06-21 08:29:11
> m:res              cs            st               ds                 p
> mounted        fstype
> 0:drbd-resource-0  WFConnection  Primary/Unknown  UpToDate/Outdated  C
> /drbd0         ext3
> 
> When I restart nfsnode1 (keeping nfsnode2 powered off) I would expect during
> drbd startup that degr-wfc-timeout will take place.
> Instead it seems that wfc-timeout is the parameter followed: with
> wfc-timeout set to 0 no start at all, with it put to 30 seconds, after 30
> seconds drbd starts, and then heartbeat correctly.
>
> So the question is: when drbd thinks it is in degraded mode?
> or does it depends on the fact that heartbeat stop during shutdown puts it
> in Secondary mode?

this has come up before:

  Date: Tue, 8 Jan 2008 11:56:00 +0100
  From: Lars Ellenberg <lars.ellenberg at linbit.com>
  To: drbd-user at lists.linbit.com
  Subject: Re: [DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary

On Mon, Jan 07, 2008 at 01:16:16PM -0800, Art Age Software wrote:
> Hi all,
>
> I've asked this question before and have still not figured it out.
>
> Either  the degr-wfc-timeout setting is not working as documented, or
> I just don't understand how it is supposed to work.
>
> Here's the scenario:
>
> 1) Both primary and secondary nodes (servers) are running. DRBD is
> primary/connected/uptodate on Node1 and secondary/connected/uptodate
> on Node2.
>
> 2) Shut down Node2. This takes DRBD on Node1 into primary/disconnected state.
>
> 3) Reboot Node1. (Do **not** start up Node2. It remains shut down.)
>
> According to my understanding, what I now have is a "degraded
> cluster." However, when Node1 reboots, the init script waits forever,
> ignoring the degr-wfc-timeout setting. It is as if DRBD does not think
> the cluster is degraded.
>
> Another DRBD user on the list has confirmed seeing this behavior as
> well in his setup.
>
> So, is this a DRBD bug? Or am I misunderstanding the use of the
> degr-wfc-timeout setting?

If I am currently not Primary,
but meta data primary indicator is set,
I just now recover from a hard crash,
and have been Primary before that crash.

Now, if I had no connection before that crash
(have been degraded Primary), chances are that
I won't find my peer now either.

In that case, and _only_ in that case,
we use the degr-wfc-timeout instead of the default,
so we can automatically recover from a crash of a
degraded but active "cluster" after a certain timeout.

which means, that if you _reboot_ a degraded node,
this will not use the "degr-wfc-timeout".

the idea is:
if you intentionally reboot it, you aparently "logged in" anyways
(well, reboot will kick you off, but you can immediately log in again).
maybe you fixed some hardware thing, and the reboot is supposed to
pick that up. if not, because you are sitting in front of the console
anyways, you can confirm/kill that wfc-thing if necessary.

if it crashed while being Primary, and then later boots up again,
it will use degr-wfc-timeout.



-- 
: Lars Ellenberg                
: LINBIT HA-Solutions GmbH
: DRBD®/HA support and consulting    http://www.linbit.com

DRBD® and LINBIT® are registered trademarks
of LINBIT Information Technologies GmbH
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list