[DRBD-user] Understanding degr-wfc-timeout

Fri Dec 3 03:13:09 CET 2010

On Thu, 02 Dec 2010 13:36:25 -0600, J. Ryan Earl wrote:

> If you "gracefully" stop DRBD on one
> node, it's not "degraded."  Degraded is from like a non-graceful
> separation due to a crash, power-outage, network issue, etc where one
> end detects the other is gone instead of being told to gracefully close
> connection between nodes.

I issued a "stop" (a graceful shutdown) only after I broke the DRBD 
connection by blocking the relevant packets.  So before the stop, the 
cluster was in a degraded state:

 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Using "stop" still causes a clean shutdown which then avoids degr-wfc-
timeout?

Is there any way that a network issue, or anything else short of a crash 
of the system, can invoke degr-wfc-timeout?  I've even tried 'kill -9' of 
the drbd processes, but they seems immune to this.

I can force a system crash if I have to, but that's something of pain in 
the neck so I'd prefer another option if one is available.

Or have I misunderstood?  I've been assuming that degr-wfc-timeout 
applies only to the WFC at startup (because the timeout value is in the 
startup block of the configuration file).  Is this controlling some other 
WFC period?

When I break the connection (and with no extra fencing logic specified), 
I see that both nodes go into a WFC state.  But this is lasting well 
longer than the 60 seconds I have defined in degr-wfc-timeout.

Thanks...

	Andrew