[DRBD-user] DRBD State Confusion

Lars Ellenberg Lars.Ellenberg at linbit.com
Wed Sep 20 14:04:51 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2006-09-16 20:13:52 +0100
\ Tim Jackson:
> Usman Ahmad wrote:
> 
> Hi, it would be easier if you didn't top-post. Thanks.
> 
> >No, there is no firewall setup, but same question again, it may be stupid. How long does a drbd node wait for 
> >changing the other node to unknown state ( i guess this is done through different timeout parameters etc.), and if 
> >the other server comes back up again, is it changed normally or not?
> 
> First, just in case it wasn't clear, DRBD doesn't change the status of the *local* node; only an external program 
> (e.g. Heartbeat) does that. As for what it shows for the status of the *remote* node, well, I haven't read the code 
> but it normally seems to me to be pretty fast. Again, read the logs: if the local node is primary (and writes are 
> taking place, I think), then you should start seeing "ko" warnings quickly, and how long before the other node is 
> declared dead depends on the "ko-count" configuration option. You will see all kinds of warnings from DRBD when the 
> nodes can't communicate. See also my bug report on this list the other day though, about DRBD not reporting a 
> transition of the remote node to "Unknown" status in the logs.

Thanks, Tim!

almost correct...
"ko-count" is for the case when the peer seems to still be alive
(answers drbd ping packets timely), but does no longer accept data
packets. this is to detect a half-broken secondary, where the local io
subsystem is broken. such a half broken secondary would otherwise block
the primary as well, which is obviously not desirable in an "HA" setup.

if the peer is _dead_ (or the network is broken; "does not answer at
all"), drbd detects this (worst case) within timeout (caution: unit
centisects) + ping-int, (default: 60 centisecs + 10 secs == 16 seconds;
both parameters configured within the net {} section).
typically "dead peer" detection is faster.

the heartbeat deadtime has to be larger.

the heartbeat init-deadtime should be much larger than the drbd
connect-int.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list