[DRBD-user] speed of fail-over..

Mon Oct 20 10:23:03 CEST 2008

Little, Kevin wrote:
> I searched the archives for this, but found nothing.

Well if you search _this_ list's archives, that's not exactly suprising
as what you are after is more of a Heartbeat issue than a DRBD one.
Searching the linux-ha archives is likely to yield better results.

> I’ve seen mentioned
> (http://fghaas.wordpress.com/2007/06/26/when-not-to-use-drbd/) that the
> fail-over time for DRBD+Heartbeat is on the order of 20 seconds.

Well it's configurable (via the deadtime config entry in
/etc/ha.d/ha.cf), but on the order of between 15 to 30 seconds is what
people usually pick.

> First, is this accurate? 

Yes. But don't take my word for it; I wrote that blog post. :-)

> Second, what portion of the 20 seconds is a
> function of Heartbeat, what portion is a function of DRBD? 

Those 20 seconds (if you use a deadtime value of 20s) are enforced by
Heartbeat, irrespective of DRBD.

And, your application may need to recover _after_ failover is completed
(such as a filesystem journal replay, database recovery, etc.), which is
outside both Heartbeat's and DRBD's realm entirely.

> Third, what
> options are there to speed up DRBD failover (alternate cluster manager,
> etc.)?

Set deadtime to a lower value, plain and simple.

But:
1. Do not set deadtime so low that a network hiccup causes Heartbeat to
pronounce a node dead.
2. Do not set deadtime lower than the keepalive interval (default 1s,
unless explicitly specified in ha.cf).
3. Do not set deadtime lower than the sum of the timeout and
ping-timeout parameters (defaulting to 6 and 5 seconds, respectively, so
a deadtime of <=11s is generally a bad idea).

Hope this helps.
Cheers,
Florian

-- 
: Florian G. Haas
: LINBIT Information Technologies GmbH
: Vivenotgasse 48, A-1120 Vienna, Austria

When replying, there is no need to CC my personal address. I monitor the
list on a daily basis. Thank you.

DRBD® and LINBIT® are registered trademarks of LINBIT.