[DRBD-user] Re: Apps cannot write for 15 seconds during failover

Michael Toler mike.toler at prodeasystems.com
Thu Jan 3 17:55:09 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Lars Ellenberg <lars.ellenberg at ...> writes:

> your problem is most likely not the switchover time of drbd
> (that should be virtually instaneous in such a setup),
> not that of the nfs server.
> but either an ip switchover side effect,
> something like arp table update timeout on the clients,
> or "mac spoofing protection" on the switch.
> or due to some nfs retry timeout magic.
> which then again depends on the nfs versions in use.

(New to DRBD, so I'll try and be more specific between
 switchover and failover.  So far all of my testing has 
 been SWITCHOVER only, by stopping the heartbeat service.)

On the Servers, DRBD switches over pretty much instantaneously
when I do a "service heartbeat stop" to trigger it.  Maybe 2 
or 3 seconds total before I can access the directories on the 
secondary (now Primary) server.

It's just the NFS clients that take the 15 seconds to notice
that the switchover has occurred. 

> btw, "soft" nfs mounts is a _really_ bad idea, imho.

I changed to "soft" mounts because when I had it set to "hard" it took
1 to 2 minutes for a NFS client to notice that the switchover 
had occurred.  After changing to "soft" it dropped to 15 seconds.

Today, of course, both "soft" and "hard" mounts are only taking 15
seconds to do the switchover.  <sigh>

> for "normal nfs failover" btw, 15 seconds is not that bad.

While I agree that 15 seconds doesn't seem that bad, all of our
server processes hang while trying to write to the NFS mount.  
So effectively, that 15 seconds is a total outage of our system, 
even though a true failover would be a RARE event.

> with proper configuration and hardware, readily achievable *switchover*
> times with nfs on drbd should be <= 5 seconds. depending on actual io
> load, vfs tuning, amount of dirty pages at the exact time of the
> switchover (influencing time to umount), and possibly other tuning,
> maybe even less.

Guess I need to start tuning.  :-P

> <more great info>

Thanks Lars, You've been very helpful.


More information about the drbd-user mailing list