Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Jan 02, 2008 at 03:41:27PM +0000, Michael Toler wrote: > (I buried this question in another post before the holidays and it didn't seem > to get any response, so I'm trying again. > > System: > 2 IBM Xeon blades as NFS/DRBD log servers setup as > Primary/Secondary. > 20+ Blades running various packages (JBoss, Jabber, > Postgres, etc) logging > to the NFS log server. > All blades running RH 4 ES. > > Right now I have all of my processes connecting to the Primary > side of a my DRBD setup. I'm using this setup as a log server for > processes on multiple blades on a blade server. > > When I do the HA switchover, my processes lock for about 15 seconds > (when using soft mounted NFS directories) before continuing to write > to the files their log files on the secondary server. I had thought > that since I moved the /var/lib/nfs directory to my drbd shared > directory that the failover would be pretty much instantaneous. > Do I have a problem in my setup somewhere or is 15 seconds for > failover fairly standard? your problem is most likely not the switchover time of drbd (that should be virtually instaneous in such a setup), not that of the nfs server. but either an ip switchover side effect, something like arp table update timeout on the clients, or "mac spoofing protection" on the switch. or due to some nfs retry timeout magic. which then again depends on the nfs versions in use. btw, "soft" nfs mounts is a _really_ bad idea, imho. nice summary descriptions of the reasons can be found in http://nfs.sourceforge.net/#faq_e4 and http://unix.derkeiler.com/Newsgroups/comp.unix.solaris/2005-10/1566.html I suggest to use hard,intr (if you really want the intr, that is). and I suggest playing with timeo and retrans parameters. also see if switching from/to tcp/udp helps to meet your expectations. for "normal nfs failover" btw, 15 seconds is not that bad. with proper configuration and hardware, readily achievable *switchover* times with nfs on drbd should be <= 5 seconds. depending on actual io load, vfs tuning, amount of dirty pages at the exact time of the switchover (influencing time to umount), and possibly other tuning, maybe even less. *FAILOVER* is a different matter, since a failover is only triggered after your cluster manager decides that a node failure has happened, which is normally done when cluster heartbeats cease to be answered for a certain deadtime timeout, typically in the range of 10 to 30 seconds. failover actions -- potentially stonith, starting of services, necessary service data recovery like journal replays... -- are only taken after this deadtime, so any *failover* time will be noticeably larger than any *switchover* time, where services are intentionally and cleanly stopped/switched/started. switchover time: time to stop services unconfig ip (instantaneous), unconfig nfs (instantaneous (?)) umount fs (few ms to some seconds), drbd secondary (normally almost instantaneous, under certain circumstances a few seconds) + time to start services drbd primary (instantaneous in this scenario), mount fs (few ms in this scenario) config nfs (instantaneous) config ip + time for clients to notice the ip change and "reconnect" failover time: time to notice node failure (typically several seconds) + optionally time for any fencing operations + time to start services the same as above, + recovery time during mount, which is very hard to predict, but depends on fs type and size (failover time) - (switchover time) - time to stop services which should be negligible in this case + time to notice node failure which is configurable, but several seconds + time to do fencing + time to do data recovery both durations are difficult to guess so my guess is that for *failover*, you have to look into your cluster managers deadtime configuration, and also tune the various other involved timeouts in your cluster manager and in drbd to be as short as possible but as long as necessary. But don't fall into the trap and configure timeouts as short as you would wish them to be! :) or you will see spurious "node dead, trying to takeover" events, which is very annoying at best. -- : Lars Ellenberg http://www.linbit.com : : DRBD/HA support and consulting sales at linbit.com : : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : __ please use the "List-Reply" function of your email client.