Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
OK, I think I understand. Just to be clear: If I am running a degraded cluster (say the secondary server is being replaced and will be unavailable for several days), there are two possibilities when restarting the primary: 1) The primary crashes and reboots. In this case degr-wfc-timeout is honored. or 2) The primary is **manually** rebooted (cleanly). In this case degr-wfc-timeout is **not** honored. Is this correct? And if so, what is the intention behind degr-wfc-timeout exactly? Why would I want to control it separately from wfc-timeout? I think it would be very handy to have a config setting that says "This node is a one-node cluster until further notice. So, don't bother waiting for the peer - don't worry about split-brain - just start up." In the scenario I mentioned (having the secondary out for maintenance), it would be nice if it were not so easy to get into a situation where you think the server has come up - but your services are not all there. I can see this happening in a managed hosting situation where the hosting service reboots the machine for some reason but is unaware of the DRBD aspect of things. Thanks, Sam On Jan 8, 2008 2:56 AM, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > > On Mon, Jan 07, 2008 at 01:16:16PM -0800, Art Age Software wrote: > > Hi all, > > > > I've asked this question before and have still not figured it out. > > > > Either the degr-wfc-timeout setting is not working as documented, or > > I just don't understand how it is supposed to work. > > > > Here's the scenario: > > > > 1) Both primary and secondary nodes (servers) are running. DRBD is > > primary/connected/uptodate on Node1 and secondary/connected/uptodate > > on Node2. > > > > 2) Shut down Node2. This takes DRBD on Node1 into primary/disconnected state. > > > > 3) Reboot Node1. (Do **not** start up Node2. It remains shut down.) > > > > According to my understanding, what I now have is a "degraded > > cluster." However, when Node1 reboots, the init script waits forever, > > ignoring the degr-wfc-timeout setting. It is as if DRBD does not think > > the cluster is degraded. > > > > Another DRBD user on the list has confirmed seeing this behavior as > > well in his setup. > > > > So, is this a DRBD bug? Or am I misunderstanding the use of the > > degr-wfc-timeout setting? > > If I am currently not Primary, > but meta data primary indicator is set, > I just now recover from a hard crash, > and have been Primary before that crash. > > Now, if I had no connection before that crash > (have been degraded Primary), chances are that > I won't find my peer now either. > > In that case, and _only_ in that case, > we use the degr-wfc-timeout instead of the default, > so we can automatically recover from a crash of a > degraded but active "cluster" after a certain timeout. > > which means, that if you _reboot_ a degraded node, > this will not use the "degr-wfc-timeout". > > the idea is: > if you intentionally reboot it, you aparently "logged in" anyways > (well, reboot will kick you off, but you can immediately log in again). > maybe you fixed some hardware thing, and the reboot is supposed to > pick that up. if not, because you are sitting in front of the console > anyways, you can confirm/kill that wfc-thing if necessary. > > if it crashed while being Primary, and then later boots up again, > it will use degr-wfc-timeout. > > -- > : Lars Ellenberg http://www.linbit.com : > : DRBD/HA support and consulting sales at linbit.com : > : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : > : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : > __ > please use the "List-Reply" function of your email client. > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user >