Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
--- On Tue, 3/10/09, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > > However, if you keep track of your peer's failure, > > this restriction is potentially removed. > > We keep track of the peer being "outdated", > if it is. Cool! > > If node > > B suffers an HD failure and you are replacing its > > drive, do you want your cluster to require manual > > boot intervention if node A happens to go down > > for a minute? > > if it "goes down" _unexpectedly_, > it will be a crashed primary, > and use the "degr-wfc-timeout". > which is finite by default. > no manual intervention needed. But, is that not a risky suggestion? Will both node A and B in the above scenario start on their own (without being able to connect to their peer) after "degr-wfc-timeout"? If so, then for node A it would be a safe solution, but not for node B since it may already be outdated causing split brain, no?! > if it is _shut down_ cleanly, explicitly, > then well, you had manual intervention anyways. > > that is currently the reasoning behind how we deal > with wfc-timeout, and degr-wfc-timeout. OK, but neither of those situations allow a single node to start safely automatically on its own currently, do they? > > Seems unnecessary to me, node A > > should be smart enough to be able to reboot and > > continue on its own? > > but yes, we do consider to not wait at all _iff_ we find > the "peer is outdated or worse" flag in the meta data. > the flag is already there. I think that would be a very valuable option making a cluster much more HA, especially with low end commodity hardware and non professional setups where it might not be uncommon for machines to go down. > > Well, it certainly can be handled on the cluster level > > (and I plan on doing so), but why would drbd not want > > to store extra important information if possible? > > it already does. Cool, but how can a cluster manager get this info then? I tried using drbdadm dstate and could not see a difference in this case, am I missing something? > it just does not (yet) use it to skip the wait-for-connection > completely. this can probably be changed. this has some more > implications though, which we are discussing. I am not surprised, but I could not think of any. I am curious about what you think they are, could you elaborate? > > Even if drbd does not use this info, why not store the > > fact that you are positive that your peer is outdated > > we already do. Again, cool, how do I get that info from a script? > > (or that you are in split brain)?! > > hm. we probably could. but what would we then do with > that information? just display it? Yes, for starters, that would make better/smarter integration with heartbeat possible. This data could become a node attribute that could then become a constraint which allows a node to be promoted to master even if it cannot connect to its peer. Thanks, for you consideration on this. I run drbd for my home servers, nothing that really needs HA, I just like the ability to failover when I want to maintain a machine. Being that my data is important to me, but HA is a secondary goal, it is not uncommon for me to operate for a while with a node down. This means that I am probably much more prone to failure scenarios then the average "serious" setup. That being the case, I am more aware of the current drawbacks to the current failure handling. This one has burned me before. Another thing that makes my setup more prone to encountering this problem is the asymmetry of my cluster. I have only one of the nodes on a UPS, this means that if power goes out, my usual secondary will drop out right away (node B). But since my UPS has a limited backuptime, if the power outage is long, the primary will eventually also go down (node A). Now, when power comes back on, you would think that I would be fine, but another "feature" of much commodity hardware are soft power switches which do not always work right. Despite a BIOS setting that supposedly makes my backup computer (node B) able to power on by itself, it will not actually power on without manual intervention. So, after a long enough power outage, node A will return on its own unattended while node B will not. This leaves me with a cluster that is down that could be up. I know that I describe a lot of things above that no one in their right mind would want to do if they were serious about HA. However, I also believe that those who are serious about HA are less likely to actually make their clusters fail deliberately in various ways for testing only. This means that they too might have hidden scenarios which could cause more downtime to them then they anticipate. I hope that my (and other's) soft HA attempts will expose more corner cases that drbd could eventually handle better and become more robust than other HA solutions! Thanks, for listening to my blabbing... -Martin