Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Mar 09, 2009 at 10:59:46AM -0700, Martin Fick wrote: > However, if you keep track of your peer's failure, > this restriction is potentially removed. We keep track of the peer being "outdated", if it is. We do not keep track of "peer's failure", because we don't know about that. > If node > B suffers an HD failure and you are replacing its > drive, do you want your cluster to require manual > boot intervention if node A happens to go down > for a minute? if it "goes down" _unexpectedly_, it will be a crashed primary, and use the "degr-wfc-timeout". which is finite by default. no manual intervention needed. if it is _shut down_ cleanly, explicitly, then well, you had manual intervention anyways. that is currently the reasoning behind how we deal with wfc-timeout, and degr-wfc-timeout. > Seems unnecessary to me, node A > should be smart enough to be able to reboot and > continue on its own? but yes, we do consider to not wait at all _iff_ we find the "peer is outdated or worse" flag in the meta data. the flag is already there. > Well, it certainly can be handled on the cluster level > (and I plan on doing so), but why would drbd not want > to store extra important information if possible? it already does. it just does not (yet) use it to skip the wait-for-connection completely. this can probably be changed. this has some more implications though, which we are discussing. > Even if drbd does not use this info, why not store the > fact that you are positive that your peer is outdated we already do. > (or that you are in split brain)?! hm. we probably could. but what would we then do with that information? just display it? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed