Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Sorry, if my questions seemed to simplistic, I was asking for confirmation about my understanding before making my suggestion for improvement below: > The thing that people usually realised too late is > that default setting > for wait-for-connection setting > is 0 which means wait for peer connection forever or > manual intervention. Such setting will block your boot > up process in case DRBD service is started at boot up > time. OK, so then in case of 0 (the default), my scenario descriptions were correct, no? What happens with the drbd8 M/S OCFs, since they do not use the boot script? Do they honor the wait-for-connection setting? > Re 2A) No, if you set reasonable wait for connection > timeout interval. But, what is reasonable? Anything that you set risks becoming split brain if it happens to be scenario 2B as you pointed out, right? This means that the default is currently the safe solution and anything else is extremely risky right? > Only node that has UpToDate data set can become > Primary But the first node that comes up in either 2A or 2B, will have data "UpToDate" right? So, in 2A it will be accurate, but in 2B it means split brain. If "UpToDate" is trusted (thus the wait-for-connection I suppose)? > So when one node goes down immediately, there > is no way to set its data as outdated. This is what I was trying to verify since I do not like this part. Again, I assume that is why the default wft is 0? Setting it to any other value seems like asking for splitbrain. I was wondering if any other solutions have been sought to this problem? Would it not be possible after node B goes down to record that node B is outdated on node A (just because node B is unreachable does not mean that we do not have valuable information about the cluster status)? This way in scenario 2B) nothing would change, node B would continue to wait for node A before any node is promoted, but at least in scenario 2A), node A could take note that node B was previously down (and therefor should be considered outdated) and node A should be allowed to be promoted right away (or after a newly defined timer expires) without waiting for node B to come up? If the cluster was degraded when node A went down, it should be able to continue to operate degraded safely when node A comes backup right? Is there anything wrong with this logic? Are there currently any mechanisms to do this? Thanks, -Martin