Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Sorry, if my questions seemed to simplistic, I was
asking for confirmation about my understanding
before making my suggestion for improvement below:
> The thing that people usually realised too late is
> that default setting > for wait-for-connection setting
> is 0 which means wait for peer connection forever or
> manual intervention. Such setting will block your boot
> up process in case DRBD service is started at boot up
> time.
OK, so then in case of 0 (the default), my scenario
descriptions were correct, no? What happens with the
drbd8 M/S OCFs, since they do not use the boot script?
Do they honor the wait-for-connection setting?
> Re 2A) No, if you set reasonable wait for connection
> timeout interval.
But, what is reasonable? Anything that you set risks
becoming split brain if it happens to be scenario
2B as you pointed out, right? This means that the
default is currently the safe solution and anything
else is extremely risky right?
> Only node that has UpToDate data set can become
> Primary
But the first node that comes up in either 2A or 2B, will
have data "UpToDate" right? So, in 2A it will be accurate,
but in 2B it means split brain. If "UpToDate" is trusted
(thus the wait-for-connection I suppose)?
> So when one node goes down immediately, there
> is no way to set its data as outdated.
This is what I was trying to verify since I do not like
this part. Again, I assume that is why the default wft
is 0? Setting it to any other value seems like asking
for splitbrain. I was wondering if any other solutions
have been sought to this problem?
Would it not be possible after node B goes down to
record that node B is outdated on node A (just because
node B is unreachable does not mean that we do not
have valuable information about the cluster status)?
This way in scenario 2B) nothing would change, node
B would continue to wait for node A before any node
is promoted, but at least in scenario 2A), node A
could take note that node B was previously down (and
therefor should be considered outdated) and node A
should be allowed to be promoted right away (or after
a newly defined timer expires) without waiting
for node B to come up?
If the cluster was degraded when node A went down,
it should be able to continue to operate degraded
safely when node A comes backup right? Is there
anything wrong with this logic? Are there currently
any mechanisms to do this?
Thanks,
-Martin