Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, 2004-08-18 at 23:13, Lars Ellenberg wrote: > / 2004-08-18 12:38:01 +0200 > \ Alex Ongena: > > Lars, > > > > I can simulate it and I think it's related to getting a: > > "PingAck did not arrive in time" on the slave, during > > a 'proper or not so proper' reboot of the master. > > => both systems are consistent, but the HA software > > put's the slave into primary (because the master is dead) > > and when the master commes back, they both think they > > are consistent although they are different. > > Maybe the 'longest a-live' should be consisered as real- > > consistent.... > > consistent is not a sysnonym for up-to-date. > > and something with your init script order is not working properly. > it first cuts the network, while drbd is still up and running. > you need to *first* make a possible Primary drbd Secondary, > or completely stop drbd, and only *then* shutdown the network. I know, but I tried to figure out how robust the driver was against 'less than normal' shutdowns, possibly caused by faulty HD's where the proper shutdown scripts are corrupted... I'am _not_ simulating the _normal_ situations, but those that I have seen that can happen. Nevertheless, I think it should be possible to recover from a split-brain situation by just using drbdsetup command's. Now, the only way to recover is to manualy corrupt the, drbd-meta storage. > just out of curiosity, > what are you using as cluster manager? heartbeat > why don't you use heartbeat? I am > > PS: let me know if I can help with further testing to make > > 0.7.x rock-solid. > > for now, just use it the right way :) I do, but I also like drbd to be robust when used the wrong way (caused by script failures, errors, bad HD's, ....) Regards alex > > Lars Ellenberg -- aXs GUARD has completed security and anti-virus checks on this e-mail (http://www.axsguard.com)