[DRBD-user] 0.7.2 Split-Brain, unrecoverable - update-reconstruction [user error]

Alex Ongena Alex.Ongena at able.be
Thu Aug 19 09:26:05 CEST 2004

On Wed, 2004-08-18 at 23:13, Lars Ellenberg wrote:
> / 2004-08-18 12:38:01 +0200
> \ Alex Ongena:
> > Lars,
> > 
> > I can simulate it and I think it's related to getting a:
> > "PingAck did not arrive in time" on the slave, during
> > a 'proper or not so proper' reboot of the master.
> > => both systems are consistent, but the HA software
> > put's the slave into primary (because the master is dead)
> > and when the master commes back, they both think they
> > are consistent although they are different.
> > Maybe the 'longest a-live' should be consisered as real-
> > consistent....
> consistent is not a sysnonym for up-to-date.
> and something with your init script order is not working properly.
> it first cuts the network, while drbd is still up and running.
> you need to *first* make a possible Primary drbd Secondary,
> or completely stop drbd, and only *then* shutdown the network.

I know, but I tried to figure out how robust the driver was
against 'less than normal' shutdowns, possibly caused by faulty HD's
where the proper shutdown scripts are corrupted...

I'am _not_ simulating the _normal_ situations, but those that I
have seen that can happen.

Nevertheless, I think it should be possible to recover from a
split-brain situation by just using drbdsetup command's.
Now, the only way to recover is to manualy corrupt the,
drbd-meta storage.

> just out of curiosity,
> what are you using as cluster manager?

> why don't you use heartbeat?
I am

> > PS: let me know if I can help with further testing to make
> > 0.7.x rock-solid.
> for now, just use it the right way :)
I do, but I also like drbd to be robust when used the wrong way
(caused by script failures, errors, bad HD's, ....)

> 	Lars Ellenberg

