[DRBD-user] 0.7.2 Split-Brain, unrecoverable - update -reconstruction [user error]

Lars Ellenberg Lars.Ellenberg at linbit.com
Wed Aug 18 23:13:26 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2004-08-18 12:38:01 +0200
\ Alex Ongena:
> Lars,
> 
> I can simulate it and I think it's related to getting a:
> "PingAck did not arrive in time" on the slave, during
> a 'proper or not so proper' reboot of the master.
> => both systems are consistent, but the HA software
> put's the slave into primary (because the master is dead)
> and when the master commes back, they both think they
> are consistent although they are different.
> Maybe the 'longest a-live' should be consisered as real-
> consistent....

consistent is not a sysnonym for up-to-date.

and something with your init script order is not working properly.
it first cuts the network, while drbd is still up and running.
you need to *first* make a possible Primary drbd Secondary,
or completely stop drbd, and only *then* shutdown the network.

in fact, on reboot or shutdown, you should give up HA resources a
_long time_ before you cut the comm links.

go fix that, or you will run into this always.

just out of curiosity,
what are you using as cluster manager?
why don't you use heartbeat?


> PS: let me know if I can help with further testing to make
> 0.7.x rock-solid.

for now, just use it the right way :)

	Lars Ellenberg

-- 
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list