[DRBD-user] Concurrent state changes detected and aborted

Fri Jun 19 13:44:20 CEST 2009

>> What exactly happened there and how can I avoid it?
> 
> I have no idea.
> possibly both have been told to "down" at the exact same time.
> 
> there are a few "cluster wide state changes", and while one of those is
> pending, no other cluster wide state change is allowed.
> 
> apparently virt-1 attempted to detach (become Diskless) while virt-2
> attempted (and finally succeeded) to disconnect.
> 
> this probably cannot be avoided, though it should be very rare.
> it may be worked around in the resource agent by some retry logic.
> 
> I said it should be rare, so it should not be easy to reproduce.

Right. And just for the record I want to say that I could not reproduce
it in that very cluster with numerous of the mentioned stop operations.

So I'm back thinking it must've been the rare case where the state
change happens at the exact same time.

Regards
Dominik

> if you find a (simple) procedure to reproduce it, let me know,
> we will find out why it happens, and make it behave better.
> if you cannot easily reproduce it: => WONTFIX.
> 
> no, "enable debug" in drbd would not help to understand what exactly
> happened. though it is possible to use the tracing framework.
> documentation of drbd tracing: see drbd source code.
>