Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
>> What exactly happened there and how can I avoid it? > > I have no idea. > possibly both have been told to "down" at the exact same time. > > there are a few "cluster wide state changes", and while one of those is > pending, no other cluster wide state change is allowed. > > apparently virt-1 attempted to detach (become Diskless) while virt-2 > attempted (and finally succeeded) to disconnect. > > this probably cannot be avoided, though it should be very rare. > it may be worked around in the resource agent by some retry logic. > > I said it should be rare, so it should not be easy to reproduce. Right. And just for the record I want to say that I could not reproduce it in that very cluster with numerous of the mentioned stop operations. So I'm back thinking it must've been the rare case where the state change happens at the exact same time. Regards Dominik > if you find a (simple) procedure to reproduce it, let me know, > we will find out why it happens, and make it behave better. > if you cannot easily reproduce it: => WONTFIX. > > no, "enable debug" in drbd would not help to understand what exactly > happened. though it is possible to use the tracing framework. > documentation of drbd tracing: see drbd source code. >