[DRBD-user] drbdmanage commands take a long time to complete on the leader if a node in the cluster is down

Mon Jan 23 09:20:53 CET 2017

On Sun, Jan 22, 2017 at 09:18:45AM +1300, Brady, Mike wrote:
> I am doing some testing with drbd9 and drbdmanage and am seeing some
> behaviour that I do not understand.
> 
> I have three nodes in a cluster.  Node names are kvm09, kvm10 and
> kvm11.  kvm09 is the leader.  All three systems are up to date Centos
> 7.3 with drbd 9.0.6, drbd-utils 8.9.10  and drbdmanage 0.98.2
> 
> If I shutdown a node, drbdmanage commands executed on the leader now
> take a "long time" to complete.
> 

Yes, that is correct. So far the leader tries to "ping" (not ICMP, a
drbdmanage protocol ping) the satellites relatively often. That is done
so that satellite nodes know who their leader is (they don't have that
information otherwise). Too long TCP timeouts and a clumsy "always ping
them even if you know the node is gone" lead to that behavior you
describe.

That was changed with a commit named "ping service: Don't hang if nodes
leave", which should be in the next release, which should happen this
week.

Regards, rck