[DRBD-user] drbdmanage hangs frequently

Tue Apr 17 09:43:23 CEST 2018

On Mon, Apr 16, 2018 at 12:44:22PM -0400, dehacked wrote:
> Greetings,
> 
> I have a small cluster used for Openstack (Newton on centos 7 nodes). I have
> 2 main storage nodes, 1 openstack controller node and 5 'diskless'
> hypervisors. It's configured with the hypervisors as satellite nodes and the
> 3 remaining servers as management nodes with the management volume, though
> only the 2 storage nodes actually hold the rest of the user data.
> 
> I'm finding that drbdmanage hangs frequently trying to communicate with the
> service. Even 'drbdmanage ping' will timeout. Examining the service process
> I see it apparently busy connecting to another host which is itself hung.
> 
> Any ideas what's wrong or what troubleshooting steps I should be taking here?

Usually this is a sign that at least one of them is busy and tries to do
the same thing (e.g., create a resource, delete a resource,...) over and
over again. Usually that stops after a fail-count is reached. But if it
even takes longer than the TCP timeout we set, a node might not even be
able to report back that it failed doing something. And then this loops.
There have been fixes in that regard and the latest version has a
configurable TCP timeout.

Enable debugging, check if you detect such a "busy loop" in the syslogs.

> Thanks
> 
> drbdmanage version 0.99.14
> kernel driver version 9.0.9
> drbd-utils version 9.1.1
> all built from source tarballs

Every single one of them is outdated. At least try the latest drbdmange.

Regards, rck