[DRBD-user] DRBD9 and Proxmox - no quorum in 2 nodes cluster

Tue Mar 14 13:56:30 CET 2017

Il 14/03/2017 10:47, Shafeek Sumser ha scritto:
> Hi Yannis,
> 
> Thanks for the information you provided. 
> 
> On pve1, I have initiate the cluster and add the node pve2.  When the
> drbdctrl is primary on pve1 (secondary on pve2) and I shutdown the pve2,
> the drbd storage is available.  I can do any manipulation and even the
> VM is working.  But on the other side, if I shutdown pve1 (where
> drbdctrl is primary), the drbd storage is not available on pve2. 
> Moreover, any drbdmanage commands (lists-nodes, volumes etc..) does not
> work on pve2.  it says:
> 
> root at pve2:~# drbdmanage list-nodes
> Waiting for server: ...............
> No nodes defined

Being a two-node storage, I guess that shutting down the node where the
drbd control node (.drbdctrl) is primary leaves the system with no
chance but to detect a split brain. If there was a third node, one of
the two remaining would be elected as primary. The following log excerpt
seems to confirm:

> The log goes as follows:
> Mar 14 13:39:39 pve2 drbdmanaged[20776]: INFO       Leader election by
> wait for connections
> Mar 14 13:39:39 pve2 drbdmanaged[20776]: INFO       DrbdAdm: Running
> external command: drbdsetup wait-connect-resource --wait-after-sb=yes
> --wfc-timeout=2 .drbdctrl
> Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR      DrbdAdm: External
> command 'drbdsetup': Exit code 5
> Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR      drbdsetup/stderr:
> degr-wfc-timeout has to be shorter than wfc-timeout
> Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR      drbdsetup/stderr:
> degr-wfc-timeout implicitly set to wfc-timeout (2s)
> Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR      drbdsetup/stderr:
> outdated-wfc-timeout has to be shorter than degr-wfc-timeout
> Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR      drbdsetup/stderr:
> outdated-wfc-timeout implicitly set to degr-wfc-timeout (2s)
> Mar 14 13:39:41 pve2 drbdmanaged[20776]: WARNING    Resource
> '.drbdctrl': wait-connect-resource not finished within 2 seconds
> 
> 
> Regarding the Split Brain issue, I can't find in the log that a split
> brain situation has been detected on survival node ie pve2

see:
> external command: drbdsetup wait-connect-resource --wait-after-sb=yes
> --wfc-timeout=2 .drbdctrl

> for the
> moment.  I have done a drbdmanage primary drbdctrl but still the drbd
> storage is not available.  How can I resolve the split brain manually so
> as the drbd storage continues to work even if pve1 (primary is down). 

I guess that this should be left to drbdmanaged daemon, i guess the
control node should not be managed directly. May be someone of drbd
folks could give more details/advices.

> I will try to test the scenario by adding a third drbd node (pve3) to
> the cluster (drbdmanage add-node command) on pve1 and I will let you know. 

With three node you should avoid the split brain. By the way, even with
a two node pve cluster you would be a degraded situation,

no quorum ... -> most of the operations locked for safety.

The same holds for the drbd storage cluster.

By the way, my three node cluster performs as expected, the only
difference between mine and yours being that I have three nodes for drbd
storage as well.

Bye,
rob