Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Il 14/03/2017 10:47, Shafeek Sumser ha scritto: > Hi Yannis, > > Thanks for the information you provided. > > On pve1, I have initiate the cluster and add the node pve2. When the > drbdctrl is primary on pve1 (secondary on pve2) and I shutdown the pve2, > the drbd storage is available. I can do any manipulation and even the > VM is working. But on the other side, if I shutdown pve1 (where > drbdctrl is primary), the drbd storage is not available on pve2. > Moreover, any drbdmanage commands (lists-nodes, volumes etc..) does not > work on pve2. it says: > > root at pve2:~# drbdmanage list-nodes > Waiting for server: ............... > No nodes defined Being a two-node storage, I guess that shutting down the node where the drbd control node (.drbdctrl) is primary leaves the system with no chance but to detect a split brain. If there was a third node, one of the two remaining would be elected as primary. The following log excerpt seems to confirm: > The log goes as follows: > Mar 14 13:39:39 pve2 drbdmanaged[20776]: INFO Leader election by > wait for connections > Mar 14 13:39:39 pve2 drbdmanaged[20776]: INFO DrbdAdm: Running > external command: drbdsetup wait-connect-resource --wait-after-sb=yes > --wfc-timeout=2 .drbdctrl > Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR DrbdAdm: External > command 'drbdsetup': Exit code 5 > Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR drbdsetup/stderr: > degr-wfc-timeout has to be shorter than wfc-timeout > Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR drbdsetup/stderr: > degr-wfc-timeout implicitly set to wfc-timeout (2s) > Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR drbdsetup/stderr: > outdated-wfc-timeout has to be shorter than degr-wfc-timeout > Mar 14 13:39:41 pve2 drbdmanaged[20776]: ERROR drbdsetup/stderr: > outdated-wfc-timeout implicitly set to degr-wfc-timeout (2s) > Mar 14 13:39:41 pve2 drbdmanaged[20776]: WARNING Resource > '.drbdctrl': wait-connect-resource not finished within 2 seconds > > > Regarding the Split Brain issue, I can't find in the log that a split > brain situation has been detected on survival node ie pve2 see: > external command: drbdsetup wait-connect-resource --wait-after-sb=yes > --wfc-timeout=2 .drbdctrl > for the > moment. I have done a drbdmanage primary drbdctrl but still the drbd > storage is not available. How can I resolve the split brain manually so > as the drbd storage continues to work even if pve1 (primary is down). I guess that this should be left to drbdmanaged daemon, i guess the control node should not be managed directly. May be someone of drbd folks could give more details/advices. > I will try to test the scenario by adding a third drbd node (pve3) to > the cluster (drbdmanage add-node command) on pve1 and I will let you know. With three node you should avoid the split brain. By the way, even with a two node pve cluster you would be a degraded situation, no quorum ... -> most of the operations locked for safety. The same holds for the drbd storage cluster. By the way, my three node cluster performs as expected, the only difference between mine and yours being that I have three nodes for drbd storage as well. Bye, rob