[DRBD-user] Failover problem with LINSTORPlugin.pm for Proxmox

Roland Kammerer roland.kammerer at linbit.com
Tue Feb 19 10:25:48 CET 2019


On Fri, Feb 08, 2019 at 10:40:43AM +0100, Wanja Schonecke wrote:
> Hi there,
> 
> i hope this is the right place for my question.
> 
> i'm running an Proxmox-Cluster with Linstor and just two real nodes.
> There is a third node (RaspberryPi) just running corosync for Quorum to
> work.
>    
> Everything is working fine, except for failover when one of the two real
> nodes fails.
> The remaining node is not able to start any VM.
> The Proxmox HA tries to start the VM on the remaining node and runs into
> a timeout after 60 seconds.
> I traced it back to the Linstor plugin on line 479.
> Where there is:
> wait_connect_resource($volname);
> 
> Which does:
> run_command(
>         [ 'drbdsetup', 'wait-connect-resource', $resource ],
>         errmsg => "Could not wait until replication established for
> ($resource)",
>         timeout => 60 # could use --wfc-timeout, but hey when we already
> do it proxmoxy...
>     );
>    
> I'm fairly new to DRBD but i read:
> wait-connect-resource - Wait until all connections are establised.
> 
> That sounds like this command will wait for connection with the peer
> node, which isn't available at this time as it just failed.
> As the ressource is available on this node at this time in secondary
> mode, i wonder if the plugin shouldn't just switch it to primary,
> instead.   
> If i comment out line 479 in LINSTORPlugin.pm, Proxmox HA is able to
> start all my VM on the remaining node, if the peer node fails.
> 
> So my cluster would work just fine in any cases i tested, if i just
> leave line 479 commented out.
> But i',m not sure if i will run into any other problems if i leave it
> like that and i have no clue what this line is for.
> So it would be good to know, what this command is doing there and if
> there is any other solution for my problem.

Hi Wanja,

sorry, I overlooked this mail. You are right, in a two node cluster this
might cause issues. This got introduced to really make sure the device
is usable (stating or RO open in general is not good enough). Connected
is.

I will think about it. Thanks for reporting!

Regards, rck


More information about the drbd-user mailing list