[DRBD-user] linstor-proxmox-2.9.0

Roberto Resoli roberto at resolutions.it
Thu Aug 30 16:06:56 CEST 2018


I'm happy to inform the lists about last findings in making 
linstor-controller HA on PVE, for reference:


Il 27/08/2018 11:11, Roberto Resoli ha scritto:

> I have still to investigate the condition under that drbd storage became 
> unavailable to pve, causing all vms to stop. Hopefully I will have a 
> chance to give you some more details after examining the logs.

I found after several trials that quorum on my cluster was too unstable 
to support HA.

First of all, following Yannis steps, I migrated the controller vm 
resource out of linstor managed ones, (so to avoid its definition being 
deleted at linstor-satellite startup, see 
https://lists.gt.net/drbd/users/30049#30049 ).

This fixed the resource unavailable issue, but after having put 
controller vm under HA, the nodes randomly started to reboot, often 
after having rebooted a selected one.

After having searched in the proxmox forum, I found that this behaviour 
is often related to a bad multicast setup. In particular my suspects 
went to the switch, after having read this sentence on the proxmox wiki:

"This uncovers problems where IGMP snooping is activated on the network 
but no multicast querier is active"

This was exactly my case; my switch had IGMP snooping enabled and no 
querier in the net. After having disabled IGMP snooping (my net is so 
small that doesn't make much sense configuring a querier, which should 
be the correct action) the quorum configuration became much more solid.

I suggest to all Proxmox cluster users to read carefully all the 
documentation regarding multicast configuration and testing:


> At the moment I can report only a bunch of these messages in syslog:
> Aug 25 22:49:04 pve3 pvestatd[2598]: malformed JSON string, neither tag, 
> array, object, number, string or atom, at character offset 0 (before 
> "(end of string)") at 
> /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 321.

These are generated when proxmox queries the linstor plugin about 
current status, expecting a response in json format, but the configured 
controller is not responding.

Finally: at the moment the linstor controller HA is working quite well, 
in particular I find handy the ability to live migrate it elsewere when 
a node needs maintenance.


More information about the drbd-user mailing list