roberto at resolutions.it
Thu Aug 30 16:06:56 CEST 2018
I'm happy to inform the lists about last findings in making
linstor-controller HA on PVE, for reference:
Il 27/08/2018 11:11, Roberto Resoli ha scritto:
> I have still to investigate the condition under that drbd storage became
> unavailable to pve, causing all vms to stop. Hopefully I will have a
> chance to give you some more details after examining the logs.
I found after several trials that quorum on my cluster was too unstable
to support HA.
First of all, following Yannis steps, I migrated the controller vm
resource out of linstor managed ones, (so to avoid its definition being
deleted at linstor-satellite startup, see
This fixed the resource unavailable issue, but after having put
controller vm under HA, the nodes randomly started to reboot, often
after having rebooted a selected one.
After having searched in the proxmox forum, I found that this behaviour
is often related to a bad multicast setup. In particular my suspects
went to the switch, after having read this sentence on the proxmox wiki:
"This uncovers problems where IGMP snooping is activated on the network
but no multicast querier is active"
This was exactly my case; my switch had IGMP snooping enabled and no
querier in the net. After having disabled IGMP snooping (my net is so
small that doesn't make much sense configuring a querier, which should
be the correct action) the quorum configuration became much more solid.
I suggest to all Proxmox cluster users to read carefully all the
documentation regarding multicast configuration and testing:
> At the moment I can report only a bunch of these messages in syslog:
> Aug 25 22:49:04 pve3 pvestatd: malformed JSON string, neither tag,
> array, object, number, string or atom, at character offset 0 (before
> "(end of string)") at
> /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 321.
These are generated when proxmox queries the linstor plugin about
current status, expecting a response in json format, but the configured
controller is not responding.
Finally: at the moment the linstor controller HA is working quite well,
in particular I find handy the ability to live migrate it elsewere when
a node needs maintenance.
More information about the drbd-user