[DRBD-user] linstor-proxmox-2.9.0

Mon Aug 27 11:11:30 CEST 2018

Il 27/08/2018 09:33, Roland Kammerer ha scritto:
> On Mon, Aug 27, 2018 at 09:12:12AM +0200, Roberto Resoli wrote:
>> Il 24/08/2018 10:54, Roland Kammerer ha scritto:
>>> This feature is documented here:
>>> https://docs.linbit.com/docs/users-guide-9.0/#s-proxmox-ls-HA
>>
>> Hello,
>> I read and tried the described procedure, but my findings were not positive.
>> In two cases the entire drbd storage freezed and all vm were stopped.
> 
> Could you describe these scenarios in more detail? How many nodes, where
> was the controller VM started, what did you do exactly, autoboot of
> other VMs? Sorry, as always, "does not work" is not good enough to debug
> things.

Yes, of course. The scenario is:

=== general ===

3 pve nodes, each one with a 2TB disk dedicated to drbd; I recently 
migrated from drbdmanage, all 3 nodes are COMBINED type, each resource 
is replicated on each node.

=== networking ===

drbd storage uses a dedicated network mesh, with dedicated connections 
between nodes (no switch). There is a bridge used as drbd interface on 
each node, stp is disabled, and broadcasts storm are blocked using 
ebtable rules. Here's the relevant part of /etc/network/interfaces file 
for node pve3

auto vmbr2
iface vmbr2 inet static
	address  10.1.1.3
	netmask  255.255.255.0
         bridge_ports eth2 eth3
         bridge_stp off
	bridge_ageing 30
         bridge_fd 5
# Only with stp on
       # pve1 and pve2 are preferred
         #bridge_bridgeprio 32768
# Only with stp off
         pre-up ifconfig eth2 mtu 9000 && ifconfig eth3 mtu 9000
	up   ebtables -I FORWARD -i eth2 -j DROP
	up   ebtables -I FORWARD -i eth3 -j DROP
	up   ebtables -I FORWARD -o tap200i1 -j ACCEPT
         down ebtables -D FORWARD -i eth2 -j DROP
         down ebtables -D FORWARD -i eth3 -j DROP
	down ebtables -D FORWARD -o tap200i1 -j ACCEPT

The "tap200i1" interface corresponds to th controller vm, the resulting 
ebtables rules are:

# ebtables -L
Bridge table: filter

Bridge chain: INPUT, entries: 0, policy: ACCEPT

Bridge chain: FORWARD, entries: 3, policy: ACCEPT
-o tap200i1 -j ACCEPT
-i eth3 -j DROP
-i eth2 -j DROP

Bridge chain: OUTPUT, entries: 0, policy: ACCEPT
root at pve3:~# ebtables -L
Bridge table: filter

Bridge chain: INPUT, entries: 0, policy: ACCEPT

Bridge chain: FORWARD, entries: 3, policy: ACCEPT
-o tap200i1 -j ACCEPT
-i eth3 -j DROP
-i eth2 -j DROP

Bridge chain: OUTPUT, entries: 0, policy: ACCEPT

=== drbd resources ===

Resource definitions are as follows:

# linstor rd l -p
+------------------------------+
| ResourceName  | Port | State |
|------------------------------|
| vm-100-disk-1 | 7009 | ok    |
| vm-101-disk-1 | 7001 | ok    |
| vm-101-disk-2 | 7002 | ok    |
| vm-102-disk-1 | 7003 | ok    |
| vm-103-disk-1 | 7000 | ok    |
| vm-103-disk-2 | 7004 | ok    |
| vm-104-disk-1 | 7008 | ok    |
| vm-104-disk-2 | 7005 | ok    |
| vm-105-disk-1 | 7006 | ok    |
| vm-106-disk-1 | 7013 | ok    |
| vm-120-disk-1 | 7010 | ok    |
| vm-120-disk-2 | 7015 | ok    |
| vm-121-disk-1 | 7007 | ok    |
| vm-122-disk-1 | 7011 | ok    |
| vm-123-disk-1 | 7012 | ok    |
| vm-200-disk-1 | 7014 | ok    |
| vm-999-disk-1 | 7016 | ok    |
| vm-999-disk-2 | 7017 | ok    |
+------------------------------+

=== drbd nodes before controller virtualization ===

+--------------------------------------------------+
| Node  | NodeType   | IPs               | State   |
|--------------------------------------------------|
| pve1  | COMBINED   | 10.1.1.1(PLAIN)   | Online  |
| pve2  | COMBINED   | 10.1.1.2(PLAIN)   | Online  |
| pve3  | COMBINED   | 10.1.1.3(PLAIN)   | Online  |
+--------------------------------------------------+

I installed the controller on 10.1.1.1 and this basic setup is working 
nicely.

I installed a debian stretch minimal vm with id 200, connected with 
distict interfaces to both the drbd storage and the regular network 
segments. I installed linbit controller on it, and gave hostname 
"drbdc". I stopped and disabled the controller on pve1 and migrated the 
service on drbdc. Then i added drbdc to the drbd cluster:

=== drbd nodes after controller virtualization ===

+--------------------------------------------------+
| Node  | NodeType   | IPs               | State   |
|--------------------------------------------------|
| drbdc | CONTROLLER | 10.1.1.200(PLAIN) | OFFLINE |
| pve1  | COMBINED   | 10.1.1.1(PLAIN)   | Online  |
| pve2  | COMBINED   | 10.1.1.2(PLAIN)   | Online  |
| pve3  | COMBINED   | 10.1.1.3(PLAIN)   | Online  |
+--------------------------------------------------+

After that I enabled and started drbd service on all three nodes.

Then i changed the drbd proxmox storage configuration as follows:

drbd: drbdthin
	content rootdir,images
	redundancy 3
	controller 10.1.1.200
	controllervm 200

Then i enabled HA for vm 200

===

At this point all worked quite well, but when i tried to shut down the 
node on which the controller vm resided, the resources on it did not 
came up, even that of controller vm. In one case HA did its job moving 
it on another node, but since in my setup many times the quorum is lost 
temporarily even restarting one node only, the controller restarted only 
after the quorum was established.

I have still to investigate the condition under that drbd storage became 
unavailable to pve, causing all vms to stop. Hopefully I will have a 
chance to give you some more details after examining the logs.

At the moment I can report only a bunch of these messages in syslog:

Aug 25 22:49:04 pve3 pvestatd[2598]: malformed JSON string, neither tag, 
array, object, number, string or atom, at character offset 0 (before 
"(end of string)") at 
/usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm line 321.

may be generated when i switched to the "controllervm" configuration.

>> The main problem, if I understand well, is that even if proxmox plugin does
>> not manage controller vm anymore, controller vm storage depends on itself.
> 
> What do you mean by that? The DRBD resource (== storage for the
> controller VM) is brought up by the drbd.service and can then be
> auto-promoted. 

I noted that resources definition files inside /var/lib/linstor.d are 
recreated each time the node is started, so i guess drbd service cannot 
bring them up.

> The plugin code ignores that VM. The Proxmox HA service
> should do its job and start the VM on one node. So again, where exactly
> does that chain break in your setup?

See before.

>> If it is stopped, or cannot be contacted, no corresponding resource will be
>> created, resulting in a deadlock.
> 
> That is true. But how would that happen? The Proxmox HA feature should
> make sure that the VM is always running (that is its job, isn't it). The
> storage should be accessible, because up'ed by drbd.service.

Is drbd service dependent on definitions files? If so, I guess that if 
controller is unavailable, definitions of resources linstor-managed are 
not provided to satellite that cannot create definitions. It's only a 
guess, i'm not inside linstor/drbd internals.

> Did Proxmox try to start another "autoboot" VM before the HA service
> kicked in to start the controller VM? That would at least explain it.

May be; i have another two vm with autoboot feature enabled, but I had 
set them in order after controller vm, and with minutes of delay.

> In
> that case we have to document that "autoboot" has to be disabled if one
> goes that route.

Ok, will retry and hopefully give you some more feedback.

Bye,
rob

> Thanks, rck
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>