[DRBD-user] linstor failing to shutdown non-existent resource

Andreas Pflug pgadmin at pse-consulting.de
Wed Aug 25 15:47:32 CEST 2021


In the process of upgrading a Proxmox cluster from 6.4 to 7, I
encountered a failure of linstor which prevents me from proceeding.

First, I upgraded all nodes to the latest linstor 1.14.0, and made sure
that linstor node list shows all nodes Online.

Next, I upgraded all nodes to the latest 6.4 pve-manager (6.4-13), then
evacuated the first diskless node.

Then I upgraded the first node to pve 7.0-11. After dealing with bonding
problems (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990428), the
machine was a full member of the cluster again.

At this point, the new machine has
   linstor-proxmox 5.2.1
   drbd 9.1.3
   drbd-utils 9.18.2
 while all others still have
   linstor-proxmox 5.1.6
   drbd 9.1.2
   drbd-utils 9.18.0

I started draining the next node. Migrating 7 VMs online with 75G total
to the fresh node worked flawlessly, but the next migration failed:

Proxmox reports
"Resource did not became ready on node 'TARGET' within reasonable time,
check Satellite for errors."

Actually for that very moment, there was no error logged on the
satellite, but a minute later, an ErrorReport was created:
 "external command for stopping the DRBD resource failed", generated at
deleteDrbd.

linstor resource list will show that resource as Unconnected(...)
DELETING, while drbdadm on that satellite says "not in your config". I
still see /dev/drbd/... for that resources, drbd_r and drbd_s processes,
and 15 "drbdsetup down" processes for that resource that can't be killed -9.

Restarting linstor-satellite will create several "external command for
stopping..." ErrorReports, no more migration to that machine possible.

How can I resolve this situation, preferrably without reboot (because
the situation might pop up again)?

Regards,
Andreas



More information about the drbd-user mailing list