[DRBD-user] [DRBD-9.0.15-0rc1] Resource "stuck" during live migration

Yannis Milios yannis.milios at gmail.com
Wed Jul 25 21:49:02 CEST 2018


Hello,

Currently testing 9.0.15-0rc1 on a 3 node PVE cluster.

Pkg versions:
------------------
cat /proc/drbd
version: 9.0.15-0rc1 (api:2/proto:86-114)
GIT-hash: fc844fc366933c60f7303694ca1dea734dcb39bb build by root at pve1,
2018-07-23 18:47:08
Transports (api:16): tcp (9.0.15-0rc1)
ii  python-drbdmanage             0.99.18-1
ii  drbdmanage-proxmox            2.2-1
ii  drbd-utils                    9.5.0-1
---------------------
Resource=vm-122-disk-1
Replica count=3
PVE nodes=pve1,pve2,pve3
Resource is active on pve2 (Primary), the rest two nodes (pve1,pve2) are
Secondary.

Tried to live migrate the VM from pve2 to pve3 and the process stuck just
before starting. By inspecting dmesg on both nodes (pve2,pve3), I get the
following crash..


pve2 (Primary) node:
https://privatebin.net/?fb5435a42b431af2#4xZpd9D5bYnB000+H3K0noZmkX20fTwGSziv5oO/Zlg=

pve3(Secondary)node:
https://privatebin.net/?d3b1638fecb6728f#2StXbwDPT0JlFUKf686RJiR+4hl52jEmmij2UTtnSjs=

Cancelled the migration, but it now it's impossible to change the state of
the DRBD resource (vm-122-disk-1), in any way (switch from Primary to
Secondary, Disconnect, bring down the resource etc) on pve3 or pve2.

root at pve3:~# drbdadm down vm-122-disk-1
vm-122-disk-1: State change failed: (-12) Device is held open by someone
additional info from kernel:
failed to demote
Command 'drbdsetup down vm-122-disk-1' terminated with exit code 11

Can't find any apparent process locking the specific resource on pve3 by
using lsof.

Is there a way to recover from this without rebooting the each node ?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180725/5efacdca/attachment.htm>


More information about the drbd-user mailing list