[DRBD-user] Proxmox with Linstor: Online migration / disk move problem

Łukasz Wąsikowski lukasz at wasikowski.net
Mon Nov 22 15:28:10 CET 2021


I'm trying to migrate VM storage to Linstor SDS and have some odd 
troubles. All nodes are running Proxmox VE 7.1:

pve-manager/7.1-5/6fe299a0 (running kernel: 5.13.19-1-pve)

Linstor storage is, for now, on one host. When I create new VM on 
linstor it works. When I try to migrate VM from another host (and 
another storage) to Linstor it fails:

2021-11-22 13:06:53 starting migration of VM 116 to node 'proxmox-ve3' 
2021-11-22 13:06:53 found local disk 'local-lvm:vm-116-disk-0' (in 
current VM config)
2021-11-22 13:06:53 starting VM 116 on remote node 'proxmox-ve3'
2021-11-22 13:07:01 volume 'local-lvm:vm-116-disk-0' is 
'linstor-local:vm-116-disk-1' on the target
2021-11-22 13:07:01 start remote tunnel
2021-11-22 13:07:03 ssh tunnel ver 1
2021-11-22 13:07:03 starting storage migration
2021-11-22 13:07:03 scsi1: start migration to 
drive mirror is starting for drive-scsi1 with bandwidth limit: 51200 KB/s
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2021-11-22 13:07:03 ERROR: online migrate failure - block job (mirror) 
error: drive-scsi1: 'mirror' has been cancelled
2021-11-22 13:07:03 aborting phase 2 - cleanup resources
2021-11-22 13:07:03 migrate_cancel
2021-11-22 13:07:08 ERROR: migration finished with problems (duration 
TASK ERROR: migration problems

Linstor volumes are created during migration, no errors in it's logs. I 
don't know why Proxmox is cancelling this job.

When I try to move disk from NFS to Linstor (online) it fails:

create full clone of drive scsi0 (nfs-backup:129/vm-129-disk-0.qcow2)

Trying to create diskful resource (vm-129-disk-1) on (proxmox-ve3).
drive mirror is starting for drive-scsi0 with bandwidth limit: 51200 KB/s
drive-scsi0: Cancelling block job
drive-scsi0: Done.
TASK ERROR: storage migration failed: block job (mirror) error: 
drive-scsi0: 'mirror' has been cancelled

To move storage to Linstor I have first move it to NFS (online), turn 
off VM and move VM storage offline to Linstor. And bizzare thing is that 
once I do it, I can move this particular VM storage from Linstor to NFS 
online and from NFS to Linstor online. I can also migrate VM online, 
from Linstor, directly to another node and another storage without problems.

I've setup test cluster to reproduce this problem and couldn't - online 
migration to Linstor storage just worked. I don't know why it's not 
working on main cluster - any hints how to debug it?


dir: local
path /var/lib/vz
content vztmpl,iso,backup

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images
nodes proxmox-ve5,proxmox-ve4

drbd: linstor-local
content images,rootdir
resourcegroup linstor-local
preferlocal yes
nodes proxmox-ve3

zfspool: local-zfs
pool rpool/data
content images,rootdir
nodes proxmox-ve0,proxmox-ve1,proxmox-ve2
sparse 1

nfs: nfs-backup
export /data/nfs
path /mnt/pve/nfs-backup
server backup2
content rootdir,backup,images,iso,vztmpl
options vers=3

Best regards,
Łukasz Wąsikowski

