[DRBD-user] Proxmox with Linstor: Online migration / disk move problem
Łukasz Wąsikowski
lukasz at wasikowski.net
Tue Nov 23 13:11:55 CET 2021
Hi Roland,
W dniu 2021-11-23 o 08:41, Roland Kammerer pisze:
> I have heard of that once before, but never experienced it myself and so
> far no customers complained so I did not dive into it.
>
> If you can reproduce it, that would be highly appreciated. To me it
> looks like the plugin and LINSTOR basically did their job, but then
> something else happens. This are just random thoughts that might be
> complete nonsense:
>
> - maybe some size rounding error and the resulting DRBD device is just a
> tiny bit too small. If you can reproduce it, I would check sizes of
> source/destination. If it starts writing and fails at the end it
> should start writing data. So does it take some time till it fails? Do
> you see that some data was written at the beginning of the DRBD block
> device that matches the source? But maybe there is already a size
> check at the beginning and it fails fast, who knows. Maybe try with a
> VM that has exactly the same size as the failing one in production.
It fails at the start of the migration. I have two identical (in terms
of size) VMs - both have 32 GB disk. The one that was not migrated to
Linstor looks like this:
scsi0: local-lvm:vm-131-disk-0,cache=writeback,size=32G
The one that was migrated (offline, via NFS) to Linstor, which
originally has size=32G, on Linstor looks like this:
scsi0: linstor-local:vm-125-disk-1,cache=writeback,size=33555416K
33555416 KiB is 32.0009384 GiB, slightly larger than 32 GiB.
> - some race and the DRBD device isn't actually ready before the
> migration wants to write data. Maybe there is more time before a
> disk gets used when a VM is created vs. when existing data is written
> to a freshly created device + migration.
I don't think so (but I may be wrong). "linstor volume list" when I
start live migration looks like this: https://pastebin.com/FWqbq6uK
It's at "InUse" at some point during migration.
> - check dmesg to see what happened on DRBD level
dmesg of failed migration of VM between nodes, from thin LVM to Linstor,
dmesg is from target node: https://pastebin.com/rN5ZQ8vN
This VM has two disks:
scsi0: local-lvm:vm-132-disk-0,cache=writeback,size=16M
scsi1: local-lvm:vm-132-disk-1,cache=writeback,size=10244M
This VM was at some point at Linstor, because it's size is 10244M
instead of 10G (originally it was 10G).
--
Best regards,
Łukasz Wąsikowski
More information about the drbd-user
mailing list