[DRBD-user] drbd9.2 resync stuck with drbd_set_in_sync: sector=<...>s size=<...> nonsense!

Nils Juergens nils at muon.de
Tue Oct 25 15:49:02 CEST 2022


Dear DRBD-users,

we are currently performing an upgrade from proxmox ve-6 to ve-7 on a 
three-node linstor/drbd cluster. (Only two nodes are storage+compute 
nodes / satellites, third is linstor-controller+quorum node)

This is a testing environment that we built in preparation for the 
upgrade of the live cluster.

Before starting the upgrade we were on linstor 1.11, drbd-dkms 9.0.27 
and pve 6.3. Our upgrade route was to first upgrade linstor to 1.20, 
then upgrade all nodes to pve 6.4 and drbd-9.2 (9.0.27-1 -> 9.2.0-1)

After a fresh boot all nodes we were in a good state. Healthy cluster, 
pve6to7 happy, drbd in sync and all packages up-to-date.

We then performed the upgrade of the first node to pve-7 which seemed to 
go well and rebooted the first node into pve-7.2-11) As we have three 
active VMs with three disk resources this triggered a drbd resync.

Two resources came out fine:

drbd1000 Testserver1: Resync done (total 2 sec; paused 0 sec; 104448 K/sec)
drbd1002 Testserver1: Resync done (total 55 sec; paused 0 sec; 92120 K/sec)

The third resource however did sync about 65% of the outdated data and 
then stalled (no more sync traffic, no progress in drbdmon)

The kernel message that seems to be relevant here is this:

drbd vm-101-disk-1/0 drbd1001: drbd_set_in_sync: sector=73703424s 
size=134479872 nonsense!

More kernel logs from the pve7 node can be found here
https://pastebin.com/aGjy7Sgp

So far we have tried to reboot the pve7 node, but it will always get 
stuck in inconsistent/synctarget (no percentage of progress shown) and 
print the kernel error message "drbd_set_in_sync: sector=73703424s 
size=134479872 nonsense".

The linstor resources are backed by lvm_thin which is backed by a 
MegaRAID in RAID1 with SSD drives.

I don't know if this is relevant, but the VM in question has at some 
point in its lifetime been rolled back to a snapshot. (All snapshots 
have been removed prior to the upgrades).

At that time the rollback did work OK, but we noticed a huge increase of 
the allocated space on the backing device (IIRC it was equal to the 
virtual disk size). We have set "discard=on" in proxmox and did a 
"fstrim" in the VM, which cut down the space usage, but it's not equal 
on both nodes):

root at Testserver3:~# linstor resource list-volumes
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node        ┊ Resource      ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ 
DeviceName    ┊ Allocated ┊ InUse  ┊        State ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ Testserver1 ┊ vm-100-disk-1 ┊ ssd_thin    ┊     0 ┊    1000 ┊ 
/dev/drbd1000 ┊  2.28 GiB ┊ InUse  ┊     UpToDate ┊
┊ Testserver2 ┊ vm-100-disk-1 ┊ ssd_thin    ┊     0 ┊    1000 ┊ 
/dev/drbd1000 ┊  2.50 GiB ┊ Unused ┊     UpToDate ┊
┊ Testserver1 ┊ vm-101-disk-1 ┊ ssd_thin    ┊     0 ┊    1001 ┊ 
/dev/drbd1001 ┊ 35.38 GiB ┊ InUse  ┊     UpToDate ┊
┊ Testserver2 ┊ vm-101-disk-1 ┊ ssd_thin    ┊     0 ┊    1001 ┊ 
/dev/drbd1001 ┊ 31.05 GiB ┊ Unused ┊ Inconsistent ┊
┊ Testserver1 ┊ vm-102-disk-1 ┊ ssd_thin    ┊     0 ┊    1002 ┊ 
/dev/drbd1002 ┊  7.04 GiB ┊ InUse  ┊     UpToDate ┊
┊ Testserver2 ┊ vm-102-disk-1 ┊ ssd_thin    ┊     0 ┊    1002 ┊ 
/dev/drbd1002 ┊  7.04 GiB ┊ Unused ┊     UpToDate ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The linstor-created resource looks like this:
https://pastebin.com/syLADBdC

relevant version numbers:

drbd-dkms: 9.2.0-1
linstor-(controller|satellite): 1.20.0-1
linstor-proxmox: 6.1.0-1
proxmox-ve versions: 6.4-1 (two nodes) and 7.2-1 (one node)
kernel: 5.4.203-1-pve (two nodes) and 5.15.64-1-pve (one node)

Any insight on this would be most welcome. I'll provide more details if 
you feel something is missing.

thanks and kind regards,
Nils


More information about the drbd-user mailing list