[DRBD-user] Diskless resource stops working alternatively if attached to a qemu VM

Chirag Anand anand.chirag at gmail.com
Mon Mar 1 09:38:14 CET 2021


Hello list, we have encountered a strange (possible) bug in DRBD while
using a resource with qemu.

When we create a persistent raw disk on OpenNebula using Linstor addon and
attach it to a running VM everything works fine until you detach it from
the VM and attach it again. The second time you attach the resource the
diskless resource stops working. The resource exists. For example, `ls
/dev/drbd/by-res/OpenNebula-Image-575/0` works but `dd` doesn't copy any
bytes:

# dd if=/dev/drbd/by-res/OpenNebula-Image-575/0 of=/dev/null bs=1M count=10
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000108231 s, 0.0 kB/s

Strangely, this problem occurs alternatively, that is, the first time you
attach a raw disk to a running VM, things work. The second time (detach and
attach) they don't, the third time they work again, and so on.

Setup:
1. A node running OpenNebula and Linstor server.
2. 3 x Linstor satellites (disk-ful) nodes
3. 1 x VM (diskless-only) node. (referred to as 'VMHost' below)
4. Place count: 3
5. DRBD runs over LVM thinpool on the Linstor satellites.

Steps to reproduce are given below:

**Attempt 1 (works)**
1. `linstor rd c test-disk2`
2. `linstor vd c test-disk2 10G`
3. `linstor r c linstor-drbd1 linstor-drbd2 linstor-drbd3 test-disk2
--storage-pool=data`
4. `linstor r c --drbd-diskless VMHost test-disk2 -s DfltDisklessStorPool`
5. Attach the disk to qemu VM: `ssh VMHost 'virsh attach-disk 15
/dev/drbd/by-res/test-disk2/0 vdb'`
6. Test the resource:
    # ssh VMHost 'dd if=/dev/drbd/by-res/test-disk2/0 of=/dev/null bs=1M
count=10'`
    10+0 records in
    10+0 records out
    10485760 bytes (10 MB) copied, 0.317778 s, 33.0 MB/s

7. Detach from VM: `ssh VMHost 'virsh detach-disk 15 vdb'`
8. Delete diskless resource: `linstor r d VMHost test-disk2`

**Attempt 2 (doesn't work)**
1. Create diskless resource: `linstor r c --drbd-diskless VMHost test-disk2
-s DfltDisklessStorPool`
2. No need to attach to qemu, the diskless resource just doesn't work.
3. Test the resource:
    # ssh VMHost 'dd if=/dev/drbd/by-res/test-disk2/0 of=/dev/null bs=1M
count=10'
    0+0 records in
    0+0 records out
    0 bytes (0 B) copied, 0.000108231 s, 0.0 kB/s

4. Delete diskless resource: `linstor r d VMHost test-disk2`

**Attempt 3 (works)**
1. `linstor r c --drbd-diskless VMHost test-disk2 -s DfltDisklessStorPool`
2. No need to attach to qemu, the resource will work this time.
3. # ssh VMHost 'dd if=/dev/drbd/by-res/test-disk2/0 of=/dev/null bs=1M
count=10'
    10+0 records in
    10+0 records out
    10485760 bytes (10 MB) copied, 0.0338463 s, 310 MB/s

We have been able to track the problem to DRBD and qemu. If the resource
was attached to qemu at some point then the resource shows this behaviour
otherwise it works just fine, subsequent deletion and then creation of
diskless resource results in bytes getting read/written to the resource.

We can see that manually running `drdbadm connect` and `drbdadm disconnect`
on the resource on VMHost makes the resource work (in even-numbered
attempts), hence, no need to delete and re-create the diskless resource
using Linstor, at least, for reproducing this bug.

We are able to reproduce the bug on two different setups with different
hardware but same versions of software.

Version numbers being used:

# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ fa9b9d3823b6e1792919e711fcf6164cac629290\
build\ by\ mockbuild@\,\ 2021-01-09\ 17:01:51
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090019
DRBD_KERNEL_VERSION=9.0.25
DRBDADM_VERSION_CODE=0x090f01
DRBDADM_VERSION=9.15.1

# /usr/libexec/qemu-kvm --version
QEMU emulator version 2.12.0 (qemu-kvm-ev-2.12.0-44.1.el7_8.1)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

# virsh --version
5.0.0

Hope I am able to detail out the issue. Please let me know if I can provide
additional information (strace, logs, etc.) I will be happy to.

PS: The bug is reproducible even if you set the resource as primary on the
diskless node before attaching it to qemu or reading any bytes from it.

Thank you,
Chirag Anand
http://atvariance.in/chiraganand
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20210301/f35470b6/attachment.htm>


More information about the drbd-user mailing list