[DRBD-user] Putting resource in secondary role fails under heavy load

Vladislav Bogdanov bubble at hoster-ok.com
Wed Jul 22 12:21:17 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

it could be LVM (actually DM) via udev rules (it asynchronously calls 
blkid and other utilities which open block devices). That is done for 
all devices, independently of the 'filter' settings in lvm.conf. I saw 
the same when calling 'drbdadm secondary' right after 'drbd primary' in 
some circumstances (high load is one of them).

The best way I found to overcome that is to put demotion attempts into 
the loop.

Best,
Vladislav

22.07.2015 12:29, Bram Klein Gunnewiek wrote:
> We are using DRBD to provide HA storage for our QEMU instances. We use
> DRBD on top of logical volumes, the QEMU instances use the /dev/drbdX
> devices directly as hard disks. We are running into a strange problem on
> live migrations. Our live migration flow looks like this:
>
> 1) Put drbd resource in dual primary
> 2) Start QEMU live migration
> 3) If migration is done, stop QEMU instance on source node
> 4) Put drbd resource on source node in secondary role (drbdsetup
> secondary /dev/drbd2)
>
> Under normal conditions this works flawlessly. However when we have
> multiple QEMU instances running on the source node that cause heavy (IO)
> loads the last step fails with the error message "/dev/drbd2: State
> change failed: (-12) Device is held open by someone".
>
> We can't figure out what process is holding the device open. The QEMU
> process that was previously using the device is shut-down and not
> running any more. We don't have (known) other processes open so we
> suspect that this is something in DRBD itself. This is only happening
> under heavy loads. We retry the command until it eventually succeeds but
> this can take a couple of minutes (depending on the load of the source
> node). If we shut-down the QEMU instance that causes the heavy load the
> command succeeds right away. 'lsof' doesnt show any pointers eather:
>
> drbd2_sub 4725            root  cwd       DIR 8,2        4096          2 /
> drbd2_sub 4725            root  rtd       DIR 8,2        4096          2 /
> drbd2_sub 4725            root  txt
> unknown                                           /proc/4725/exe
>
> We tried this with the default drbd module shipped with ubuntu 14.04
> (version: 8.4.3 (api:1/proto:86-101), srcversion:
> 6551AD2C98F533733BE558C) and the 8.4.6 release from git (version: 8.4.6
> (api:1/proto:86-101), GIT-hash:
> 833d830e0152d1e457fa7856e71e11248ccf3f70), both versions have the problem.
>
> Is this something we can fix ourself? Is this considered a bug or is it
> expected behaviour and something that won't change?
>




More information about the drbd-user mailing list