[DRBD-user] Putting resource in secondary role fails under heavy load

Bram Klein Gunnewiek bram at shockmedia.nl
Wed Jul 22 11:29:07 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


We are using DRBD to provide HA storage for our QEMU instances. We use 
DRBD on top of logical volumes, the QEMU instances use the /dev/drbdX 
devices directly as hard disks. We are running into a strange problem on 
live migrations. Our live migration flow looks like this:

1) Put drbd resource in dual primary
2) Start QEMU live migration
3) If migration is done, stop QEMU instance on source node
4) Put drbd resource on source node in secondary role (drbdsetup 
secondary /dev/drbd2)

Under normal conditions this works flawlessly. However when we have 
multiple QEMU instances running on the source node that cause heavy (IO) 
loads the last step fails with the error message "/dev/drbd2: State 
change failed: (-12) Device is held open by someone".

We can't figure out what process is holding the device open. The QEMU 
process that was previously using the device is shut-down and not 
running any more. We don't have (known) other processes open so we 
suspect that this is something in DRBD itself. This is only happening 
under heavy loads. We retry the command until it eventually succeeds but 
this can take a couple of minutes (depending on the load of the source 
node). If we shut-down the QEMU instance that causes the heavy load the 
command succeeds right away. 'lsof' doesnt show any pointers eather:

drbd2_sub 4725            root  cwd       DIR 8,2        4096          2 /
drbd2_sub 4725            root  rtd       DIR 8,2        4096          2 /
drbd2_sub 4725            root  txt 
unknown                                           /proc/4725/exe

We tried this with the default drbd module shipped with ubuntu 14.04 
(version: 8.4.3 (api:1/proto:86-101), srcversion: 
6551AD2C98F533733BE558C) and the 8.4.6 release from git (version: 8.4.6 
(api:1/proto:86-101), GIT-hash: 
833d830e0152d1e457fa7856e71e11248ccf3f70), both versions have the problem.

Is this something we can fix ourself? Is this considered a bug or is it 
expected behaviour and something that won't change?

-- 
Met vriendelijke groet / Kind regards,
Bram Klein Gunnewiek | Shock Media B.V.

Tel: +31 (0)546 - 714360
Fax: +31 (0)546 - 714361
Web: https://www.shockmedia.nl/




More information about the drbd-user mailing list