[DRBD-user] Testing local-io-error handler -- blkid hangs and ties up drbd device

Lars Ellenberg lars.ellenberg at linbit.com
Thu Apr 12 14:36:24 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Apr 12, 2012 at 08:21:38AM -0400, Chris Dickson wrote:
> Hello,
> 
> Someone please chime in if my method of simulating io-errors is too
> complicated and there is an easier way.
> 
> I've been trying to simulate IO errors with drbd 8.4.1 by creating a device
> mapper with dmsetup. I create the device mapper from a 1GB LVM volume that
> was initialized with internal meta data and synced:
> 
> dmsetup create bad_disk << EOF
> 0 1000 linear /dev/vg0/vol575 0
> 1000 1 error
> 1001 2096151 linear /dev/vg0/vol575 1001
> EOF
> 
> I can now successfully start the drbd device backed by the bad_disk device
> mapper and it shows Connected and UpToDate. When I change its role to
> Primary, dmesg shows my IO error that I set at block 1000 and my custom
> local-io-error script is called successfully. The drbd device is also set
> to a disk state of Diskless.
> 
> It's at this moment that all other operations attempted on the device will
> hang. Somewhere during or shortly after the io-error handler something ties
> up the device and nothing I can do can free it... the first dmesg problem I
> can see is this:
> 
> INFO: task blkid:1945 blocked for more than 120 seconds.
> 
> It might not be drbd, LVM is involved and also my manually created device
> mapper on top of it. I wanted to throw this out there if anyone has tried
> the same thing and encountered the error or if I'm doing something overtly
> wrong.

What is your io error handler trying to do?

It is run synchronously from a drbd kernel thread which
also is (may be) necessary to process further IO requests on that drbd.

If you trigger synchronous IO on that drbd from the handler,
you deadlock on yourself.

You may not even be aware of it: if you do any lvm commands,
they will scan all devices (not filtered), and by doing so,
may trigger IO there.

If you do not expect to use DRBD as a PV,
please reject drbd from your filter in lvm.conf.

If that does not help already, and you try to do anything "interesting"
from that handler, consider backgrounding it.

Better yet, tell us what you actually want to achieve,
any we may be able to suggest a solution.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com



More information about the drbd-user mailing list