[DRBD-user] DRBD 8.2.5 on LVM device: "local disk flush failed with status -5"

Anders Henke anders.henke at 1und1.de
Fri Feb 22 13:20:45 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I'm using Kernel 2.6.24.2 with DRBD 8.2.5 
(9faf052fdae5ef0c61b4d03890e2d2eab550610c) on top of an LVM2 device (LV):

  device    /dev/drbd0;
  disk      /dev/vg/drbd;
  meta-disk internal;

... which leads to "flooding" the kernel logs on the secondary:

kernel: [  167.434201] drbd0: local disk flush failed with status -5
kernel: [  167.964981] drbd0: local disk flush failed with status -5
kernel: [  168.250102] drbd0: local disk flush failed with status -5
kernel: [  168.345999] drbd0: local disk flush failed with status -5
kernel: [  168.522441] drbd0: local disk flush failed with status -5
kernel: [  168.666767] drbd0: local disk flush failed with status -5
kernel: [  168.731338] drbd0: local disk flush failed with status -5

After moving the lower device from /dev/vg/drbd to /dev/sda5,
the message completely disappeared.

DRBD tries to flush the metadata with write barriers enabled, but an LVM-LV 
doesn't support write barriers - which gives this message. However, DRBD
does correctly check for EOPNOTSUPP to detect this situation:

drbd-8.2.5/drbd/drbd_receiver.c:
[...]
        /* BarrierAck may imply that the corresponding extent is dropped
 * from
         * the activity log, which means it would not be resynced in
         * case the
         * Primary crashes now.
         * Just waiting for write_completion is not enough,
         * better flush to make sure it is all on stable storage. */
        if (!test_bit(LL_DEV_NO_FLUSH, &mdev->flags) && inc_local(mdev))
{
                rv = blkdev_issue_flush(mdev->bc->backing_bdev, NULL);
                dec_local(mdev);
                if (rv == -EOPNOTSUPP) /* don't try again */
                        set_bit(LL_DEV_NO_FLUSH, &mdev->flags);
                if (rv)
                        ERR("local disk flush failed with status
%d\n",rv);
        }
[...]

XFS-users are accustomed of non-barrier-enabled devices by the 
message "Disabling barriers, not supported by the underlying device"
when trying to mount such a device:

[ 2724.092649] Filesystem "drbd0": Disabling barriers, not supported by the underlying device

However, for XFS, this message usually only occurs once (during mount) and if 
you don't care about the write barriers, you can also choose to disable write 
barrier support at all by supplying the mount option "nobarrier" to xfs (which
is also recommended for e.g RAID-devices using battery-backed up write caches).

In my setup, the underlying device (/dev/sda) is an 3w-9xxx-driven RAID
controller who does support write barriers (at least XFS uses write
barriers on my box and doesn't complain when mounting a filesystem).

Right now, device mapper devices (like dmraid or lvm2) and multipath-enabled
devices don't support write barriers. Md devices only do support write barriers
when RAID1 (mirroring) is being used and all underlying devices also do have 
write barrier support. Some other drivers like ide-disk don't seem to
support write barriers, too.

According to dm_request() from linux-2.6.24.2/drivers/md/dm.c, barrier 
requests aren't being forwarded and should return an EOPNOTSUPP (which
DRBD would catch and consequently set LL_DEV_NO_FLUSH):

static int dm_request(struct request_queue *q, struct bio *bio)
{
        int r = -EIO;
        int rw = bio_data_dir(bio);
        struct mapped_device *md = q->queuedata;

        /*
         * There is no use in forwarding any barrier request since we
         * can't
         * guarantee it is (or can be) handled by the targets correctly.
         */
        if (unlikely(bio_barrier(bio))) {
                bio_endio(bio, -EOPNOTSUPP);
                return 0;
        }
[...]
(Notice the -EIO default value: it's the 'status -5' from DRBD's message)

I'm still a little puzzled why XFS does see that the lvm device isn't capable 
of barriers (checking or QUEUE_ORDERED_NONE-queing) and DRBD (correctly
checking for EOPNOTSUPP upon blkdev_issue_flush()) doesn't detect this also.

My suggestion is to add a similar check for barrier support to DRBD like 
the one the XFS guys do use and set LL_DEV_NO_FLUSH (and maybe also 
MD_NO_BARRIER) accordingly; or check why DRBD doesn't catch an
EOPNOTSUPP to disable the barrier flushes.



Anders
-- 
1&1 Internet AG              System Architect
Brauerstrasse 48             v://49.721.91374.50
D-76135 Karlsruhe            f://49.721.91374.225

Amtsgericht Montabaur HRB 6484
Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger,
Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren



More information about the drbd-user mailing list