[DRBD-user] recovering from "Local IO failed. Detaching..."

Lars Ellenberg lars.ellenberg at linbit.com
Thu Sep 10 19:11:02 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Sep 10, 2009 at 06:47:24PM +0200, Lars Ellenberg wrote:
> then something is wrong with your hardware, or your setup.
> or your kernel.
> or, of course, maybe only something is wrong with drbd (in your setup on
> your hardware ;-])
> 
> care to try
>         no-disk-flushes;
>         no-md-flushes;
>         no-disk-barrier;
> ?

hmmm.
"interessting"

I think only adding "no-md-flushes" should help.
if that does,
please use drbd-8.3.3rc2,
then add below patch,
and __leave off__ the no-md-flushes option again.
so we can confirm that the fallback and retry without barriers
does finally work as expected.

thanks.

usually, non-working barriers are detected early by some other means,
but if the timing on your box is unlucky, drbd may end up in this
function before the other code path has determined that barriers don't work.
and the fallback error path in there apparently has been broken for a
long time :(

diff --git a/drbd/drbd_actlog.c b/drbd/drbd_actlog.c
index 708b689..cb2aa43 100644
--- a/drbd/drbd_actlog.c
+++ b/drbd/drbd_actlog.c
@@ -80,8 +80,6 @@ STATIC int _drbd_md_sync_page_io(struct drbd_conf *mdev,
 	int ok;
 
 	md_io.mdev = mdev;
-	init_completion(&md_io.event);
-	md_io.error = 0;
 
 	if (rw == WRITE && !test_bit(MD_NO_BARRIER, &mdev->flags))
 		rw |= (1<<BIO_RW_BARRIER);
@@ -107,6 +105,10 @@ STATIC int _drbd_md_sync_page_io(struct drbd_conf *mdev,
 
 	trace_drbd_bio(mdev, "Md", bio, 0, NULL);
 
+	/* on retry, this is re-init */
+	init_completion(&md_io.event);
+	md_io.error = 0;
+
 	if (FAULT_ACTIVE(mdev, (rw & WRITE) ? DRBD_FAULT_MD_WR : DRBD_FAULT_MD_RD))
 		bio_endio(bio, -EIO);
 	else


> if that does not help:
> 8.3.2?
> 8.3.3rc2?
> various other drbd versions? kernels?
> different lower level device? (not cciss? other cciss drive/partition?)
> etc.
> 
> if all else fails: contact linbit, we do sell support.
> 
> we even sell "drbd health checks", which somewhat boils down to a
> one-time engagement - though for those you may need to wait for a
> suitable (for linbit) time-slot.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list