[DRBD-user] DRBD 0.7.23 and MD corruption

Mon May 21 17:20:40 CEST 2007

On Fri, May 18, 2007 at 11:56:33AM -0400, Ryan Steele wrote:
> I saw someone else post something similar to this a few weeks ago, but 
> didn't see any response to it.  I've just set up DRBD 0.7.23 with 
> Heartbeat2 on two future database server.  However, DRBD seems to have 
> corrupted my multi-disk RAID1.  I booted a Knoppix CD on the affected 
> machines, removed the DRBD rc.d scripts, and rebooted and things were 
> fine.  To verify, I ran update-rc.d to recreate the symbolic links, and 
> rebooted again to find that it again would not boot.  Moreover, even 
> removing the rc.d links did not help - the array is, I fear, irreparably 
> damaged.
> 
> Is there any acknowledgement of this bug, or are there any suggestions 
> as to how one might go about fixing it?  I can't even boot into the 
> machine to run mdadm and repair the array, though maybe I can do that 
> from the Knoppix CD...

I think...
the issue is that md raid5 sometimes for certain [1] kernel versions
may fail a READA request, without actually returning an error,
but clearing the UPTODATE flag.

[1] (no longer in current git. whether for all prior 2.6 kernel,
    or only for a few of them, I did not verify. yet.)

so if the file system runs directly on it, it sees the !uptodate...
but any virtual block device with an own bio_endio function,
that relies on the lower layer to *report an error* if something goes
wrong, is deceived, and random (read: corrupt) data finds its way into
the buffer/page cache, and into userland.

I think that the "real fix" belongs in the kernel drivers/md/raid5.c
something like

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4500660..5e7611d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2725,7 +2725,9 @@ static int make_request(request_queue_t 
 		if ( rw == WRITE )
 			md_write_end(mddev);
 		bi->bi_size = 0;
-		bi->bi_end_io(bi, bytes, 0);
+		bi->bi_end_io(bi, bytes,
+			      test_bit(BIO_UPTODATE, &bi->bi_flags)
+			        ? 0 : -EIO);
 	}
 	return 0;
 }

but this may not be enough, there may be more places.

we possibly can work around in drbd by
 * submitting all READA as READ (like 2.4 kernel did)
 * failing all READA on the spot
both not very attractive; but there is one more:
 * double check for plausibility in our endio function, like this

Index: drbd/drbd_worker.c
===================================================================
--- drbd/drbd_worker.c	(revision 2898)
+++ drbd/drbd_worker.c	(working copy)
@@ -311,7 +311,13 @@
 
 	/* READAs may fail.
 	 * upper layers need to be able to handle that themselves */
-	if (bio_rw(bio) == READA) goto pass_on;
+	if (bio_rw(bio) == READA) {
+		/* md RAID5 is misbehaving,
+		 * clears the up-to-date flag, but does not return an error! */
+		ERR_IF(!bio_flagged(bio, BIO_UPTODATE) && !error)
+			error = -EIO;
+		goto pass_on;
+	}
 	if (error) {
 		drbd_chk_io_error(mdev,error); // handle panic and detach.
 		if(mdev->on_io_error == PassOn) goto pass_on;


if you can confirm that this fixes it for you,
(you may need a "echo 3 > /proc/sys/vm/drop_caches",
 or even a reboot, to get a clean state first; cache will be poisoned
 with random data; maybe you even need to recreate/rebuild the raid5,
 or fsck/mkfs...)

you can change the "ERR_IF" to a plain if, otherwise it spams your
kernel logs with failed asserts as long as you use this
(arguably broken version of the) raid5 driver.

cheers,

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.