Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, May 18, 2007 at 11:56:33AM -0400, Ryan Steele wrote: > I saw someone else post something similar to this a few weeks ago, but > didn't see any response to it. I've just set up DRBD 0.7.23 with > Heartbeat2 on two future database server. However, DRBD seems to have > corrupted my multi-disk RAID1. I booted a Knoppix CD on the affected > machines, removed the DRBD rc.d scripts, and rebooted and things were > fine. To verify, I ran update-rc.d to recreate the symbolic links, and > rebooted again to find that it again would not boot. Moreover, even > removing the rc.d links did not help - the array is, I fear, irreparably > damaged. > > Is there any acknowledgement of this bug, or are there any suggestions > as to how one might go about fixing it? I can't even boot into the > machine to run mdadm and repair the array, though maybe I can do that > from the Knoppix CD... I think... the issue is that md raid5 sometimes for certain [1] kernel versions may fail a READA request, without actually returning an error, but clearing the UPTODATE flag. [1] (no longer in current git. whether for all prior 2.6 kernel, or only for a few of them, I did not verify. yet.) so if the file system runs directly on it, it sees the !uptodate... but any virtual block device with an own bio_endio function, that relies on the lower layer to *report an error* if something goes wrong, is deceived, and random (read: corrupt) data finds its way into the buffer/page cache, and into userland. I think that the "real fix" belongs in the kernel drivers/md/raid5.c something like diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 4500660..5e7611d 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -2725,7 +2725,9 @@ static int make_request(request_queue_t if ( rw == WRITE ) md_write_end(mddev); bi->bi_size = 0; - bi->bi_end_io(bi, bytes, 0); + bi->bi_end_io(bi, bytes, + test_bit(BIO_UPTODATE, &bi->bi_flags) + ? 0 : -EIO); } return 0; } but this may not be enough, there may be more places. we possibly can work around in drbd by * submitting all READA as READ (like 2.4 kernel did) * failing all READA on the spot both not very attractive; but there is one more: * double check for plausibility in our endio function, like this Index: drbd/drbd_worker.c =================================================================== --- drbd/drbd_worker.c (revision 2898) +++ drbd/drbd_worker.c (working copy) @@ -311,7 +311,13 @@ /* READAs may fail. * upper layers need to be able to handle that themselves */ - if (bio_rw(bio) == READA) goto pass_on; + if (bio_rw(bio) == READA) { + /* md RAID5 is misbehaving, + * clears the up-to-date flag, but does not return an error! */ + ERR_IF(!bio_flagged(bio, BIO_UPTODATE) && !error) + error = -EIO; + goto pass_on; + } if (error) { drbd_chk_io_error(mdev,error); // handle panic and detach. if(mdev->on_io_error == PassOn) goto pass_on; if you can confirm that this fixes it for you, (you may need a "echo 3 > /proc/sys/vm/drop_caches", or even a reboot, to get a clean state first; cache will be poisoned with random data; maybe you even need to recreate/rebuild the raid5, or fsck/mkfs...) you can change the "ERR_IF" to a plain if, otherwise it spams your kernel logs with failed asserts as long as you use this (arguably broken version of the) raid5 driver. cheers, -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.