[Drbd-dev] Re: drbd_panic() in drbd_receiver.c

Graham, Simon Simon.Graham at stratus.com
Thu Jul 6 22:06:25 CEST 2006


I agree completely with the idea of having a local failure cause a retry to the peer -- that is actually the subject of the second phase of the work I want to do to handle disk errors -- if a read fails locally because of a bad block, then retry the read remotely AND when the data comes back, actually WRITE it locally as well as returning the result - this will nicely fix the error in a lot of cases as the disk will remap the block.

Of course, there are some tricky timing windows to watch out for here (such as the app performing an explicit write to the same block in the meantime).

/simgr

-----Original Message-----
From: drbd-dev-bounces at linbit.com [mailto:drbd-dev-bounces at linbit.com] On Behalf Of Philipp Reisner
Sent: Thursday, July 06, 2006 10:39 AM
To: drbd-dev at linbit.com
Subject: Re: [Drbd-dev] Re: drbd_panic() in drbd_receiver.c

Am Mittwoch, 5. Juli 2006 19:49 schrieb Graham, Simon:
> Thanks again for the review; I understand and agree with your comments on
> doing this in 8 versus 7 - anything I do on 7 will just be for prototyping
> (because as I said it's much easier for me to test with 7 right now).
>
> I will take the approach of continuing with as much as possible of the
> resync (although as I suspected, it's MUCH easier to simply abort the
> resync as soon as any error is reported).

Ok. 

> One question -- you said that DRBD disconnects from the disk on the first
> (local) error -- I think this is only true if you set on-io-error to
> "Detach" -- we actually run with the default value of PassOn in which case
> drbd_io_error does nothing; I think this is actually the best way to run
> since it keeps the disk accessible for those blocks that are OK and returns
> errors for those that are not.

Hmmm. The current semantic is:

on-io-error = passOn

If there is a local read error, DRBD will pass the IO error on to the 
filesystem without retrying on the peer node.

Maybe we should have one more on-io-error hander. One that 
retrtries on the peer node, if there is one, and if there is no peer
it returns the IO error, without detaching from the disk.

Good point. 

I added that thought to my ROADMAP file...

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :
_______________________________________________
drbd-dev mailing list
drbd-dev at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-dev


More information about the drbd-dev mailing list