[Drbd-dev] DRBD-8 - handling data write errors

Graham, Simon Simon.Graham at stratus.com
Thu Jan 11 05:00:53 CET 2007


> I'm not really sure how to fix this at the moment, but I'm considering
> the following:
> 
> 1. The side that gets the error marks the block as out of sync AND
> marks
> the local disk as inconsistent.
> 2. Receipt of a NegAck causes the block to be marked as out of sync
AND
> the peer disk is made inconsistent
>    (not sure if I need this step since step 1 should cause this fact
to
> be broadcast but it seems safer).
> 

So - I've found there is some existing code in place already - for
example, set_out_of_sync
is done in req_may_be_done if either local or remote fails, however,
this is not sufficient
for a couple of reasons:

1. Need to get the failing disk set Inconsistent so that following reads
do not attempt
   to use the local block.

2. It seems to me that the current code doesn't really handle
set_out_of_sync being set
   whilst resync is in progress (i.e. if a write error occurs on an
application write
   during resync).

I've also coded something that sends the Inconsistent state to the other
side, which will trigger resync immediately - perhaps I shouldn't do
this??? Not really going to be able to fix this problem (although it
might be worth trying if the error was transient)...

I wonder if we shouldn't instead simply always detach on error (i.e.
stop using PassOn at all) to get the best behavior... this would
certainly make things simpler (and we could remove the forcible detach
on meta-data error that I added earlier -- if you want to be able to
handle errors then never use PassOn!

Simon


More information about the drbd-dev mailing list