[Drbd-dev] DRBD-8 - handling data write errors

Graham, Simon Simon.Graham at stratus.com
Wed Jan 10 21:50:19 CET 2007


A few months ago, I added support to handle write errors for the
meta-data portions of the disk (by forcibly detaching the disk when this
occurs).

Now I'm looking at handling write errors in the data portion and
wondering what the best approach would be - the current behavior is
definitely wrong because we end up with the two plexes having
inconsistent data with no record of the fact in the bitmap or other disk
state. This is true no matter what flavor of error handling is used.

I'm not really sure how to fix this at the moment, but I'm considering
the following:

1. The side that gets the error marks the block as out of sync AND marks
the local disk as inconsistent.
2. Receipt of a NegAck causes the block to be marked as out of sync AND
the peer disk is made inconsistent 
   (not sure if I need this step since step 1 should cause this fact to
be broadcast but it seems safer).

I'm not sure if I need to bump the UUID info as well here to ensure
resync happens correctly in the future?

I also considered forcibly detaching the disk in this case but rejected
that as it makes the rest of the disk unavailable when the error might
only be in one block.

One last problem to be considered is when we get write errors in
different blocks on both disks -- they cant both be inconsistent (how
would we know which way to resync?); I'm not sure what the right answer
is here - anyone have any suggestions?

Thanks,
Simon


More information about the drbd-dev mailing list