[Drbd-dev] DRBD-8: BUG when disk write errors occur during
Simon.Graham at stratus.com
Thu Jan 11 21:39:10 CET 2007
> Simon, this is an excellent description of what is going on. I also
> have gone
> through it as well, and think that moving dec_local() is the correct
> Just have just committed it
It turns out that this fix, whilst necessary I think, is not sufficient
-- specifically, it does not cover the case where the local request
fails and then later on the network request is ACK'd...
. When the local request fails, we run through
req_mod(write_completed_with_error) and at the end
do the dec_local().
. If some other thread was attempting to set the local disk Diskless, it
will now see local_cnt==0
and run, releasing the act_log and resync caches.
. Now the network request is acked and we run
req_mod(write_acked_by_peer) -- now that both local and
remote are done, req_may_be_done does it's thing and ends up calling
drbd_al_complete_io which crashes
because act_log is now NULL.
Now - one fix would be to check for act_log being NULL in
drbd_al_complete_io. However, I wonder if it might be more correct to
delay doing the dec_local() until we are definitely done with the
This would mean moving it out of req_mod() completely and instead doing
it in req_may_be_done() when the request actually is complete on both
sides... (and if RQ_LOCAL_COMPLETED flag is set I think)
More information about the drbd-dev