[Drbd-dev] DRBD-8: BUG when disk write errors occur during heavyI/O

Fri Jan 12 00:18:44 CET 2007

/ 2007-01-11 15:39:10 -0500
\ Graham, Simon:
> > 
> > Simon, this is an excellent description of what is going on. I also
> > have gone
> > through it as well, and think that moving dec_local() is the correct
> > solution.
> > 
> > Just have just committed it
> > http://lists.linbit.com/pipermail/drbd-cvs/2007-January/001421.html
> > 
> 
> Phil,
> 
> It turns out that this fix, whilst necessary I think, is not sufficient
> -- specifically, it does not cover the case where the local request
> fails and then later on the network request is ACK'd...
> 
> . When the local request fails, we run through
> req_mod(write_completed_with_error) and at the end
>   do the dec_local().
> . If some other thread was attempting to set the local disk Diskless, it
> will now see local_cnt==0
>   and run, releasing the act_log and resync caches.
> . Now the network request is acked and we run
> req_mod(write_acked_by_peer) -- now that both local and
>   remote are done, req_may_be_done does it's thing and ends up calling
> drbd_al_complete_io which crashes
>   because act_log is now NULL.
> 
> Now - one fix would be to check for act_log being NULL in
> drbd_al_complete_io. However, I wonder if it might be more correct to
> delay doing the dec_local() until we are definitely done with the
> request?
> 
> This would mean moving it out of req_mod() completely and instead doing
> it in req_may_be_done() when the request actually is complete on both
> sides... (and if RQ_LOCAL_COMPLETED flag is set I think)

so what you are saying is:
inc_local is done when the request is submitted.
dec_local should be done only when the request is in fact "done".
like below.

| Index: drbd_req.c
| ===================================================================
| --- drbd_req.c	(revision 2685)
| +++ drbd_req.c	(working copy)
| @@ -281,6 +281,7 @@
|  		} else {
|  			drbd_req_free(req);
|  		}
| +		dec_local(mdev);
|  	}
|  	/* else: network part and not DONE yet. that is
|  	 * protocol A or B, barrier ack still pending... */
| @@ -442,7 +443,6 @@
|  		req->rq_state &= ~RQ_LOCAL_PENDING;
|  
|  		_req_may_be_done(req,error);
| -		dec_local(mdev);
|  		break;
|  
|  	case write_completed_with_error:
| @@ -457,7 +457,6 @@
|  		 * FIXME see comment below in read_completed_with_error */
|  		__drbd_chk_io_error(mdev,FALSE);
|  		_req_may_be_done(req,error);
| -		dec_local(mdev);
|  		break;
|  
|  	case read_completed_with_error:
| @@ -469,7 +468,6 @@
|  
|  		bio_put(req->private_bio);
|  		req->private_bio = NULL;
| -		dec_local(mdev);
|  		if (bio_rw(req->master_bio) == READA) {
|  			/* it is legal to fail READA */
|  			_req_may_be_done(req,error);

hm. but that would make the "local" reference count be a network
reference count, too, it may even keep the reference for a long time
if you configure it to freeze io on connection loss...

not good.

so maybe we rather want to brace drbd_al_complete_io
with inc_local dec_local ?

currently we only call it in three places,
early bail out on conflicting writes in drbd_make_request
[which holds local ref], drbd_endio_write_sec when the
EE_CALL_AL_COMPLETE_IO flag is set [also holds a local ref]
and this code in question.

so how about this (sorry for the long lines):

Index: drbd_req.c
===================================================================

--- drbd_req.c	(revision 2685)
+++ drbd_req.c	(working copy)
@@ -257,7 +257,13 @@
 			 * we would forget to resync the corresponding extent.
 			 */
 			if (s & RQ_LOCAL_MASK) {
-				drbd_al_complete_io(mdev, req->sector);
+				if (inc_local_if_state(mdev,Failed)) {
+					drbd_al_complete_io(mdev, req->sector);
+					dec_local(mdev);
+				} else {
+					WARN("Should have called drbd_al_complete_io(, %llu), "
+					     "but my Disk seems to have faile :(\n", req->sector);
+				}
 			}
 		}
 

-- 
: Lars Ellenberg                            Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :