[Drbd-dev] Panic in _drbd_send_page() again.

Lars Ellenberg lars.ellenberg at linbit.com
Fri May 4 18:00:24 CEST 2007


On Fri, May 04, 2007 at 10:37:32AM -0400, Graham, Simon wrote:
> > 
> > all pieces of information we have about this seem to indicate that the
> > xen block device code builds up its own bios, tries to be smart
> > there...
> > 
> > and possibly outsmarts itself.
> > 
> 
> It's of course possible and we're looking at it but it's actually a
> pretty standard piece of code that builds the bio and I don't see any
> trickiness in it.
> 
> I also have a theory on the cause of this -- it's another tiny timing
> window I think similar to ones we fixed earlier where the ack for a
> packet would be received whilst we were still processing inside
> drbd_send_zc_bio -- here's my hypothesis:
> 
> 1. We're in drbd_send_zc_bio, we've sent the last segment but have not
> yet looped back to
>    the top of the loop to __bio_for_each_segment.
> 2. Ack arrives for last segment - clears RQ_NET_PENDING
> 3. Local IO completes, clears RQ_LOCAL_PENDING and calls req_may_be_done
> ==> completes bio
>    because both RQ_NET_PENDING and RQ_LOCAL_PENDING are clear.
> 
> NOW we come back to the thread running drbd_send_zc_bio and the bio has
> been freed... KABLOOIE!
> 
> I realize this is a very small window but, as the saying goes, where
> there's a window there's a bug...

hm. this does make sense, actually :)

> Seems to me that req_may_be_done should not complete the master bio
> unless RQ_NET_SENT is set... maybe the completed_ok: case in req_mod
> should test this similar to what is done in recv_acked_by_peer:...
> although it seems to me that this test should actually be buried in
> req_may_be_done since if this flag is not set, the request is not done!


so what you suggest is:

Index: drbd_req.c
===================================================================
--- drbd_req.c	(revision 2864)
+++ drbd_req.c	(working copy)
@@ -255,6 +255,16 @@
 	print_rq_state(req, "_req_may_be_done");
 	MUST_HOLD(&mdev->req_lock)
 
+	/* we must not complete the master bio, while it is
+	 *	still being processed by _drbd_send_zc_bio (drbd_send_dblock)
+	 *	not yet acknowledged by the peer
+	 *	not yet completed by the local io subsystem
+	 * these flags may get cleared in any order by
+	 *	the worker,
+	 *	the receiver,
+	 *	the bio_endio completion callbacks.
+	 */
+	if (s & RQ_NET_QUEUED) return;
 	if (s & RQ_NET_PENDING) return;
 	if (s & RQ_LOCAL_PENDING) return;
 

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :



More information about the drbd-dev mailing list