[Drbd-dev] Panic in _drbd_send_page() again.

Graham, Simon Simon.Graham at stratus.com
Fri May 4 16:37:32 CEST 2007


> 
> all pieces of information we have about this seem to indicate that the
> xen block device code builds up its own bios, tries to be smart
> there...
> 
> and possibly outsmarts itself.
> 

It's of course possible and we're looking at it but it's actually a
pretty standard piece of code that builds the bio and I don't see any
trickiness in it.

I also have a theory on the cause of this -- it's another tiny timing
window I think similar to ones we fixed earlier where the ack for a
packet would be received whilst we were still processing inside
drbd_send_zc_bio -- here's my hypothesis:

1. We're in drbd_send_zc_bio, we've sent the last segment but have not
yet looped back to
   the top of the loop to __bio_for_each_segment.
2. Ack arrives for last segment - clears RQ_NET_PENDING
3. Local IO completes, clears RQ_LOCAL_PENDING and calls req_may_be_done
==> completes bio
   because both RQ_NET_PENDING and RQ_LOCAL_PENDING are clear.

NOW we come back to the thread running drbd_send_zc_bio and the bio has
been freed... KABLOOIE!

I realize this is a very small window but, as the saying goes, where
there's a window there's a bug...

Seems to me that req_may_be_done should not complete the master bio
unless RQ_NET_SENT is set... maybe the completed_ok: case in req_mod
should test this similar to what is done in recv_acked_by_peer:...
although it seems to me that this test should actually be buried in
req_may_be_done since if this flag is not set, the request is not done!

Simon


More information about the drbd-dev mailing list