[Drbd-dev] Panic in _drbd_send_page() again.
Graham, Simon
Simon.Graham at stratus.com
Fri May 4 16:37:32 CEST 2007
>
> all pieces of information we have about this seem to indicate that the
> xen block device code builds up its own bios, tries to be smart
> there...
>
> and possibly outsmarts itself.
>
It's of course possible and we're looking at it but it's actually a
pretty standard piece of code that builds the bio and I don't see any
trickiness in it.
I also have a theory on the cause of this -- it's another tiny timing
window I think similar to ones we fixed earlier where the ack for a
packet would be received whilst we were still processing inside
drbd_send_zc_bio -- here's my hypothesis:
1. We're in drbd_send_zc_bio, we've sent the last segment but have not
yet looped back to
the top of the loop to __bio_for_each_segment.
2. Ack arrives for last segment - clears RQ_NET_PENDING
3. Local IO completes, clears RQ_LOCAL_PENDING and calls req_may_be_done
==> completes bio
because both RQ_NET_PENDING and RQ_LOCAL_PENDING are clear.
NOW we come back to the thread running drbd_send_zc_bio and the bio has
been freed... KABLOOIE!
I realize this is a very small window but, as the saying goes, where
there's a window there's a bug...
Seems to me that req_may_be_done should not complete the master bio
unless RQ_NET_SENT is set... maybe the completed_ok: case in req_mod
should test this similar to what is done in recv_acked_by_peer:...
although it seems to me that this test should actually be buried in
req_may_be_done since if this flag is not set, the request is not done!
Simon
More information about the drbd-dev
mailing list