[Drbd-dev] DRBD-8 - crash due to NULL page* in drbd_send_page

Philipp Reisner philipp.reisner at linbit.com
Wed Aug 16 10:44:31 CEST 2006


Am Dienstag, 15. August 2006 21:46 schrieb Graham, Simon:
> Have now traced the network and I am very confused -- I'm still
> convinced that the problem is that we are still in drbd_send_zc_bio when
> the Ack for the write is received BUT the data is correctly and
> completely sent on the wire to the peer who turns around and sends a
> WriteAck to it.
>
> I suppose it's theoretically possible that sending the final portion of
> the data from drbd_send_zc_bio might end up being pended; maybe the pipe
> is full when we go to send it which causes the worker thread to get
> suspended. That being the case, it's possible that this thread doesn't
> get rescheduled until waaaaay later - specifically, AFTER the Ack has
> been received and the bio completed and freed -- now we return to the
> worker thread and attempt to continue to loop through the (now free) bio
> with __bio_for_each_segment -- does this seem feasible?
>
> Assuming for the minute that this IS the cause, what would a suitable
> solution be? We really need to delay processing the Ack until the
> send-dblock/send-block has finished -- i.e. we should wait until the
> RQ_DRBD_ON_WIRE flag is set in the request -- is there something
> suitable we could issue a wait_event_interruptible() on in
> got_BlockAck() to wait for this?
>

Simon, 

I think a suitable solution would be to complete the request after
1) it was written locally.
2) the ack was received.
3) and we finished sending it [new]

I attached the patch. I guess you will rerun your tests with this
patch. [ it is completely untested ]

I take from Lars' mail yesterday that he could not reproduce this
problem here on our main test cluster here, so it is up to you
to verify it.

-philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :
-------------- next part --------------
A non-text attachment was scrubbed...
Name: for_simon.diff
Type: text/x-diff
Size: 1275 bytes
Desc: not available
Url : http://lists.linbit.com/pipermail/drbd-dev/attachments/20060816/08fe5807/for_simon-0001.bin


More information about the drbd-dev mailing list