[Drbd-dev] DRBD-8 - system hangs when NegDReply received

Lars Ellenberg Lars.Ellenberg at linbit.com
Thu Sep 7 11:28:50 CEST 2006


/ 2006-09-06 10:09:31 +0200
\ Lars Ellenberg:
> / 2006-09-05 21:41:36 -0400
> \ Graham, Simon:
> > I'd still like to understand why simply completing the original request
> > with an error similar to what is done in receive_DataReply leads to a
> > hang - all suggestions gratefully received - this is what the NegDReply
> > code looks like now:
> > 
> > STATIC int got_NegDReply(drbd_dev *mdev, Drbd_Header* h)
> > {
> > 	drbd_request_t *req;
> > 	Drbd_BlockAck_Packet *p = (Drbd_BlockAck_Packet*)h;
> > 	sector_t sector = be64_to_cpu(p->sector);
> > 
> > 	req = (drbd_request_t *)(unsigned long)p->block_id;
> > 	if(unlikely(!drbd_pr_verify(mdev,req,sector))) {
> > 		ERR("Got a corrupt block_id/sector pair(3).\n");
> > 		return FALSE;
> > 	}
> > 
> > 	ERR("Got NegDReply; Sector %llx, len %x; Fail original
> > request.\n",
> > 	    (unsigned long long)sector,be32_to_cpu(p->blksize));
> > 
> > 	spin_lock(&mdev->pr_lock);
> > 	hlist_del(&req->colision);
> > 	spin_unlock(&mdev->pr_lock);
> > 
> > 	/* Complete original request with error */
> > 	drbd_bio_endio(req->master_bio,0 /* failed */);
> 
> I am still working on a monster patch to consolidate all the
> request functionality in one place, so it is more obvious what should
> and should not happen.
> I may be wrong here, but you cannot simply end the master request and
> free the req because you get a NegDReply. the local part (submit_bio)
> may still be on the fly.
> you have to use drbd_end_req with appropriate flags...

nonsense. a NegDReply comes from a read request, there is no local
 request pending for that one...
 sorry, have been to deep in other areas of the code...

so this should just work as you coded it.

> 
> 
> > 
> > 	dec_ap_bio(mdev);
> > 	dec_ap_pending(mdev);
> > 
> > 	drbd_req_free(req);
> > 
> > 	drbd_khelper(mdev,"pri-on-incon-degr");

well, what is your pri-on-incon-degr handler?
if that happens to be "halt -f" it would pretty much explain the "hang"
right?

> > 
> > 	return TRUE;
> > }
> 

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :


More information about the drbd-dev mailing list