[Drbd-dev] DRBD-8 - system hangs when NegDReply received

Lars Ellenberg Lars.Ellenberg at linbit.com
Wed Sep 6 10:09:31 CEST 2006


/ 2006-09-05 21:41:36 -0400
\ Graham, Simon:
> I'd still like to understand why simply completing the original request
> with an error similar to what is done in receive_DataReply leads to a
> hang - all suggestions gratefully received - this is what the NegDReply
> code looks like now:
> 
> STATIC int got_NegDReply(drbd_dev *mdev, Drbd_Header* h)
> {
> 	drbd_request_t *req;
> 	Drbd_BlockAck_Packet *p = (Drbd_BlockAck_Packet*)h;
> 	sector_t sector = be64_to_cpu(p->sector);
> 
> 	req = (drbd_request_t *)(unsigned long)p->block_id;
> 	if(unlikely(!drbd_pr_verify(mdev,req,sector))) {
> 		ERR("Got a corrupt block_id/sector pair(3).\n");
> 		return FALSE;
> 	}
> 
> 	ERR("Got NegDReply; Sector %llx, len %x; Fail original
> request.\n",
> 	    (unsigned long long)sector,be32_to_cpu(p->blksize));
> 
> 	spin_lock(&mdev->pr_lock);
> 	hlist_del(&req->colision);
> 	spin_unlock(&mdev->pr_lock);
> 
> 	/* Complete original request with error */
> 	drbd_bio_endio(req->master_bio,0 /* failed */);

I am still working on a monster patch to consolidate all the
request functionality in one place, so it is more obvious what should
and should not happen.
I may be wrong here, but you cannot simply end the master request and
free the req because you get a NegDReply. the local part (submit_bio)
may still be on the fly.
you have to use drbd_end_req with appropriate flags...


> 
> 	dec_ap_bio(mdev);
> 	dec_ap_pending(mdev);
> 
> 	drbd_req_free(req);
> 
> 	drbd_khelper(mdev,"pri-on-incon-degr");
> 
> 	return TRUE;
> }

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :


More information about the drbd-dev mailing list