[DRBD-cvs] svn commit by lars - r2290 - branches/drbd-0.7/drbd - in drbd_worker, the semaphore count and list item count

drbd-cvs at lists.linbit.com drbd-cvs at lists.linbit.com
Thu Jul 20 14:17:05 CEST 2006


Author: lars
Date: 2006-07-20 14:16:40 +0200 (Thu, 20 Jul 2006)
New Revision: 2290

Modified:
   branches/drbd-0.7/drbd/drbd_int.h
   branches/drbd-0.7/drbd/drbd_worker.c
Log:
in drbd_worker, the semaphore count and list item count could get out of sync.
fixed that.

be more robust in worker, in case somehow the semaphore count and the list
items get out of sync again.


Modified: branches/drbd-0.7/drbd/drbd_int.h
===================================================================
--- branches/drbd-0.7/drbd/drbd_int.h	2006-07-20 12:12:42 UTC (rev 2289)
+++ branches/drbd-0.7/drbd/drbd_int.h	2006-07-20 12:16:40 UTC (rev 2290)
@@ -1244,8 +1244,9 @@
 	unsigned long flags;
 	spin_lock_irqsave(&mdev->req_lock,flags);
 	list_add(&w->list,&q->q);
+	up(&q->s); /* within the spinlock,
+		      see comment near end of drbd_worker() */
 	spin_unlock_irqrestore(&mdev->req_lock,flags);
-	up(&q->s);
 }
 
 static inline void
@@ -1255,8 +1256,9 @@
 	unsigned long flags;
 	spin_lock_irqsave(&mdev->req_lock,flags);
 	list_add_tail(&w->list,&q->q);
+	up(&q->s); /* within the spinlock,
+		      see comment near end of drbd_worker() */
 	spin_unlock_irqrestore(&mdev->req_lock,flags);
-	up(&q->s);
 }
 
 static inline void wake_asender(drbd_dev *mdev) {

Modified: branches/drbd-0.7/drbd/drbd_worker.c
===================================================================
--- branches/drbd-0.7/drbd/drbd_worker.c	2006-07-20 12:12:42 UTC (rev 2289)
+++ branches/drbd-0.7/drbd/drbd_worker.c	2006-07-20 12:16:40 UTC (rev 2290)
@@ -956,7 +956,21 @@
 
 		w = 0;
 		spin_lock_irq(&mdev->req_lock);
-		D_ASSERT(!list_empty(&mdev->data.work.q));
+		ERR_IF(list_empty(&mdev->data.work.q)) {
+			/* something terribly wrong in our logic.
+			 * we were able to down() the semaphore,
+			 * but the list is empty... doh.
+			 *
+			 * what is the best thing to do now?
+			 * try again from scratch, restarting the receiver,
+			 * asender, whatnot? could break even more ugly,
+			 * e.g. when we are primary, but no good local data.
+			 *
+			 * I'll try to get away just starting over this loop.
+			 */
+			spin_unlock_irq(&mdev->req_lock);
+			continue;
+		}
 		w = list_entry(mdev->data.work.q.next,struct drbd_work,list);
 		list_del_init(&w->list);
 		spin_unlock_irq(&mdev->req_lock);
@@ -1020,6 +1034,11 @@
 	ERR_IF(!list_empty(&mdev->data.work.q))
 		goto again;
 	sema_init(&mdev->data.work.s,0);
+	/* DANGEROUS race: if someone did queue his work within the spinlock,
+	 * but up() ed outside the spinlock, we could get an up() on the
+	 * semaphore without corresponding list entry.
+	 * So don't do that.
+	 */
 	spin_unlock_irq(&mdev->req_lock);
 
 	INFO("worker terminated\n");



More information about the drbd-cvs mailing list