[DRBD-cvs] svn commit by phil - r2737 - trunk/drbd - All the kudos
to Enesto & Simon for finding this. All t
drbd-cvs at lists.linbit.com
drbd-cvs at lists.linbit.com
Wed Jan 31 18:23:48 CET 2007
Author: phil
Date: 2007-01-31 18:23:47 +0100 (Wed, 31 Jan 2007)
New Revision: 2737
Modified:
trunk/drbd/drbd_receiver.c
trunk/drbd/drbd_worker.c
Log:
All the kudos to Enesto & Simon for finding this.
All the blame to me for the patch.
When an IO error happened during resync out drbd_rs_complete_io() function
might trigger an kernel bug when mdev->resync is already NULL.
We need to protect calls to drbd_rs_begin_io() and drbd_rs_complete_io()
by holding a reference with inc_local()/dec_local().
Here are Simons' comments, edited by me, to match my patch:
1. got_NegRSDReply -- possibly add inc_local_if_state()/dec_local()? Make
sure you still call dec_rs_pending() though -- only drbd_rs_complete_io()
and drbd_rs_failed_io() should be protected with the
inc_local_if_state(Failed).
2. w_make_resync_request -- Added inc_local() and dec_local() in order
to protect both drbd_rs_begin_io() and drbd_rs_complete_io().
3. w_e_end_rsdata_req -- this is run as a worker item at the end of
processing a resync data request -- the bio completion routine is
drbd_endio_read_sec and this is decrementing the local count on the mdev
before the work item runs. Since drbd_endio_read_sec() is used for
reads of a disk less peer as well, I decided to not modify it.
Instead I decided to grab a reference directly in w_e_end_rsdata_req().
Modified: trunk/drbd/drbd_receiver.c
===================================================================
--- trunk/drbd/drbd_receiver.c 2007-01-31 14:06:28 UTC (rev 2736)
+++ trunk/drbd/drbd_receiver.c 2007-01-31 17:23:47 UTC (rev 2737)
@@ -3254,10 +3254,12 @@
dec_rs_pending(mdev);
- drbd_rs_complete_io(mdev,sector);
+ if(inc_local_if_state(mdev,Failed)) {
+ drbd_rs_complete_io(mdev,sector);
+ drbd_rs_failed_io(mdev, sector, size);
+ dec_local(mdev);
+ }
- drbd_rs_failed_io(mdev, sector, size);
-
return TRUE;
}
Modified: trunk/drbd/drbd_worker.c
===================================================================
--- trunk/drbd/drbd_worker.c 2007-01-31 14:06:28 UTC (rev 2736)
+++ trunk/drbd/drbd_worker.c 2007-01-31 17:23:47 UTC (rev 2737)
@@ -287,6 +287,16 @@
}
number -= atomic_read(&mdev->rs_pending_cnt);
+ if(!inc_local(mdev)) {
+ /* Since we only need to access mdev->rsync a
+ inc_local_if_state(mdev,Failed) would be sufficient, but
+ to continue resync with a broken disk makes no sense at
+ all */
+ ERR("Disk broke down during resync!\n");
+ mdev->resync_work.cb = w_resync_inactive;
+ return 1;
+ }
+
for(i=0;i<number;i++) {
next_sector:
@@ -299,6 +309,7 @@
* or make this the _only_ place that is allowed
* to assign w_resync_inactive! */
mdev->resync_work.cb = w_resync_inactive;
+ dec_local(mdev);
return 1;
}
@@ -365,6 +376,7 @@
sector,size,ID_SYNCER)) {
ERR("drbd_send_drequest() failed, aborting...\n");
dec_rs_pending(mdev);
+ dec_local(mdev);
return 0;
}
}
@@ -377,11 +389,13 @@
* until then resync "work" is "inactive" ...
*/
mdev->resync_work.cb = w_resync_inactive;
+ dec_local(mdev);
return 1;
}
requeue:
mod_timer(&mdev->resync_timer, jiffies + SLEEP_TIME);
+ dec_local(mdev);
return 1;
}
@@ -549,7 +563,10 @@
return 1;
}
- drbd_rs_complete_io(mdev,e->sector);
+ if(inc_local_if_state(mdev,Failed)) {
+ drbd_rs_complete_io(mdev,e->sector);
+ dec_local(mdev);
+ }
if(likely(drbd_bio_uptodate(e->private_bio))) {
if (likely( mdev->state.pdsk >= Inconsistent )) {
More information about the drbd-cvs
mailing list