Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Sorry this post spent so much time in some moderation queue, apparently you don't post from your subscription address, or you are not subscribed. Anyways, see below. On Tue, Nov 09, 2010 at 05:40:02PM +0100, Thomas Vögtle wrote: > Lars Ellenberg wrote: > > > > If your kernel source looks like mine, then this would indicate something in > > between spin_lock_irqsave and spin_unlock_irqrestore above would enable > > spinlocks again, where is must not. > > > My kernel source is 2.6.32.25 (vanilla). > > > > If that something is some part of DRBD, then that would be a serious bug. > > > > If you run with spin lock debug enabled, that may provide some more insight. > > We'll try to reproduce here anyways. > > Switched on: > CONFIG_DEBUG_SPINLOCK=y > CONFIG_DEBUG_SPINLOCK_SLEEP=y > > Hope this helps: No it does not... no new information than in your previous post. Anyways, we do it wrong... commit 7f9c6c210158d212cc2c7be6d6b4d289078ab735 Author: Lars Ellenberg <lars.ellenberg at linbit.com> Date: Wed Nov 10 10:33:21 2010 +0100 drbd: use irqsave in bio endio callback We used spin_lock_irq, spin_unlock_irq. The later may re-enable irq too early if we have been called with irq disabled, opening up a window for all sorts of problems. diff --git a/drbd/drbd_req.h b/drbd/drbd_req.h index 2260e4f..f759b05 100644 --- a/drbd/drbd_req.h +++ b/drbd/drbd_req.h @@ -338,18 +338,21 @@ static inline int _req_mod(struct drbd_request *req, enum drbd_req_event what) return rv; } -/* completion of master bio is outside of spinlock. - * If you need it irqsave, do it your self! */ +/* completion of master bio is outside of our spinlock. + * We still may or may not be inside some irqs disabled section + * of the lower level driver completion callback, so we need to + * spin_lock_irqsave here. */ static inline int req_mod(struct drbd_request *req, enum drbd_req_event what) { + unsigned long flags; struct drbd_conf *mdev = req->mdev; struct bio_and_error m; int rv; - spin_lock_irq(&mdev->req_lock); + spin_lock_irqsave(&mdev->req_lock, flags); rv = __req_mod(req, what, &m); - spin_unlock_irq(&mdev->req_lock); + spin_unlock_irqrestore(&mdev->req_lock, flags); if (m.bio) complete_master_bio(mdev, &m); only, we do it wrong for a long time already. So I don't really see, why it would only show up in 8.3.9... Hm. Wait. No, we used to do it correct. My bad. in commit 9b7f76dc37919ea36caa9680a3f765e5b19b25fb Author: Lars Ellenberg <lars.ellenberg at linbit.com> Date: Wed Aug 11 23:40:24 2010 +0200 drbd: new configuration parameter c-min-rate We now track the data rate of locally submitted resync related requests, and can thus detect non-resync activity on the lower level device. If the current sync rate is above c-min-rate, and the lower level device appears to be busy, we throttle the resyncer. a bad chunk slipped through, replacing the correct spin_lock_irqsave;__req_mod; etc.. with a plain req_mod(), which only does spin_lock_irq. Sorry. I'll revert req_mod to plain spin_lock_irq, and revert the endio callback to use the spin_lock_irqsave. I think that should do it. Thanks. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed