Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
[ comment by Lars Ellenberg: ] [ Repost on request of Diego, so it appears in the archives. ] [ This post seems to have been lost somehow. The problem described ] [ has been solved since, with release of drbd 0.7.14... ] > On Mon, Oct 3 at 16:04, Phil wrote: > > On Fri, 2005-09-30 at 18:58, Diego Liziero wrote: > > > Today we tried to switch to drbd 0.7.11 and everything worked fine. > > > > > > We used the same kernel that repeatedly freezed with drbd 0.7.13 > > > but with Spinlock debugging and nmi_watchdog enabled. > > > > > > As regards our environment we can say that drdb 0.7.13 has some > > > instability issues that disappeared (by now) using the > > > earlier 0.7.11 version with the same kernel with Spinlock debugging and > > > nmi_watchdog enabled. > > > > > > Sorry if we can't give more information, > > > but our cluster is in production and we couldn't repeat today > > > the kernel panic of drbd 0.7.13 with the spinlock debug option > > > switched on. > > > > > > Regards, > > > Diego. > > > > Hi Diego, > > Hello Phil, sorry for not quoting the previous mailing-list > thread discussion. > > > What do you mean by freeze: > > * Does it respond to key-strokes ? > > * Does it respond to pings ? > > * Does it respond to the "Num-Lock" key with toggling the "num-Lock" led ? > > * Is the screen blank, or has it the same content as before the freeze ? > > As previously reported to Lars, the system is a multiprocessor > (4 Xeon) SMP cluster, in production for little more than a year. > We have been using drbd 0.6.12 without any problem up to the point > where we decided to upgrade to the 0.7.x branch. > > The lock we had in 0.7.13 happened repeatedly during the first > resync of the last and biggest partition while already primary and in use. > At that time we didn't have nmi_watchdog and spinlock debugging, > sorry about that. > > Keyboard was not responding (no led of caps-lock/num-lock was > changing state), ping was working (but I know that > sometimes it works even after kernel panics), the screen was blank > (maybe the console blank screen?) apart from one time where > there was readable the end of an Oops message > (I remember something about tasker, irq and smp) > but I couldn't read more because the keyboard > shift-pageup wasn't working. > > > [..] > > You enabled nmi_watchdog and spinlock debugging for use with 0.7.11 ? > > Yes, that what Lars told me to do, we didn't try again 0.7.13 > because the cluster is in production and we couldn't > test again the version that caused the freeze. > > Each 0.7.13 freeze has occurred after a noticeable slowdown of the resync process > (at about the 10% of the normal speed). > > Actually even with 0.7.11 we had initially a slowdown during > the resync of the biggest partition (the same that never > went over the 5% of the resync process before freezing > with drbd 0.7.13), with a high load average (over 150), > but without any lock. > Then we decided to invalidate the secondary box because > after the 0.7.13 we went back to 0.6.12 before the new > upgrade to 0.7.11, and we thought that maybe the metadata was wrong. > After this forced resync (that we did before the trouble > partition completed the resync), no slowdown occurred and the load > stayed below 5. > > Regards, > Diego.