Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Jun 30, 2008 at 04:13:41PM +0200, Stefan Löfgren wrote: > Oops... Forgot to add that I was starting "drbdadm verify r0" during weekends > (manually)... ;) > > Yes, I know. I have to reproduce the error to be able to fix it. I can > reproduce it, but not right now. I'll get a call very early tomorrow morning > if I do ;) I have to wait for a service-stop and/or install 2 more mashines > when I do find the time to do that (by the way, DELL PowerEdge R200 quad core)... > > This is not a "bug-report" just a "experiance-report". Online verify completes > the task and it took up to 24 hours (maybe more) before I saw any problems. > The problems that I saw was that the system was "degrading" in functionality. duh. > 1) "Working fine". Everything worked fine. Even verify was working. > 2) "Hmm. Now, what? Timeout? Ahh, there it is!". The system suddenly stopped > responding to ping (for example) for a while or some services just dead. > 3) "Hmm... Refusing connections? But I'm logged in already on another SSH!". > Ping was gone again. Completly gone. Refused new SSH. But I was looged in over > an excisting SSH-connection, so the network was there. The mouse might work, > but not the keyboard. X could have gone dead. Lots of strange things in short > words. When I saw this I never logged out my SSH connection. > 4) "Oops. That's not good! Reboot!". My SSH-connection died. Keyboard > lockouts. Blank screens (even when using X-Windows). Sometimes a kernel-panic. > > This strange behaviour dissapeared when I stopped using verify during > weekends. "strange" is a very friendly word, here. > That's the only difference. I know it sound lame and strange, and I > know that it looks like I'm a complete newbe. But after replacing a lot of > hardware, changing/reconfiguring/recompiling kernels and hours and hours of > thinking. The only thing left to try was to stop using verify. It's been > working for over a month now (before it crashed once a week, sunday or monday). > > I'll get back when I've installed two more or can reproduce the error... yes, _please_. because, if those "strange things" were caused by drbd verify, would mean we are doing something very bad. memory leak, or a serious "random memory corruption with slow side effects" (never seen such thing yet; usually, ifff you corrupt memeory, it goes either boom immediately, or goes unnoticed until someone recalculates some obviously wrong results using pencil and paper). basically, I have no idea what could cause such behaviour, in or outside drbd. -- : Lars Ellenberg http://www.linbit.com : : DRBD/HA support and consulting sales at linbit.com : : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : __ please don't Cc me, but send to list -- I'm subscribed