[DRBD-user] drbdadm verify all seems to produce false positives on ext3 and crash the server
lars.ellenberg at linbit.com
Mon Jun 30 17:47:16 CEST 2008
On Mon, Jun 30, 2008 at 04:13:41PM +0200, Stefan Löfgren wrote:
> Oops... Forgot to add that I was starting "drbdadm verify r0" during weekends
> (manually)... ;)
> Yes, I know. I have to reproduce the error to be able to fix it. I can
> reproduce it, but not right now. I'll get a call very early tomorrow morning
> if I do ;) I have to wait for a service-stop and/or install 2 more mashines
> when I do find the time to do that (by the way, DELL PowerEdge R200 quad core)...
> This is not a "bug-report" just a "experiance-report". Online verify completes
> the task and it took up to 24 hours (maybe more) before I saw any problems.
> The problems that I saw was that the system was "degrading" in functionality.
> 1) "Working fine". Everything worked fine. Even verify was working.
> 2) "Hmm. Now, what? Timeout? Ahh, there it is!". The system suddenly stopped
> responding to ping (for example) for a while or some services just dead.
> 3) "Hmm... Refusing connections? But I'm logged in already on another SSH!".
> Ping was gone again. Completly gone. Refused new SSH. But I was looged in over
> an excisting SSH-connection, so the network was there. The mouse might work,
> but not the keyboard. X could have gone dead. Lots of strange things in short
> words. When I saw this I never logged out my SSH connection.
> 4) "Oops. That's not good! Reboot!". My SSH-connection died. Keyboard
> lockouts. Blank screens (even when using X-Windows). Sometimes a kernel-panic.
> This strange behaviour dissapeared when I stopped using verify during
"strange" is a very friendly word, here.
> That's the only difference. I know it sound lame and strange, and I
> know that it looks like I'm a complete newbe. But after replacing a lot of
> hardware, changing/reconfiguring/recompiling kernels and hours and hours of
> thinking. The only thing left to try was to stop using verify. It's been
> working for over a month now (before it crashed once a week, sunday or monday).
> I'll get back when I've installed two more or can reproduce the error...
because, if those "strange things" were caused by drbd verify,
would mean we are doing something very bad.
memory leak, or a serious "random memory corruption with slow side
effects" (never seen such thing yet; usually, ifff you corrupt memeory,
it goes either boom immediately, or goes unnoticed until someone
recalculates some obviously wrong results using pencil and paper).
basically, I have no idea what could cause such behaviour,
in or outside drbd.
: Lars Ellenberg http://www.linbit.com :
: DRBD/HA support and consulting sales at linbit.com :
: LINBIT Information Technologies GmbH Tel +43-1-8178292-0 :
: Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 :
please don't Cc me, but send to list -- I'm subscribed
More information about the drbd-user