[DRBD-user] drbdadm verify all seems to produce false positives on ext3 and crash the server

Lars Ellenberg lars.ellenberg at linbit.com
Mon Jun 30 17:47:16 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Jun 30, 2008 at 04:13:41PM +0200, Stefan Löfgren wrote:
> Oops... Forgot to add that I was starting "drbdadm verify r0" during weekends
> (manually)... ;)
> 
> Yes, I know. I have to reproduce the error to be able to fix it. I can
> reproduce it, but not right now. I'll get a call very early tomorrow morning
> if I do ;) I have to wait for a service-stop and/or install 2 more mashines
> when I do find the time to do that (by the way, DELL PowerEdge R200 quad core)...
> 
> This is not a "bug-report" just a "experiance-report". Online verify completes
> the task and it took up to 24 hours (maybe more) before I saw any problems.
> The problems that I saw was that the system was "degrading" in functionality.

duh.

> 1) "Working fine". Everything worked fine. Even verify was working.
> 2) "Hmm. Now, what? Timeout? Ahh, there it is!". The system suddenly stopped
> responding to ping (for example) for a while or some services just dead.
> 3) "Hmm... Refusing connections? But I'm logged in already on another SSH!".
> Ping was gone again. Completly gone. Refused new SSH. But I was looged in over
> an excisting SSH-connection, so the network was there. The mouse might work,
> but not the keyboard. X could have gone dead. Lots of strange things in short
> words. When I saw this I never logged out my SSH connection.
> 4) "Oops. That's not good! Reboot!". My SSH-connection died. Keyboard
> lockouts. Blank screens (even when using X-Windows). Sometimes a kernel-panic.
> 
> This strange behaviour dissapeared when I stopped using verify during
> weekends.

"strange" is a very friendly word, here.

> That's the only difference. I know it sound lame and strange, and I
> know that it looks like I'm a complete newbe. But after replacing a lot of
> hardware, changing/reconfiguring/recompiling kernels and hours and hours of
> thinking. The only thing left to try was to stop using verify. It's been
> working for over a month now (before it crashed once a week, sunday or monday).
> 
> I'll get back when I've installed two more or can reproduce the error...

yes, _please_.

because, if those "strange things" were caused by drbd verify,
would mean we are doing something very bad.
memory leak, or a serious "random memory corruption with slow side
effects" (never seen such thing yet; usually, ifff you corrupt memeory,
it goes either boom immediately, or goes unnoticed until someone
recalculates some obviously wrong results using pencil and paper).

basically, I have no idea what could cause such behaviour,
in or outside drbd.

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list