[DRBD-user] Unidentified strange DRBD behaviors...

Sun Jun 12 14:43:41 CEST 2005

On Tuesday 07 June 2005 16:11, Lars Ellenberg wrote:
> Note that all processes waiting for disk io are counted as runable!
> Therefore, if a lot of processes wait for disk io, the "load average" goes
> straight up, though the system actually may be almost idle cpu-wise ...
>
> E.g. crash your nfs server

It likely may have been NFS-related, though I didn't know and couldn't figure 
out at the time.  Stopped everything that was running and the load still sat 
at 4, which was just worrisome.

> bad ram? motherboard? southbridge or whatever?

Motherboard most likely.  Not RAM.

>   drbdadm reconnect all
> should have worked (on the one that was StandAlone ).

Thanks, I'll remember that one.

> but you can then
>   drbdadm invalidate
> on the one with the bad data (probably the current Secondary),

Okay.  Where is proper documentation on this?  I haven't been able to find 
much besides quick guides for setting things up, nothing really 
comprehensive.

> well. you should have done so before, and more importantly set the
> known BAD server to "inconsistent", so it will receive a full sync...

How?

> maybe some oddities in imap/maildir/symlink/header cache or some such?

No.  The individual mail files were not present on the system when we first 
brought up services. We looked in several maildir directories individually, 
and the files were simply not there.

When the files started re-appearing, they didn't come all at once, but just 
started showing up here and there.  One of our clients had called complaining 
about missing mail and we told him that we would check our backups in case it 
was more recent than the old DRBD data (it wasn't), and he later called back 
to say thanks, that he saw his missed mail showing up in his mailbox one by 
one.

Looking at the filesystem, the files that were there, that were later simply 
not there, were simply there again.

> or maybe it was just a meteorite shower, cosmic rays, you know :->

I'm pretty certain that somehow or another DRBD ended up working out the 
problem, because the files had *never been stored* on this drive since it was 
out of sync, and the initial recovery sync did not copy these files over.

> btw, you are sure the hardware is ok (again) ?

We're sure that the *current* hardware is ok.  As described, the other server 
is bad.  Currently there is only one machine (we're installing a replacement 
for the secondary tonight).

Cheers,
-- 
Casey Allen Shobe | http://casey.shobe.info
cshobe at seattleserver.com | cell 425-443-4653
AIM & Yahoo:  SomeLinuxGuy | ICQ:  1494523
SeattleServer.com, Inc. | http://www.seattleserver.com