[DRBD-user] Re: filesystem corruptions

Sun Oct 9 14:40:03 CEST 2005

/ 2005-10-09 15:43:06 +0400
\ Eugene Crosser:
> Hello people,
> 

Hi Eugene,
your bug reports have always been valuable...

> We are running several HA NFS servers based on DRBD for more than a
> year, very successfully.  But recently, we've set up a new server on
> different hardware, and we run into exactly the same problem as the
> 'bro' guy here.  That is, eventually we get EXT3-fs error, and the
> system becomes r/o or panics depending on configuration.
> 
> Lars Ellenberg writes:
> 
> >> two suggestions:
> >> 1. I once had weird problems with SATA drives that went away when i
> >> upgraded the systems BIOS. Sometimes drives also have a upgradable
> >> firmware.  So look out for BIOS and drive firmware updates.
> >> 
> >> 2. I would try to eliminate DRBD as a cause of your problems: Create a
> >> filesystem directly on your SATA drives, without DRBD. Then do a
> >> stress test on it, e.g. use bonnie. If errors show up, you know it is
> >> not DRBD.
> > 
> > but if no errors show up, you still do not know for sure it is drbd:
> > problems may just be in an other storage area,
> > problems may just occur if you have simultaneous local-io and network load,
> 
> I think that neither of these can be confirmed.
> If we mount the filesystem on /dev/md3, and export over NFS, everything
> works like a charm.  No problems whatsoever.  All with full NFS load.
> (But without traffic over Gigabit crossover link, I must admit).
> If we mount the same filesystem on /dev/drbd0, very soon we have this error:
> 
> Oct  9 04:58:11 snfs1 kernel: EXT3-fs error (device drbd0):
> ext3_readdir: bad entry in directory #102023348: rec_len %% 4  != 0 -
> offset=0, inode=33261, rec_len=4011, name_len=0
> 
> fsck does not notice any errors.
> [not 100% sure but apparently] if you mount it from /dev/md3, it works OK.
> (note: yes, the size of the filesystem is correct. Equals to the drbd
> device size, which is 128Mb smaller than the md device.)
> 
> Looks like the filesystem error is in the in-memory structures, but not
> on the disk.  Yet if you umount and mount it again, you get the same
> error, when you try to access the same data on the filesystem.  It is
> reproducible, without any external load, you just need to try to read
> some particular directories to stomp over it.
> 
> Now, that is different on this system as compared to the working ones?
> First, "protocol A".  On working systems, we have "protocol C".
> Second, the underlying hardware (and consequently the driver). On
> working systems, it is Dell's Megaraid with SCSI disks (configured as
> RAID0).  On non-working one, there is an on-board 8 port Marvell SATA
> controller with 6 disks, joined into software RAID5.  Exactly,
> "Marvell MV88SX6081 8-port SATA II PCI-X Controller (rev 03)".
> 
> I would like to hear from people:
> - is there anybody who successfully use "protocol A" at all?

just as a side note, as long as no failover was involved to reproduce
the error, the drbd protocol is irrelevant.

> - is there anybody who successfully use on-board SATA controllers?
> Marvell in particular?  Which models?  Which driver?
> - What SATA controller does bro use? What driver?
> - Any other ideas?

We recently had a server which continued to crash on us.
We suspected, then confirmed, some problems related to "ide dma"
transfers. We reduced the "dma mode" of the disks (actually they 
are SATA disks), but that only increased the time needed to confirm the
corruption. We finally fall back to pio mode, but not even that worked.
Now we have the very same disks behind some old controller (iirc
promise) we had lying around, and they apear as scsi devices...
and whoops, it all works.

I then googled a little, and it seems that several people see strange
problems with some "recent" [whatever that means] changes in the kernel
ide code.

Sorry that I cannot be more specific yet. Still investigating.
Note, I do not say that its not DRBDs fault.
I just say we have too little evidence one way or an other...

You could run some file level integrity checks, like debsums or rpm -V,
or try the "wbtest" thing from the drbd tgz, testing/CTH/wbtest/
 # cd somewhere; mkdir data
 # wbtest -v -p 0 -r 1000 -c 5 -m 16384 -M 1024000 -d data -l /root/wbtest.log
which contiuously will create files filled with "random" data,
and after some time starts to compare them again against their expected
content... you'll need to adjust the parameters to be demanding on the
performance and memory of your system...

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.