[DRBD-user] DRBD+LVM+Ext3 online resize

Mon Dec 22 13:20:18 CET 2008

On Fri, Dec 19, 2008 at 01:15:56PM -0700, Matt Graham wrote:
> From: Lars Ellenberg <lars.ellenberg at linbit.com>
> >> So when building the initial LVM and mkfs.ext3, what are some
> >> options that I should be using on an array that will start at
> >> 2TB and grow to about 10TB as I add more disks?
> > I may be wrong, but I think on intel/amd ext3 is still limited
> > to < 8TiB ?
> 
> Docs say the max filesystem size for ext23 on the x86 is 16T with
> 4K blocks.

then either the docs you are referring to
or wikipedia seems to be wrong or outdated?
http://en.wikipedia.org/wiki/Ext3

>  I don't know how much this has been tested or used in
> the real world, since arrays that large are still out of my price
> range.
>  
> > ever seen e2fsck on a multi terabyte device?
> 
> It takes a while, which is why he's using ext3.  You'd run into 
> the same problems with any filesystem when things go south, 
> really.  Don't worry, it'll all be fixed in ext4 (currently in
> alpha).

oh please.  no one is suggesting ext2. it just so happened that the
binary is still called e2fsck.

experience shows that somewhen in the life of a file system, no matter
how advanced or proven, there will come the time where it shows strange
symptoms, and needs to be fsck'ed.
neither replication nor journalling will fix that.

I have seen both xfs and ext3 go bad (due to various reasons,
mostly bad hardware). both do recover fine when fsck'ed.

Iff you ever have to fsck a file system in the multi TB range,
you'd rather use a file system that has been designed for such large
file systems, not "hacked up" to be able to support that.

> > why not go for XFS? seems to be a better choice for device
> > sizes in that range.
>  
> Maybe he wants his data to still be there if there's a power
> failure :-)

again: oh please.
if your application(s) don't fsync, then replication with drbd and
failover clustering will not help you either.

On Fri, Dec 19, 2008 at 02:34:40PM -0600, Artur (eBoundHost) wrote:
> This is the only reason i'm looking into replication with DRBD.  We  
> absolutely cannot have downtime for fsck.

in case you ever need to fsck, anyways, you still need a downtime.
with drbd, you may be able to keep one side read-only,
and xfs_repair/fsck the other, then switch to the fsck'ed side,
and let drbd resync.
but you will not be able to have it read-write during an eventual fsck.

On Mon, Dec 22, 2008 at 10:10:31AM +0000, Igor Neves wrote:
> And there is another point, recently I tested areca cards with RAID5 and LVM. I
> put DRBD on top of the LVM and then XFS.
> With XFS and high I/O load the machines gives me kernel panic, I had tested it
> about for a week, to understand where was the problem.
> 
> - XFS directly on LVM gives no problems at all.
> - Ext3 directly on LVM gives no problems at all.
> - XFS on top of drbd, and drbd on top LVM, gives kernel panic with drbd 8.0.X
> and with 8.2.X.
> - Ext3 on top of drbd, and drbd on top LVM, gives no problems at all.
> 
> Sorry I cant show you any kernel panic, but when I took XFS from the way,
> everything works, but I know it's not a problem of XFS since XFS works fine
> directly on top of LVM.
> 
> Everything I tested, was on top of Centos 5, with the latest updates.

and with 4kB kernel stack size, I suppose,
as that is apparently the default for RHEL?

make sure you use 8kB kernel stack size
in those multiple-layers-of-virtual-block-devices setups,
or you get a kernel stack overflow.

possibly a 2.6.18 kernel may not be the best platform for xfs.
(even if - or maybe because - frankenstein'ed by redhat)

> XFS it's incredibly faster then ext3, but ext3 it's widely tested and
> proved so far, so for this more strange setups with drbd on the game,
> don't use XFS.

I'm a big fan of ext3.
for /boot, for /root,
and for everything up to a few hundered GB.
it has lots of knobs to tune, if you need it.

but ext3 is not a good choice for huge file systems, IMO.

I'm a big fan of XFS, as well.
for everything that scales up to a few TiB,
for lots of small files,
for very large files,
for many hardlinks/inodes,
for a lot of things.
it also fsck's (ok. xfs_repair's; btw, latest xfsprogs is 2.10.2,
and improve further on speed and memory consumption)
much faster per TB of storage/number of inodes than ext3.

but general recommendations for multi terrabyte storage server:
use 64bit OS,
put in as much RAM as you can conveniently afford,
use a battery backed (thus non-volatile) write cache on your controller,
and switch off the (volatile) write cache on your disks (in case your
controller does not take care of that).

but, of course, the decision is yours.
if your tests show that it does not work for you,
or even if you just feel uncomfortable with xfs,
stick with what you know.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed