Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
----- rb at megabit.net schrieb:
> ----- "Lars Ellenberg" <lars.ellenberg at linbit.com> schrieb:
>
> > you need to subscribe to get your posts through!
> Sorry, my mail client mixed the reply addresses - this one should be
> subscribed to the list though.
>
> >
> > On Wed, Dec 17, 2008 at 11:38:08AM +0000, Rudolph Bott wrote:
> > > > On Tue, Dec 16, 2008 at 08:23:39PM +0000, Rudolph Bott wrote:
> > > > > Hi List,
> > > > >
> > > > > I was wondering if anyone might be able to share some
> > performance
> > > > > information about his/her DRBD setup. Ours comes along with
> the
> > > > > following Hardware:
> > > > >
> > > > > Hardware: Xeon QuadCore CPU, 2GB RAM, Intel Mainboard with 2
> > > > Onboard
> > > > > e1000 NICs and one additional plugged into a regular PCI
> slot,
> > > > 3ware
> > > > > 9650SE (PCI-Express) with 4 S-ATA Disks in a RAID-10 array
> > > > >
> > > > > Software: Ubuntu Hardy LTS with DRBD 8.0.11 (from the ubuntu
> > > > repository), Kernel 2.6.24
> > > > >
> > > > > one NIC acts as "management interface", one as the DRBD Link,
> > one
> > > > as
> > > > > the heartbeat interface. On top of DRBD runs LVM to allow the
> > > > creation
> > > > > of volumes (which are in turn exported via iSCSI). Everything
> > seems
> > > > to
> > > > > run smoothly - but I'm not quite satisfied with the write
> speed
> > > > > available on the DRBD device (locally, I don't care about the
> > iSCSI
> > > > > part yet).
> > > > >
> > > > > All tests were done with dd (either copying from /dev/zero or
> > to
> > > > > /dev/null with 1, 2 or 4GB sized files). Reading gives me
> speeds
> > at
> > > > > around 390MB/sec which is way more than enough - but writing
> > does
> > > > not
> > > > > exceed 39MB/sec. Direct writes to the raid controller
> (without
> > > > DRBD)
> > > > > are at around 95MB/sec which is still below the limit of
> > > > Gig-Ethernet.
> > > > > I spent the whole day tweaking various aspects (Block-Device
> > > > tuning,
> > > > > TCP-offload-settings, DRBD net-settings etc.) and managed to
> > raise
> > > > the
> > > > > write speed from initially 25MB/sec to 39MB/sec that way.
> > > > >
> > > > > Any suggestions what happens to the missing ~60-50MB/sec that
> > the
> > > > > 3ware controller is able to handle? Do you think the PCI bus
> is
> > > > > "overtasked"? Would it be enough to simply replace the
> onboard
> > NICs
> > > > > with an additional PCI-Express Card or do you think the limit
> > is
> > > > > elsewhere? (DRBD settings, Options set in the default Distro
> > Kernel
> > > > > etc.).
> > > >
> > > > drbdadm dump all
> > >
> > > common {
> > > syncer {
> > > rate 100M;
> > > }
> > > }
> > >
> > > resource storage {
> > > protocol C;
> > > on nas03 {
> > > device /dev/drbd0;
> > > disk /dev/sda3;
> > > address 172.16.15.3:7788;
> > > meta-disk internal;
> > > }
> > > on nas04 {
> > > device /dev/drbd0;
> > > disk /dev/sda3;
> > > address 172.16.15.4:7788;
> > > meta-disk internal;
> > > }
> > > net {
> > > unplug-watermark 1024;
> > > after-sb-0pri disconnect;
> > > after-sb-1pri disconnect;
> > > after-sb-2pri disconnect;
> > > rr-conflict disconnect;
> >
> > _any_ thread about drbd tuning mentiones
> > at least sndbuf-size, max-buffers, max-epoch-size, ...
> >
> > > }
> > > disk {
> > > on-io-error detach;
> >
> > do you have a battery backed write cache on the controller?
> > if not, get one, and read about no-disk-flushes and no-md-flushes.
> >
> > > }
> > > syncer {
> > > rate 100M;
> > > al-extents 257;
> >
> > did you understand the al-extents setting,
> > and its tradeoff?
> >
> > > }
> > > startup {
> > > wfc-timeout 20;
> > > degr-wfc-timeout 120;
> > > }
> > > handlers {
> > > pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt
> -f";
> > > pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt
> -f";
> > > local-io-error "echo o > /proc/sysrq-trigger ; halt
> -f";
> > > }
> > > }
> >
> > > > what exactly does your micro benchmark look like?
> > > dd if=/dev/zero of=/mnt/testfile bs=1M count=2048
> > > dd if=/mnt/testfile of=/dev/null
> >
> > the write test does not fsync.
> > add "conv=fsync".
Apparently enabling the write cache (we were running the 3ware's without BBU until now) fixed the blocksize issue - this is what we get now:
ext3 on LVM on DRBD (StandAlone):
root at nas03:/mnt# dd if=/dev/zero of=/mnt/testfile bs=1024K count=2048 conv=fsync
2048+0 records in
2048+0 records out
2147483648 bytes (2,1 GB) copied, 13,0608 s, 164 MB/s
ext3 on LVM on DRBD (Connected):
root at nas03:/mnt# dd if=/dev/zero of=/mnt/testfile bs=1024K count=2048 conv=fsync
2048+0 records in
2048+0 records out
2147483648 bytes (2,1 GB) copied, 35,6191 s, 60,3 MB/s
ext3 directly on RAID device (root fs):
root at nas03:/mnt# dd if=/dev/zero of=/tmp/testfile bs=1024K count=2048 conv=fsync
2048+0 records in
2048+0 records out
2147483648 bytes (2,1 GB) copied, 12,6701 s, 169 MB/s
Writing to the ext3 directly on the RAID device is almost as fast as writing to LVM ontop of DRBD in StandAlone mode. "Connected" still brings 60MB/sec - I guess theres still some room for DRBD/network tuning (sync rate was between 50 and 90MB/sec).
> >
> > or use oflag=direct, and a much higher bs and less count.
> >
> > or both.
> >
> > the read test reads from page cache, unless you drop caches before
> > each
> > run. ok, so it is a large file, about the size of your RAM. but
> > still.
> > use iflag=direct, a much larger bs, and a smaller count if you are
> > interessted in streaming read performance from storage.
> > if you want to benchmark page cache, fine...
> >
> > > hmm...when I take the information above into account I would
> > > say...maybe LVM is the bottleneck? The speed comparison to local
> > > writes (achieving ~95mb/sec) were done on the root fs, which is
> > direct
> > > on the sda device, not ontop of LVM.
> >
> > well, you could easily verify with a non-drbd lv.
> >
> > I'd say you should read up on the al-extents.
> >
> > and get a battery backet cache.
> >
> > cheers,
>
> Thanks for your help so far - I will look into the above mentioned
> things. But I think I found the bottleneck elsewhere: LVM
>
> take a look at: http://lkml.org/lkml/2003/12/30/81
>
> Apparently LVM sets the blocksize of the physical volume to 512 Bytes,
> instead of the regular 4K. Since the 3ware driver seems to be
> optimised on 4K it chokes on other blocksizes. Sorry for bothering you
> with this problem, I'll try to get rid of this blocksize-Problem first
> - after that I'll take a look at the DRBD layer again.
>
> >
> > --
> > : Lars Ellenberg
> > : LINBIT | Your Way to High Availability
> > : DRBD/HA support and consulting http://www.linbit.com
> >
> > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> > __
> > please don't Cc me, but send to list -- I'm subscribed
> > _______________________________________________
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user