Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
----- rb at megabit.net schrieb:
> ----- "Lars Ellenberg" <lars.ellenberg at linbit.com> schrieb:
> 
> > you need to subscribe to get your posts through!
> Sorry, my mail client mixed the reply addresses - this one should be
> subscribed to the list though.
> 
> > 
> > On Wed, Dec 17, 2008 at 11:38:08AM +0000, Rudolph Bott wrote:
> > > > On Tue, Dec 16, 2008 at 08:23:39PM +0000, Rudolph Bott wrote:
> > > > > Hi List,
> > > > > 
> > > > > I was wondering if anyone might be able to share some
> > performance
> > > > > information about his/her DRBD setup. Ours comes along with
> the
> > > > > following Hardware:
> > > > > 
> > > > > Hardware: Xeon QuadCore CPU, 2GB RAM, Intel Mainboard with 2
> > > > Onboard
> > > > > e1000 NICs and one additional plugged into a regular PCI
> slot,
> > > > 3ware
> > > > > 9650SE (PCI-Express) with 4 S-ATA Disks in a RAID-10 array
> > > > > 
> > > > > Software: Ubuntu Hardy LTS with DRBD 8.0.11 (from the ubuntu
> > > > repository), Kernel 2.6.24
> > > > > 
> > > > > one NIC acts as "management interface", one as the DRBD Link,
> > one
> > > > as
> > > > > the heartbeat interface. On top of DRBD runs LVM to allow the
> > > > creation
> > > > > of volumes (which are in turn exported via iSCSI). Everything
> > seems
> > > > to
> > > > > run smoothly - but I'm not quite satisfied with the write
> speed
> > > > > available on the DRBD device (locally, I don't care about the
> > iSCSI
> > > > > part yet).
> > > > > 
> > > > > All tests were done with dd (either copying from /dev/zero or
> > to
> > > > > /dev/null with 1, 2 or 4GB sized files). Reading gives me
> speeds
> > at
> > > > > around 390MB/sec which is way more than enough - but writing
> > does
> > > > not
> > > > > exceed 39MB/sec. Direct writes to the raid controller
> (without
> > > > DRBD)
> > > > > are at around 95MB/sec which is still below the limit of
> > > > Gig-Ethernet.
> > > > > I spent the whole day tweaking various aspects (Block-Device
> > > > tuning,
> > > > > TCP-offload-settings, DRBD net-settings etc.) and managed to
> > raise
> > > > the
> > > > > write speed from initially 25MB/sec to 39MB/sec that way.
> > > > > 
> > > > > Any suggestions what happens to the missing ~60-50MB/sec that
> > the
> > > > > 3ware controller is able to handle? Do you think the PCI bus
> is
> > > > > "overtasked"? Would it be enough to simply replace the
> onboard
> > NICs
> > > > > with an additional PCI-Express Card or do you think the limit
> > is
> > > > > elsewhere? (DRBD settings, Options set in the default Distro
> > Kernel
> > > > > etc.). 
> > > > 
> > > > drbdadm dump all
> > > 
> > > common {
> > >     syncer {
> > >         rate             100M;
> > >     }
> > > }
> > > 
> > > resource storage {
> > >     protocol               C;
> > >     on nas03 {
> > >         device           /dev/drbd0;
> > >         disk             /dev/sda3;
> > >         address          172.16.15.3:7788;
> > >         meta-disk        internal;
> > >     }
> > >     on nas04 {
> > >         device           /dev/drbd0;
> > >         disk             /dev/sda3;
> > >         address          172.16.15.4:7788;
> > >         meta-disk        internal;
> > >     }
> > >     net {
> > >         unplug-watermark 1024;
> > >         after-sb-0pri    disconnect;
> > >         after-sb-1pri    disconnect;
> > >         after-sb-2pri    disconnect;
> > >         rr-conflict      disconnect;
> > 
> > _any_ thread about drbd tuning mentiones
> > at least sndbuf-size, max-buffers, max-epoch-size, ...
> > 
> > >     }
> > >     disk {
> > >         on-io-error      detach;
> > 
> > do you have a battery backed write cache on the controller?
> > if not, get one, and read about no-disk-flushes and no-md-flushes.
> > 
> > >     }
> > >     syncer {
> > >         rate             100M;
> > >         al-extents       257;
> > 
> > did you understand the al-extents setting,
> > and its tradeoff?
> > 
> > >     }
> > >     startup {
> > >         wfc-timeout       20;
> > >         degr-wfc-timeout 120;
> > >     }
> > >     handlers {
> > >         pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt
> -f";
> > >         pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt
> -f";
> > >         local-io-error   "echo o > /proc/sysrq-trigger ; halt
> -f";
> > >     }
> > > }
> > 
> > > > what exactly does your micro benchmark look like?
> > > dd if=/dev/zero of=/mnt/testfile bs=1M count=2048
> > > dd if=/mnt/testfile of=/dev/null
> > 
> > the write test does not fsync.
> > add "conv=fsync".
Apparently enabling the write cache (we were running the 3ware's without BBU until now) fixed the blocksize issue - this is what we get now:
ext3 on LVM on DRBD (StandAlone):
root at nas03:/mnt# dd if=/dev/zero of=/mnt/testfile bs=1024K count=2048 conv=fsync
2048+0 records in
2048+0 records out
2147483648 bytes (2,1 GB) copied, 13,0608 s, 164 MB/s
ext3 on LVM on DRBD (Connected):
root at nas03:/mnt# dd if=/dev/zero of=/mnt/testfile bs=1024K count=2048 conv=fsync
2048+0 records in
2048+0 records out
2147483648 bytes (2,1 GB) copied, 35,6191 s, 60,3 MB/s
ext3 directly on RAID device (root fs):
root at nas03:/mnt# dd if=/dev/zero of=/tmp/testfile bs=1024K count=2048 conv=fsync
2048+0 records in
2048+0 records out
2147483648 bytes (2,1 GB) copied, 12,6701 s, 169 MB/s
Writing to the ext3 directly on the RAID device is almost as fast as writing to LVM ontop of DRBD in StandAlone mode. "Connected" still brings 60MB/sec - I guess theres still some room for DRBD/network tuning (sync rate was between 50 and 90MB/sec).
> > 
> > or use oflag=direct, and a much higher bs and less count.
> > 
> > or both.
> > 
> > the read test reads from page cache, unless you drop caches before
> > each
> > run.  ok, so it is a large file, about the size of your RAM.  but
> > still.
> > use iflag=direct, a much larger bs, and a smaller count if you are
> > interessted in streaming read performance from storage.
> > if you want to benchmark page cache, fine...
> > 
> > > hmm...when I take the information above into account I would
> > > say...maybe LVM is the bottleneck? The speed comparison to local
> > > writes (achieving ~95mb/sec) were done on the root fs, which is
> > direct
> > > on the sda device, not ontop of LVM.
> > 
> > well, you could easily verify with a non-drbd lv.
> > 
> > I'd say you should read up on the al-extents.
> > 
> > and get a battery backet cache.
> > 
> > cheers,
> 
> Thanks for your help so far - I will look into the above mentioned
> things. But I think I found the bottleneck elsewhere: LVM
> 
> take a look at: http://lkml.org/lkml/2003/12/30/81
> 
> Apparently LVM sets the blocksize of the physical volume to 512 Bytes,
> instead of the regular 4K. Since the 3ware driver seems to be
> optimised on 4K it chokes on other blocksizes. Sorry for bothering you
> with this problem, I'll try to get rid of this blocksize-Problem first
> - after that I'll take a look at the DRBD layer again.
> 
> > 
> > -- 
> > : Lars Ellenberg
> > : LINBIT | Your Way to High Availability
> > : DRBD/HA support and consulting http://www.linbit.com
> > 
> > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> > __
> > please don't Cc me, but send to list   --   I'm subscribed
> > _______________________________________________
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user