Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
----- rb at megabit.net schrieb: > ----- "Lars Ellenberg" <lars.ellenberg at linbit.com> schrieb: > > > you need to subscribe to get your posts through! > Sorry, my mail client mixed the reply addresses - this one should be > subscribed to the list though. > > > > > On Wed, Dec 17, 2008 at 11:38:08AM +0000, Rudolph Bott wrote: > > > > On Tue, Dec 16, 2008 at 08:23:39PM +0000, Rudolph Bott wrote: > > > > > Hi List, > > > > > > > > > > I was wondering if anyone might be able to share some > > performance > > > > > information about his/her DRBD setup. Ours comes along with > the > > > > > following Hardware: > > > > > > > > > > Hardware: Xeon QuadCore CPU, 2GB RAM, Intel Mainboard with 2 > > > > Onboard > > > > > e1000 NICs and one additional plugged into a regular PCI > slot, > > > > 3ware > > > > > 9650SE (PCI-Express) with 4 S-ATA Disks in a RAID-10 array > > > > > > > > > > Software: Ubuntu Hardy LTS with DRBD 8.0.11 (from the ubuntu > > > > repository), Kernel 2.6.24 > > > > > > > > > > one NIC acts as "management interface", one as the DRBD Link, > > one > > > > as > > > > > the heartbeat interface. On top of DRBD runs LVM to allow the > > > > creation > > > > > of volumes (which are in turn exported via iSCSI). Everything > > seems > > > > to > > > > > run smoothly - but I'm not quite satisfied with the write > speed > > > > > available on the DRBD device (locally, I don't care about the > > iSCSI > > > > > part yet). > > > > > > > > > > All tests were done with dd (either copying from /dev/zero or > > to > > > > > /dev/null with 1, 2 or 4GB sized files). Reading gives me > speeds > > at > > > > > around 390MB/sec which is way more than enough - but writing > > does > > > > not > > > > > exceed 39MB/sec. Direct writes to the raid controller > (without > > > > DRBD) > > > > > are at around 95MB/sec which is still below the limit of > > > > Gig-Ethernet. > > > > > I spent the whole day tweaking various aspects (Block-Device > > > > tuning, > > > > > TCP-offload-settings, DRBD net-settings etc.) and managed to > > raise > > > > the > > > > > write speed from initially 25MB/sec to 39MB/sec that way. > > > > > > > > > > Any suggestions what happens to the missing ~60-50MB/sec that > > the > > > > > 3ware controller is able to handle? Do you think the PCI bus > is > > > > > "overtasked"? Would it be enough to simply replace the > onboard > > NICs > > > > > with an additional PCI-Express Card or do you think the limit > > is > > > > > elsewhere? (DRBD settings, Options set in the default Distro > > Kernel > > > > > etc.). > > > > > > > > drbdadm dump all > > > > > > common { > > > syncer { > > > rate 100M; > > > } > > > } > > > > > > resource storage { > > > protocol C; > > > on nas03 { > > > device /dev/drbd0; > > > disk /dev/sda3; > > > address 172.16.15.3:7788; > > > meta-disk internal; > > > } > > > on nas04 { > > > device /dev/drbd0; > > > disk /dev/sda3; > > > address 172.16.15.4:7788; > > > meta-disk internal; > > > } > > > net { > > > unplug-watermark 1024; > > > after-sb-0pri disconnect; > > > after-sb-1pri disconnect; > > > after-sb-2pri disconnect; > > > rr-conflict disconnect; > > > > _any_ thread about drbd tuning mentiones > > at least sndbuf-size, max-buffers, max-epoch-size, ... > > > > > } > > > disk { > > > on-io-error detach; > > > > do you have a battery backed write cache on the controller? > > if not, get one, and read about no-disk-flushes and no-md-flushes. > > > > > } > > > syncer { > > > rate 100M; > > > al-extents 257; > > > > did you understand the al-extents setting, > > and its tradeoff? > > > > > } > > > startup { > > > wfc-timeout 20; > > > degr-wfc-timeout 120; > > > } > > > handlers { > > > pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt > -f"; > > > pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt > -f"; > > > local-io-error "echo o > /proc/sysrq-trigger ; halt > -f"; > > > } > > > } > > > > > > what exactly does your micro benchmark look like? > > > dd if=/dev/zero of=/mnt/testfile bs=1M count=2048 > > > dd if=/mnt/testfile of=/dev/null > > > > the write test does not fsync. > > add "conv=fsync". Apparently enabling the write cache (we were running the 3ware's without BBU until now) fixed the blocksize issue - this is what we get now: ext3 on LVM on DRBD (StandAlone): root at nas03:/mnt# dd if=/dev/zero of=/mnt/testfile bs=1024K count=2048 conv=fsync 2048+0 records in 2048+0 records out 2147483648 bytes (2,1 GB) copied, 13,0608 s, 164 MB/s ext3 on LVM on DRBD (Connected): root at nas03:/mnt# dd if=/dev/zero of=/mnt/testfile bs=1024K count=2048 conv=fsync 2048+0 records in 2048+0 records out 2147483648 bytes (2,1 GB) copied, 35,6191 s, 60,3 MB/s ext3 directly on RAID device (root fs): root at nas03:/mnt# dd if=/dev/zero of=/tmp/testfile bs=1024K count=2048 conv=fsync 2048+0 records in 2048+0 records out 2147483648 bytes (2,1 GB) copied, 12,6701 s, 169 MB/s Writing to the ext3 directly on the RAID device is almost as fast as writing to LVM ontop of DRBD in StandAlone mode. "Connected" still brings 60MB/sec - I guess theres still some room for DRBD/network tuning (sync rate was between 50 and 90MB/sec). > > > > or use oflag=direct, and a much higher bs and less count. > > > > or both. > > > > the read test reads from page cache, unless you drop caches before > > each > > run. ok, so it is a large file, about the size of your RAM. but > > still. > > use iflag=direct, a much larger bs, and a smaller count if you are > > interessted in streaming read performance from storage. > > if you want to benchmark page cache, fine... > > > > > hmm...when I take the information above into account I would > > > say...maybe LVM is the bottleneck? The speed comparison to local > > > writes (achieving ~95mb/sec) were done on the root fs, which is > > direct > > > on the sda device, not ontop of LVM. > > > > well, you could easily verify with a non-drbd lv. > > > > I'd say you should read up on the al-extents. > > > > and get a battery backet cache. > > > > cheers, > > Thanks for your help so far - I will look into the above mentioned > things. But I think I found the bottleneck elsewhere: LVM > > take a look at: http://lkml.org/lkml/2003/12/30/81 > > Apparently LVM sets the blocksize of the physical volume to 512 Bytes, > instead of the regular 4K. Since the 3ware driver seems to be > optimised on 4K it chokes on other blocksizes. Sorry for bothering you > with this problem, I'll try to get rid of this blocksize-Problem first > - after that I'll take a look at the DRBD layer again. > > > > > -- > > : Lars Ellenberg > > : LINBIT | Your Way to High Availability > > : DRBD/HA support and consulting http://www.linbit.com > > > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > > __ > > please don't Cc me, but send to list -- I'm subscribed > > _______________________________________________ > > drbd-user mailing list > > drbd-user at lists.linbit.com > > http://lists.linbit.com/mailman/listinfo/drbd-user > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user