Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-08-09 17:50:26 -0400 \ Vampire D: > I was looking for something like this, but super easy query or command. > Like Monty said, since it is block level most transactions may not "ramp" up > enough to really show real throughput, but some way of "testing" what the > devices are able to keep up would go a long way for benchmarking, scaling, > sizing, and making sure everything keeps running "up to snuff". in the drbd tarball in the benchmark subdir, we have a very simple tool called dm (I don't remember why those letters). you can also get it from http://svn.drbd.org/drbd/trunk/benchmark/dm.c compile it: gcc -O2 -o dm dm.c it is basically some variation on the "dd" tool, but you can switch on "progress" and "throughput" output, and you can switch on fsync() before close. it just does sequential io. to benchmark WRITE throughput, you use it like this: ./dm -a 0 -b 1M -s 500M -y -m -p -o $out_file_or_device this will print lots of 'R's (requested by "-m", 500 of them to be exact, one for each "block" (-b) up to the requested "size" (-s)). the first of those Rs will print very fast, if you request several Gig you will see it "hang" for a short time every few "R"s, and finally it will hang for quite a while (thats the fsync requested by -y). finally it will tell you the overall throughput. if you leave off the fsync (-y), you will get very fast writes, as long as they fit in some of the involved caches... this would be the "virtual" throughput seen by most processes which don't use fsync. but these are not very useful to figure out bottlenecks in the drbd configuration and general setup. you can tell dm where to write its data: use "-l 378G", and it will (try to) seek 378G into the device (file would probably result in a sparse file, which is not of particular interest). so if you have one disk of 400G, and have one partition on it using 400G, you could benchmark the "inner" 10G, and the "outer" 10G by using different offsets here. you will notice that the throughput differs significantly when using inner or outer cylinders of your disks. example run with "-b 1M -s 2M": RR 10.48 MB/sec (2097152 B / 00:00.190802) if you don't like the "R"s, leave off the -m ... to measure local io bandwidth, you can use it directly on the lower level device (or an equivalent dummy partition). !!this is destructive!! !!you will have to recreate a file system on that thing!! ./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/vg00/dummy to measure local io bandwidth including file system overhead: ./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/dummy/dummy-out to measure drbd performance in disconnected mode: drbdadm disconnect dummy ./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9 (be prepared for some additional latency, drbd housekeeping has to remember which blocks are dirty now...) ... in connected mode drbdadm connect dummy ./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9 still, the first write may be considerably slower than successive runs of the same command, since the activity log will be "hot" after the first one (as long as the size fits in the activity log completely) ... with file system mkfs.xfs /dev/drbd9 ; mount /dev/drbd9 /mnt/drbd9-mount-point ./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/drbd9-mount-point/dummy-out if you want to see the effect on power usage when writing 0xff instead of 0x00, use "-a 0xff" :) if you want to see the effect of the drbd activity log, use a size considerably larger than what you configured as al-extents. maybe you want to use "watch -n1 cat /proc/drbd" at the same time, so you can see the figures move, the pe go up, the lo go up sometimes, the ap go up, the dw and ns increase all the time, the al increasing not too often, finally the pe, lo, and ap fall back to zero... if you like, you could use watch -n1 "cat /proc/drbd ; netstat -tn | grep -e ^Proto -e ':7788\>'" which would also show you the drbd socket buffer usage, in case 7788 is your drbd port. if you are curious, you should run this on both nodes. to see the effect of resync on that, you could invalidate one node (cause a full sync), and benchmark again. then play with the sync rate parameter. to be somewhat more reliable, you should repeat each command several times. to benchmark READ throughput, you use ./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/sdx ./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/drbd9 be careful: you'll need to use a size _considerably_ larger than your RAM, or you'll see the linux caching effects on the second usage. of course, you could also "shrink" the caches first. to do so, since 2.6.16, you can echo 3 > /proc/sys/vm/drop_caches to get clean read throughput results. before that, you can allocate and use huge amounts of memory, like this: perl -e '$x = "X" x (1024*1024*500)' # would allocate and use about 1 GB, it uses about twice as much as you # say in those brackets... use as much as you got RAM (as long as you # have some swap available) and the caches will shrink :) or, even easier: you can just seek into the input device to some area where it is unlikely to have been read before: ./dm -o /dev/null -b 1M -s 500M -m -p -k 7G -i /dev/drbd9 "-k 17G" makes it seek 17 gig into the given input "file". you will notice here, too, that read performance varies considerably with the "inner" and "outer" cylinders. this can be as gross as 50MB/sec inner and 30MB/sec outer. you can also benchmark network throughput with dm, if you utilize netcat. e.g., me at x# nc -l -p 54321 -q0 >/dev/null me at y# dm -a 0 -b 1M -s 500M -m -p -y | nc x 54321 -q0 two of them in reverse directions to see if your full duplex GigE does what you think it should ... ... you got the idea. at least this is what we use to track down problems at customer clusters. maybe sometime we script something around that, but most of the time we like the flexibility of using the tool directly. actually, we most of the time use rather a "data set size" of 800M to 6G... but be prepared for a sligh degradation once you cross the size of the activity log (al-extents parameter), as then drbd has to do synchronouse updates to its meta data area for every additional 4M. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.