wow, this is great info, thanks!<br><br>
<div><span class="gmail_quote">On 8/9/06, <b class="gmail_sendername">Lars Ellenberg</b> <<a href="mailto:Lars.Ellenberg@linbit.com">Lars.Ellenberg@linbit.com</a>> wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">/ 2006-08-09 17:50:26 -0400<br>\ Vampire D:<br>> I was looking for something like this, but super easy query or command.
<br>> Like Monty said, since it is block level most transactions may not "ramp" up<br>> enough to really show real throughput, but some way of "testing" what the<br>> devices are able to keep up would go a long way for benchmarking, scaling,
<br>> sizing, and making sure everything keeps running "up to snuff".<br><br>in the drbd tarball in the benchmark subdir,<br>we have a very simple tool called dm<br>(I don't remember why those letters).<br>you can also get it from
<br><a href="http://svn.drbd.org/drbd/trunk/benchmark/dm.c">http://svn.drbd.org/drbd/trunk/benchmark/dm.c</a><br>compile it: gcc -O2 -o dm dm.c<br><br>it is basically some variation on the "dd" tool,<br>but you can switch on "progress" and "throughput" output,
<br>and you can switch on fsync() before close.<br>it just does sequential io.<br><br>to benchmark WRITE throughput,<br>you use it like this:<br>./dm -a 0 -b 1M -s 500M -y -m -p -o $out_file_or_device<br><br>this will print lots of 'R's (requested by "-m", 500 of them to be
<br>exact, one for each "block" (-b) up to the requested "size" (-s)).<br>the first of those Rs will print very fast, if you request several Gig<br>you will see it "hang" for a short time every few "R"s, and finally it
<br>will hang for quite a while (thats the fsync requested by -y).<br>finally it will tell you the overall throughput.<br><br>if you leave off the fsync (-y), you will get very fast writes, as long<br>as they fit in some of the involved caches... this would be the
<br>"virtual" throughput seen by most processes which don't use fsync.<br>but these are not very useful to figure out bottlenecks in the drbd<br>configuration and general setup.<br><br>you can tell dm where to write its data: use "-l 378G", and it will (try
<br>to) seek 378G into the device (file would probably result in a sparse<br>file, which is not of particular interest). so if you have one disk of<br>400G, and have one partition on it using 400G, you could benchmark the
<br>"inner" 10G, and the "outer" 10G by using different offsets here.<br><br>you will notice that the throughput differs significantly when using<br>inner or outer cylinders of your disks.<br><br>example run with "-b 1M -s 2M":
<br> RR<br> 10.48 MB/sec (2097152 B / 00:00.190802)<br><br>if you don't like the "R"s, leave off the -m ...<br><br>to measure local io bandwidth, you can use it directly on the lower<br>level device (or an equivalent dummy partition).
<br> !!this is destructive!!<br> !!you will have to recreate a file system on that thing!!<br>./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/vg00/dummy<br><br>to measure local io bandwidth including file system overhead:
<br>./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/dummy/dummy-out<br><br>to measure drbd performance in disconnected mode:<br>drbdadm disconnect dummy<br>./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9<br><br>(be prepared for some additional latency,
<br>drbd housekeeping has to remember which<br>blocks are dirty now...)<br><br>... in connected mode<br>drbdadm connect dummy<br>./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9<br><br>still, the first write may be considerably slower than successive runs
<br>of the same command, since the activity log will be "hot" after the<br>first one (as long as the size fits in the activity log completely)<br><br>... with file system<br>mkfs.xfs /dev/drbd9 ; mount /dev/drbd9 /mnt/drbd9-mount-point
<br>./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/drbd9-mount-point/dummy-out<br><br>if you want to see the effect on power usage when writing 0xff instead<br>of 0x00, use "-a 0xff" :)<br><br>if you want to see the effect of the drbd activity log, use a size
<br>considerably larger than what you configured as al-extents.<br><br>maybe you want to use "watch -n1 cat /proc/drbd" at the same time,<br>so you can see the figures move, the pe go up, the lo go up sometimes,
<br>the ap go up, the dw and ns increase all the time, the al increasing not<br>too often, finally the pe, lo, and ap fall back to zero...<br><br>if you like, you could use<br>watch -n1 "cat /proc/drbd ; netstat -tn | grep -e ^Proto -e ':7788\>'"
<br>which would also show you the drbd socket buffer usage, in case 7788 is<br>your drbd port. if you are curious, you should run this on both nodes.<br><br>to see the effect of resync on that, you could invalidate one node
<br>(cause a full sync), and benchmark again.<br>then play with the sync rate parameter.<br><br>to be somewhat more reliable,<br>you should repeat each command several times.<br><br>to benchmark READ throughput, you use<br>
./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/sdx<br>./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/drbd9<br>be careful: you'll need to use a size _considerably_ larger than your<br>RAM, or you'll see the linux caching effects on the second usage.
<br>of course, you could also "shrink" the caches first.<br>to do so, since 2.6.16, you can<br>echo 3 > /proc/sys/vm/drop_caches<br>to get clean read throughput results.<br>before that, you can allocate and use huge amounts of memory, like this:
<br>perl -e '$x = "X" x (1024*1024*500)'<br># would allocate and use about 1 GB, it uses about twice as much as you<br># say in those brackets... use as much as you got RAM (as long as you<br># have some swap available) and the caches will shrink :)
<br><br>or, even easier: you can just seek into the input device to some area<br>where it is unlikely to have been read before:<br>./dm -o /dev/null -b 1M -s 500M -m -p -k 7G -i /dev/drbd9<br>"-k 17G" makes it seek 17 gig into the given input "file".
<br><br>you will notice here, too, that read performance varies considerably<br>with the "inner" and "outer" cylinders.<br>this can be as gross as 50MB/sec inner and 30MB/sec outer.<br><br><br>you can also benchmark network throughput with dm,
<br>if you utilize netcat. e.g.,<br>me@x# nc -l -p 54321 -q0 >/dev/null<br>me@y# dm -a 0 -b 1M -s 500M -m -p -y | nc x 54321 -q0<br>two of them in reverse directions to see if your full duplex GigE does<br>what you think it should ...
<br><br>...<br><br>you got the idea.<br><br>at least this is what we use to track down problems at customer clusters.<br>maybe sometime we script something around that, but most of the time we<br>like the flexibility of using the tool directly.
<br>actually, we most of the time use rather a "data set size" of 800M to 6G...<br><br>but be prepared for a sligh degradation once you cross the size of the<br>activity log (al-extents parameter), as then drbd has to do synchronouse
<br>updates to its meta data area for every additional 4M.<br><br><br>--<br>: Lars Ellenberg Tel +43-1-8178292-0 :<br>: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
<br>: Schoenbrunner Str. 244, A-1120 Vienna/Europe <a href="http://www.linbit.com">http://www.linbit.com</a> :<br>__<br>please use the "List-Reply" function of your email client.<br>_______________________________________________
<br>drbd-user mailing list<br><a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br><a href="http://lists.linbit.com/mailman/listinfo/drbd-user">http://lists.linbit.com/mailman/listinfo/drbd-user</a>
<br></blockquote></div><br><br clear="all"><br>-- <br>"Do the actors on Unsolved Mysteries ever get arrested because they look just like the criminal they are playing?"<br><br>Christopher<br>