[DRBD-user] monitor and graph "data transfer rate" [howto benchmark using ./dm]

Thu Aug 10 01:04:57 CEST 2006

/ 2006-08-09 17:50:26 -0400
\ Vampire D:
> I was looking for something like this, but super easy query or command.
> Like Monty said, since it is block level most transactions may not "ramp" up
> enough to really show real throughput, but some way of "testing" what the
> devices are able to keep up would go a long way for benchmarking, scaling,
> sizing, and making sure everything keeps running "up to snuff".

in the drbd tarball in the benchmark subdir,
we have a very simple tool called dm
(I don't remember why those letters).
you can also get it from 
 http://svn.drbd.org/drbd/trunk/benchmark/dm.c
compile it: gcc -O2 -o dm dm.c

it is basically some variation on the "dd" tool,
but you can switch on "progress" and "throughput" output,
and you can switch on fsync() before close.
it just does sequential io.

to benchmark WRITE throughput,
you use it like this:
 ./dm -a 0 -b 1M -s 500M -y -m -p -o $out_file_or_device

this will print lots of 'R's (requested by "-m", 500 of them to be
exact, one for each "block" (-b) up to the requested "size" (-s)).
the first of those Rs will print very fast, if you request several Gig
you will see it "hang" for a short time every few "R"s, and finally it
will hang for quite a while (thats the fsync requested by -y).
finally it will tell you the overall throughput.

if you leave off the fsync (-y), you will get very fast writes, as long
as they fit in some of the involved caches... this would be the
"virtual" throughput seen by most processes which don't use fsync.
but these are not very useful to figure out bottlenecks in the drbd
configuration and general setup.

you can tell dm where to write its data: use "-l 378G", and it will (try
to) seek 378G into the device (file would probably result in a sparse
file, which is not of particular interest). so if you have one disk of
400G, and have one partition on it using 400G, you could benchmark the
"inner" 10G, and the "outer" 10G by using different offsets here.

you will notice that the throughput differs significantly when using
inner or outer cylinders of your disks.

example run with "-b 1M -s 2M":
	RR
	10.48 MB/sec (2097152 B / 00:00.190802)

if you don't like the "R"s, leave off the -m ...

to measure local io bandwidth, you can use it directly on the lower
level device (or an equivalent dummy partition).
	!!this is destructive!!
	!!you will have to recreate a file system on that thing!!
 ./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/vg00/dummy

to measure local io bandwidth including file system overhead:
 ./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/dummy/dummy-out

to measure drbd performance in disconnected mode:
  drbdadm disconnect dummy
  ./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9

 (be prepared for some additional latency,
  drbd housekeeping has to remember which
 blocks are dirty now...)

... in connected mode
  drbdadm connect dummy
  ./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9

  still, the first write may be considerably slower than successive runs
  of the same command, since the activity log will be "hot" after the
  first one (as long as the size fits in the activity log completely)

... with file system
  mkfs.xfs /dev/drbd9 ; mount /dev/drbd9 /mnt/drbd9-mount-point
  ./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/drbd9-mount-point/dummy-out

if you want to see the effect on power usage when writing 0xff instead
of 0x00, use "-a 0xff" :)

if you want to see the effect of the drbd activity log, use a size
considerably larger than what you configured as al-extents.

maybe you want to use "watch -n1 cat /proc/drbd" at the same time,
so you can see the figures move, the pe go up, the lo go up sometimes,
the ap go up, the dw and ns increase all the time, the al increasing not
too often, finally the pe, lo, and ap fall back to zero...

if you like, you could use
 watch -n1 "cat /proc/drbd ; netstat -tn | grep -e ^Proto -e ':7788\>'"
which would also show you the drbd socket buffer usage, in case 7788 is
your drbd port.  if you are curious, you should run this on both nodes.

to see the effect of resync on that, you could invalidate one node
(cause a full sync), and benchmark again.
then play with the sync rate parameter.

to be somewhat more reliable,
you should repeat each command several times.

to benchmark READ throughput, you use
 ./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/sdx
 ./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/drbd9
be careful: you'll need to use a size _considerably_ larger than your
RAM, or you'll see the linux caching effects on the second usage.
of course, you could also "shrink" the caches first.
to do so, since 2.6.16, you can
 echo 3 > /proc/sys/vm/drop_caches
to get clean read throughput results.
before that, you can allocate and use huge amounts of memory, like this:
 perl -e '$x = "X" x (1024*1024*500)'
 # would allocate and use about 1 GB, it uses about twice as much as you
 # say in those brackets... use as much as you got RAM (as long as you
 # have some swap available) and the caches will shrink :)

or, even easier: you can just seek into the input device to some area
where it is unlikely to have been read before:
 ./dm -o /dev/null -b 1M -s 500M -m -p -k 7G -i /dev/drbd9
 "-k 17G" makes it seek 17 gig into the given input "file".

you will notice here, too, that read performance varies considerably
with the "inner" and "outer" cylinders.
this can be as gross as 50MB/sec inner and 30MB/sec outer.

you can also benchmark network throughput with dm,
if you utilize netcat. e.g.,
 me at x# nc -l -p 54321 -q0 >/dev/null
 me at y# dm -a 0 -b 1M -s 500M -m -p -y | nc x 54321 -q0
two of them in reverse directions to see if your full duplex GigE does
what you think it should ...

... 

you got the idea.

at least this is what we use to track down problems at customer clusters.
maybe sometime we script something around that, but most of the time we
like the flexibility of using the tool directly.
actually, we most of the time use rather a "data set size" of 800M to 6G...

but be prepared for a sligh degradation once you cross the size of the
activity log (al-extents parameter), as then drbd has to do synchronouse
updates to its meta data area for every additional 4M.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.