[DRBD-user] Full write speed on local disks and network but slow write-speed on drbd

Mon Oct 29 15:19:20 CET 2012

Hi Sebastian,

Sorry for coming back only now. I ran the commands as suggested in the link on the local partition and on the DRBD-Partition.

The output of blktrace / blkparse of the local partition:
tom at hydra04 [1609]:~/blkparse_dm-4_local$ sudo blktrace /dev/vg0/test-io -b 4096 &
[1] 8373
tom at hydra04 [1610]:~/blkparse_dm-4_local$ pid=$!
tom at hydra04 [1611]:~/blkparse_dm-4_local$ sudo dd if=/dev/zero of=/dev/vg0/test-io bs=1M 
dd: writing `/dev/vg0/test-io': No space left on device
20481+0 records in
20480+0 records out
21474836480 bytes (21 GB) copied, 29.1139 s, 738 MB/s
tom at hydra04 [1612]:~/blkparse_dm-4_local$ sudo kill -2 $pid
tom at hydra04 [1613]:~/blkparse_dm-4_local$ === dm-4 ===
  CPU  0:              3941517 events,   184759 KiB data
  CPU  1:               300485 events,    14086 KiB data
  CPU  2:                63195 events,     2963 KiB data
  CPU  3:                19456 events,      912 KiB data
  CPU  4:                31830 events,     1493 KiB data
  CPU  5:                    0 events,        0 KiB data
  CPU  6:              2900676 events,   135970 KiB data
  CPU  7:               263447 events,    12350 KiB data
  CPU  8:                88587 events,     4153 KiB data
  CPU  9:                16105 events,      755 KiB data
  CPU 10:                 8225 events,      386 KiB data
  CPU 11:                 1856 events,       87 KiB data
  CPU 12:               169875 events,     7963 KiB data
  CPU 13:                71106 events,     3334 KiB data
  CPU 14:                    0 events,        0 KiB data
  CPU 15:                    0 events,        0 KiB data
  CPU 16:                 9416 events,      442 KiB data
  CPU 17:                 4507 events,      212 KiB data
  CPU 18:                85902 events,     4027 KiB data
  CPU 19:                 7458 events,      350 KiB data
  CPU 20:                    0 events,        0 KiB data
  CPU 21:                    0 events,        0 KiB data
  CPU 22:                    0 events,        0 KiB data
  CPU 23:                    0 events,        0 KiB data
  Total:               7983643 events (dropped 0),   374234 KiB data
tom at hydra04 [1613]:~/blkparse_dm-4_local$ blkparse dm-4.blktrace.0 | head
254,4   13        1     0.000000000  8462  Q   W 0 + 8 [flush-254:4]
254,4   13        2     0.000027002  8462  Q   W 8 + 8 [flush-254:4]
254,4   13        3     0.000032502  8462  Q   W 16 + 8 [flush-254:4]
254,4   13        4     0.000035940  8462  Q   W 24 + 8 [flush-254:4]
254,4   13        5     0.000039489  8462  Q   W 32 + 8 [flush-254:4]
254,4   13        6     0.000042720  8462  Q   W 40 + 8 [flush-254:4]
254,4   13        7     0.000046493  8462  Q   W 48 + 8 [flush-254:4]
254,4   13        8     0.000050135  8462  Q   W 56 + 8 [flush-254:4]
254,4   13        9     0.000053015  8462  Q   W 64 + 8 [flush-254:4]
254,4   13       10     0.000055812  8462  Q   W 72 + 8 [flush-254:4]

The output of blktrace / blkparse on DRBD (which uses the logical volume from above as the backing device):

tom at hydra04 [1616]:~/blkparse_drbd5_drbd$ sudo blktrace /dev/drbd/by-res/test-io -b 4096 &
[1] 10529
tom at hydra04 [1617]:~/blkparse_drbd5_drbd$ pid=$!
tom at hydra04 [1618]:~/blkparse_drbd5_drbd$ sudo dd if=/dev/zero of=/dev/drbd/by-res/test-io bs=1M
dd: writing `/dev/drbd/by-res/test-io': No space left on device
20480+0 records in
20479+0 records out
21474144256 bytes (21 GB) copied, 201.519 s, 107 MB/s
tom at hydra04 [1619]:~/blkparse_drbd5_drbd$ sudo kill -2 $pid
tom at hydra04 [1620]:~/blkparse_drbd5_drbd$ === drbd5 ===
  CPU  0:                60048 events,     2815 KiB data
  CPU  1:               699195 events,    32775 KiB data
  CPU  2:               642751 events,    30129 KiB data
  CPU  3:               637215 events,    29870 KiB data
  CPU  4:               483730 events,    22675 KiB data
  CPU  5:               367882 events,    17245 KiB data
  CPU  6:               533295 events,    24999 KiB data
  CPU  7:               106050 events,     4972 KiB data
  CPU  8:                20642 events,      968 KiB data
  CPU  9:                 7326 events,      344 KiB data
  CPU 10:                 1034 events,       49 KiB data
  CPU 11:                 1254 events,       59 KiB data
  CPU 12:                 7606 events,      357 KiB data
  CPU 13:                 9830 events,      461 KiB data
  CPU 14:                 2369 events,      112 KiB data
  CPU 15:                    0 events,        0 KiB data
  CPU 16:                 7022 events,      330 KiB data
  CPU 17:                    0 events,        0 KiB data
  CPU 18:                89473 events,     4195 KiB data
  CPU 19:                77342 events,     3626 KiB data
  CPU 20:                    0 events,        0 KiB data
  CPU 21:                    0 events,        0 KiB data
  CPU 22:                    0 events,        0 KiB data
  CPU 23:                    0 events,        0 KiB data
  Total:               3754064 events (dropped 0),   175972 KiB data
tom at hydra04 [1620]:~/blkparse_drbd5_drbd$ blkparse drbd5.blktrace.0 | head
147,5   13        1     0.000000000 10639  Q   W 0 + 8 [flush-147:5]
147,5   13        2     0.099943986 10639  Q   W 8 + 8 [flush-147:5]
147,5   13        3     0.099954085 10639  Q   W 16 + 8 [flush-147:5]
147,5   13        4     0.099959347 10639  Q   W 24 + 8 [flush-147:5]
147,5   13        5     0.099963852 10639  Q   W 32 + 8 [flush-147:5]
147,5   13        6     0.099969408 10639  Q   W 40 + 8 [flush-147:5]
147,5   13        7     0.099974688 10639  Q   W 48 + 8 [flush-147:5]
147,5   13        8     0.099980950 10639  Q   W 56 + 8 [flush-147:5]
147,5   13        9     0.099986854 10639  Q   W 64 + 8 [flush-147:5]
147,5   13       10     0.099992100 10639  Q   W 72 + 8 [flush-147:5]

There where 24 files produced by the blktrace command. I used the blkparse command on the first file as you can see. The RAID controller has read- and write-caching enabled. I'm not sure if this makes a difference.

thanks and warm regards,

Tom

-------- Original-Nachricht --------
> Datum: Tue, 23 Oct 2012 17:09:25 +0200
> Von: Sebastian Riemer <sebastian.riemer at profitbricks.com>
> An: Tom Fernandes <anyaddress at gmx.net>
> CC: drbd-user at lists.linbit.com
> Betreff: Re: [DRBD-user] Full write speed on local disks and network but slow write-speed on drbd

> Hi Tom,
> 
> there was already a similar discussion. It would be a big help if you
> trace your IOs in the blkio layer.
> 
> http://www.mail-archive.com/drbd-user@lists.linbit.com/msg06705.html
> 
> Use the tool "blktrace" for that. If IOs are too small, then this has
> the biggest impact on performance. E.g. software bugs can cause IOs to
> be too small.
> 
> Cheers,
> Sebastian
> 
> -- 
> Sebastian Riemer
> Linux Kernel Developer - Storage
> 
> We are looking for (SENIOR) LINUX KERNEL DEVELOPERS!
> 
> ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
> www.profitbricks.com • sebastian.riemer at profitbricks.com
> 
> Sitz der Gesellschaft: Berlin
> Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
> Geschäftsführer: Andreas Gauger, Achim Weiss
> 
> 
> On 23.10.2012 12:12, Tom Fernandes wrote:
> > I'm experiencing a strange and interesting issue. I have a
> two-node-cluster 
> > with write-speed on a local partition of ~500MB/s and network throughput
> of 
> > ~220MB/s. But when writing to the drbd-partition I get only ~60MB/s.
> Another 
> > interesting thing is, that the initial-sync-speed reported by "watch cat
> > /proc/drbd" is roughly 215MB/s. Also the seeks/s are a lot slower on the
> drbd-
> > partition - which makes me worry as this cluster is also meant to run a 
> > database.
> 
>