[DRBD-user] Secondary node saturises RAID array

Joris van Rooij jorrizza at wasda.nl
Thu Apr 10 17:22:36 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wednesday 09 April 2008 18:27:24 Florian Haas wrote:
> About your dd tests: while I admire your efforts, all of them are slightly
> misled. Let me explain.
>
> You're testing your memory and page cache here, not your I/O subsystem.

I must admit I've blatantly copied them from earlier posts on this list. 
Thanks for the heads-up.

> This is better (oflag=dsync), however by the block size and count you
> selected, you're mixing up a throughput and latency measurement. Can you
> re-run with bs=1G and count=1, then repeat that 3 times to get some
> reasonable average?
>
> Sadly, you reduplicated these errors for all your other test runs, so I'm
> afraid you'll have to re-run those as well.
> > syncer {
> >         rate                    51200k; # bytes/second
> >         after                   -1 _is_default;
> >         al-extents              257;
>
> This is _extremely_ low for your I/O subsystem. Try 1801, or even 3389.
>
> Please re-run your tests considering the suggestions I made above, and
> we'll go from there.

Here are the results using the suggestions (al-extents is 3389):

Local filesystem:
# for try in 1 2 3; do echo $try; dd if=/dev/zero of=/tmp/testfile bs=1G 
count=1 oflag=dsync; sleep 10; done
1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.28003 s, 147 MB/s
2
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 5.68388 s, 189 MB/s
3
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.28915 s, 147 MB/s

DRBD disconnected:
# for try in 1 2 3; do echo $try; dd if=/dev/zero of=/mnt/test/testfile bs=1G 
count=1 oflag=dsync; sleep 10; done
1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.00449 s, 153 MB/s
2
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 6.10559 s, 176 MB/s
3
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 7.6466 s, 140 MB/s

DRBD connected:
# for try in 1 2 3; do echo $try; dd if=/dev/zero of=/mnt/test/testfile bs=1G 
count=1 oflag=dsync; sleep 10; done
1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 71.8806 s, 14.9 MB/s
2
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 70.4868 s, 15.2 MB/s
3
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 25.8466 s, 41.5 MB/s

Because of that last peak I ran it again:
# for try in 1 2 3; do echo $try; dd if=/dev/zero of=/mnt/test/testfile bs=1G 
count=1 oflag=dsync; sleep 10; done
1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 24.642 s, 43.6 MB/s
2
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 25.9565 s, 41.4 MB/s
3
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 23.8683 s, 45.0 MB/s

So it needs some time to get going. I've set the unplug-watermark to 16, which 
had no significant effect. Increasing it in regular intervals all the way up 
to 16000 didn't change much. Neither did tasksetting drbdN_* processes. I 
have to reboot the entire machine in order to get the slow startup result 
back. I guess it's safe to say it's the Sun STK RAID device doing it's black 
voodoo magic.

The troughput is quite reasonable now. But the problem still exists I'm 
afraid. MySQL has it's InnoDB storage on top of one of the DRBD devices. 
SELECT statements are fast as f*ck, but INSERTs start locking when executed 
in fast salvos. Simple serial INSERT queries take ~0.2 seconds a piece on an 
idle system. Again, the secondary nodes gets saturated quickly. I've already 
mounted the partition using the noatime flag. This caused the speed to 
increase, but not that dramatically as I hoped it would.

Where should I look for the solution? Is it DRBD or MySQL/InnoDB's locking 
behaviour? It's not so much the troughput but some latency that's causing the 
slowdown.

It's not the network, that's for sure.. I think.

Thanks for the help sofar.

-- 
Greetings,
Joris



More information about the drbd-user mailing list