[DRBD-user] Secondary node saturises RAID array

Florian Haas florian.haas at linbit.com
Wed Apr 9 18:27:24 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wednesday 09 April 2008 17:02:51 Joris van Rooij wrote:
> Hi,
>
> I'm using two identical machines, both packing 32GB of RAM, 16 Xeon
> cores and a battery-backed hardware RAID-6 setup (SUN STK using Adaptec
> AAC). Both machines are running Debian, Linux 2.6.24.3 and DRBD 8.0.11.
> The two machines are connected using a dedicated gigabit ethernet link
> (using an Intel 82571EB) with MTU set to 9000.

Nice setup. :-)

About your dd tests: while I admire your efforts, all of them are slightly 
misled. Let me explain.

> Streamed dd to the local filesystem:
>
> # dd if=/dev/zero of=/tmp/testfile bs=4096 count=10000
> 10000+0 records in
> 10000+0 records out
> 40960000 bytes (41 MB) copied, 0.0920567 s, 445 MB/s

You're testing your memory and page cache here, not your I/O subsystem.

> Synced dd to the local filesystem:
>
> # dd if=/dev/zero of=/tmp/testfile bs=4096 count=10000 oflag=dsync
> 10000+0 records in
> 10000+0 records out
> 40960000 bytes (41 MB) copied, 5.39007 s, 7.6 MB/s

This is better (oflag=dsync), however by the block size and count you 
selected, you're mixing up a throughput and latency measurement. Can you 
re-run with bs=1G and count=1, then repeat that 3 times to get some 
reasonable average?

Sadly, you reduplicated these errors for all your other test runs, so I'm 
afraid you'll have to re-run those as well.

> Some snippets of configuration (from drdbsetup show):
> disk {
>         size                    0s _is_default; # bytes
>         on-io-error             detach;
>         fencing                 resource-only;
> }
> net {
>         timeout                 60 _is_default; # 1/10 seconds
>         max-epoch-size          16000;
>         max-buffers             16000;
>         unplug-watermark        128 _is_default;
>         connect-int             10 _is_default; # seconds
>         ping-int                10 _is_default; # seconds
>         sndbuf-size             524288; # bytes
>         ko-count                0 _is_default;
>         cram-hmac-alg           "sha1";
>         shared-secret           :);
>         after-sb-0pri           disconnect _is_default;
>         after-sb-1pri           disconnect _is_default;
>         after-sb-2pri           disconnect _is_default;
>         rr-conflict             disconnect _is_default;
>         ping-timeout            5 _is_default; # 1/10 seconds
> }
> syncer {
>         rate                    51200k; # bytes/second
>         after                   -1 _is_default;
>         al-extents              257;

This is _extremely_ low for your I/O subsystem. Try 1801, or even 3389.

Please re-run your tests considering the suggestions I made above, and we'll 
go from there.

Cheers,
Florian


-- 
: Florian G. Haas
: LINBIT Information Technologies GmbH
: Vivenotgasse 48, A-1120 Vienna, Austria

When replying, there is no need to CC my personal address.
I monitor the list on a daily basis. Thank you.



More information about the drbd-user mailing list