Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Brian,
I'm in the middle of tuning according to the user guide, and to Florian's blog post: http://fghaas.wordpress.com/2007/06/22/performance-tuning-drbd-setups/
To speed up my process, can you post what settings you used for:
1. al-extents
2. sndbuf-size
3. unplug-watermark
4. max-buffers
5. max-epoch-size
Also, what MTU are you using, the standard 1500, or 9000, or something else?
Thank you SOOOO much in advance!
~Noah
On Jun 21, 2011, at 10:31 AM, Brian R. Hellman wrote:
> Just for the record I'm currently working on a system that achieves
> 1.2GB/sec, maxing out the 10Gbe connection, so DRBD can perform that
> well. Just have to tune it to do so.
>
> Check out this section of the users guide:
> http://www.drbd.org/users-guide/ch-throughput.html
>
> Good luck,
> Brian
>
> On 06/21/2011 05:51 AM, Noah Mehl wrote:
>> Yes,
>>
>> But I was getting the same performance with the nodes in
>> Standalone/Primary. Also, if the the lower level physical device, and
>> the network link perform at 3x that rate, then what's the bottle neck?
>> Is this the kind of performance loss I should expect from DRBD?
>>
>> ~Noah
>>
>> On Jun 21, 2011, at 2:29 AM, <Robert.Koeppl at knapp.com
>> <mailto:Robert.Koeppl at knapp.com>> <Robert.Koeppl at knapp.com
>> <mailto:Robert.Koeppl at knapp.com>> wrote:
>>
>>>
>>> Hi!
>>> You are getting about 4 Gbit/s actual throughput, which is not that
>>> bad, but could be better. 1,25 Gbyte/s would be the theoretical
>>> maximum of your interlink without any overhead latency.
>>> Mit freundlichen Grüßen / Best Regards
>>>
>>> Robert Köppl
>>>
>>> Systemadministration
>>> *
>>> KNAPP Systemintegration GmbH*
>>> Waltenbachstraße 9
>>> 8700 Leoben, Austria
>>> Phone: +43 3842 805-910
>>> Fax: +43 3842 82930-500
>>> robert.koeppl at knapp.com <mailto:robert.koeppl at knapp.com>
>>> www.KNAPP.com <http://www.KNAPP.com>
>>>
>>> Commercial register number: FN 138870x
>>> Commercial register court: Leoben
>>>
>>> The information in this e-mail (including any attachment) is
>>> confidential and intended to be for the use of the addressee(s) only.
>>> If you have received the e-mail by mistake, any disclosure, copy,
>>> distribution or use of the contents of the e-mail is prohibited, and
>>> you must delete the e-mail from your system. As e-mail can be changed
>>> electronically KNAPP assumes no responsibility for any alteration to
>>> this e-mail or its attachments. KNAPP has taken every reasonable
>>> precaution to ensure that any attachment to this e-mail has been
>>> swept for virus. However, KNAPP does not accept any liability for
>>> damage sustained as a result of such attachment being virus infected
>>> and strongly recommend that you carry out your own virus check before
>>> opening any attachment.
>>>
>>>
>>> *Noah Mehl <noah at tritonlimited.com <mailto:noah at tritonlimited.com>>*
>>> Gesendet von: drbd-user-bounces at lists.linbit.com
>>> <mailto:drbd-user-bounces at lists.linbit.com>
>>>
>>> 21.06.2011 03:30
>>>
>>>
>>> An
>>> "drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>"
>>> <drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>>
>>> Kopie
>>>
>>> Thema
>>> Re: [DRBD-user] Poor DRBD performance, HELP!
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Jun 20, 2011, at 6:06 AM, Cristian Mammoli - Apra Sistemi wrote:
>>>
>>>> On 06/20/2011 07:16 AM, Noah Mehl wrote:
>>>>>
>>>>>
>>>>> On Jun 20, 2011, at 12:39 AM, Noah Mehl wrote:
>>>>>
>>>>>> On Jun 18, 2011, at 2:27 PM, Florian Haas wrote:
>>>>>>
>>>>>>> On 06/17/2011 05:04 PM, Noah Mehl wrote:
>>>>>>>> Below is the script I ran to do the performance testing. I
>>> basically took the script from the user guide and removed the
>>> oflag=direct,
>>>>>>>
>>>>>>> ... which means that dd wrote to your page cache (read: RAM). At
>>> this
>>>>>>> point, you started kidding yourself about your performance.
>>>>>>
>>>>>> I do have a question here: the total size of the dd write was
>>> 64GB, twice the amount of system RAM, does this still apply?
>>>>>>
>>>>>>>
>>>>>>>> because when it was in there, it brought the performance down
>>> to 26MB/s (not really my focus here, but maybe related?).
>>>>>>>
>>>>>>> "Related" doesn't begin to describe it.
>>>>>>>
>>>>>>> Rerun the tests with oflag=direct and then repost them.
>>>>>>
>>>>>> Florian,
>>>>>>
>>>>>> I apologize for posting again without seeing your reply. I took
>>> the script directly from the user guide:
>>>>>>
>>>>>> #!/bin/bash
>>>>>> TEST_RESOURCE=r0
>>>>>> TEST_DEVICE=$(drbdadm sh-dev $TEST_RESOURCE)
>>>>>> TEST_LL_DEVICE=$(drbdadm sh-ll-dev $TEST_RESOURCE)
>>>>>> drbdadm primary $TEST_RESOURCE
>>>>>> for i in $(seq 5); do
>>>>>> dd if=/dev/zero of=$TEST_DEVICE bs=512M count=1 oflag=direct
>>>>>> done
>>>>>> drbdadm down $TEST_RESOURCE
>>>>>> for i in $(seq 5); do
>>>>>> dd if=/dev/zero of=$TEST_LL_DEVICE bs=512M count=1 oflag=direct
>>>>>> done
>>>>>>
>>>>>> Here are the results:
>>>>>>
>>>>>> 1+0 records in
>>>>>> 1+0 records out
>>>>>> 536870912 bytes (537 MB) copied, 0.911252 s, 589 MB/s
>>>> [...]
>>>>
>>>> If your controller has a BBU change the write policy to writeback and
>>>> disable flushes in your drbd.conf
>>>>
>>>> HTH
>>>>
>>>> --
>>>> Cristian Mammoli
>>>> APRA SISTEMI srl
>>>> Via Brodolini,6 Jesi (AN)
>>>> tel dir. +390731719822
>>>>
>>>> Web www.apra.it <http://www.apra.it>
>>>> e-mail c.mammoli at apra.it <mailto:c.mammoli at apra.it>
>>>> _______________________________________________
>>>> drbd-user mailing list
>>>> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>
>>> After taking many users suggestions into play, here's where I am now.
>>> I've done the iperf between the machines:
>>>
>>> [root at storageb ~]# iperf -c 10.0.100.241
>>> ------------------------------------------------------------
>>> Client connecting to 10.0.100.241, TCP port 5001
>>> TCP window size: 27.8 KByte (default)
>>> ------------------------------------------------------------
>>> [ 3] local 10.0.100.242 port 57982 connected with 10.0.100.241 port 5001
>>> [ ID] Interval Transfer Bandwidth
>>> [ 3] 0.0-10.0 sec 11.5 GBytes 9.86 Gbits/sec
>>>
>>> As you can see the network connectivity between the machines should
>>> not be a bottleneck. Unless I'm running the wrong test, or in the
>>> wrong way. Comments are definitely welcome here.
>>>
>>> I update my resource config to remove flushes because my controller
>>> is set to writeback:
>>>
>>> # begin resource drbd0
>>> resource r0 {
>>> protocol C;
>>>
>>> disk {
>>> no-disk-flushes;
>>> no-md-flushes;
>>> }
>>>
>>> startup {
>>> wfc-timeout 15;
>>> degr-wfc-timeout 60;
>>> }
>>>
>>> net {
>>> allow-two-primaries;
>>> after-sb-0pri discard-zero-changes;
>>> after-sb-1pri discard-secondary;
>>> after-sb-2pri disconnect;
>>> }
>>> syncer {
>>> }
>>> on storagea {
>>> device /dev/drbd0;
>>> disk /dev/sda1;
>>> address 10.0.100.241:7788;
>>> meta-disk internal;
>>> }
>>> on storageb {
>>> device /dev/drbd0;
>>> disk /dev/sda1;
>>> address 10.0.100.242:7788;
>>> meta-disk internal;
>>> }
>>> }
>>>
>>> I've connected and synced the other node:
>>>
>>> version: 8.3.8.1 (api:88/proto:86-94)
>>> GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@,
>>> 2011-05-21 19:18:16
>>> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
>>> ns:1460706824 nr:0 dw:671088640 dr:2114869272 al:163840 bm:210874
>>> lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
>>>
>>> I've update the test script to include the oflag=direct in dd. Also,
>>> I expanded the test writes to 64GB, twice the system ram, and 64
>>> times the controller ram:
>>>
>>> #!/bin/bash
>>> TEST_RESOURCE=r0
>>> TEST_DEVICE=$(drbdadm sh-dev $TEST_RESOURCE)
>>> TEST_LL_DEVICE=$(drbdadm sh-ll-dev $TEST_RESOURCE)
>>> drbdadm primary $TEST_RESOURCE
>>> for i in $(seq 5); do
>>> dd if=/dev/zero of=$TEST_DEVICE bs=1G count=64 oflag=direct
>>> done
>>> drbdadm down $TEST_RESOURCE
>>> for i in $(seq 5); do
>>> dd if=/dev/zero of=$TEST_LL_DEVICE bs=1G count=64 oflag=direct
>>> done
>>>
>>> And this is the result:
>>>
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 152.376 s, 451 MB/s
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 148.863 s, 462 MB/s
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 152.587 s, 450 MB/s
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 152.661 s, 450 MB/s
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 148.099 s, 464 MB/s
>>> 0: State change failed: (-12) Device is held open by someone
>>> Command 'drbdsetup 0 down' terminated with exit code 11
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 52.5957 s, 1.3 GB/s
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 56.9315 s, 1.2 GB/s
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 57.5803 s, 1.2 GB/s
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 52.4276 s, 1.3 GB/s
>>> 64+0 records in
>>> 64+0 records out
>>> 68719476736 bytes (69 GB) copied, 52.8235 s, 1.3 GB/s
>>>
>>> I'm getting a huge performance difference between the drbd resource
>>> and the lower level device. Is this what I should expect?
>>>
>>> ~Noah
>>>
>>>
>>>
>>> Scanned for viruses and content by the Tranet Spam Sentinel service.
>>> _______________________________________________
>>> drbd-user mailing list
>>> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>
>>> <ATT00001..txt>
>>
>>
>>
>>
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user