[DRBD-user] Antwort: Re: Poor DRBD performance, HELP!

Tue Jun 21 16:31:56 CEST 2011

Just for the record I'm currently working on a system that achieves
1.2GB/sec, maxing out the 10Gbe connection, so DRBD can perform that
well.  Just have to tune it to do so.

Check out this section of the users guide:
http://www.drbd.org/users-guide/ch-throughput.html

Good luck,
Brian

On 06/21/2011 05:51 AM, Noah Mehl wrote:
> Yes,
>
> But I was getting the same performance with the nodes in
> Standalone/Primary.  Also, if the the lower level physical device, and
> the network link perform at 3x that rate, then what's the bottle neck?
>  Is this the kind of performance loss I should expect from DRBD?
>
> ~Noah
>
> On Jun 21, 2011, at 2:29 AM, <Robert.Koeppl at knapp.com
> <mailto:Robert.Koeppl at knapp.com>> <Robert.Koeppl at knapp.com
> <mailto:Robert.Koeppl at knapp.com>> wrote:
>
>>
>> Hi!
>> You are getting about 4 Gbit/s actual throughput, which is not that
>> bad, but could be better. 1,25 Gbyte/s would be the theoretical
>> maximum of your interlink without any overhead latency.
>> Mit freundlichen Grüßen / Best Regards
>>
>> Robert Köppl
>>
>> Systemadministration
>> *
>> KNAPP Systemintegration GmbH*
>> Waltenbachstraße 9
>> 8700 Leoben, Austria
>> Phone: +43 3842 805-910
>> Fax: +43 3842 82930-500
>> robert.koeppl at knapp.com <mailto:robert.koeppl at knapp.com>
>> www.KNAPP.com <http://www.KNAPP.com>
>>
>> Commercial register number: FN 138870x
>> Commercial register court: Leoben
>>
>> The information in this e-mail (including any attachment) is
>> confidential and intended to be for the use of the addressee(s) only.
>> If you have received the e-mail by mistake, any disclosure, copy,
>> distribution or use of the contents of the e-mail is prohibited, and
>> you must delete the e-mail from your system. As e-mail can be changed
>> electronically KNAPP assumes no responsibility for any alteration to
>> this e-mail or its attachments. KNAPP has taken every reasonable
>> precaution to ensure that any attachment to this e-mail has been
>> swept for virus. However, KNAPP does not accept any liability for
>> damage sustained as a result of such attachment being virus infected
>> and strongly recommend that you carry out your own virus check before
>> opening any attachment.
>>
>>
>> *Noah Mehl <noah at tritonlimited.com <mailto:noah at tritonlimited.com>>*
>> Gesendet von: drbd-user-bounces at lists.linbit.com
>> <mailto:drbd-user-bounces at lists.linbit.com>
>>
>> 21.06.2011 03:30
>>
>> 	
>> An
>> 	"drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>"
>> <drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>>
>> Kopie
>> 	
>> Thema
>> 	Re: [DRBD-user] Poor DRBD performance, HELP!
>>
>>
>>
>> 	
>>
>>
>>
>>
>>
>>
>> On Jun 20, 2011, at 6:06 AM, Cristian Mammoli - Apra Sistemi wrote:
>>
>> > On 06/20/2011 07:16 AM, Noah Mehl wrote:
>> >>
>> >>
>> >> On Jun 20, 2011, at 12:39 AM, Noah Mehl wrote:
>> >>
>> >>> On Jun 18, 2011, at 2:27 PM, Florian Haas wrote:
>> >>>
>> >>>> On 06/17/2011 05:04 PM, Noah Mehl wrote:
>> >>>>> Below is the script I ran to do the performance testing.  I
>> basically took the script from the user guide and removed the
>> oflag=direct,
>> >>>>
>> >>>> ... which means that dd wrote to your page cache (read: RAM). At
>> this
>> >>>> point, you started kidding yourself about your performance.
>> >>>
>> >>> I do have a question here:  the total size of the dd write was
>> 64GB, twice the amount of system RAM, does this still apply?
>> >>>
>> >>>>
>> >>>>> because when it was in there, it brought the performance down
>> to 26MB/s (not really my focus here, but maybe related?).
>> >>>>
>> >>>> "Related" doesn't begin to describe it.
>> >>>>
>> >>>> Rerun the tests with oflag=direct and then repost them.
>> >>>
>> >>> Florian,
>> >>>
>> >>> I apologize for posting again without seeing your reply.  I took
>> the script directly from the user guide:
>> >>>
>> >>> #!/bin/bash
>> >>> TEST_RESOURCE=r0
>> >>> TEST_DEVICE=$(drbdadm sh-dev $TEST_RESOURCE)
>> >>> TEST_LL_DEVICE=$(drbdadm sh-ll-dev $TEST_RESOURCE)
>> >>> drbdadm primary $TEST_RESOURCE
>> >>> for i in $(seq 5); do
>> >>>  dd if=/dev/zero of=$TEST_DEVICE bs=512M count=1 oflag=direct
>> >>> done
>> >>> drbdadm down $TEST_RESOURCE
>> >>> for i in $(seq 5); do
>> >>>  dd if=/dev/zero of=$TEST_LL_DEVICE bs=512M count=1 oflag=direct
>> >>> done
>> >>>
>> >>> Here are the results:
>> >>>
>> >>> 1+0 records in
>> >>> 1+0 records out
>> >>> 536870912 bytes (537 MB) copied, 0.911252 s, 589 MB/s
>> > [...]
>> >
>> > If your controller has a BBU change the write policy to writeback and
>> > disable flushes in your drbd.conf
>> >
>> > HTH
>> >
>> > --
>> > Cristian Mammoli
>> > APRA SISTEMI srl
>> > Via Brodolini,6 Jesi (AN)
>> > tel dir. +390731719822
>> >
>> > Web   www.apra.it <http://www.apra.it>
>> > e-mail  c.mammoli at apra.it <mailto:c.mammoli at apra.it>
>> > _______________________________________________
>> > drbd-user mailing list
>> > drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>> > http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>> After taking many users suggestions into play, here's where I am now.
>>  I've done the iperf between the machines:
>>
>> [root at storageb ~]# iperf -c 10.0.100.241
>> ------------------------------------------------------------
>> Client connecting to 10.0.100.241, TCP port 5001
>> TCP window size: 27.8 KByte (default)
>> ------------------------------------------------------------
>> [  3] local 10.0.100.242 port 57982 connected with 10.0.100.241 port 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0-10.0 sec  11.5 GBytes  9.86 Gbits/sec
>>
>> As you can see the network connectivity between the machines should
>> not be a bottleneck.  Unless I'm running the wrong test, or in the
>> wrong way.  Comments are definitely welcome here.
>>
>> I update my resource config to remove flushes because my controller
>> is set to writeback:
>>
>> # begin resource drbd0
>> resource r0 {
>>     protocol C;
>>
>>          disk {
>>                no-disk-flushes;
>>                no-md-flushes;
>>                }
>>
>>          startup {
>>                wfc-timeout 15;
>>                degr-wfc-timeout 60;
>>                }
>>
>>          net {
>>                allow-two-primaries;
>>                after-sb-0pri discard-zero-changes;
>>                after-sb-1pri discard-secondary;
>>                after-sb-2pri disconnect;
>>                }
>>          syncer {
>>          }
>>     on storagea {
>>               device /dev/drbd0;
>>               disk /dev/sda1;
>>               address 10.0.100.241:7788;
>>               meta-disk internal;
>>     }
>>     on storageb {
>>                device /dev/drbd0;
>>                disk /dev/sda1;
>>                address 10.0.100.242:7788;
>>                meta-disk internal;
>>     }
>> }
>>
>> I've connected and synced the other node:
>>
>> version: 8.3.8.1 (api:88/proto:86-94)
>> GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@,
>> 2011-05-21 19:18:16
>> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
>>    ns:1460706824 nr:0 dw:671088640 dr:2114869272 al:163840 bm:210874
>> lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
>>
>> I've update the test script to include the oflag=direct in dd.  Also,
>> I expanded the test writes to 64GB, twice the system ram, and 64
>> times the controller ram:
>>
>> #!/bin/bash
>> TEST_RESOURCE=r0
>> TEST_DEVICE=$(drbdadm sh-dev $TEST_RESOURCE)
>> TEST_LL_DEVICE=$(drbdadm sh-ll-dev $TEST_RESOURCE)
>> drbdadm primary $TEST_RESOURCE
>> for i in $(seq 5); do
>>  dd if=/dev/zero of=$TEST_DEVICE bs=1G count=64 oflag=direct
>> done
>> drbdadm down $TEST_RESOURCE
>> for i in $(seq 5); do
>>  dd if=/dev/zero of=$TEST_LL_DEVICE bs=1G count=64 oflag=direct
>> done
>>
>> And this is the result:
>>
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 152.376 s, 451 MB/s
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 148.863 s, 462 MB/s
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 152.587 s, 450 MB/s
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 152.661 s, 450 MB/s
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 148.099 s, 464 MB/s
>> 0: State change failed: (-12) Device is held open by someone
>> Command 'drbdsetup 0 down' terminated with exit code 11
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 52.5957 s, 1.3 GB/s
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 56.9315 s, 1.2 GB/s
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 57.5803 s, 1.2 GB/s
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 52.4276 s, 1.3 GB/s
>> 64+0 records in
>> 64+0 records out
>> 68719476736 bytes (69 GB) copied, 52.8235 s, 1.3 GB/s
>>
>> I'm getting a huge performance difference between the drbd resource
>> and the lower level device.  Is this what I should expect?
>>
>> ~Noah
>>
>>
>>
>> Scanned for viruses and content by the Tranet Spam Sentinel service.
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>> <ATT00001..txt>
>
>
>     
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user