Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Brian, I'm in the middle of tuning according to the user guide, and to Florian's blog post: http://fghaas.wordpress.com/2007/06/22/performance-tuning-drbd-setups/ To speed up my process, can you post what settings you used for: 1. al-extents 2. sndbuf-size 3. unplug-watermark 4. max-buffers 5. max-epoch-size Also, what MTU are you using, the standard 1500, or 9000, or something else? Thank you SOOOO much in advance! ~Noah On Jun 21, 2011, at 10:31 AM, Brian R. Hellman wrote: > Just for the record I'm currently working on a system that achieves > 1.2GB/sec, maxing out the 10Gbe connection, so DRBD can perform that > well. Just have to tune it to do so. > > Check out this section of the users guide: > http://www.drbd.org/users-guide/ch-throughput.html > > Good luck, > Brian > > On 06/21/2011 05:51 AM, Noah Mehl wrote: >> Yes, >> >> But I was getting the same performance with the nodes in >> Standalone/Primary. Also, if the the lower level physical device, and >> the network link perform at 3x that rate, then what's the bottle neck? >> Is this the kind of performance loss I should expect from DRBD? >> >> ~Noah >> >> On Jun 21, 2011, at 2:29 AM, <Robert.Koeppl at knapp.com >> <mailto:Robert.Koeppl at knapp.com>> <Robert.Koeppl at knapp.com >> <mailto:Robert.Koeppl at knapp.com>> wrote: >> >>> >>> Hi! >>> You are getting about 4 Gbit/s actual throughput, which is not that >>> bad, but could be better. 1,25 Gbyte/s would be the theoretical >>> maximum of your interlink without any overhead latency. >>> Mit freundlichen Grüßen / Best Regards >>> >>> Robert Köppl >>> >>> Systemadministration >>> * >>> KNAPP Systemintegration GmbH* >>> Waltenbachstraße 9 >>> 8700 Leoben, Austria >>> Phone: +43 3842 805-910 >>> Fax: +43 3842 82930-500 >>> robert.koeppl at knapp.com <mailto:robert.koeppl at knapp.com> >>> www.KNAPP.com <http://www.KNAPP.com> >>> >>> Commercial register number: FN 138870x >>> Commercial register court: Leoben >>> >>> The information in this e-mail (including any attachment) is >>> confidential and intended to be for the use of the addressee(s) only. >>> If you have received the e-mail by mistake, any disclosure, copy, >>> distribution or use of the contents of the e-mail is prohibited, and >>> you must delete the e-mail from your system. As e-mail can be changed >>> electronically KNAPP assumes no responsibility for any alteration to >>> this e-mail or its attachments. KNAPP has taken every reasonable >>> precaution to ensure that any attachment to this e-mail has been >>> swept for virus. However, KNAPP does not accept any liability for >>> damage sustained as a result of such attachment being virus infected >>> and strongly recommend that you carry out your own virus check before >>> opening any attachment. >>> >>> >>> *Noah Mehl <noah at tritonlimited.com <mailto:noah at tritonlimited.com>>* >>> Gesendet von: drbd-user-bounces at lists.linbit.com >>> <mailto:drbd-user-bounces at lists.linbit.com> >>> >>> 21.06.2011 03:30 >>> >>> >>> An >>> "drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>" >>> <drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>> >>> Kopie >>> >>> Thema >>> Re: [DRBD-user] Poor DRBD performance, HELP! >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Jun 20, 2011, at 6:06 AM, Cristian Mammoli - Apra Sistemi wrote: >>> >>>> On 06/20/2011 07:16 AM, Noah Mehl wrote: >>>>> >>>>> >>>>> On Jun 20, 2011, at 12:39 AM, Noah Mehl wrote: >>>>> >>>>>> On Jun 18, 2011, at 2:27 PM, Florian Haas wrote: >>>>>> >>>>>>> On 06/17/2011 05:04 PM, Noah Mehl wrote: >>>>>>>> Below is the script I ran to do the performance testing. I >>> basically took the script from the user guide and removed the >>> oflag=direct, >>>>>>> >>>>>>> ... which means that dd wrote to your page cache (read: RAM). At >>> this >>>>>>> point, you started kidding yourself about your performance. >>>>>> >>>>>> I do have a question here: the total size of the dd write was >>> 64GB, twice the amount of system RAM, does this still apply? >>>>>> >>>>>>> >>>>>>>> because when it was in there, it brought the performance down >>> to 26MB/s (not really my focus here, but maybe related?). >>>>>>> >>>>>>> "Related" doesn't begin to describe it. >>>>>>> >>>>>>> Rerun the tests with oflag=direct and then repost them. >>>>>> >>>>>> Florian, >>>>>> >>>>>> I apologize for posting again without seeing your reply. I took >>> the script directly from the user guide: >>>>>> >>>>>> #!/bin/bash >>>>>> TEST_RESOURCE=r0 >>>>>> TEST_DEVICE=$(drbdadm sh-dev $TEST_RESOURCE) >>>>>> TEST_LL_DEVICE=$(drbdadm sh-ll-dev $TEST_RESOURCE) >>>>>> drbdadm primary $TEST_RESOURCE >>>>>> for i in $(seq 5); do >>>>>> dd if=/dev/zero of=$TEST_DEVICE bs=512M count=1 oflag=direct >>>>>> done >>>>>> drbdadm down $TEST_RESOURCE >>>>>> for i in $(seq 5); do >>>>>> dd if=/dev/zero of=$TEST_LL_DEVICE bs=512M count=1 oflag=direct >>>>>> done >>>>>> >>>>>> Here are the results: >>>>>> >>>>>> 1+0 records in >>>>>> 1+0 records out >>>>>> 536870912 bytes (537 MB) copied, 0.911252 s, 589 MB/s >>>> [...] >>>> >>>> If your controller has a BBU change the write policy to writeback and >>>> disable flushes in your drbd.conf >>>> >>>> HTH >>>> >>>> -- >>>> Cristian Mammoli >>>> APRA SISTEMI srl >>>> Via Brodolini,6 Jesi (AN) >>>> tel dir. +390731719822 >>>> >>>> Web www.apra.it <http://www.apra.it> >>>> e-mail c.mammoli at apra.it <mailto:c.mammoli at apra.it> >>>> _______________________________________________ >>>> drbd-user mailing list >>>> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com> >>>> http://lists.linbit.com/mailman/listinfo/drbd-user >>> >>> After taking many users suggestions into play, here's where I am now. >>> I've done the iperf between the machines: >>> >>> [root at storageb ~]# iperf -c 10.0.100.241 >>> ------------------------------------------------------------ >>> Client connecting to 10.0.100.241, TCP port 5001 >>> TCP window size: 27.8 KByte (default) >>> ------------------------------------------------------------ >>> [ 3] local 10.0.100.242 port 57982 connected with 10.0.100.241 port 5001 >>> [ ID] Interval Transfer Bandwidth >>> [ 3] 0.0-10.0 sec 11.5 GBytes 9.86 Gbits/sec >>> >>> As you can see the network connectivity between the machines should >>> not be a bottleneck. Unless I'm running the wrong test, or in the >>> wrong way. Comments are definitely welcome here. >>> >>> I update my resource config to remove flushes because my controller >>> is set to writeback: >>> >>> # begin resource drbd0 >>> resource r0 { >>> protocol C; >>> >>> disk { >>> no-disk-flushes; >>> no-md-flushes; >>> } >>> >>> startup { >>> wfc-timeout 15; >>> degr-wfc-timeout 60; >>> } >>> >>> net { >>> allow-two-primaries; >>> after-sb-0pri discard-zero-changes; >>> after-sb-1pri discard-secondary; >>> after-sb-2pri disconnect; >>> } >>> syncer { >>> } >>> on storagea { >>> device /dev/drbd0; >>> disk /dev/sda1; >>> address 10.0.100.241:7788; >>> meta-disk internal; >>> } >>> on storageb { >>> device /dev/drbd0; >>> disk /dev/sda1; >>> address 10.0.100.242:7788; >>> meta-disk internal; >>> } >>> } >>> >>> I've connected and synced the other node: >>> >>> version: 8.3.8.1 (api:88/proto:86-94) >>> GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, >>> 2011-05-21 19:18:16 >>> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- >>> ns:1460706824 nr:0 dw:671088640 dr:2114869272 al:163840 bm:210874 >>> lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 >>> >>> I've update the test script to include the oflag=direct in dd. Also, >>> I expanded the test writes to 64GB, twice the system ram, and 64 >>> times the controller ram: >>> >>> #!/bin/bash >>> TEST_RESOURCE=r0 >>> TEST_DEVICE=$(drbdadm sh-dev $TEST_RESOURCE) >>> TEST_LL_DEVICE=$(drbdadm sh-ll-dev $TEST_RESOURCE) >>> drbdadm primary $TEST_RESOURCE >>> for i in $(seq 5); do >>> dd if=/dev/zero of=$TEST_DEVICE bs=1G count=64 oflag=direct >>> done >>> drbdadm down $TEST_RESOURCE >>> for i in $(seq 5); do >>> dd if=/dev/zero of=$TEST_LL_DEVICE bs=1G count=64 oflag=direct >>> done >>> >>> And this is the result: >>> >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 152.376 s, 451 MB/s >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 148.863 s, 462 MB/s >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 152.587 s, 450 MB/s >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 152.661 s, 450 MB/s >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 148.099 s, 464 MB/s >>> 0: State change failed: (-12) Device is held open by someone >>> Command 'drbdsetup 0 down' terminated with exit code 11 >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 52.5957 s, 1.3 GB/s >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 56.9315 s, 1.2 GB/s >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 57.5803 s, 1.2 GB/s >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 52.4276 s, 1.3 GB/s >>> 64+0 records in >>> 64+0 records out >>> 68719476736 bytes (69 GB) copied, 52.8235 s, 1.3 GB/s >>> >>> I'm getting a huge performance difference between the drbd resource >>> and the lower level device. Is this what I should expect? >>> >>> ~Noah >>> >>> >>> >>> Scanned for viruses and content by the Tranet Spam Sentinel service. >>> _______________________________________________ >>> drbd-user mailing list >>> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com> >>> http://lists.linbit.com/mailman/listinfo/drbd-user >>> >>> <ATT00001..txt> >> >> >> >> >> >> _______________________________________________ >> drbd-user mailing list >> drbd-user at lists.linbit.com >> http://lists.linbit.com/mailman/listinfo/drbd-user > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user