Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Jun 20, 2011, at 6:06 AM, Cristian Mammoli - Apra Sistemi wrote: > On 06/20/2011 07:16 AM, Noah Mehl wrote: >> >> >> On Jun 20, 2011, at 12:39 AM, Noah Mehl wrote: >> >>> On Jun 18, 2011, at 2:27 PM, Florian Haas wrote: >>> >>>> On 06/17/2011 05:04 PM, Noah Mehl wrote: >>>>> Below is the script I ran to do the performance testing. I basically took the script from the user guide and removed the oflag=direct, >>>> >>>> ... which means that dd wrote to your page cache (read: RAM). At this >>>> point, you started kidding yourself about your performance. >>> >>> I do have a question here: the total size of the dd write was 64GB, twice the amount of system RAM, does this still apply? >>> >>>> >>>>> because when it was in there, it brought the performance down to 26MB/s (not really my focus here, but maybe related?). >>>> >>>> "Related" doesn't begin to describe it. >>>> >>>> Rerun the tests with oflag=direct and then repost them. >>> >>> Florian, >>> >>> I apologize for posting again without seeing your reply. I took the script directly from the user guide: >>> >>> #!/bin/bash >>> TEST_RESOURCE=r0 >>> TEST_DEVICE=$(drbdadm sh-dev $TEST_RESOURCE) >>> TEST_LL_DEVICE=$(drbdadm sh-ll-dev $TEST_RESOURCE) >>> drbdadm primary $TEST_RESOURCE >>> for i in $(seq 5); do >>> dd if=/dev/zero of=$TEST_DEVICE bs=512M count=1 oflag=direct >>> done >>> drbdadm down $TEST_RESOURCE >>> for i in $(seq 5); do >>> dd if=/dev/zero of=$TEST_LL_DEVICE bs=512M count=1 oflag=direct >>> done >>> >>> Here are the results: >>> >>> 1+0 records in >>> 1+0 records out >>> 536870912 bytes (537 MB) copied, 0.911252 s, 589 MB/s > [...] > > If your controller has a BBU change the write policy to writeback and > disable flushes in your drbd.conf > > HTH > > -- > Cristian Mammoli > APRA SISTEMI srl > Via Brodolini,6 Jesi (AN) > tel dir. +390731719822 > > Web www.apra.it > e-mail c.mammoli at apra.it > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user After taking many users suggestions into play, here's where I am now. I've done the iperf between the machines: [root at storageb ~]# iperf -c 10.0.100.241 ------------------------------------------------------------ Client connecting to 10.0.100.241, TCP port 5001 TCP window size: 27.8 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.100.242 port 57982 connected with 10.0.100.241 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 11.5 GBytes 9.86 Gbits/sec As you can see the network connectivity between the machines should not be a bottleneck. Unless I'm running the wrong test, or in the wrong way. Comments are definitely welcome here. I update my resource config to remove flushes because my controller is set to writeback: # begin resource drbd0 resource r0 { protocol C; disk { no-disk-flushes; no-md-flushes; } startup { wfc-timeout 15; degr-wfc-timeout 60; } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } syncer { } on storagea { device /dev/drbd0; disk /dev/sda1; address 10.0.100.241:7788; meta-disk internal; } on storageb { device /dev/drbd0; disk /dev/sda1; address 10.0.100.242:7788; meta-disk internal; } } I've connected and synced the other node: version: 8.3.8.1 (api:88/proto:86-94) GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, 2011-05-21 19:18:16 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:1460706824 nr:0 dw:671088640 dr:2114869272 al:163840 bm:210874 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 I've update the test script to include the oflag=direct in dd. Also, I expanded the test writes to 64GB, twice the system ram, and 64 times the controller ram: #!/bin/bash TEST_RESOURCE=r0 TEST_DEVICE=$(drbdadm sh-dev $TEST_RESOURCE) TEST_LL_DEVICE=$(drbdadm sh-ll-dev $TEST_RESOURCE) drbdadm primary $TEST_RESOURCE for i in $(seq 5); do dd if=/dev/zero of=$TEST_DEVICE bs=1G count=64 oflag=direct done drbdadm down $TEST_RESOURCE for i in $(seq 5); do dd if=/dev/zero of=$TEST_LL_DEVICE bs=1G count=64 oflag=direct done And this is the result: 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 152.376 s, 451 MB/s 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 148.863 s, 462 MB/s 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 152.587 s, 450 MB/s 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 152.661 s, 450 MB/s 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 148.099 s, 464 MB/s 0: State change failed: (-12) Device is held open by someone Command 'drbdsetup 0 down' terminated with exit code 11 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 52.5957 s, 1.3 GB/s 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 56.9315 s, 1.2 GB/s 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 57.5803 s, 1.2 GB/s 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 52.4276 s, 1.3 GB/s 64+0 records in 64+0 records out 68719476736 bytes (69 GB) copied, 52.8235 s, 1.3 GB/s I'm getting a huge performance difference between the drbd resource and the lower level device. Is this what I should expect? ~Noah Scanned for viruses and content by the Tranet Spam Sentinel service.