Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Dec 18, 2007 at 11:26:55AM -0800, Art Age Software wrote: > On Dec 18, 2007 10:13 AM, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > > please do one-node# ping -w 10 -f -s 4100 replication-link-ip-of-other-node and show me the output. > > Lars, > > Thanks for your help on this. Here is the output of the ping test. > > [node1 ~]$ ping -w 10 -f -s 4100 node2 > --- ping statistics --- > 46900 packets transmitted, 46899 received, 0% packet loss, time 10000ms > rtt min/avg/max/mdev = 0.159/0.184/20.047/0.154 ms, pipe 2, ipg/ewma > 0.213/0.193 ms > > [node2 ~]$ ping -w 10 -f -s 4100 node1 > --- ping statistics --- > 48061 packets transmitted, 48060 received, 0% packet loss, time 10001ms > rtt min/avg/max/mdev = 0.154/0.180/20.333/0.172 ms, pipe 2, ipg/ewma > 0.208/0.183 ms you have a very interessting maximum and a huge deviation there. but, lets use the 0.180 ms average rtt of 4k packets. averages from the dd commands below are drbd disconnected: 0.310 ms per 4k request drbd connected 1.170 ms per 4k request non-drbd 0.300 ms per 4k request I've also already seen non-drbd be slower than drbd-unconnected on the same hardware, there are funny effects in play. but they are close within 3%, this is expected. however your drbd-connected seems bad. from ping rtt and non-drbd numbers we'd expect that latency of drbd connected should be ~ 0.480 ms. your measurement indicates it is worse than this expectation by a factor of 2.5. in all setups I have tuned so far, the actual (measured) latency of drbd, and the rough estimate given by said ping and dd commands are very close. so I suspect your secondaries ("node2") io subsystem is slower. please verify. other than that, pinning of drbd related threads to one CPU, preferably the same where you pinned the NIC driver irq to, could help to reduce latency. On Tue, Dec 18, 2007 at 11:47:37AM -0800, Art Age Software wrote: > On Dec 18, 2007 10:41 AM, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > > On Tue, Dec 18, 2007 at 07:13:05PM +0100, Lars Ellenberg wrote: > > > please do > > > one-node# ping -w 10 -f -s 4100 replication-link-ip-of-other-node > > > and show me the output. > > > > also, > > > > 1) > > drbdadm disconnect you-resource-name > > drbd now StandAlone Primary/Unknown > > dd if=/dev/zero bs=4096 count=10000 of=/some/file/on/your/drbd oflag=dsync > > > > 2) > > drbdadm adjust all > > wait for the resync > > drbd now Connected Primary/Secondary > > dd if=/dev/zero bs=4096 count=10000 of=/some/file/on/your/drbd oflag=dsync > > > > 3) > > dd if=/dev/zero bs=4096 count=10000 of=/some/file/NOT/on/your/drbd oflag=dsync > > > > do each dd command several times, > > do this when nothing else happens on the box. > > > > the important part here is the "dsync" flag. > > if your dd does not know about that, upgrade. > > > > the dd bs=4096 count=10000 oflag=sync > > (many small requests, each single one of them synchronous by itself) > > is to give an idea for the average latency of one single write request, > > * with drbd disconnected > > * with drbd connected > > * without drbd (should be as close as possible to the lower level > > device of drbd, preferably on the same hardware) > > OK, here are the results: > > Test 1: DRBD Disconnected > > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 3.14315 seconds, 13.0 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 3.05737 seconds, 13.4 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 3.08115 seconds, 13.3 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 3.17052 seconds, 12.9 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 3.0727 seconds, 13.3 MB/s > > > Test 2: DRBD Connected > > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 11.8043 seconds, 3.5 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 11.9506 seconds, 3.4 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 12.2863 seconds, 3.3 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 11.203 seconds, 3.7 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/drbd/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 11.212 seconds, 3.7 MB/s > > > Test 3: Non-DRBD > > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 3.14307 seconds, 13.0 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 2.98458 seconds, 13.7 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 2.95751 seconds, 13.8 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 2.90936 seconds, 14.1 MB/s > [node1 ~]$ dd if=/dev/zero bs=4096 count=10000 of=/tmp/testfile oflag=dsync > 10000+0 records in > 10000+0 records out > 40960000 bytes (41 MB) copied, 3.04481 seconds, 13.5 MB/s > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -- : Lars Ellenberg http://www.linbit.com : : DRBD/HA support and consulting sales at linbit.com : : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : __ please use the "List-Reply" function of your email client.