Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi again,
Problem solved. There is a setting called storsave for newer 3ware RAID-
controllers. This was set to "balanced" which - among others - provides
a write-journal for the disk-cache to prevent data-loss in case of power
failure. Setting it to "perform" boosted write performance from ~55MB/s
to ~155MB/s. Looking carefully at the ouput of atop and comparing the
writes/s to the backing device with the writes/s to the disk made me
feel that it must have something to do with raid-/disk-caching settings.
atop is a wonderful tool!
The manual http://www.3ware.com/support/UserDocs/UsrGuide-9.5.2.pdf has
more details on this setting.
I do wonder though why this performance bottleneck (storsave=balance)
only applies for writes on the DRBD-device. Writes on the local backing
device directly are fast (~300MB/s). Are different syncs or write-calls
used when writing locally or when DRBD writes to the disk of the
secondary node? Or is this due to the network-latency?
Warm regards and thanks for the good work!
Tom
On Wednesday 02 01 2013 17:51:22 Tom Fernandes wrote:
> Hi Florian,
>
> Thanks for your reply. I was out of office for some time so here's my
> observations...
>
> On Wednesday 02 01 2013 16:29:17 you wrote:
> > On Tue, Dec 18, 2012 at 10:58 AM, Tom Fernandes <anyaddress at gmx.net>
> > wrote:
> > > ------------------------------- DRBD
> > -----------------------------------------
> > > tom at hydra04 [1526]:~$ sudo drbdadm dump
> > > # /etc/drbd.conf
> > > common {
> > > protocol C;
> > > syncer {
> > > rate 150M;
> > > }
> > > }
> > >
> > > # resource leela on hydra04: not ignored, not stacked
> > > resource leela {
> > > on hydra04 {
> > > device minor 0;
> > > disk /dev/vg0/leela;
> > > address ipv4 10.0.0.1:7788;
> > > meta-disk internal;
> > > }
> > > on hydra05 {
> > > device minor 0;
> > > disk /dev/vg0/leela;
> > > address ipv4 10.0.0.2:7788;
> > > meta-disk internal;
> > > }
> > > }
> >
> > If that configuration is indeed "similar" to the one on the other
> > cluster (the one where you're apparently writing to DRBD at 200
> > MB/s), I'd be duly surprised. Indeed I'd consider it quite unlikely
> > for _any_ DRBD 8.3 cluster to hit that throughput unless you tweaked
> > at least al-extents, max-buffers and max-epoch-size, and possibly
> > also sndbuf-size and rcvbuf-size, and set no-disk-flushes and no-md-
> > flushes (assuming you run on flash or battery backed write cache).
>
> I compared the DRBD-configuration of the fast and the slow cluster
> again with drbdadm dump. They are the same. Both configurations have
> just the defaults. No modifications of the parameters you mentioned
> above.
> To be on the save side I re-ran the benchmarks with a 2048MB dd-file
> (as we have big RAID-caches). On the fast cluster I have 1024 flash-
> backed cache, on the slow cluster it's 512MB (without BBU). When doing
> the tests on the fast cluster I observed nr and dw in /proc/drbd on
> the secondary node to be sure, that the data is really getting synced.
> The fast cluster are HP-Servers. The slow cluster is different
> hardware (it's rented from our provider and may be no-name hardware).
> But they have the same amount of RAM, same number of threads, both SAS
> drives and both have a RAID6 configured.
>
> The fast cluster gives ~176MB/s write performance (not 200MB/s as I
> mentioned before - I wasn't accurate when I wrote that - sorry). The
> slow cluster gives ~55MB/s write performance. The speed on the slow
> cluster stays roughly the same, whether I use protocol C or A. On the
> fast cluster the speed increases from ~176MB/s to 187MB/s when
> switching from protocol C to protocol A.
>
> >
> > So I'd suggest that you refer back to your "fast" cluster and see if
> > perhaps you forgot to copy over your /etc/drbd.d/global_common.conf.
>
> I checked. Both configs are the same.
>
> >
> > You may also need to switch your I/O scheduler from cfq to deadline
> > on your backing devices, if you haven't already done so.
>
> I switched from cfq to dealine on the slow cluster. There was a
> performance increase from ~55MB/s to ~58MB/s.
>
> > And finally, for
> > a round-robin bonded network link, upping the
> > net.ipv4.tcp_reordering sysctl to perhaps 30 or so would also be
> > wise.
>
> I tried out setting it to 30 on the slow cluster but performance
> didn't really change.
>
> I did not feel it makes sense to tweak the DRBD-configuration on the
> slow cluster as the fast cluster has the same DRBD-configuration but
> gives more than 3x better performance.
>
> I'll try with 8.4 tomorrow. Let's see if that makes a difference.
>
> Is there any more information I can provide?
>
>
> warm regards,
>
>
> Tom Fernandes
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>