[DRBD-user] DRBD terrible sync performance on 10GigE

Thu Dec 3 08:52:09 CET 2015

Hi Emmanuel,

Try the following settings in global_common.conf ?

common {
    disk {
        c-plan-ahead 0;
        resync-rate 800M;
    }
}

Regards,

Ben

> Le 2 déc. 2015 à 17:48, Emmanuel Florac <eflorac at intellique.com> a écrit :
> 
> 
> 
> I've set up a pair of identical servers with RAID arrays (8 cores, 16GB
> RAM, 12x2 TB RAID6), 3 10GigE interfaces, to host some highly available
> services.
> 
> The systems are currently running Debian 7.9 Wheezy oldstable (because
> corosync/pacemaker are not available on 8.x stable nor testing).
> However I've tried with Jessie too, no dice.
> 
> Local disk performance is about 900 MB/s write, 1600 MB/s read. network
> throughput between the machines is over 700MB/s. through iSCSI, each
> machine can write to the other's storage at more than 700 MB/s.
> 
> However, no matter the way I configure DRBD, the throughput is limited
> to 100MB/s. It really looks like some hardcoded limit. I can reliably
> lower performance by tweaking the settings, but it never goes over
> 1Gbit (122MB/s are reached for a couple of seconds at a time). I'm
> really pulling my hair on this one.
> 
>    plain vanilla kernel 3.18.24 amd64
>    drbd 8.9.2~rc1-1~bpo70+1
> 
> The configuration is split in two files: global-common.conf:
> 
> global {
>        usage-count no;
> }
> 
> common {
>        handlers {
>        }
> 
>        startup {
>        }
> 
>        disk {
>                on-io-error             detach;
>         #       no-disk-flushes ;
>        }
>        net {
>                max-epoch-size          8192;   # tried some values,
>                 max-buffers             8192;  # no real difference
>                sndbuf-size             2097152;# or lower perf.
>        }
>        syncer {
>                rate                    4194304k; # changing this does
> 						  # nothing much
>                al-extents              6433;     
>        }
> }
> 
> and cluster.res:
> 
> resource rd0 {
>        protocol C;
>        on cl1 {
>                device /dev/drbd0;
>                disk /dev/sda4;
>                address 192.168.42.1:7788;
>                meta-disk internal;
>        }
> 
>        on cl2 {
>                device /dev/drbd0;
>                disk /dev/sda4;
>                address 192.168.42.2:7788;
>                meta-disk internal;
>        }
> }
> 
> Output from cat /proc/drbd on slave :
> 
> version: 8.4.5 (api:1/proto:86-101)
> srcversion: EDE19BAA3D4D4A0BEFD8CDE 
> 0: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C
> r----- ns:0 nr:4462592 dw:4462592 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
> ep:1 wo:f oos:16489499884 [>....................] sync'ed:  0.1%
> (16103024/16107384)M finish: 49:20:03 speed: 92,828 (92,968) want:
> 102,400 K/sec
> 
> Output from vmstat 2 on master (both machines are almost completely
> idle):
> 
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> 0  0      0 14952768 108712 446108    0    0   213   254   16    9  0  0 100  0
> 0  0      0 14952484 108712 446136    0    0     0     4 10063 1361  0  0 99  0
> 0  0      0 14952608 108712 446136    0    0     0     4 10057 1356  0  0 99  0
> 0  0      0 14952608 108720 446128    0    0     0    10 10063 1352  0  1 99  0
> 0  0      0 14951616 108720 446136    0    0     0     6 10175 1417  0  1 99  0
> 0  0      0 14951748 108720 446136    0    0     0     4 10172 1426 0  1 99  0
> 
> 
> Output from iperf between the two servers:
> 
> ------------------------------------------------------------
> Client connecting to cl2, TCP port 5001
> TCP window size:  325 KByte (default)
> ------------------------------------------------------------
> [  3] local 192.168.42.1 port 47900 connected with 192.168.42.2 port
> 5001 [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  6.87 GBytes  5.90 Gbits/sec
> 
> Apparently initial synchronisation is supposed to be somewhat slow, but
> not this slow... Furthermore it doesn't really react to any attempt to
> throttle sync rate like 
> 
> drbdadm disk-options --resync-rate=800M all
> 
> I've tried setting up an md mirror of the 2 volumes over iSCSI on these
> machines (cl1 using cl2 as a target), works just fine (mirror
> synchronizes in about 6 hours, performance is 80% of local). So there's
> obviously nothing wrong with the network and RAID stacks (even if the
> network throughput is somewhat low for some reason).
> 
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac at intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user