Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I've set up a pair of identical servers with RAID arrays (8 cores, 16GB RAM, 12x2 TB RAID6), 3 10GigE interfaces, to host some highly available services. The systems are currently running Debian 7.9 Wheezy oldstable (because corosync/pacemaker are not available on 8.x stable nor testing). However I've tried with Jessie too, no dice. Local disk performance is about 900 MB/s write, 1600 MB/s read. network throughput between the machines is over 700MB/s. through iSCSI, each machine can write to the other's storage at more than 700 MB/s. However, no matter the way I configure DRBD, the throughput is limited to 100MB/s. It really looks like some hardcoded limit. I can reliably lower performance by tweaking the settings, but it never goes over 1Gbit (122MB/s are reached for a couple of seconds at a time). I'm really pulling my hair on this one. plain vanilla kernel 3.18.24 amd64 drbd 8.9.2~rc1-1~bpo70+1 The configuration is split in two files: global-common.conf: global { usage-count no; } common { handlers { } startup { } disk { on-io-error detach; # no-disk-flushes ; } net { max-epoch-size 8192; # tried some values, max-buffers 8192; # no real difference sndbuf-size 2097152;# or lower perf. } syncer { rate 4194304k; # changing this does # nothing much al-extents 6433; } } and cluster.res: resource rd0 { protocol C; on cl1 { device /dev/drbd0; disk /dev/sda4; address 192.168.42.1:7788; meta-disk internal; } on cl2 { device /dev/drbd0; disk /dev/sda4; address 192.168.42.2:7788; meta-disk internal; } } Output from cat /proc/drbd on slave : version: 8.4.5 (api:1/proto:86-101) srcversion: EDE19BAA3D4D4A0BEFD8CDE 0: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r----- ns:0 nr:4462592 dw:4462592 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:16489499884 [>....................] sync'ed: 0.1% (16103024/16107384)M finish: 49:20:03 speed: 92,828 (92,968) want: 102,400 K/sec Output from vmstat 2 on master (both machines are almost completely idle): procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 14952768 108712 446108 0 0 213 254 16 9 0 0 100 0 0 0 0 14952484 108712 446136 0 0 0 4 10063 1361 0 0 99 0 0 0 0 14952608 108712 446136 0 0 0 4 10057 1356 0 0 99 0 0 0 0 14952608 108720 446128 0 0 0 10 10063 1352 0 1 99 0 0 0 0 14951616 108720 446136 0 0 0 6 10175 1417 0 1 99 0 0 0 0 14951748 108720 446136 0 0 0 4 10172 1426 0 1 99 0 Output from iperf between the two servers: ------------------------------------------------------------ Client connecting to cl2, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.42.1 port 47900 connected with 192.168.42.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 6.87 GBytes 5.90 Gbits/sec Apparently initial synchronisation is supposed to be somewhat slow, but not this slow... Furthermore it doesn't really react to any attempt to throttle sync rate like drbdadm disk-options --resync-rate=800M all I've tried setting up an md mirror of the 2 volumes over iSCSI on these machines (cl1 using cl2 as a target), works just fine (mirror synchronizes in about 6 hours, performance is 80% of local). So there's obviously nothing wrong with the network and RAID stacks (even if the network throughput is somewhat low for some reason). -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac at intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------