[DRBD-user] Slow write performance

Sat Jul 2 10:02:51 CEST 2011

hi

On Fri, 2011-07-01 at 16:44 -0500, Zev Weiss wrote:
>
> (Re-sending since my prior attempt via gmane doesn't appear to have  
> worked;
> apologies if this is duplicated.)
> 
> Hi,
> 
> I'm seeing similar problems with massive underperformance on writes on  
> my
> system.  I'm running locally-compiled DRBD [version: 8.3.7 (api:88/ 
> proto:86-91),
> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917] on RHEL 5.  Read  
> performance
> is fine, but I'm getting write throughput of about 4MB/s, with latency  
> around
> 13ms (as measured with a script similar to the one in the user's guide).
> 
> I haven't yet tried tuning various parameters in drbd.conf as  
> described in the
> available performance-optimization guides, but it's so slow I have to  
> think
> there must be some more fundamental problem at play here than a lack of
> optimization (i.e. the defaults shouldn't be *that* bad).
> 
> And for what it's worth, I've ruled out the network link as a possible
> bottleneck -- it's giving me 1.97Gbps throughput in both directions  
> according to
> iperf (a bonded pair of back-to-back GbE ports, 9k MTU).
> 
> Anyone have any suggestions or advice?
> 

we had similar problems,  but recently i found a setup which
gives good write performance.  the problem i have is that i don't
know exactly what was wrong with my initial setup and which config
change gave the performance boost.

we have ext4 cluster resources on cluster lvm and
kvm virtual machines directly on drbd device (qemu-kvm ..,cache=none)

dd below was done inside virtual machine (2gb ram und 2 virtual cores)
on ext4 filesystem /var/spool/imap.  as soon as i fire up the dd the
replication runs at maximum network speed (about 119M).

[root at cmail imap]# dd if=/dev/zero of=test.img bs=1024k count=10k
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 82.1332 seconds, 131 MB/s

[root at cmail imap]# dd if=/dev/zero of=test.img bs=1024k count=1k oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.9007 seconds, 98.5 MB/s

tests below run on ext4 filesystem which is ontop of cluster lvm volume
/dev/mapper/vg_cdata-data on /data type ext4 (rw,noatime,nodiratime)

[root at cnode1 data]# dd if=/dev/zero of=test.img bs=1024k count=1k oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.3417 seconds, 104 MB/s

[root at cnode1 data]# dd if=/dev/zero of=test.img bs=64k count=10k oflag=direct
10240+0 records in
10240+0 records out
671088640 bytes (671 MB) copied, 9.44331 seconds, 71.1 MB/s

as u see the 1gb interconnect is pretty busy when using such dd.
i expect a write performance about 300MB/s or more when we get a
pair of 10gb cards dedicated for drbd.

below you can see the current drbd and sysctl settings.
from our initial setup we changed hardware raid5 to raid50
which doubles the raw write performance.

also changed sndbuf-size from fixed 512k to 0 (auto tuning)
and sysctl tcp tuning settings, can't remember initial settings.

but we also had one hardware problem (raid controller battery)
the raid controller write cache was disabled.  raw raid50
write performance was only about 175MB/s on one node,
after battery was replaced it is about 450MB/s

so this is my first drbd cluster installation which we
started 2 weeks ago.  i am sure drbd needs a little more
settings to handle problems but performance seems good.

br
ulrich

cluster hardware:
-----------------
  pair of hp dl380 g7
  singe xeon E5649 (6-core cpu)
  12gb ram
  6x300gb raid50 (about 450mb/s write performance)
  1gb cluster interconnect (will be replaced by 10gb)

cluster software:
------------------
  centos 5.6 x86_64
  drbd83 from centos extras repository (drbd83-8.3.8-1.el5.centos)
  rhel cluster suite

drbd performance settings:
--------------------------

/etc/drbd.d/global_common.conf
...
common {
...
        disk {
                no-disk-barrier;
                no-disk-flushes;
                on-io-error detach;
        }
        net {
                max-buffers 8000;
                max-epoch-size 8000;
                sndbuf-size 0;
        }
        syncer {
                rate      50M;
                al-extents 3389;
                verify-alg md5;
        }
...
}

drbd device which runs kvm virtual machine (/etc/drbd.d/cmail.res):
-------------------------------------------------------------------
resource cmail {
  device    /dev/drbd2;
  meta-disk internal;
  on cnode1.obvsg.at {
    disk      /dev/vg_cnode1/cmail;
    address   10.0.0.161:7791;
  }
  on cnode2.obvsg.at {
    disk      /dev/vg_cnode2/cmail;
    address   10.0.0.162:7791;
  }
  net {
    allow-two-primaries;
  }
  startup {
    become-primary-on both;
  }
}

cat >> /etc/sysctl.conf <<EOF

# http://fasterdata.es.net/fasterdata/host-tuning/linux/
# increase TCP max buffer size setable using setsockopt()
# 16 MB with a few parallel streams is recommended for most 10G paths
# 32 MB might be needed for some very long end-to-end 10G or 40G paths
net.core.rmem_max = 16777216 
net.core.wmem_max = 16777216 
# increase Linux autotuning TCP buffer limits 
# min, default, and max number of bytes to use
# (only change the 3rd value, and make it 16 MB or more)
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# recommended to increase this for 10G NICS
net.core.netdev_max_backlog = 30000

# Disable netfilter on bridges.
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

# reduce water levels to start marketing background (and foreground) 
# write back early. Reduces the chance of resource starvation. 
vm.dirty_ratio = 10
vm.dirty_background_ratio = 3
EOF

-- 
Ulrich Leodolter <ulrich.leodolter at obvsg.at>