Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hey everyone. We're having a problem with DRBD 8.4.1 in our Test Environment.
Buffered Reads operate 10x slower than unbuffered reads (60 MB/s vs 900 MB/s)
through DRBD, while both (buffered/direct) perform at full speed if done
against the raid controller directly without DRBD.
This situation is nearly identical to a previous posting to the list:
http://lists.linbit.com/pipermail/drbd-user/2012-January/017634.html
We have two high performance servers with 2x BBU raid controllers each and
24 disks each.
Big Picture + Dirty Stats
Host_A Host_B
--------------------------------------------------------------------------
12disks sda drbd_a (priamry) <10gigE> drbd_a (secondary) sda 12disks
12disks sdb drbd_b (secondary) <10gigE> drbd_b (primary) sdb 12disks
Notes
a) we've done these tests with and without the partner node connected; those
below are unconnected
b) recreating the drbd device does does not resolve the problem
Hardware
2x LSI Logic / Symbios Logic MegaRAID SAS 2108 (MegaRAID SAS 9260-16i)
root at host_a:~>uname -a
Linux host_a 2.6.35.14 #1 SMP Mon Jan 23 22:12:58 UTC 2012 x86_64 GNU/Linux
root at host_a:~>cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80
root at host_a:~>drbdadm -V
DRBDADM_BUILDTAG=GIT-hash:\ 91b4c048c1a0e06777b5f65d312b38d47abaea80\ build\ by\ root at host_a\,\
2012-01-31\ 22:40:37
DRBDADM_API_VERSION=1
DRBD_KERNEL_VERSION_CODE=0x080401
DRBDADM_VERSION_CODE=0x080401
DRBDADM_VERSION=8.4.1
Clear Cache (run before every 'dd' test)
root at host_a:~>echo 3 > /proc/sys/vm/drop_caches
Buffered
--------
1. A Side A Resources
slow:
root at host_a:~>dd if=/dev/drbd_a of=/dev/null bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 16.4465 s, 65.3 MB/s
fast:
root at host_a:~>dd if=/dev/sda of=/dev/null bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 1.20017 s, 895 MB/s
2. A Side B Resources (After going Primary)
slow:
root at host_a:~>dd if=/dev/drbd_b of=/dev/null bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 17.6546 s, 60.8 MB/s
3. B Side B Resources
slow:
root at host_b:~>dd if=/dev/drbd_b of=/dev/null bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 16.8178 s, 63.8 MB/s
fast:
root at host_b:~>dd if=/dev/sdb of=/dev/null bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 1.21691 s, 882 MB/s
Direct Unbuffered
-----------------
1. A Side A Resources
fast:
root at host_a:~>dd if=/dev/drbd_a of=/dev/null bs=1M count=1024 iflag=direct
1073741824 bytes (1.1 GB) copied, 1.09098 s, 984 MB/s
fast:
root at host_a:~>dd if=/dev/sda of=/dev/null bs=1M count=1024 iflag=direct
1073741824 bytes (1.1 GB) copied, 1.07937 s, 995 MB/s
2. B Side B Resources
fast:
root at host_b:~>dd if=/dev/sdb of=/dev/null bs=1M count=1024 iflag=direct
1073741824 bytes (1.1 GB) copied, 1.23987 s, 866 MB/s
fast:
root at host_b:~>dd if=/dev/drbd_b of=/dev/null bs=1M count=1024 iflag=direct
1073741824 bytes (1.1 GB) copied, 1.33096 s, 807 MB/s
As you can see, if we access the data through DRBD, we get terrible performance,
some ~14x slower.
If we access the raid device directly, or we force direct IO through DRBD,
performance is fast and as expected.
DRBD Config
global {
usage-count no ;
}
common {
protocol C ;
syncer {
rate 4194304K ;
}
net {
sndbuf-size 0 ;
timeout 120 ;
connect-int 20 ;
ping-int 20 ;
max-buffers 8192 ;
max-epoch-size 8192 ;
ko-count 30 ;
cram-hmac-alg "sha1" ;
shared-secret "some-password" ;
}
disk {
disk-barrier no ;
disk-flushes no ;
md-flushes no ;
al-extents 509 ;
}
}
resource drbd_a {
device /dev/drbd_a minor 0 ;
disk /dev/sda ;
meta-disk internal ;
startup {
become-primary-on host_a ;
}
on host_a {
address 10.255.1.1:7788 ;
}
on host_b {
address 10.255.1.2:7788 ;
}
}
resource drbd_b {
device /dev/drbd_b minor 1 ;
disk /dev/sdb ;
meta-disk internal ;
startup {
become-primary-on host_b ;
}
on host_a {
address 10.255.2.1:7788 ;
}
on host_b {
address 10.255.2.2:7788 ;
}
}