Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hey everyone. We're having a problem with DRBD 8.4.1 in our Test Environment. Buffered Reads operate 10x slower than unbuffered reads (60 MB/s vs 900 MB/s) through DRBD, while both (buffered/direct) perform at full speed if done against the raid controller directly without DRBD. This situation is nearly identical to a previous posting to the list: http://lists.linbit.com/pipermail/drbd-user/2012-January/017634.html We have two high performance servers with 2x BBU raid controllers each and 24 disks each. Big Picture + Dirty Stats Host_A Host_B -------------------------------------------------------------------------- 12disks sda drbd_a (priamry) <10gigE> drbd_a (secondary) sda 12disks 12disks sdb drbd_b (secondary) <10gigE> drbd_b (primary) sdb 12disks Notes a) we've done these tests with and without the partner node connected; those below are unconnected b) recreating the drbd device does does not resolve the problem Hardware 2x LSI Logic / Symbios Logic MegaRAID SAS 2108 (MegaRAID SAS 9260-16i) root at host_a:~>uname -a Linux host_a 2.6.35.14 #1 SMP Mon Jan 23 22:12:58 UTC 2012 x86_64 GNU/Linux root at host_a:~>cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 root at host_a:~>drbdadm -V DRBDADM_BUILDTAG=GIT-hash:\ 91b4c048c1a0e06777b5f65d312b38d47abaea80\ build\ by\ root at host_a\,\ 2012-01-31\ 22:40:37 DRBDADM_API_VERSION=1 DRBD_KERNEL_VERSION_CODE=0x080401 DRBDADM_VERSION_CODE=0x080401 DRBDADM_VERSION=8.4.1 Clear Cache (run before every 'dd' test) root at host_a:~>echo 3 > /proc/sys/vm/drop_caches Buffered -------- 1. A Side A Resources slow: root at host_a:~>dd if=/dev/drbd_a of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 16.4465 s, 65.3 MB/s fast: root at host_a:~>dd if=/dev/sda of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 1.20017 s, 895 MB/s 2. A Side B Resources (After going Primary) slow: root at host_a:~>dd if=/dev/drbd_b of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 17.6546 s, 60.8 MB/s 3. B Side B Resources slow: root at host_b:~>dd if=/dev/drbd_b of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 16.8178 s, 63.8 MB/s fast: root at host_b:~>dd if=/dev/sdb of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 1.21691 s, 882 MB/s Direct Unbuffered ----------------- 1. A Side A Resources fast: root at host_a:~>dd if=/dev/drbd_a of=/dev/null bs=1M count=1024 iflag=direct 1073741824 bytes (1.1 GB) copied, 1.09098 s, 984 MB/s fast: root at host_a:~>dd if=/dev/sda of=/dev/null bs=1M count=1024 iflag=direct 1073741824 bytes (1.1 GB) copied, 1.07937 s, 995 MB/s 2. B Side B Resources fast: root at host_b:~>dd if=/dev/sdb of=/dev/null bs=1M count=1024 iflag=direct 1073741824 bytes (1.1 GB) copied, 1.23987 s, 866 MB/s fast: root at host_b:~>dd if=/dev/drbd_b of=/dev/null bs=1M count=1024 iflag=direct 1073741824 bytes (1.1 GB) copied, 1.33096 s, 807 MB/s As you can see, if we access the data through DRBD, we get terrible performance, some ~14x slower. If we access the raid device directly, or we force direct IO through DRBD, performance is fast and as expected. DRBD Config global { usage-count no ; } common { protocol C ; syncer { rate 4194304K ; } net { sndbuf-size 0 ; timeout 120 ; connect-int 20 ; ping-int 20 ; max-buffers 8192 ; max-epoch-size 8192 ; ko-count 30 ; cram-hmac-alg "sha1" ; shared-secret "some-password" ; } disk { disk-barrier no ; disk-flushes no ; md-flushes no ; al-extents 509 ; } } resource drbd_a { device /dev/drbd_a minor 0 ; disk /dev/sda ; meta-disk internal ; startup { become-primary-on host_a ; } on host_a { address 10.255.1.1:7788 ; } on host_b { address 10.255.1.2:7788 ; } } resource drbd_b { device /dev/drbd_b minor 1 ; disk /dev/sdb ; meta-disk internal ; startup { become-primary-on host_b ; } on host_a { address 10.255.2.1:7788 ; } on host_b { address 10.255.2.2:7788 ; } }