[DRBD-user] DRBD performance problems!

Joeri Casteels joeri.casteels at intec.ugent.be
Tue Feb 11 09:42:04 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi All,

I’m having serieus problems setting up a new DRBD cluster. At the moment i have 2 clusters in production without any problems for around 2 years now.
Debian Squeeze 6 + drbd 8.3.11 running at a nice speed of 800MB/s with 16x SATA 1TB disks HW raid 6 :-) and 10G NIC’s 
No partitions on the DRBD device it’s raw block device.

Bought some new hardware: 
2x Xeon 6core CPU’s with HT - 16GB ram - Areca 1882i - 24x 600GB SAS 10k HDD’s - 10G NIC's

Started of with Debian Wheezy and DRBD 8.3.13 but performance was really bad +- 400MB/s with DRBD primary/secondary while local raw speed of /dev/sdb = 1.5GB/s
Then went on with Ubuntu 13.10 and DRBD 8.4.3 same deal. Then installed with the latest stable DRBD 8.4.4 but no difference.

I can cranck up the initial sync to 750MB/s when playing with the c-rate’s (still not a good performance) while after the initial sync it drops back to 400MB/s 

I tested all the different performance settings on everything! I mean local block tuning, NIC tuning, DRBD tuning still i can’t get it to perform well. It’s worse then my production DRBD with SATA drives on 7200rpm!

Current config:

root at vstore7:~# drbdsetup /dev/drbd0 show
disk {
	size            	0s _is_default; # bytes
	on-io-error     	detach;
	fencing         	dont-care _is_default;
	no-disk-flushes ;
	max-bio-bvecs   	0 _is_default;
}
net {
	timeout         	60 _is_default; # 1/10 seconds
	max-epoch-size  	8000;
	max-buffers     	8000;
	unplug-watermark	128 _is_default;
	connect-int     	10 _is_default; # seconds
	ping-int        	10 _is_default; # seconds
	sndbuf-size     	2097152; # bytes
	rcvbuf-size     	0 _is_default; # bytes
	ko-count        	0 _is_default;
	after-sb-0pri   	disconnect _is_default;
	after-sb-1pri   	disconnect _is_default;
	after-sb-2pri   	disconnect _is_default;
	rr-conflict     	disconnect _is_default;
	ping-timeout    	5 _is_default; # 1/10 seconds
	on-congestion   	block _is_default;
	congestion-fill 	0s _is_default; # byte
	congestion-extents	127 _is_default;
}
syncer {
	rate            	1228800k; # bytes/second
	after           	-1 _is_default;
	al-extents      	3389;
	on-no-data-accessible	io-error _is_default;
	c-plan-ahead    	0 _is_default; # 1/10 seconds
	c-delay-target  	10 _is_default; # 1/10 seconds
	c-fill-target   	0s _is_default; # bytes
	c-max-rate      	102400k _is_default; # bytes/second
	c-min-rate      	4096k _is_default; # bytes/second
}
protocol C;
_this_host {
	device			minor 0;
	disk			"/dev/sdb";
	meta-disk		internal;
	address			ipv4 172.16.252.7:7789;
}
_remote_host {
	address			ipv4 172.16.252.8:7789;
}


Here are some test results of iperf and dd so you see my local speeds and network speeds aren’t the problem. The final setup will also be a bond of 10G cards but i took that out for troubleshooting.

root at vstore7:~# iperf -c 172.16.252.8 -t 200 -i 2
------------------------------------------------------------
Client connecting to 172.16.252.8, TCP port 5001
TCP window size: 96.7 KByte (default)
------------------------------------------------------------
[  3] local 172.16.252.7 port 35149 connected with 172.16.252.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 2.0 sec  2.32 GBytes  9.96 Gbits/sec
[  3]  2.0- 4.0 sec  2.31 GBytes  9.93 Gbits/sec
[  3]  4.0- 6.0 sec  2.31 GBytes  9.93 Gbits/sec
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  99   0   0   0|  96k  246k|   0     0 |   0     0 |9227   186 
  0   3  96   0   0   1|   0     0 |3692k 1190M|   0     0 |  93k  252 
  0   3  96   0   0   1|   0     0 |3695k 1189M|   0     0 |  94k  252 
  0   3  96   0   0   1|   0     0 |3697k 1189M|   0     0 |  94k  268 

root at vstore7:~# dd if=/dev/zero of=/dev/sdb bs=1M count=100000 
100000+0 records in
100000+0 records out
104857600000 bytes (105 GB) copied, 98.921 s, 1.1 GB/s
root at vstore8:~# dd if=/dev/zero of=/dev/sdb bs=1M count=100000 
100000+0 records in
100000+0 records out
104857600000 bytes (105 GB) copied, 98.3511 s, 1.1 GB/s

root at vstore7:~# dd if=/dev/zero of=/dev/sdb bs=1M count=100000 oflag=direct
100000+0 records in
100000+0 records out
104857600000 bytes (105 GB) copied, 59.5348 s, 1.8 GB/s
root at vstore8:~# dd if=/dev/zero of=/dev/sdb bs=1M count=100000 oflag=direct
100000+0 records in
100000+0 records out
104857600000 bytes (105 GB) copied, 59.3226 s, 1.8 GB/s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drbd_fsio_test.rtf
Type: text/rtf
Size: 7249 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140211/c8c017c1/attachment.bin>
-------------- next part --------------




PLEASE SOMEONE TELL ME WHAT I MISS!

Tested DRBD settings:
On DISKS: 
	block/sdb/queue/scheduler = deadline
	block/sdb/queue/iosched/front_merges = 0
	block/sdb/queue/iosched/read_expire = 150
	block/sdb/queue/iosched/write_expire = 1500
On NIC:
	net.core.rmem_max = 33554432
	net.core.wmem_max = 33554432
	net.ipv4.tcp_rmem = 4096 87380 33554432
	net.ipv4.tcp_wmem = 4096 65536 33554432
	net.core.netdev_max_backlog = 30000
	net.ipv4.tcp_window_scaling = 1
on DRBD:
global {
        usage-count no;
        # minor-count dialog-refresh disable-ip-verification
}

common {
        handlers {
                pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
                pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
                local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
        }

        startup {
                wfc-timeout 240;
                degr-wfc-timeout 120;
		outdated-wfc-timeout 120;
        }

        disk {
                on-io-error   detach;
		disk-barrier no;
                disk-flushes no;
	        md-flushes no;
		c-plan-ahead 20; tested different ranges 100 and 200 as well
                 c-fill-target 2M; tested different settings
                 c-min-rate 1M; tested different settings
		c-max-rate 1500M; tested different settings
		resync-rate 1500M; doesn’t do anything when c-rate disabled
		al-extents 3389; 
        }

        net {
		protocol C;
                 sndbuf-size 2M; tested different settings 
		# rcvbuf-size 2M; tested different settings
                # ping-int 4;
                # timeout 30;
		max-buffers 8000; tested between and 131072
		max-epoch-size 8000;
                 # unplug-watermark 8000; tested from 16 and higher
		use-rle;
		# verify-alg md5;
        }
}



Mvg,

Joeri
-- 
Joeri Casteels

Department of Information Technology
Internet Based Communication Networks and Services (IBCN)
Ghent University - iMinds
Gaston Crommenlaan 8 (Bus 201), B-9050 Gent, Belgium
T: +32 9 33 14964
T Secretariaat: +32 9 33 14900
F: +32 9 33 14899
E: joeri.casteels at intec.UGent.be
W : www.ibcn.intec.UGent.be



More information about the drbd-user mailing list