Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi!
I have a very strange performance issues with DRBD. I have two DRBD
nodes - nasnode0 and nasnode1. They are running two raid 10 with 4 sata
disks, and 1 raid10 with 6 sata disks. All volumes are exported via
iSCSI (ietd) to client servers. Client server run Xen vm's.
Now, my problem is that when I run bonnie++ inside of Xen domU (virtual
machine), I can see that primary drbd node is around 10% of IO loaded,
while second is 100%. That is slowing thing down. I tried changing the
primary, and the same issue occured. I tried Protocol C, Protocol B,
Protocol A.
Network is not the problem because I have 4 intel giga NIC's in bonding
mode 0 (round-robin), and I can see that nasnodes are utilizing all the
interfaces, but load on them is around 1-5%
That is very very dissapointing. When I run drbdadm disconnect all, and
try the same tests ,they are 3-4 times faster. Both nasnodes are exact
the same.
Here are bonnie++ 1.03 from CentOS domU, with protocol C:
Version 1.03 ------Sequential Output------ --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %
CP /sec %CP
www 4G 9785 15 10497 3 4864 0 13761 8 19501 0
140.1 0
------Sequential Create------ --------Random
Create--------
-Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %
CP /sec %CP
200:50:10/30 5656 58 73790 100 919 2 6894 98 71268 99
622 1
www,4G,9785,15,10497,3,4864,0,13761,8,19501,0,140.1,0,200:50:10/30,5656,58,73790,100,919,2,6894,98,71268,99,622,1
real 48m58.698s
user 1m18.661s
sys 1m33.550s
And here is the same test with disconnected secondary:
Version 1.03 ------Sequential Output------ --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %
CP /sec %CP
www 4G 22342 34 20496 6 9473 1 13272 7 19206 0
125.0 0
------Sequential Create------ --------Random
Create--------
-Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %
CP /sec %CP
200:50:10/30 7732 79 79734 98 8864 19 9447 91 67291 97
3148 8
www,4G,22342,34,20496,6,9473,1,13272,7,19206,0,125.0,0,200:50:10/30,7732,79,79734,98,8864,19,9447,91,67291,97,3148,8
real 26m45.466s
user 1m15.281s
sys 1m28.466s
Now, I'm really puzzled... This is abnormally bad.
I'm on CentOS 5.4, with drbd:
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
mockbuild at v20z-x86-64.home.local, 2009-08-29 14:07:55
I've tried with very large al-extents, and no progress. Protocol A and
Protocol B don't perform any better that the protocol C.
Here is my drbd.conf:
global {
usage-count yes;
}
common {
protocol A;
syncer {
rate 100M;
verify-alg md5;
al-extents 2377;
}
startup {
wfc-timeout 0;
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
# call the outdate-peer handler, if primary and loose connection
to secondary
#fencing resource-only;
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri consensus;
after-sb-2pri violently-as0p;
rr-conflict disconnect;
}
handlers {
#pri-lost-after-sb
#outdate-peer "/usr/lib/drbd/outdate-peer.sh";
}
}
#
# RESOURCES
#
# resource for right raid controller (n[01]c0)
resource controller0a {
device /dev/drbd0;
disk /dev/sdb;
meta-disk internal;
on nasnode1.company.lan {
address 10.16.16.85:7790;
}
on nasnode0.company.lan {
address 10.16.16.84:7790;
}
}
# resource for left raid controller (n[01]c1), first raid10 over 4 disks
resource controller1a {
device /dev/drbd1;
disk /dev/sdc;
meta-disk internal;
on nasnode1.company.lan {
address 10.16.16.85:7791;
}
on nasnode0.company.lan {
address 10.16.16.84:7791;
}
}
# resource for left raid controller (n[01]c1), second raid10 over 4
disks
resource controller1b {
device /dev/drbd2;
disk /dev/sdd;
meta-disk internal;
on nasnode1.company.lan {
address 10.16.16.85:7792;
}
on nasnode0.company.lan {
address 10.16.16.84:7792;
}
}
# resource for systemdisks, RAID1
resource systemraid {
device /dev/drbd3;
disk /dev/sda4;
meta-disk internal;
on nasnode1.company.lan {
address 10.16.16.85:7793;
}
on nasnode0.company.lan {
address 10.16.16.84:7793;
}
}
Where am I making mistake? :-/
--
| Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D |
=================================================================
| |