Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi! I have a very strange performance issues with DRBD. I have two DRBD nodes - nasnode0 and nasnode1. They are running two raid 10 with 4 sata disks, and 1 raid10 with 6 sata disks. All volumes are exported via iSCSI (ietd) to client servers. Client server run Xen vm's. Now, my problem is that when I run bonnie++ inside of Xen domU (virtual machine), I can see that primary drbd node is around 10% of IO loaded, while second is 100%. That is slowing thing down. I tried changing the primary, and the same issue occured. I tried Protocol C, Protocol B, Protocol A. Network is not the problem because I have 4 intel giga NIC's in bonding mode 0 (round-robin), and I can see that nasnodes are utilizing all the interfaces, but load on them is around 1-5% That is very very dissapointing. When I run drbdadm disconnect all, and try the same tests ,they are 3-4 times faster. Both nasnodes are exact the same. Here are bonnie++ 1.03 from CentOS domU, with protocol C: Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % CP /sec %CP www 4G 9785 15 10497 3 4864 0 13761 8 19501 0 140.1 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec % CP /sec %CP 200:50:10/30 5656 58 73790 100 919 2 6894 98 71268 99 622 1 www,4G,9785,15,10497,3,4864,0,13761,8,19501,0,140.1,0,200:50:10/30,5656,58,73790,100,919,2,6894,98,71268,99,622,1 real 48m58.698s user 1m18.661s sys 1m33.550s And here is the same test with disconnected secondary: Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % CP /sec %CP www 4G 22342 34 20496 6 9473 1 13272 7 19206 0 125.0 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec % CP /sec %CP 200:50:10/30 7732 79 79734 98 8864 19 9447 91 67291 97 3148 8 www,4G,22342,34,20496,6,9473,1,13272,7,19206,0,125.0,0,200:50:10/30,7732,79,79734,98,8864,19,9447,91,67291,97,3148,8 real 26m45.466s user 1m15.281s sys 1m28.466s Now, I'm really puzzled... This is abnormally bad. I'm on CentOS 5.4, with drbd: version: 8.3.2 (api:88/proto:86-90) GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by mockbuild at v20z-x86-64.home.local, 2009-08-29 14:07:55 I've tried with very large al-extents, and no progress. Protocol A and Protocol B don't perform any better that the protocol C. Here is my drbd.conf: global { usage-count yes; } common { protocol A; syncer { rate 100M; verify-alg md5; al-extents 2377; } startup { wfc-timeout 0; degr-wfc-timeout 120; } disk { on-io-error detach; # call the outdate-peer handler, if primary and loose connection to secondary #fencing resource-only; } net { after-sb-0pri discard-zero-changes; after-sb-1pri consensus; after-sb-2pri violently-as0p; rr-conflict disconnect; } handlers { #pri-lost-after-sb #outdate-peer "/usr/lib/drbd/outdate-peer.sh"; } } # # RESOURCES # # resource for right raid controller (n[01]c0) resource controller0a { device /dev/drbd0; disk /dev/sdb; meta-disk internal; on nasnode1.company.lan { address 10.16.16.85:7790; } on nasnode0.company.lan { address 10.16.16.84:7790; } } # resource for left raid controller (n[01]c1), first raid10 over 4 disks resource controller1a { device /dev/drbd1; disk /dev/sdc; meta-disk internal; on nasnode1.company.lan { address 10.16.16.85:7791; } on nasnode0.company.lan { address 10.16.16.84:7791; } } # resource for left raid controller (n[01]c1), second raid10 over 4 disks resource controller1b { device /dev/drbd2; disk /dev/sdd; meta-disk internal; on nasnode1.company.lan { address 10.16.16.85:7792; } on nasnode0.company.lan { address 10.16.16.84:7792; } } # resource for systemdisks, RAID1 resource systemraid { device /dev/drbd3; disk /dev/sda4; meta-disk internal; on nasnode1.company.lan { address 10.16.16.85:7793; } on nasnode0.company.lan { address 10.16.16.84:7793; } } Where am I making mistake? :-/ -- | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | ================================================================= | |