Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
>> On Thursday 02 September 2004 21:36, Christian Garling wrote: >>> Hello people, >>> >>> i have a problem with synchronisation. i have 2 servers with escalade >>> 7506-4P raid controller and three maxtor baracuda 160gb harddisks. they >>> are connected through two intel eepro 1000 gigabit ethernet cards >>> (bonding >>> mode 0). when i comment out the rate setting in drbd.conf the initial >>> sync >>> runs with the default value of 250kb/s. but when i use the rate setting >>> it >>> only runs with about 12kb/s. i tested the connection with iptraf monitor >>> and it seems everything to be ok. here is my actually configuration. its >>> very basicly at the moment. >>> >>> resource r0 { >>> protocol C; >>> incon-degr-cmd "halt -f"; >>> >>> startup { >>> wfc-timeout 0; >>> degr-wfc-timeout 120; >>> } >>> >>> disk { >>> on-io-error panic; >>> } >>> >>> net { >>> timeout 60; # 6 seconds (unit = 0.1 seconds) >>> connect-int 10; # 10 seconds (unit = 1 second) >>> ping-int 10; # 10 seconds (unit = 1 second) >>> ko-count 5; >>> on-disconnect reconnect; >>> } >>> >>> syncer { >>> rate 10M; >>> } >>> >>> on node01 { >>> device /dev/drbd0; >>> disk /dev/sda1; >>> address 10.0.0.10:7788; >>> meta-disk internal; >>> } >>> >>> on node02 { >>> device /dev/drbd0; >>> disk /dev/sda1; >>> address 10.0.0.20:7788; >>> meta-disk internal; >>> } >>> } >>> >> >> I would really onjoy to see such a cluster in real live (I mean one >> that does only 12kb/s). >> >> I installed a csync2/DRBD/heartbeat cluster yesterday. The two boxes >> had 3Ware Escalades 9xxxx Controllers. >> >> We did some primary crash simulation test and had 20MB/sec resync >> simultaniously on two drbd-resources resyncing in parallel. >> (rate was set to 20M. Probabely these machines would do even more) >> >> Here the usual questions: >> Have you tested the bandwith of your network link ? How ? What numbers >> do you get ? >> Have you tested the bandwith of your disks ? How ? What numbers >> do you get ? >> Which kernel ? Which drbd release ? Hardware ? >> Have you tested without bonding ? >> Are you using jumbo-frames ? MTU ? >> >> -Philipp >> -- >> : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : >> : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : >> : Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com : >> _______________________________________________ >> drbd-user mailing list >> drbd-user at lists.linbit.com >> http://lists.linbit.com/mailman/listinfo/drbd-user >> >> > > Hello, > > Hardware: Tyan Thunder i7501 Pro, 512MB RAM, Intel Xeon 2400, 3Ware > Escalade 7506-4P RAID Controller, 3x Seagate Baracuda 160GB (RAID 5), 2x > Intel Etherexpress 1000MBit (no Jumbo-Frames, MTU on default value) > > Software: Debian GNU/Linux Woody, Kernel 2.4.26, DRDB 0.7.3 > > Network link tested with iptraf. I sent a 190MB Package over the two > gigabit cards and measured a speed of 12MB/s. So I think the network cards > are alright. I tried with bonding mode 0 and 1. I can´t test it without > bonding, because the RAID on the second node rebuilds at the moment. > > I dont tested the harddisks. > > Greetings, > > Christian Garling > I'm seeing similar problems (but not as slow as 12 kb/s) in an almost identical setup: Two almost identical machines with Intel D865GBF boards, 512 MB RAM, P4 2400, 3Ware 7506-4P Raid controller, 4x160 GB Samsung (or WD) hard disks, Intel 1000 MBit ethernet onboard (eth0), SMC 9452TX as second ethernet controller (eth1), attached to a 100 MBit hub to simulate the lan while the eth0 interfaces are conected with a cross-over cable (eth0 is reserved for DRBD traffic). System is SUSE 9.1 with SUSE Kernel 2.6.5-7 and drbd 0.7.2 (0.7.3 failed to compile, BTW). The config file is almost the same as the one provided by Christian. I'm running a 320GB Raid-5 on three disks (with one hotspare). Network throughput is 115000 KByte/s on eth0 and 11500 KByte/s on eth1 (measured with netio using UDP packets). The sync rate I get is 5056 KByte/s on the eth0 interfaces which is only 5% of what I had expected after "limiting" the sync rate to 100M. I have the suspicion that the problem is related to the 3Ware 7506-4P controllers and the RAID-5 system. The following values were recorded for a normal (not DRBD) partition on the raid array: Bonnie 1.4: File './Bonnie.5465', size: 536870912, volumes: 1 Writing with putc()... done: 20664 kB/s 67.6 %CPU Rewriting... done: 8425 kB/s 2.7 %CPU Writing intelligently... done: 61052 kB/s 13.4 %CPU Reading with getc()... done: 17078 kB/s 46.7 %CPU Reading intelligently... done: 44779 kB/s 6.9 %CPU The throughput is extremely low on "rewrite". Here, each Chunk (16384 Bytes) of the file is read with read(2), dirtied, and rewritten with write(2), requiring an lseek(2). I'm just guessing, but I think that the read-lseek-write cycle (in combination with the overhead of the RAID-5 parity computation for 64 KB blocks) is to blame for these values. I do not know whether the operations during the initial sync are comparable to this scenarion or not, but it seems to me that this behaviour (together with the internal meta-disk) could be the potential cause for the low sync rates. It is well known that the bonnie benchmark is a bit simplistic and tries to identify disk bottlenecks rather than simulate the actual usage pattern of a production system. However, I think it could be possible that the initial DRBD sync is one of the few cases where the I/O performance in a real-world situation is comparable to the values that are reported by a benchmark program. Has anybody seen better results with 3Ware 7506 RAID controllers? I would appreciate any comments and suggestions. Regards, Heinz-Detlev -- Heinz-Detlev Koch, E-Mail: koch at epc.de EPC, Breslauer Str. 33, 68775 Ketsch, Germany Tel.: +49 6202 690685, Fax: +49 6202 690686