[DRBD-user] slow synchronisation with 3ware RAID arrays

Sun Sep 5 16:47:27 CEST 2004

>> On Thursday 02 September 2004 21:36, Christian Garling wrote:
>>> Hello people,
>>>
>>> i have a problem with synchronisation. i have 2 servers with escalade
>>> 7506-4P raid controller and three maxtor baracuda 160gb harddisks. they
>>> are connected through two intel eepro 1000 gigabit ethernet cards
>>> (bonding
>>> mode 0). when i comment out the rate setting in drbd.conf the initial
>>> sync
>>> runs with the default value of 250kb/s. but when i use the rate setting
>>> it
>>> only runs with about 12kb/s. i tested the connection with iptraf monitor
>>> and it seems everything to be ok. here is my actually configuration. its
>>> very basicly at the moment.
>>>
>>> resource r0 {
>>>   protocol C;
>>>   incon-degr-cmd "halt -f";
>>>
>>>   startup {
>>>     wfc-timeout  0;
>>>     degr-wfc-timeout 120;
>>>  }
>>>
>>>   disk {
>>>     on-io-error   panic;
>>>   }
>>>
>>>   net {
>>>     timeout       60;    #  6 seconds  (unit = 0.1 seconds)
>>>     connect-int   10;    # 10 seconds  (unit = 1 second)
>>>     ping-int      10;    # 10 seconds  (unit = 1 second)
>>>     ko-count 5;
>>>     on-disconnect reconnect;
>>>   }
>>>
>>>   syncer {
>>>     rate 10M;
>>>   }
>>>
>>>   on node01 {
>>>     device     /dev/drbd0;
>>>     disk       /dev/sda1;
>>>     address    10.0.0.10:7788;
>>>     meta-disk  internal;
>>>   }
>>>
>>>   on node02 {
>>>     device    /dev/drbd0;
>>>     disk      /dev/sda1;
>>>     address   10.0.0.20:7788;
>>>     meta-disk internal;
>>>   }
>>> }
>>>
>>
>> I would really onjoy to see such a cluster in real live (I mean one
>> that does only 12kb/s).
>>
>> I installed a csync2/DRBD/heartbeat cluster yesterday. The two boxes
>> had 3Ware Escalades 9xxxx Controllers.
>>
>> We did some primary crash simulation test and had 20MB/sec resync
>> simultaniously on two drbd-resources resyncing in parallel.
>> (rate was set to 20M. Probabely these machines would do even more)
>>
>> Here the usual questions:
>> Have you tested the bandwith of your network link ? How ? What numbers
>> do you get ?
>> Have you tested the bandwith of your disks ? How ? What numbers
>> do you get ?
>> Which kernel ? Which drbd release ? Hardware ?
>> Have you tested without bonding ?
>> Are you using jumbo-frames ? MTU ?
>>
>> -Philipp
>> --
>> : Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
>> : LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
>> : Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>>
> 
> Hello,
> 
> Hardware: Tyan Thunder i7501 Pro, 512MB RAM, Intel Xeon 2400, 3Ware
> Escalade 7506-4P RAID Controller, 3x Seagate Baracuda 160GB (RAID 5), 2x
> Intel Etherexpress 1000MBit (no Jumbo-Frames, MTU on default value)
> 
> Software: Debian GNU/Linux Woody, Kernel 2.4.26, DRDB 0.7.3
> 
> Network link tested with iptraf. I sent a 190MB Package over the two
> gigabit cards and measured a speed of 12MB/s. So I think the network cards
> are alright. I tried with bonding mode 0 and 1. I can´t test it without
> bonding, because the RAID on the second node rebuilds at the moment.
> 
> I dont tested the harddisks.
> 
> Greetings,
> 
> Christian Garling
> 

I'm seeing similar problems (but not as slow as 12 kb/s) in an almost 
identical setup:

Two almost identical machines with Intel D865GBF boards, 512 MB RAM, P4 
2400, 3Ware 7506-4P Raid controller, 4x160 GB Samsung (or WD) hard 
disks, Intel 1000 MBit ethernet onboard (eth0), SMC 9452TX as second 
ethernet controller (eth1), attached to a 100 MBit hub to simulate the 
lan while the eth0 interfaces are conected with a cross-over cable (eth0 
is reserved for DRBD traffic).

System is SUSE 9.1 with SUSE Kernel 2.6.5-7 and drbd 0.7.2 (0.7.3 failed 
to compile, BTW). The config file is almost the same as the one provided 
by Christian.

I'm running a 320GB Raid-5 on three disks (with one hotspare). Network 
throughput is 115000 KByte/s on eth0 and 11500 KByte/s on eth1 (measured 
with netio using UDP packets).

The sync rate I get is 5056 KByte/s on the eth0 interfaces which is only 
  5% of what I had expected after "limiting" the sync rate to 100M.

I have the suspicion that the problem is related to the 3Ware 7506-4P 
controllers and the RAID-5 system. The following values were recorded 
for a normal (not DRBD) partition on the raid array:

Bonnie 1.4: File './Bonnie.5465', size: 536870912, volumes: 1
Writing with putc()...         done:  20664 kB/s  67.6 %CPU
Rewriting...                   done:   8425 kB/s   2.7 %CPU
Writing intelligently...       done:  61052 kB/s  13.4 %CPU
Reading with getc()...         done:  17078 kB/s  46.7 %CPU
Reading intelligently...       done:  44779 kB/s   6.9 %CPU

The throughput is extremely low on "rewrite". Here, each Chunk (16384 
Bytes) of the file is read with read(2), dirtied, and rewritten with 
write(2), requiring an lseek(2).

I'm just guessing, but I think that the read-lseek-write cycle (in 
combination with the overhead of the RAID-5 parity computation for 64 KB 
blocks) is to blame for these values.

I do not know whether the operations during the initial sync are 
comparable to this scenarion or not, but it seems to me that this 
behaviour (together with the internal meta-disk) could be the potential 
cause for the low sync rates.

It is well known that the bonnie benchmark is a bit simplistic and tries 
to identify disk bottlenecks rather than simulate the actual usage 
pattern of a production system. However, I think it could be possible 
that the initial DRBD sync is one of the few cases where the I/O 
performance in a real-world situation is comparable to the values that 
are reported by a benchmark program.

Has anybody seen better results with 3Ware 7506 RAID controllers?

I would appreciate any comments and suggestions.

Regards,

Heinz-Detlev

-- 
Heinz-Detlev Koch, E-Mail: koch at epc.de
EPC, Breslauer Str. 33, 68775 Ketsch, Germany
Tel.: +49 6202 690685, Fax: +49 6202 690686