[DRBD-user] DRBD 8.2.6 - reason for full resync?!

Andrei Neagoe anne at imc.nl
Fri Jun 27 09:14:25 CEST 2008


Hi,

No real improvement, just maybe a couple of MB/s. Also, what's strange 
is that the same test performed over and over again produces quite 
different results:

[root at leviathan ftp]# dd if=/dev/zero of=zero.dat bs=1G  count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 39.0488 seconds, 27.5 MB/s
[root at leviathan ftp]# rm zero.dat
[root at leviathan ftp]# rm -f zero.dat
[root at leviathan ftp]# dd if=/dev/zero of=zero.dat bs=1G  count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 37.3662 seconds, 28.7 MB/s
[root at leviathan ftp]# rm -f zero.dat
[root at leviathan ftp]# dd if=/dev/zero of=zero.dat bs=1G  count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 19.4406 seconds, 55.2 MB/s
[root at leviathan ftp]# rm -f zero.dat
[root at leviathan ftp]# dd if=/dev/zero of=zero.dat bs=1G  count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 20.3066 seconds, 52.9 MB/s


All of the above tests were done at less than 1 minute interval.

Thanks,

Andrei.

Marcelo Azevedo wrote:
> put :
>   no-disk-flushes;
>   no-md-flushes;
> under ,  disk {  }  
> tell me if it makes a difference ...
> On Tue, Jun 24, 2008 at 2:12 PM, Andrei Neagoe <anne at imc.nl 
> <mailto:anne at imc.nl>> wrote:
>
>     Thanks a lot for the clarification. That was exactly the case...
>     from my understanding of the docs I thought it was just necessary
>     to run drbdadm adjust all on each node, regardless of the node
>     state (primary or secondary). Right now it's pretty clear how I
>     must proceed with the testing.
>     What still puzzles me is the fact that only one resource got the
>     need to be fully resynchronized, because as I said, I'm running
>     lvm2 over them (having drbd0 and drbd1 as physical volumes).
>     Another thing is the speed, which atm it's let's say satisfactory,
>     but I found a thread on linbit archive where a user with a very
>     similar setup and testing scheme was getting ~37 MB/s over fiber
>     link between 2 datacenters and if connected via crossover cable a
>     transfer rate of almost 80 MB/s. You can view the thread here:
>     http://archives.free.net.ph/message/20080523.225430.9ba8ceac.en.html
>     Testing both network and writing to the external storage box
>     directly reveals that these are not the limitations:
>
>         /------------------------------------------------------------/
>         /Client connecting to 10.0.0.20 <http://10.0.0.20/>, TCP port
>         5001/
>         /TCP window size: 0.02 MByte (default)/
>         /------------------------------------------------------------/
>         /[  3] local 10.0.0.10 <http://10.0.0.10/> port 39353
>         connected with 10.0.0.20 <http://10.0.0.20/> port 5001/
>         /[ ID] Interval       Transfer     Bandwidth/
>         /[  3]  0.0-10.0 sec  1125 MBytes    113 MBytes/sec/
>         /[ ID] Interval       Transfer     Bandwidth/
>         /[  3]  0.0-10.0 sec  1125 MBytes    112 MBytes/sec
>         -----------------------------------------------------------
>
>         [root at erebus testing]# dd if=/dev/zero of=test.dat bs=1G
>         count=1 oflag=dsync
>         1+0 records in
>         1+0 records out
>         1073741824 bytes (1.1 GB) copied, 10.321 seconds, 104 MB/s
>
>         /
>
>     Note that in the above test a different device is mounted in
>     /testing (just another logical drive on the storage box). As an
>     additional information, the storage box is an IBM DS 3200
>     connected to the machine using 2 SAS HBA's (just for redundancy,
>     no load balancing).
>
>     So at the moment I'm also pretty stuck with performance tuning as
>     I don't know what else I could try.
>
>     Thanks,
>     Andrei Neagoe.
>
>
>     Lars Ellenberg wrote:
>>     On Tue, Jun 24, 2008 at 12:27:30PM +0200, Andrei Neagoe wrote:
>>       
>>>     Hi,
>>>
>>>     I was trying today to play with drbd's settings and benchmark the results in
>>>     order to obtain the best performance.
>>>     Here is my test setup:
>>>     2 identical machines with sas storage boxes. Each machine has two 2TB device
>>>     (in my case /dev/sdb and /dev/sdc) that I mirror over drbd and on top of them
>>>     there's LVM set up. The nodes share a gbit link dedicated for drbd traffic.
>>>     After the initial sync which took something around 20 hours to finish, I've
>>>     created the LVM volume and formatted using ext3 FS. Then I started to play
>>>     around with params like al-extents, unplug-watermark, maxbuffers, max-epoch by
>>>     changing the  values and doing a drbdadm adjust all on each node (of course
>>>     after copying the config file accordingly). In the begining it went pretty
>>>     well, maximum value attained by dd test over drbd was 28.9 MB/s:
>>>
>>>     [root at erebus testing]# dd if=/dev/zero of=test.dat bs=1G count=1 oflag=dsync
>>>     1+0 records in
>>>     1+0 records out
>>>     1073741824 bytes (1.1 GB) copied, 37.1114 seconds, 28.9 MB/s
>>>
>>>     The configuration used is described in the end. After a couple more tests, I
>>>     noticed a big impact on performance, getting around 19-20 MB/s so I checked /
>>>     proc/drbd to see what's going on. Surprisingly, it was doing a full resync on
>>>     one of the disks. Problem is, I don't understand why, as normally it should
>>>     only resync discrepancies.
>>>         
>>     if you change anything in the config file that changes "disk"
>>     parameters (like on-io-error, size, fencing, use-bmbv, ...),
>>     which causes drbdadm adjust to think it needs to detach/attach, and you
>>     do that while being primary, you get a full sync.
>>
>>     this is unfortunate, and there should probably
>>     be a dialog to warn you about it.
>>
>>     if you detach a Primary, then reattach, it will receive a full sync.
>>     you need to make it secondary first, if you want to avoid that.
>>     detaching, then reattaching a secondary will only receive an
>>     "incremental" resync, which typically is a few KB or nothing at all,
>>     depending on the timing.
>>
>>     if this is not what happened for you, read the kernel log,
>>     typically drbd tells you why a resync was necessary.
>>
>>
>>     --
>>     : Lars Ellenberg                           http://www.linbit.com <http://www.linbit.com/> :
>>     : DRBD/HA support and consulting             sales at linbit.com <http://linbit.com/> :
>>     : LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
>>     : Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
>>     __
>>     please don't Cc me, but send to list -- I'm subscribed
>>     _______________________________________________
>>     drbd-user mailing list
>>     drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>>     http://lists.linbit.com/mailman/listinfo/drbd-user
>>       
>
>
>     _______________________________________________
>     drbd-user mailing list
>     drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>     http://lists.linbit.com/mailman/listinfo/drbd-user
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linbit.com/pipermail/drbd-user/attachments/20080627/439a02d0/attachment.htm 


More information about the drbd-user mailing list