[DRBD-user] Re: Huge max latency

Weilin Gong wgong at alcatel-lucent.com
Tue Jan 9 06:39:37 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Lars,

Thanks so much for your help.

> * try again with a higher numruns, to get better statistics
>
> * try again with 0.7.22
> * try again with drbd 8 (well, rcX, or svn trunk)
> * try again with large "al-extends"
>   and then again in a second run...
>   
I have not been able to run 0.7.22 and 8 yet due to our system 
constraint. I did try a large
"al-extents" value (2570) with numruns 5 (it was 3 before), which 
resulted in at least the
better max latency data:

┌─────────────────┬─────┬─────┬────┬─────┬───────┬───────┬─────────┬───────┬───────┬─────┐
│Identifier       │File │Blk  │Num │Rate │Maximum│Avg    │Max      │Lat% > │Lat% > │CPU  │
│                 │Size │Size │Thr │     │CPU%   │Latency│Latency  │2s     │10s    │Eff  │
├─────────────────┼─────┼─────┼────┼─────┼───────┼───────┼─────────┼───────┼───────┼─────┤
│Raw-device       │5000 │4096 │4   │32.19│28.27% │0.347  │5956.10  │0.00244│0.00000│114  │
├─────────────────┼─────┼─────┼────┼─────┼───────┼───────┼─────────┼───────┼───────┼─────┤
│Drbd-disconnected│5000 │4096 │4   │31.52│29.85% │0.365  │5054.72  │0.00080│0.00000│106  │
├─────────────────┼─────┼─────┼────┼─────┼───────┼───────┼─────────┼───────┼───────┼─────┤
│Drbd-connected   │5000 │4096 │4   │28.28│40.48% │0.394  │16223.82 │0.00122│0.00003│70   │
└─────────────────┴─────┴─────┴────┴─────┴───────┴───────┴─────────┴───────┴───────┴─────┘


According the comments in drbd.conf, increasing "al-extents" would cause 
longer resync time if the primary crashes. Now
here are my questions:

1) Is this the only side-effect of a large "al-extents" value?
2) Does the activity log works the way similar to the ext3 file system 
journaling,  each transaction is written to the log until
it is full. Could you provide some background information on this?

> * what is the real storage device?  sdX ? mdX ? lvm something?
> * what is the meta data storage device?

Our storage device is the SCSI disk sda. LVM is being considered.  The 
meta data resides on the same storage device
(internal).

One more thing. When running "tiobench" with DRBD in a connected mode, I 
noticed the throughput would increase
~4-6% if "jnettop" tool is turned on to monitor the interface carrying 
the traffic. Do we know what happens?

Thanks again!

Weilin





Lars Ellenberg wrote:
> / 2007-01-05 15:48:15 -0600
> \ Weilin Gong:
>   
>> Forgot to mention, DRBD version is 0.7.11.
>>     
>
>   
>> Hi,
>>
>> I ran some drbd performance tests with 'tiobench':
>>
>> tiobench.pl --identifier drbd-connected --size 5000 --numruns 1 --dir /
>> secroot --block 4096 --threads 4
>>
>> noticed the huge max latency numbers with Sequential Writes:
>>     
>
> let me help you with the formating...
>
>   
>> Unit information
>> ================
>> File size            = megabytes
>> Blk Size            = bytes
>> Rate                 = megabytes per second
>> CPU%             = percentage of CPU used during the test
>> Latency            = milliseconds
>> Lat%                = percent of requests that took longer than X seconds
>> CPU Eff           = Rate divided by CPU% - throughput per cpu load
>>
>> ┌─────────────────┬─────┬─────┬────┬─────┬───────┬───────┬─────────┬───────┬───────┬─────┐
>> │Identifier       │File │Blk  │Num │Rate │Maximum│Avg    │Max      │Lat% > │Lat% > │CPU  │
>> │                 │Size │Size │Thr │     │CPU%   │Latency│Latency  │2s     │10s    │Eff  │
>> ├─────────────────┼─────┼─────┼────┼─────┼───────┼───────┼─────────┼───────┼───────┼─────┤
>> │Raw-device       │5000 │4096 │4   │31.24│26.80% │0.364  │5635.06  │0.00242│0.00000│117  │
>> ├─────────────────┼─────┼─────┼────┼─────┼───────┼───────┼─────────┼───────┼───────┼─────┤
>> │Drbd-disconnected│5000 │4096 │4   │33.18│26.29% │0.319  │110394.58│0.00190│0.00016│126  │
>> ├─────────────────┼─────┼─────┼────┼─────┼───────┼───────┼─────────┼───────┼───────┼─────┤
>> │Drbd-connected   │5000 │4096 │4   │20.88│50.18% │0.457  │135712.09│0.00110│0.00008│42   │
>> └─────────────────┴─────┴─────┴────┴─────┴───────┴───────┴─────────┴───────┴───────┴─────┘
>>
>> My lab setting:
>>   ● Two Linux nodes on 2.6.10_mvlcge401-pc_target-x86_pentium3-P3SMP.
>>   ● Dual-core Intel Xeon 2GHz CPU, 4GB Memory, 1Gbits/s network interface.
>>   ● 5.7GB partition on SCSI disk, ext2 file system with the "-T largefile4"
>>     option.
>>   ● DRBD rate=100M;  protocal C; sndbuf-size 1024k.
>> I don't believe the disk (~50MB/s) or network may be the factors here. You
>> can see the big number jump even with the DRBD disconnected mode.
>>     
>
> of course, drbd housekeeping adds latency.
> _especially_ in disconnected mode (activity-log and dirty-bitmap updates).
>
> but I also see the _average_ latency _drop_ and the rate increase
> for drbd-disconnected...
>
> so, how do you like me to explain that?
>
> the much higher average latency for drbd connected may be due to network latency.
> overall throughput is minimum,
> overall latency is _sum_ of the respective component values.
>
>   
>> Any help on analyzing/identifying the problem will be greatly appreciated.
>>
>> Weilin
>>     
>
> * try again with a higher numruns, to get better statistics
>
> * try again with 0.7.22
> * try again with drbd 8 (well, rcX, or svn trunk)
> * try again with large "al-extends"
>   and then again in a second run...
>
> * what is the real storage device?  sdX ? mdX ? lvm something?
> * what is the meta data storage device?
>
>   




More information about the drbd-user mailing list