[DRBD-user] Secondary node saturises RAID array

Fri Apr 11 15:39:22 CEST 2008

Hello Folks,

Il 11-04-2008 15:19, "Florian Haas" <florian.haas at linbit.com> ha scritto:

>> On Thursday 10 April 2008 17:32:42 Florian Haas wrote:
>> 
>>> No, it's DRBD doing its grey voodoo magic. :-) You simply witnessed the
>>> effects of "cold" vs. "hot" activity log.
>> 
>> Cool! (pun intended).
>> 
>>> May I guess you are using the CFQ I/O scheduler?
>>> What's /sys/block/sda/queue/scheduler say?
>> 
>> I'm currently using noop because of the hardware RAID underneath[1]. I've
>> also
>> tried the deadline scheduler, since it's better for database loads
>> according
>> to the Linux docs. This didn't improve anything, although noop's only a
>> bit
>> faster. Noop's results are consistent while deadline fluctuates a bit
>> more.
>> 
>> The way I've tested this is keeping a 0.5s watch firing a SELECT COUNT(*)
>> query into the database while a serial INSERT script is running on the
>> background. All of this is running while switching/tuning schedulers. The
>> setup causing the fastest increment in the COUNT(*) result wins.
>> 
>> I haven't tested CFQ that well, but without tuning CFQ it's performance is
>> worse, which is to be expected. Is this scheduler a potential winner when
>> tuned correctly?
> 
> No, not really. I was actually asking because most people tend to use CFQ
> these days since it's the default in recent kernels. I was going to
> suggest noop or deadline, but you've tried that already.
> 
> And, I assume you do have your write cache enabled and set to write back.
> 
>> The weird thing is, when I disconnect the secondary DRBD node the
>> increment
>> becomes a few hundred times faster. When the second node reconnects after
>> a
>> few minutes it's sync is _very_ fast (a few seconds). The performance
>> drops
>> back again after the reconnect.
> 
> Um, this is just a wild guess, but I do remember having observed similar
> symptoms after enabling Jumbo frames on one of my test systems. I never
> found a reasonable explanation for this -- if someone else has, please
> share -- but latency dropped for a few writes, then surged dramatically
> and never improved. Can you duplicate your tests with a standard-issue MTU
> of 1500?

I had the same behaviour using RAID5/RAID6 with internal metadata.
We already discussed here few months ago, and think Lars explained it as a
"bitmap sync writes problem with the raid parity calculation"
Try to change your raid level to 0/10 or move the internal meta-data
somewhere else.
For those interested, i'm going to try with a i-ram for metadata. Any
thoughts?
--
matteo