[DRBD-user] Secondary node saturises RAID array

Fri Apr 11 17:08:18 CEST 2008

On Friday 11 April 2008 15:19:29 Florian Haas wrote:
> No, not really. I was actually asking because most people tend to use CFQ
> these days since it's the default in recent kernels. I was going to
> suggest noop or deadline, but you've tried that already.
>
> And, I assume you do have your write cache enabled and set to write back.

Yep.

> > The weird thing is, when I disconnect the secondary DRBD node the
> > increment
> > becomes a few hundred times faster. When the second node reconnects after
> > a
> > few minutes it's sync is _very_ fast (a few seconds). The performance
> > drops
> > back again after the reconnect.
>
> Um, this is just a wild guess, but I do remember having observed similar
> symptoms after enabling Jumbo frames on one of my test systems. I never
> found a reasonable explanation for this -- if someone else has, please
> share -- but latency dropped for a few writes, then surged dramatically
> and never improved. Can you duplicate your tests with a standard-issue MTU
> of 1500?

Meh, exactly the same results. The network seems fine, NFS loves it.
The _very_ fast sync I wrote about is the DRBD sync, not the sync system call 
by the way. The secondary node was back in business in no time after being 
reconnected in a degraded state. So the bandwidth and/or latency to the 
storage device isn't the problem.

And as for your latency problem, it sounds like a failing switch or NIC. I've 
had the same problem with an old D-link gigabit switch. It seems to do OK for 
a few frames but starts sending garbage after a while.

I've run the same tests using protocol A instead of C. sync() still takes 
quite long, but it's faster. Sync takes half a second after a 1M write 
compared to almost two seconds (sometimes) using protocol C.

When DRBD is connected every call to sync, even when there's nothing changed, 
takes 0.35s.

> I had the same behaviour using RAID5/RAID6 with internal metadata.
> We already discussed here few months ago, and think Lars explained it as a
> "bitmap sync writes problem with the raid parity calculation"
> Try to change your raid level to 0/10 or move the internal meta-data
> somewhere else.

I'm going to try that tomorrow, it makes sense in a way. Too bad, I was quite 
happy with this RAID's performance.

Thanks, and I'll report my findings back to this list.

-- 
Greetings,
Joris