[DRBD-user] very poor write performance with flush/barriers turned on

Mon Jan 23 22:20:43 CET 2012

> >
> > I've tried all the other configuration tweaks I can think of, the only
> > think of is that flushing is having an effect on the secondary node???
> >
> > Can anyone clarify the situation for me?
> 
> See if that helps to understand what we are doing, and why:
>  From: Lars Ellenberg
>  Subject: Re: massive latency increases from the slave with barrier or flush
> enabled
>  Date: 2011-07-03 08:50:53 GMT
> 
> http://article.gmane.org/gmane.linux.network.drbd/22056
> 

Hmmm... I googled for _ages_ and never came across that one!

The behaviour you describe seems to contradict the documented behaviour of protocols A and B though. With flushes enabled, A and B act more like C as soon as a barrier/flush comes along (if I understand correctly). I can now understand why it is done this way and it seems obvious now, but putting something about it in the docs would be really useful, eg "In all protocols, a barrier/flush (if those options are enabled) will still cause data to be synced to disk before it is considered complete".

That said, should I really expect my performance to be 10x worse? My setup is this:

(1) iscisi initiator
|
| <- (a)multipatch across a pair of 1Gb links
|
(2) drbd primary
|
| <- (b)bonded pair of 1Gb links in rr mode
|
(3) drbd secondary

If my dd write performance from (1) to (2) with (3) disconnected can be 0.5s, and (1) to (2) with (3) connected but with barrier/flush disabled can be about the same, why does it jump to 5s as soon as I turn on barrier and flush?

This is protocol B so I assume that my link (b) is working just fine and it's the flush and barrier that slows things right down.

One thing I haven't tried is different combinations of flush/md-flush/barrier... is that worth doing, or am I not really gaining any data integrity unless all are enabled? The problem is that I can't seem to do an adjust to change those without forcing a resync and/or crash of either node so I need to down the cluster first.

Thanks again!

James