[DRBD-user] very poor write performance with flush/barriers turned on

Sat Jan 28 00:06:19 CET 2012

Did anyone have an answer for the issue below? If it's a bug I'll go looking in the code for why, but if it's a 'feature' then I guess I'm stuck with it. It just doesn't make sense that turning on barrier/flush would make the latency 10x worse.

It seems that a few other threads I've read just trailed off at this point too...

James

> 
> > >
> > > I've tried all the other configuration tweaks I can think of, the
> > > only think of is that flushing is having an effect on the secondary node???
> > >
> > > Can anyone clarify the situation for me?
> >
> > See if that helps to understand what we are doing, and why:
> >  From: Lars Ellenberg
> >  Subject: Re: massive latency increases from the slave with barrier or
> > flush enabled
> >  Date: 2011-07-03 08:50:53 GMT
> >
> > http://article.gmane.org/gmane.linux.network.drbd/22056
> >
> 
> Hmmm... I googled for _ages_ and never came across that one!
> 
> The behaviour you describe seems to contradict the documented behaviour
> of protocols A and B though. With flushes enabled, A and B act more like C as
> soon as a barrier/flush comes along (if I understand correctly). I can now
> understand why it is done this way and it seems obvious now, but putting
> something about it in the docs would be really useful, eg "In all protocols, a
> barrier/flush (if those options are enabled) will still cause data to be synced
> to disk before it is considered complete".
> 
> That said, should I really expect my performance to be 10x worse? My setup
> is this:
> 
> (1) iscisi initiator
> |
> | <- (a)multipatch across a pair of 1Gb links
> |
> (2) drbd primary
> |
> | <- (b)bonded pair of 1Gb links in rr mode
> |
> (3) drbd secondary
> 
> If my dd write performance from (1) to (2) with (3) disconnected can be 0.5s,
> and (1) to (2) with (3) connected but with barrier/flush disabled can be about
> the same, why does it jump to 5s as soon as I turn on barrier and flush?
> 
> This is protocol B so I assume that my link (b) is working just fine and it's the
> flush and barrier that slows things right down.
> 
> One thing I haven't tried is different combinations of flush/md-
> flush/barrier... is that worth doing, or am I not really gaining any data integrity
> unless all are enabled? The problem is that I can't seem to do an adjust to
> change those without forcing a resync and/or crash of either node so I need
> to down the cluster first.
> 
> Thanks again!
> 
> James
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user