[DRBD-user] How long does it take OOS to clear

Mon Oct 21 22:36:39 CEST 2019

Is there anything that will force the OOS to push what is out of sync?

On Mon, Oct 21, 2019 at 11:00 AM Digimer <lists at alteeve.ca> wrote:

> Tuning is quite instance-specific. I would always suggest starting by
> commenting out all tuning, see how it behaves, then tune. Premature
> optimization never is.
>
> digimer
>
> On 2019-10-21 1:31 p.m., G C wrote:
> > Would any of these values being changed help or would it need to be the
> > actual speed between the two nodes that needs to be increased?
> >
> > disk {
> >         on-io-error detach;
> >         c-plan-ahead 10;
> >         c-fill-target 24M;
> >         c-min-rate 80M;
> >         c-max-rate 720M;
> >     }
> >     net {
> >         protocol A;
> >         max-buffers 36k;
> >         sndbuf-size 1024k;
> >         rcvbuf-size 2048k;
> >     }
> >
> > Thank you
> >
> >
> >
> > On Mon, Oct 21, 2019 at 10:10 AM Digimer <lists at alteeve.ca
> > <mailto:lists at alteeve.ca>> wrote:
> >
> >     I assumed it wasn't paused, but that confirms it.
> >
> >     Protocol A allows for out of sync to grow. It says "when the data in
> on
> >     the network buffer to send to the peer, consider the write
> complete". As
> >     such, data that hasn't made it over to the peer causes oos to climb.
> If
> >     you have a steady write rate that is faster than your transmit
> >     bandwidth, then seeing fairly steady OOS makes sense.
> >
> >     To "fix" it, you need to increase the connection speed to the peer
> node.
> >     Or, less likely, if the peer's disk is slower than the bandwidth
> >     connecting it, speed up the disk write speed.
> >
> >     In either case, what you are seeing is not a surprise, and it's not a
> >     problem with DRBD. The only other option is to use protocol C, so
> that a
> >     write isn't complete until it reaches the peer, but that will slow
> down
> >     the write performance of the primary node to be whatever speed you
> have
> >     to the peer. That's likely unacceptable.
> >
> >     In short, you have a hardware/resource issue.
> >
> >     digimer
> >
> >     On 2019-10-21 12:19 p.m., G C wrote:
> >     > version: 8.4.10
> >     > Ran the resume-sync all and received:
> >     > 0: Failure: (135) Sync-pause flag is already cleared
> >     > Command 'drbdsetup-84 resume-sync 0' terminated with exit code 10
> >     >
> >     > Protocol used is 'A', our systems are running on a cloud
> environment.
> >     >
> >     >
> >     >
> >     >
> >     > On Mon, Oct 21, 2019 at 9:09 AM Digimer <lists at alteeve.ca
> >     <mailto:lists at alteeve.ca>
> >     > <mailto:lists at alteeve.ca <mailto:lists at alteeve.ca>>> wrote:
> >     >
> >     >     8.9.2 is the utils version, what is the kernel module version?
> >     >     (8.3.x/8.4.x/9.0.x)?
> >     >
> >     >     It's possible something paused sync, but I doubt it. You can
> try
> >     >     'drbdadm resume-sync all'. The oos number should change
> >     constantly, any
> >     >     time a block changes it should go up and every time a block
> >     syncs it
> >     >     should go down.
> >     >
> >     >     What protocol are you using? A, B or C?
> >     >
> >     >     digimer
> >
> >
> >     --
> >     Digimer
> >     Papers and Projects: https://alteeve.com/w/
> >     "I am, somehow, less interested in the weight and convolutions of
> >     Einstein’s brain than in the near certainty that people of equal
> talent
> >     have lived and died in cotton fields and sweatshops." - Stephen Jay
> >     Gould
> >
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20191021/1567ba0f/attachment-0001.htm>