<div dir="ltr">Would any of these values being changed help or would it need to be the actual speed between the two nodes that needs to be increased?<div><br></div><div>disk {<br>        on-io-error detach;<br>        c-plan-ahead 10;<br>        c-fill-target 24M;<br>        c-min-rate 80M;<br>        c-max-rate 720M;<br>    }<br>    net {<br>        protocol A;<br>        max-buffers 36k;<br>        sndbuf-size 1024k;<br>        rcvbuf-size 2048k;</div><div>    }<br></div><div><br></div><div>Thank you</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 21, 2019 at 10:10 AM Digimer &lt;<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I assumed it wasn&#39;t paused, but that confirms it.<br>

<br>

Protocol A allows for out of sync to grow. It says &quot;when the data in on<br>

the network buffer to send to the peer, consider the write complete&quot;. As<br>

such, data that hasn&#39;t made it over to the peer causes oos to climb. If<br>

you have a steady write rate that is faster than your transmit<br>

bandwidth, then seeing fairly steady OOS makes sense.<br>

<br>

To &quot;fix&quot; it, you need to increase the connection speed to the peer node.<br>

Or, less likely, if the peer&#39;s disk is slower than the bandwidth<br>

connecting it, speed up the disk write speed.<br>

<br>

In either case, what you are seeing is not a surprise, and it&#39;s not a<br>

problem with DRBD. The only other option is to use protocol C, so that a<br>

write isn&#39;t complete until it reaches the peer, but that will slow down<br>

the write performance of the primary node to be whatever speed you have<br>

to the peer. That&#39;s likely unacceptable.<br>

<br>

In short, you have a hardware/resource issue.<br>

<br>

digimer<br>

<br>

On 2019-10-21 12:19 p.m., G C wrote:<br>

&gt; version: 8.4.10<br>

&gt; Ran the resume-sync all and received:<br>

&gt; 0: Failure: (135) Sync-pause flag is already cleared<br>

&gt; Command &#39;drbdsetup-84 resume-sync 0&#39; terminated with exit code 10<br>

&gt; <br>

&gt; Protocol used is &#39;A&#39;, our systems are running on a cloud environment.<br>

&gt; <br>

&gt; <br>

&gt; <br>

&gt; <br>

&gt; On Mon, Oct 21, 2019 at 9:09 AM Digimer &lt;<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a><br>

&gt; &lt;mailto:<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>&gt;&gt; wrote:<br>

&gt; <br>

&gt;     8.9.2 is the utils version, what is the kernel module version?<br>

&gt;     (8.3.x/8.4.x/9.0.x)?<br>

&gt; <br>

&gt;     It&#39;s possible something paused sync, but I doubt it. You can try<br>

&gt;     &#39;drbdadm resume-sync all&#39;. The oos number should change constantly, any<br>

&gt;     time a block changes it should go up and every time a block syncs it<br>

&gt;     should go down.<br>

&gt; <br>

&gt;     What protocol are you using? A, B or C?<br>

&gt; <br>

&gt;     digimer<br>

<br>

<br>

-- <br>

Digimer<br>

Papers and Projects: <a href="https://alteeve.com/w/" rel="noreferrer" target="_blank">https://alteeve.com/w/</a><br>

&quot;I am, somehow, less interested in the weight and convolutions of<br>

Einstein’s brain than in the near certainty that people of equal talent<br>

have lived and died in cotton fields and sweatshops.&quot; - Stephen Jay Gould<br>

</blockquote></div>