<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 2021-08-10 3:16 p.m., Janusz

      Jaskiewicz wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAGf4UHBwuE_w21RxbR8aq-hC_4H01bD_1nO4UvtbdbNJ+Eb0ig@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">Hi,

        <div><br>

        </div>

        <div>Thanks for your answers.</div>

        <div><br>

        </div>

        <div>Answering your questions:</div>

        <div>DRBD_KERNEL_VERSION=9.0.25<br>

        </div>

        <div><br>

        </div>

        <div>Linux kernel:</div>

        <div>4.18.0-305.3.1.el8.x86_6<br>

        </div>

        <div><br>

        </div>

        <div>File system type: </div>

        <div>XFS.</div>

        <div><br>

        </div>

        <div>So the file system is not cluster-aware, but as far as I

          understand in an active/passive setup - single primary (that I

          have) it should be OK.</div>

        <div>Just checked the doc which seems to confirm that.</div>

        <div><br>

        </div>

        <div>I think the problem may come from the way I'm testing it.</div>

        <div>I came up with this testing scenario, that I described in

          my first post, because I didn't have an easy way to abruptly

          restart the server.</div>

        <div>When I do the hard reset of the primary server it works as

          expected (at least I can find a logical explanation).</div>

        <div><br>

        </div>

        <div>I think what happened in my previous scenario was:</div>

        <div>Service is writing to the disk, and some portion of the

          written data is in a disk cache. As the picture <a

href="https://linbit.com/wp-content/uploads/drbd/drbd-guide-9_0-en/images/drbd-in-kernel.png"

            moz-do-not-send="true">https://linbit.com/wp-content/uploads/drbd/drbd-guide-9_0-en/images/drbd-in-kernel.png</a>

          shows, the cache is above the DRBD module.</div>

        <div>Then I kill the service and the network, but some data is

          still in the cache.</div>

        <div>At some point the cache is flushed and the data gets

          written to the disk.</div>

        <div>DRBD probably reports some error at this point, as it can't

          send that data to the secondary node (DRBD thinks the other

          node has left the cluster).</div>

        <div><br>

        </div>

        <div>When I check the files at this point I see more data on the

          primary because it also contains the data from the cache,

          which were not replicated because the network was down when

          the data hit the DRBD.</div>

        <div><br>

        </div>

        <div>When I do the hard restart of the server, data in the cache

          is lost, so we don't observe the result as above.</div>

        <div><br>

        </div>

        <div>Does it make sense?</div>

        <div><br>

        </div>

        <div>Regards,</div>

        <div>Janusz.</div>

      </div>

    </blockquote>

    <p>OK, it sounded from your first post like you have the FS mounted

      on both nodes at the same time, that would be a problem. If it's

      only mounted in one place at a time, then it's OK.</p>

    <p>As for caching; DRBD on the Secondary will say "write complete"

      to the primary, in protocol C, when it has been told that the disk

      write is complete. So if the cache is _above_ drbd's kernel

      module, then that's probably not the problem because the Secondary

      won't tell the primary it's done until it receives the data. If

      there is a caching issue _below_ DRBD on the Secondary, then it's

      _possible_ that's the problem, but I doubt it. The reason is that

      whatever is managing the cache below DRBD on the Secondary should

      know that a given block hasn't flushed yet and, on read request,

      read from cache not disk. This is a guess on my part.</p>

    <p>What are your 'disk { disk-flushes [yes|no]; and md-flushes

      [yes|no]; }' set to?</p>

    <pre class="moz-signature" cols="72">-- 

Digimer

Papers and Projects: <a class="moz-txt-link-freetext" href="https://alteeve.com/w/">https://alteeve.com/w/</a>

"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould</pre>

  </body>

</html>