<div dir="ltr">Hi, <div><br></div><div>Thank you for your reply.</div><div>I&#39;m not using a virtual machine (I&#39;ve used a VM only to check if I got the same issue and then went back on the physical server).</div><div>Do you suggest me I should disable the swap on my server ? I&#39;m using EXT4</div><div>How can I check if my primary and my secondary are synchronized using a command like fsck ? (online)</div><div>I found explanation about &quot;data getting modifying in flight&quot; but no &quot;workaround&quot; or everything on how I can avoid getting out of sync block.</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2014-10-10 18:44 GMT+11:00 Lionel Sausin <span dir="ltr">&lt;<a href="mailto:ls@numerigraphe.com" target="_blank">ls@numerigraphe.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div text="#000000" bgcolor="#FFFFFF">

    <div>&quot;buffer modified by upper layers during

      write&quot; means whatever sits on top of drbd changes data &quot;in

      flight&quot;.<br>

      Please search the list archives, this is a FAQ.<br>

      Swap and some file systems do that - usually it&#39;s some kind of

      optimization. I suspect VMWare VMs hosted in ext4 do that too.<br>

      There&#39;s probably nothing wrong, but DRBD can&#39;t know. You should do

      your data integrity checking on some higher level (fsck for

      example)<br>

      Lionel.<br>

      <br>

      Le 10/10/2014 01:45, aurelien panizza a écrit :<br>

    </div>

    <blockquote type="cite"><div><div class="h5">

      <div dir="ltr">Hi all,

        <div><br>

        </div>

        <div>I&#39;ve got a problem on my environnement.</div>

        <div>I set up my primary server (pacemaker + drbd) which ran

          alone for a while, and then I added the second server

          (currently only DRBD).</div>

        <div>Both server can see each other and /proc/drbd reports

          &quot;uptodate/uptodate&quot;.</div>

        <div>If I run a verify on that resource (right after the full

          resync), it reports some blocks out of sync ( generally from

          100 to 1500 on my 80GO LVM partition).</div>

        <div>So I disconnect/connect the slave and oos report 0 block.</div>

        <div>I run again a verify and some block are still out of sync.

          What I&#39;ve notived is that it seems to be almost always the

          same blocks which are out of sync.</div>

        <div>I tried to do a full resync multiple times but had the same

          issue.</div>

        <div>I also tried to replace the physical secondary server by a

          virtual machine (in order to check if the issue came from the

          secondary server) but had the same issue.</div>

        <div><br>

        </div>

        <div>I then activated &quot;data-integrity-alg crc32c&quot; and got a

          couple of &quot;Digest mismatch, buffer modified by upper layers

          during write: 167134312s +4096&quot; in the primary log.</div>

        <div><br>

        </div>

        <div>I tried on a different network card but got the same

          errors.</div>

        <div><br>

        </div>

        <div>My full configuration file:</div>

        <div><br>

        </div>

        <div>  protocol C;</div>

        <div>  meta-disk internal;</div>

        <div>  device /dev/drbd0;</div>

        <div>  disk /dev/sysvg/drbd;</div>

        <div><br>

        </div>

        <div>  handlers {</div>

        <div>         split-brain &quot;/usr/lib/drbd/notify-split-brain.sh

          xxx@xxx&quot;;</div>

        <div>         out-of-sync &quot;/usr/lib/drbd/notify-out-of-sync.sh

          xxx@xxx&quot;;</div>

        <div>         fence-peer &quot;/usr/lib/drbd/crm-fence-peer.sh&quot;;</div>

        <div>         after-resync-target

          &quot;/usr/lib/drbd/crm-unfence-peer.sh&quot;;</div>

        <div>  }</div>

        <div><br>

        </div>

        <div>  net {</div>

        <div>         cram-hmac-alg &quot;sha1&quot;;</div>

        <div>         shared-secret &quot;drbd&quot;;</div>

        <div>         sndbuf-size 512k;</div>

        <div>         max-buffers 8000;</div>

        <div>         max-epoch-size 8000;</div>

        <div>         verify-alg md5;</div>

        <div>         after-sb-0pri disconnect;</div>

        <div>         after-sb-1pri disconnect;</div>

        <div>         after-sb-2pri disconnect;</div>

        <div>         data-integrity-alg crc32c;</div>

        <div>  }</div>

        <div><br>

        </div>

        <div>  disk {</div>

        <div>        al-extents 3389;</div>

        <div>        fencing resource-only;</div>

        <div>  }</div>

        <div><br>

        </div>

        <div>  syncer {</div>

        <div>        rate 90M;</div>

        <div>  }</div>

        <div>  on host1 {</div>

        <div>        address <a href="http://10.110.1.71:7799" target="_blank">10.110.1.71:7799</a>;</div>

        <div>  }</div>

        <div>  on host2 {</div>

        <div>        address <a href="http://10.110.1.72:7799" target="_blank">10.110.1.72:7799</a>;</div>

        <div>  }</div>

        <div>}</div>

        <div><br>

        </div>

        <div>My OS : Redhat6 2.6.32-431.20.3.el6.x86_64</div>

        <div>DRBD version : drbd84-8.4.4-1</div>

        <div><br>

        </div>

        <div>

          <div>ethtool -k eth0</div>

          <div>Features for eth0:</div>

          <div>rx-checksumming: on</div>

          <div>tx-checksumming: on</div>

          <div>scatter-gather: on</div>

          <div>tcp-segmentation-offload: on</div>

          <div>udp-fragmentation-offload: off</div>

          <div>generic-segmentation-offload: on</div>

          <div>generic-receive-offload: off</div>

          <div>large-receive-offload: off</div>

          <div>ntuple-filters: off</div>

          <div>receive-hashing: off</div>

        </div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>Secondary server is currently not in the HA (pacemaker) but

          I don&#39;t think this the problem.</div>

        <div>I have got another HA on 2 physical host with the exact

          same configuration and drbd/os version (but not same server

          model) and everything&#39;s OK.</div>

        <div><br>

        </div>

        <div>As the primary server is in production, I can&#39;t stop the

          application (Database) to check if the alerts are false

          positive.</div>

        <div><br>

        </div>

        <div>Would you have any advice ?</div>

        <div>Could it be the primary server which have corrupted block

          or wrong metadata ?</div>

        <div><br>

        </div>

        <div>Regards,</div>

        <div><br>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      </div></div><pre>_______________________________________________

drbd-user mailing list

<a href="mailto:drbd-user@lists.linbit.com" target="_blank">drbd-user@lists.linbit.com</a>

<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a>

</pre>

    </blockquote>

    <br>

  </div>

<br>_______________________________________________<br>

drbd-user mailing list<br>

<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>

<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>

<br></blockquote></div><br></div>