<div dir="ltr"><br><div class="gmail_extra"><br>

<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><br>

&gt; Most of the time (99%) I see ERR for the swap space of virtual machines.<br>

<br>

</div>If you enable &quot;integrity-alg&quot;, do you still see those &quot;buffer modified<br>

by upper layers during write&quot;?<br>

<br>

Well, then that is your problem,<br>

and that problem can *NOT* be fixed with DRBD &quot;config tuning&quot;.<br>

<br>

What does that mean?<br>

<br>

  Upper layer submits write to DRBD.<br>

  DRBD calculates checksum over data buffer.<br>

  DRBD sends that checksum.<br>

  DRBD submits data buffer to &quot;local&quot; backend block device.<br>

      Meanwhile, upper layer changes data buffer.<br>

  DRBD sends data buffer to peer.<br>

  DRBD receives local completion.<br>

  DRBD receives remote ACK.<br>

  DRBD completes this write to upper layer.<br>

      *only now* would the upper layer be &quot;allowed&quot;<br>

      to change that data buffer again.<br>

<br>

  Misbehaving upper layer results in potentially divergent blocks<br>

  on the DRBD peers.  Or would result in potentially divergent blocks on<br>

  a local software RAID 1. Which is why the mdadm maintenance script<br>

  in rhel, &quot;raid-check&quot;, intended to be run periodically from cron,<br>

  has this tell-tale chunk:<br>

        mismatch_cnt=`cat /sys/block/$dev/md/mismatch_cnt`<br>

        # Due to the fact that raid1/10 writes in the kernel are unbuffered,<br>

        # a raid1 array can have non-0 mismatch counts even when the<br>

        # array is healthy.  These non-0 counts will only exist in<br>

        # transient data areas where they don&#39;t pose a problem.  However,<br>

        # since we can&#39;t tell the difference between a non-0 count that<br>

        # is just in transient data or a non-0 count that signifies a<br>

        # real problem, simply don&#39;t check the mismatch_cnt on raid1<br>

        # devices as it&#39;s providing far too many false positives.  But by<br>

        # leaving the raid1 device in the check list and performing the<br>

        # check, we still catch and correct any bad sectors there might<br>

        # be in the device.<br>

        raid_lvl=`cat /sys/block/$dev/md/level`<br>

        if [ &quot;$raid_lvl&quot; = &quot;raid1&quot; -o &quot;$raid_lvl&quot; = &quot;raid10&quot; ]; then<br>

            continue<br>

        fi<br>

<br>

Anyways.<br>

Point being: Either have those upper layers stop modifying buffers<br>

while they are in-flight (keyword: &quot;stable pages&quot;).<br>

Kernel upgrade within the VMs may do it.  Changing something in the<br>

&quot;virtual IO path configuration&quot; may do it.  Or not.<br>

<br>

Or live with the results, which are<br>

potentially not identical blocks on the DRBD peers.<br>

<span class=""><font color="#888888"></font></span></blockquote><div><br></div><div>Hello Lars,<br><br></div><div>Thank you for the detailed explanation. I&#39;ve done some more tests and found that &quot;out of sync&quot; sectors appear for master-slave also, not only for master-master.<br>

<br>Can you share your thoughts about what can cause upper layer changes in the following schema?<br></div><div>KVM (usually virtio) -&gt; LVM -&gt; DRBD -&gt; RAID10 -&gt; Physical drives, while LVM snapshots are not used.<br>

<br></div><div>Can LVM cause these OOS? Could it help if we replace by the following schema?<br>KVM (usually virtio) -&gt; DRBD -&gt; LVM -&gt; RAID10 -&gt; Physical drives, while LVM snapshots are not used.<br><br></div>

<div>Stanislav<br></div></div></div></div>