<div dir="ltr">Hi, <div><br></div><div>Thank you for your reply.</div><div>I'm not using a virtual machine (I've used a VM only to check if I got the same issue and then went back on the physical server).</div><div>Do you suggest me I should disable the swap on my server ? I'm using EXT4</div><div>How can I check if my primary and my secondary are synchronized using a command like fsck ? (online)</div><div>I found explanation about "data getting modifying in flight" but no "workaround" or everything on how I can avoid getting out of sync block.</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2014-10-10 18:44 GMT+11:00 Lionel Sausin <span dir="ltr"><<a href="mailto:ls@numerigraphe.com" target="_blank">ls@numerigraphe.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>"buffer modified by upper layers during
write" means whatever sits on top of drbd changes data "in
flight".<br>
Please search the list archives, this is a FAQ.<br>
Swap and some file systems do that - usually it's some kind of
optimization. I suspect VMWare VMs hosted in ext4 do that too.<br>
There's probably nothing wrong, but DRBD can't know. You should do
your data integrity checking on some higher level (fsck for
example)<br>
Lionel.<br>
<br>
Le 10/10/2014 01:45, aurelien panizza a écrit :<br>
</div>
<blockquote type="cite"><div><div class="h5">
<div dir="ltr">Hi all,
<div><br>
</div>
<div>I've got a problem on my environnement.</div>
<div>I set up my primary server (pacemaker + drbd) which ran
alone for a while, and then I added the second server
(currently only DRBD).</div>
<div>Both server can see each other and /proc/drbd reports
"uptodate/uptodate".</div>
<div>If I run a verify on that resource (right after the full
resync), it reports some blocks out of sync ( generally from
100 to 1500 on my 80GO LVM partition).</div>
<div>So I disconnect/connect the slave and oos report 0 block.</div>
<div>I run again a verify and some block are still out of sync.
What I've notived is that it seems to be almost always the
same blocks which are out of sync.</div>
<div>I tried to do a full resync multiple times but had the same
issue.</div>
<div>I also tried to replace the physical secondary server by a
virtual machine (in order to check if the issue came from the
secondary server) but had the same issue.</div>
<div><br>
</div>
<div>I then activated "data-integrity-alg crc32c" and got a
couple of "Digest mismatch, buffer modified by upper layers
during write: 167134312s +4096" in the primary log.</div>
<div><br>
</div>
<div>I tried on a different network card but got the same
errors.</div>
<div><br>
</div>
<div>My full configuration file:</div>
<div><br>
</div>
<div> protocol C;</div>
<div> meta-disk internal;</div>
<div> device /dev/drbd0;</div>
<div> disk /dev/sysvg/drbd;</div>
<div><br>
</div>
<div> handlers {</div>
<div> split-brain "/usr/lib/drbd/notify-split-brain.sh
xxx@xxx";</div>
<div> out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh
xxx@xxx";</div>
<div> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";</div>
<div> after-resync-target
"/usr/lib/drbd/crm-unfence-peer.sh";</div>
<div> }</div>
<div><br>
</div>
<div> net {</div>
<div> cram-hmac-alg "sha1";</div>
<div> shared-secret "drbd";</div>
<div> sndbuf-size 512k;</div>
<div> max-buffers 8000;</div>
<div> max-epoch-size 8000;</div>
<div> verify-alg md5;</div>
<div> after-sb-0pri disconnect;</div>
<div> after-sb-1pri disconnect;</div>
<div> after-sb-2pri disconnect;</div>
<div> data-integrity-alg crc32c;</div>
<div> }</div>
<div><br>
</div>
<div> disk {</div>
<div> al-extents 3389;</div>
<div> fencing resource-only;</div>
<div> }</div>
<div><br>
</div>
<div> syncer {</div>
<div> rate 90M;</div>
<div> }</div>
<div> on host1 {</div>
<div> address <a href="http://10.110.1.71:7799" target="_blank">10.110.1.71:7799</a>;</div>
<div> }</div>
<div> on host2 {</div>
<div> address <a href="http://10.110.1.72:7799" target="_blank">10.110.1.72:7799</a>;</div>
<div> }</div>
<div>}</div>
<div><br>
</div>
<div>My OS : Redhat6 2.6.32-431.20.3.el6.x86_64</div>
<div>DRBD version : drbd84-8.4.4-1</div>
<div><br>
</div>
<div>
<div>ethtool -k eth0</div>
<div>Features for eth0:</div>
<div>rx-checksumming: on</div>
<div>tx-checksumming: on</div>
<div>scatter-gather: on</div>
<div>tcp-segmentation-offload: on</div>
<div>udp-fragmentation-offload: off</div>
<div>generic-segmentation-offload: on</div>
<div>generic-receive-offload: off</div>
<div>large-receive-offload: off</div>
<div>ntuple-filters: off</div>
<div>receive-hashing: off</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Secondary server is currently not in the HA (pacemaker) but
I don't think this the problem.</div>
<div>I have got another HA on 2 physical host with the exact
same configuration and drbd/os version (but not same server
model) and everything's OK.</div>
<div><br>
</div>
<div>As the primary server is in production, I can't stop the
application (Database) to check if the alerts are false
positive.</div>
<div><br>
</div>
<div>Would you have any advice ?</div>
<div>Could it be the primary server which have corrupted block
or wrong metadata ?</div>
<div><br>
</div>
<div>Regards,</div>
<div><br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div></div><pre>_______________________________________________
drbd-user mailing list
<a href="mailto:drbd-user@lists.linbit.com" target="_blank">drbd-user@lists.linbit.com</a>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a>
</pre>
</blockquote>
<br>
</div>
<br>_______________________________________________<br>
drbd-user mailing list<br>
<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
<br></blockquote></div><br></div>