<br><br>

<div class="gmail_quote">2009/7/7 Mike Sweetser - Adhost <span dir="ltr">&lt;<a href="mailto:mikesw@adhost.com">mikesw@adhost.com</a>&gt;</span><br>

<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">

<div>

<p><font size="2">Hello:<br><br>We have two DRBD machines running RHEL 5.3 with DRBD 8.3.0.  Recently, we had an outage that took the primary server in the cluster down, leaving it to failover using DRBD and Heartbeat.  This was done with no issues.  <br>

<br>When the other server came back online, we initiated a manual resync as follows:<br><br>drbdadm secondary RESOURCE<br>drbdadm -- --discard-my-data connect RESOURCE<br><br>Then from the live server, we did drbdadm connect RESOURCE, and it connected and resynced.<br>

<br>Assuming all this was done right, we ran into other issues - some people have complained that their files have &quot;reverted&quot; to a previous state.  We don&#39;t show any errors occuring in the synchronization of the files, and never saw any &quot;oos&quot; in the DRBD status. <br>

<br>So how could this have happened?  What can be done, outside of regular &quot;drbdadm verify&quot;s, to combat this problem?  And honestly, why is it necessary to do manual verification when file integrity of this nature should be a fundamental part of any file system duplication of this nature?<br>

</font></p></div></blockquote>

<div> </div>

<div>Because DRBD replicates data blocks - it does not care about filesystem on top of it. Without cluster-aware filesystem it is not filesystem duplication.</div>

<div>If you are using Xen on top of DRBD there could be some writes that get not propagated to standby node. Look for threads on the list for details.</div>

<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">

<div>

<p><font size="2"><span></span><br>I&#39;ve attached my drbd.conf here - feel free to mention if I&#39;ve done something stupid.<br><br>resource r1 {<br>  protocol C;<br>  handlers {<br>    pri-on-incon-degr &quot;echo &#39;DRBD: primary requested but inconsistent!&#39; | wall; /etc/init.d/heartbeat stop&quot;; #&quot;halt -f&quot;;<br>

    pri-lost-after-sb &quot;echo &#39;DRBD: primary requested but lost!&#39; | wall; /etc/init.d/heartbeat stop&quot;; #&quot;halt -f&quot;;<br>  }<br><br>  startup {<br>    degr-wfc-timeout 120;    # 2 minutes.<br>    wfc-timeout 120;    # 2 minutes.<br>

  }<br><br>  disk {<br>    no-disk-flushes;</font></p></div></blockquote>

<div>This looks interesting. Are you sure you have battery-backed write cache?</div>

<div> </div>

<div>Tino</div></div>