<br><font size=2><tt>We confirmed that for speed considerations the circular

buffers do </tt></font>

<br><font size=2><tt>not use fsync(). Data loss is not a concern in a crash.</tt></font>

<br>

<br><font size=2><tt>We have also confirmed on subsequent verifies that

the marked </tt></font>

<br><font size=2><tt>out-of-sync blocks are all from these logs.</tt></font>

<br>

<br><font size=2><tt>Thanks<br>

</tt></font>

<br><font size=2><tt>&gt; &gt; On Thu, May 21, 2009 at 09:33:43AM -0600,

David.Livingstone@cn.ca wrote:<br>

&gt; &gt; &gt; It seems like the &quot;out-of-sync&quot; log messages are

output because the<br>

&gt; data<br>

&gt; &gt; &gt; in those sectors was changing during the verification.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Here's how I came up with that...<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; For a /var/log/messages statement like this (notice the

time stamp):<br>

&gt; &gt; &gt; May 20 14:30:19 wimpas2 kernel: drbd0: Out of sync: start=137739248,<br>

&gt; &gt; &gt; size=8 (sectors)<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; We can run a &quot;dd&quot; command to peek at what data

it's talking about on<br>

&gt; &gt; &gt; both servers:<br>

&gt; &gt; &gt; (on wimpas1): &nbsp;sudo dd if=/dev/mapper/VolGroup01-LogVol00

iflag=direct<br>

&gt; &gt; &gt; bs=512 skip=137739248 count=8 of=/tmp/wimpas1-drbd-oos<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; (on wimpas2): &nbsp;sudo dd if=/dev/mapper/VolGroup01-LogVol00

iflag=direct<br>

&gt; &gt; &gt; bs=512 skip=137739248 count=8 of=/tmp/wimpas2-drbd-oos<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Comparing the two output files using &quot;diff&quot; showed

they were the same,<br>

&gt; &gt; &gt; so that indicates replication worked properly.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Looking inside the files showed they were polling logs with

timestamps<br>

&gt; &gt; &gt; from the same time that the /var/log/messages statement

was output:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; eg) (snipped for brevity, notice the time stamps 20th day,

14:30:16 -<br>

&gt; &gt; &gt; 14:30:22)<br>

&gt; &gt; &gt; time:20143016 REC fd:21 ff1216060100ef57000000000000f78f<br>

&gt; &gt; &gt; time:20143016 TRA fd:21 12ff14000100e0a6 size:8 dur:0 OK<br>

&gt; &gt; ...<br>

&gt; &gt; &gt; time:20143022 REC fd:21 ff0f160601002763000000000000f78f<br>

&gt; &gt; &gt; time:20143022 TRA fd:21 0fff1400010097c9 size:8 dur:0 OK<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; So, the theory right now is that the &quot;out-of-sync&quot;

messages were<br>

&gt; because<br>

&gt; &gt; &gt; the data in those sectors was changing during the verification

and the<br>

&gt; &gt; &gt; &quot;0 KB (0 bits) marked out-of-sync&quot; means DRBD

realized that.<br>

</tt></font>

<br><font size=2><tt>&gt; &gt; please also see:<br>

</tt></font>

<br><font size=2><tt>&gt; &gt; http://thread.gmane.org/gmane.linux.kernel.drbd.devel/790<br>

&gt; &gt; http://thread.gmane.org/gmane.linux.network.drbd/14850<br>

</tt></font>

<br><font size=2><tt>&gt; Lars,<br>

</tt></font>

<br><font size=2><tt>&gt; Thanks for the reply.<br>

</tt></font>

<br><font size=2><tt>&gt; I've reviewed the links above(head is now spinning:).

With respect<br>

&gt; to &quot;crash safe&quot; applications the out-of-sync disk portions

that<br>

&gt; we looked at were poller and alarm daemon log files. &nbsp;They use

circular</tt></font>

<br><font size=2><tt>&gt; logs, so they<br>

&gt; would be overwriting a file they've created. &nbsp;We're currently

checking<br>

&gt; whether or not they use fsync().</tt></font>

<br><font size=2><tt>&gt; <br>

&gt; As shown in the initial post we are using ext3.<br>

</tt></font>

<br><font size=2><tt>&gt; Anything else we could be checking ?<br>

</tt></font>

<br><font size=2><tt>&gt; Thanks<br>

</tt></font>

<br><font size=2><tt>&gt; &gt;<br>

&gt; &gt; I'd suggest that &quot;somthing&quot; modified in-flight buffers,<br>

&gt; &gt; then re-submitted them.<br>

</tt></font>

<br><font size=2><tt>&gt; &gt; the drbd online-verify (as well as the syncer)

is supposed to &quot;lock&quot; the<br>

&gt; &gt; regions it currently compares against application IO, so it should

do<br>

&gt; &gt; the compare when no application IO is in-flight (on that region).<br>

&gt; &gt; but it may hit such a &quot;transient&quot; not-in-sync thingy.<br>

</tt></font>

<br><font size=2><tt>&gt; &gt; iirc, a few &quot;modify in-flight buffer&quot;

things have been tackled in the<br>

&gt; &gt; upstream kernel during the &quot;bio integrity&quot; work in

recent kernels.<br>

</tt></font>

<br><font size=2><tt>&gt; &gt; --<br>

&gt; &gt; : Lars Ellenberg<br>

&gt; &gt; : LINBIT | Your Way to High Availability<br>

&gt; &gt; : DRBD/HA support and consulting http://www.linbit.com<br>

</tt></font>

<br><font size=2><tt>&gt; &gt; DRBD? and LINBIT? are registered trademarks

of LINBIT, Austria.<br>

&gt; &gt; __<br>

&gt; &gt; please don't Cc me, but send to list &nbsp; -- &nbsp; I'm subscribed<br>

</tt></font>