[DRBD-user] On-line verification problem

David.Livingstone at cn.ca David.Livingstone at cn.ca
Thu May 28 22:31:59 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


We confirmed that for speed considerations the circular buffers do 
not use fsync(). Data loss is not a concern in a crash.

We have also confirmed on subsequent verifies that the marked 
out-of-sync blocks are all from these logs.

Thanks

> > On Thu, May 21, 2009 at 09:33:43AM -0600, David.Livingstone at cn.ca 
wrote:
> > > It seems like the "out-of-sync" log messages are output because the
> data
> > > in those sectors was changing during the verification.
> > >
> > > Here's how I came up with that...
> > >
> > > For a /var/log/messages statement like this (notice the time stamp):
> > > May 20 14:30:19 wimpas2 kernel: drbd0: Out of sync: start=137739248,
> > > size=8 (sectors)
> > >
> > > We can run a "dd" command to peek at what data it's talking about on
> > > both servers:
> > > (on wimpas1):  sudo dd if=/dev/mapper/VolGroup01-LogVol00 
iflag=direct
> > > bs=512 skip=137739248 count=8 of=/tmp/wimpas1-drbd-oos
> > >
> > > (on wimpas2):  sudo dd if=/dev/mapper/VolGroup01-LogVol00 
iflag=direct
> > > bs=512 skip=137739248 count=8 of=/tmp/wimpas2-drbd-oos
> > >
> > > Comparing the two output files using "diff" showed they were the 
same,
> > > so that indicates replication worked properly.
> > >
> > > Looking inside the files showed they were polling logs with 
timestamps
> > > from the same time that the /var/log/messages statement was output:
> > >
> > > eg) (snipped for brevity, notice the time stamps 20th day, 14:30:16 
-
> > > 14:30:22)
> > > time:20143016 REC fd:21 ff1216060100ef57000000000000f78f
> > > time:20143016 TRA fd:21 12ff14000100e0a6 size:8 dur:0 OK
> > ...
> > > time:20143022 REC fd:21 ff0f160601002763000000000000f78f
> > > time:20143022 TRA fd:21 0fff1400010097c9 size:8 dur:0 OK
> > >
> > >
> > > So, the theory right now is that the "out-of-sync" messages were
> because
> > > the data in those sectors was changing during the verification and 
the
> > > "0 KB (0 bits) marked out-of-sync" means DRBD realized that.

> > please also see:

> > http://thread.gmane.org/gmane.linux.kernel.drbd.devel/790
> > http://thread.gmane.org/gmane.linux.network.drbd/14850

> Lars,

> Thanks for the reply.

> I've reviewed the links above(head is now spinning:). With respect
> to "crash safe" applications the out-of-sync disk portions that
> we looked at were poller and alarm daemon log files.  They use circular
> logs, so they
> would be overwriting a file they've created.  We're currently checking
> whether or not they use fsync().
> 
> As shown in the initial post we are using ext3.

> Anything else we could be checking ?

> Thanks

> >
> > I'd suggest that "somthing" modified in-flight buffers,
> > then re-submitted them.

> > the drbd online-verify (as well as the syncer) is supposed to "lock" 
the
> > regions it currently compares against application IO, so it should do
> > the compare when no application IO is in-flight (on that region).
> > but it may hit such a "transient" not-in-sync thingy.

> > iirc, a few "modify in-flight buffer" things have been tackled in the
> > upstream kernel during the "bio integrity" work in recent kernels.

> > --
> > : Lars Ellenberg
> > : LINBIT | Your Way to High Availability
> > : DRBD/HA support and consulting http://www.linbit.com

> > DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
> > __
> > please don't Cc me, but send to list   --   I'm subscribed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090528/41057e6b/attachment.htm>


More information about the drbd-user mailing list