[DRBD-user] On-line verification problem

Lars Ellenberg lars.ellenberg at linbit.com
Fri May 22 10:45:17 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, May 21, 2009 at 09:33:43AM -0600, David.Livingstone at cn.ca wrote:
> It seems like the "out-of-sync" log messages are output because the data
> in those sectors was changing during the verification.
> 
> Here's how I came up with that...
> 
> For a /var/log/messages statement like this (notice the time stamp):
> May 20 14:30:19 wimpas2 kernel: drbd0: Out of sync: start=137739248,
> size=8 (sectors)
> 
> We can run a "dd" command to peek at what data it's talking about on
> both servers:
> (on wimpas1):  sudo dd if=/dev/mapper/VolGroup01-LogVol00 iflag=direct
> bs=512 skip=137739248 count=8 of=/tmp/wimpas1-drbd-oos
> 
> (on wimpas2):  sudo dd if=/dev/mapper/VolGroup01-LogVol00 iflag=direct
> bs=512 skip=137739248 count=8 of=/tmp/wimpas2-drbd-oos
> 
> Comparing the two output files using "diff" showed they were the same,
> so that indicates replication worked properly.
> 
> Looking inside the files showed they were polling logs with timestamps
> from the same time that the /var/log/messages statement was output:
> 
> eg) (snipped for brevity, notice the time stamps 20th day, 14:30:16 -
> 14:30:22)
> time:20143016 REC fd:21 ff1216060100ef57000000000000f78f
> time:20143016 TRA fd:21 12ff14000100e0a6 size:8 dur:0 OK
...
> time:20143022 REC fd:21 ff0f160601002763000000000000f78f
> time:20143022 TRA fd:21 0fff1400010097c9 size:8 dur:0 OK
> 
> 
> So, the theory right now is that the "out-of-sync" messages were because
> the data in those sectors was changing during the verification and the
> "0 KB (0 bits) marked out-of-sync" means DRBD realized that. 

please also see:

http://thread.gmane.org/gmane.linux.kernel.drbd.devel/790
http://thread.gmane.org/gmane.linux.network.drbd/14850


I'd suggest that "somthing" modified in-flight buffers,
then re-submitted them.

the drbd online-verify (as well as the syncer) is supposed to "lock" the
regions it currently compares against application IO, so it should do
the compare when no application IO is in-flight (on that region).
but it may hit such a "transient" not-in-sync thingy.

iirc, a few "modify in-flight buffer" things have been tackled in the
upstream kernel during the "bio integrity" work in recent kernels.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list