[DRBD-user] endless device verification and strange checksums

Lars Ellenberg lars.ellenberg at linbit.com
Thu Aug 7 14:29:52 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Aug 06, 2008 at 02:22:07PM +0200, Schmidt, Florian wrote:
> Hello everyone again,
> 
> I just added verify-alfg crc32c; to my drbd.conf and ran drbdadm verify
> all
> 
> After that I saw lots of out-of-sync secrots in dmesg.
> I thought: OK, lets sync and then everything should be alright.
> 
> But after the sync (drbdadm disconnect and connect all on primary one),
> a re-run of drbdadm verify all still found out-of-sync sectors.
> 
> So I googled around and saw that there where also people having this
> problem before.
> 
> I found the following command and executed this on a sector, reported as
> out of sync:
> 
> drbd3: Out of sync: start=192896, size=24 (sectors)
> <lots of lines of them>
> 
> [root at saprouter1 ~]# dd if=/dev/sda9 skip=192896 bs=512 count=24
> iflag=direct | openssl md5
> 24+0 records in
> 24+0 records out
> 12288 bytes (12 kB) copied, 6.06248 seconds, 2.0 kB/s
> 6d79f038d772cac9ce34af477574ef7d
> 
> [root at saprouter2 ~]#  dd if=/dev/sda9 skip=192896 bs=512 count=24
> iflag=direct | openssl md5
> 24+0 records in
> 24+0 records out
> 12288 bytes (12 kB) copied, 1.93692 seconds, 6.3 kB/s
> 30704f13388d5e23d644e44996cc2b62
> 
> Uh, just found something very strange after executing the command again:
> 
> [root at saprouter2 ~]#  dd if=/dev/sda9 skip=192896 bs=512 count=24 iflag=direct | openssl md5
> 30704f13388d5e23d644e44996cc2b62
> [root at saprouter2 ~]#  dd if=/dev/sda9 skip=192896 bs=512 count=24 iflag=direct | openssl md5
> a2915f04e3533f1f329ac9eab8a875f9
> [root at saprouter2 ~]#  dd if=/dev/sda9 skip=192896 bs=512 count=24 iflag=direct | openssl md5
> b94b4be0da055b290d4ae4d421cf17f0
> [root at saprouter2 ~]#  dd if=/dev/sda9 skip=192896 bs=512 count=24 iflag=direct | openssl md5
> b94b4be0da055b290d4ae4d421cf17f0
> [root at saprouter2 ~]#  dd if=/dev/sda9 skip=192896 bs=512 count=24 iflag=direct | openssl md5
> a2915f04e3533f1f329ac9eab8a875f9
> [root at saprouter2 ~]#  dd if=/dev/sda9 skip=192896 bs=512 count=24 iflag=direct | openssl md5
> 6d79f038d772cac9ce34af477574ef7d
> 
> How could there be sometimes different checksums? O_o
> Problems with the RAID-driver? 

So what you are saying is
 you don't have any activity on the device
 yet, if you look at the same location, the data changes.

if you can reproduce this behaviour,
please "tee" the raw data somewhere,
so you can diff the hexdumps, and see if it is
 * bit flips
 * some words changed
 * totally unrelated data
 * anything else that may hint to something.

but yes,
if nothing is writing to the device,
and you read it several times,
and you get different data back
and it is not supposed to spontaneously change its content
(i.e. it is not random generator),
that device, or the communication path to it,
or its driver, is broken.

is that a RAID5? RAID10?
did you rebuild/resilver/consistency check it lately?
any diagnose tools to determine its own view about its health status
and the state of the world?

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list