[DRBD-user] proto c - corrupt files - directories missing

Tue Jan 7 16:51:06 CET 2014

Hello

On Tue, 7 Jan 2014 16:47:04 +0100
Stefan Bauer <stefan.bauer at cubewerk.de> wrote:

> -----Ursprüngliche Nachricht-----
> Von:	Christian Hammers <chammers at netcologne.de>
> Gesendet:	Di 07.01.2014 15:48
> Betreff:	Re: [DRBD-user] proto c - corrupt files - directories missing
> An:	Stefan Bauer <stefan.bauer at cubewerk.de>; 
> CC:	drbd-user at lists.linbit.com; 
> > Hello
> > 
> > Have you tried "drbdadm verify clusterdb_res" to check if the secondary is
> > really identical to the primary? 
> > 
> > I would assume that DRBD only detects corrupted data using checksum when 
> > reading and out-of-date data when comparing those checksums on write requests
> > but it cannot detect that the data on your secondary has accidentaly become
> > out-of-date.
> 
> Hi Christian,
> 
> Thank you for your time.
> 
> now it gets strange! I just started a resync after the second node was offline.
> 
> [438614.558716] block drbd0: updated sync UUID A712D7A357B968B7:5410F28F1CEC98E8:540FF28F1CEC98E8:736AAB121F6173C0
> [439240.761231] block drbd0: Resync done (total 626 sec; paused 0 sec; 111204 K/sec)
> [439240.761244] block drbd0: updated UUIDs A712D7A357B968B7:0000000000000000:5410F28F1CEC98E8:540FF28F1CEC98E8
> [439240.761255] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
> [439240.854011] block drbd0: bitmap WRITE of 8933 pages took 23 jiffies
> [439240.854023] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> 
> After this i ran a verify and a bunch of out-of-sync were detected:

If your secondary was just offline for a short time, it only catches 
up the changes that were made during this time. It can therefore re-syncs
quite fast but it won't detect out-of-sync blocks that have existed
long ago.

The following messages explain why the filesystem on your secondary node
looks strange :)

> [439694.710861] block drbd0: Out of sync: start=73992, size=8 (sectors)
> [439695.086765] block drbd0: Out of sync: start=270448, size=8 (sectors)
> [439695.087157] block drbd0: Out of sync: start=270768, size=8 (sectors)
> [439695.087293] block drbd0: Out of sync: start=270824, size=8 (sectors)
...

> and so on. Am i right, after the whole verify process is 
> finished, my data should be in "real" sync? :)

No, according to the manpage "drbdadm verify" only marks blocks as
invalid but does not repair them. I found that unexpected, too.

Try "drbdadm invalidate clusterdb_res" on your *secondary* node.
This will start a complete resync from the primary node and
copies every block whose checksum mismatches. Can take some hours, 
though.

bye,

-christian-