[DRBD-user] very large out-of-sync (oos) value yet drbd-overview claims UpToDate/UpToDate

Thu Nov 1 21:37:45 CET 2012

I don't know anything about drbd-overview, I just cat /proc/drbd.

But, I bet it's echoing the same information.

drbd keeps all the bytes in sync that it knows about (UpToDate). The changes it doesn't know about are found by verify. Disconnect/Connect syncs them back up.

If you start with dirty disks, set up drbd and do not sync them, and mkfs a file system on the primary, the disks will be absolutely UpToDate in the blocks that matter for the file system, and horribly out of sync in the blocks that don't matter to anybody at all. Verify will find the oos blocks and mark them for syncing, but the hypothetical file system is still consistent.

Just do the Disconnect/Connect and you'll have oos zero AND UpToDate.

Dan

-----Original Message-----
From: Lonni J Friedman [mailto:netllama at gmail.com] 
Sent: Thursday, November 01, 2012 4:31 PM
To: Dan Barker
Cc: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview claims UpToDate/UpToDate

Thanks, that answers my 2nd question, but not my 1st question.
Shouldn't drbd-overview be treating this as a not UpToDate scenario?

On Thu, Nov 1, 2012 at 6:08 AM, Dan Barker <dbarker at visioncomm.net> wrote:
> There is an on-error event handler. Mine sends me email if verify 
> fails (runs weekly, one resource each of M, Tu, W, Th nights).
>
> Dan
>
> In my Global "handlers" section:
>
> out-of-sync      "/usr/lib/drbd/notify-out-of-sync.sh <myemail>";
>
>
>
> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com
> [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Lonni J 
> Friedman
> Sent: Wednesday, October 31, 2012 6:02 PM
> To: drbd-user at lists.linbit.com
> Subject: [DRBD-user] very large out-of-sync (oos) value yet 
> drbd-overview claims UpToDate/UpToDate
>
> I've got a drbd setup with 8.3.11.  I ran a manual verify, and once it 
> completed it reported:
>
> [23479.620066] block drbd0: Online verify  done (total 23136 sec; 
> paused 0 sec; 73748 K/sec) [23479.702176] block drbd0: Online verify 
> found 9651098 4k block out of sync!
> [23479.745988] block drbd0: conn( VerifyT -> Connected ) 
> [23479.788996] block drbd0: helper command: /sbin/drbdadm out-of-sync
> minor-0
> [23479.839348] block drbd0: helper command: /sbin/drbdadm out-of-sync
> minor-0 exit code 0 (0x0)
> [23479.961245] block drbd0: bitmap WRITE of 2763 pages took 34 jiffies 
> [23480.006527] block drbd0: 37 GB (9651098 bits) marked out-of-sync by 
> on disk bit-map.
>
> This isn't entirely surprising, as the secondary node was down for a 
> long time due to hardware problems.  However, what is surprising is 
> that drbd-overview still reports that everything is UpToDate:
> $ drbd-overview
>   0:sdb  Connected Secondary/Primary UpToDate/UpToDate C r-----
>
> Shouldn't this huge number of out of sync bits cause drbd-overview to
> report something other than UpToDate for the Secondary node?   If not,
> then how does one actually programattically detect that a verification 
> has failed?  Parsing dmesg is going to be a huge kludge, and not 
> likely to be reliable.
>
> thanks