[DRBD-user] Behavior question

Fri Sep 7 13:44:04 CEST 2007

On Thu, Sep 06, 2007 at 11:19:42PM +0200, Uli Schellhaas wrote:
> Hello,
> 
> something unfunny happened today and i would like to know what i could 
> have done. v0.7.24
> 
> I was physically moving the nodes and had the primary working well, the 
> secondary was just booting up after 30 Minutes being disconected. -and i 
> triggered issues with the raid on the primary by using the "hotplug" 
> with a new drive, so it did what i configured:
> on-io-error   detach;
> 
> drbd0: Sorry, I have no access to good data anymore.
> EXT3-fs error (device drbd0): ext3_get_inode_loc: unable to read inode block
> 
> I got the Raid up again but i believe drbd marked the Data as corrupt or 
> something ? Even after a drbd restart it wouldnt work well.
> I shut down any service on the primary before the drbd restart and 
> needed to restart services as early as possible.

current situation:
outdated, but consistent, data on secondary.
"invalid" data (because of previous disk error) on Primary.
probably this is not invalid anymore, but in fact ok and the best data
you've got, but because of the disk error it was marked as invalid.

[*]

> So - being frightened of invalidating the secondary- i connected drbd
> again and primarys data was overwritten.

from DRBD's point of view, you connected Invalid (or Inconsistent)
with Consistent data.  the only way to sync was Consistent -> Invalid.

> I would have liked newer data on a maybe corrupt filesystem more than 
> older data on a healthy filesystem. Would a invalidate or discard data 
> command on the secondary before connecting have achieved that or would i 
> have ended up with no good data anywhere if i had done this ?

if you let DRBD connect Inconsistent with Inconsistent, you end up with
no usable data at all.

> In case the data was marked corrupt - could i have forced drbd to
> remove the flag ?

since you knew better than drbd in this case,
you should have helpt it.

best way to recover from [*] (one side used to be Primary, but now has
its data marked as inconsistent; other side is out-of-date) would have
been:
  on used-to-be-primary with the supposedly most recent data:
    stop everything; cat /proc/drbd should show "Unconfigured" now.
    drbdadm attach all; (StandAlone Secondary/Unknown Inconsistent)
    drbdadm -- --do-what-I-say primary all
     (this would be "--discard-data-of-peer" in drbd 8.
      basically it forcefully sets the local data status to Consistent
      respectively Up2Date, so you may now access it.
      then it goes active with that.)
    now: "StandAlone Primary/Unknown Consistent" (Up2Date in 8.0)
   then, fsck or whatever, make sure your data is ok.
   only then connect to the outdated stuff of the other node,
   and just to be sure, probably force a full sync.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.