[DRBD-user] drbd turned on me, and bit my hand this morn

Tue Dec 6 21:06:32 CET 2005

/ 2005-12-06 14:39:20 -0500
\ Brad Barnett:
> I use a meta disk though, because I wanted to be able to move to
> ext3 without worries.  This device was the primary, so.. what
> could it be doing during startup?  The primary should already
> have all the data on the partition as it sits, shouldn't it?
> 
> As for storage size, I'm not sure what you mean here by
> initializing.  The drive was already initialized and prepped..
> I was just trying to get the drbd volume readable after a
> reboot.

drbd uses meta data, the amount of which is about proportional to
actual storage size. independently of where this meta data is
stored, it has to be read (and updated) on "drbdadm up" time.
this is "drbd device initialization".
it takes some time, some more time for bigger storage.

it may apear to take "forever" on drbd of several TB storage when
the lower level raid is rebuilding at the same time, and the
io-scheduler/driver/controller combination optimizes for high
throughput, and thus kills performance when you need low latency
instead.

> This is only a suggestion, as I know you have about 1000 other
> things to do.  However, it would be nice if /proc/drbd showed
> status information during this setup stage.  

we'll have a look at this.

> Anyhow, aside from all of this, I am worried about this:
> 
> "these timeouts are all not active, yet."

as I tried to point out: all configurable timeouts are for
drbd _network_ connection/dialog things.

your setup seemed to be stuck in the drbd _disk_ attach/initialization
phase, and there are no timeouts to tune there.
we need to find out where and why exactly it gets stuck, and fix that.

my suggestion is that it does not really get stuck but just initializes
very slowly, for whatever reason. if we can confirm this, and find the
reason, we probably can fix it there. and this is most likely not a
generic problem with drbd, but a special one for your setup.

> This is because there will definitely be times when the
> secondary will go away for a long period of time.  I won't be
> able to have the primary down, just waiting for the secondary to
> come back.  Does wfc-timeout work?  It didn't seem to. :/

as pointed out before: you currently don't even get to the "I am ready
to talk to my peer, lets see if I get a timeout trying to do so" stage.

anyways, if the primary runs, and you take the secondary down, the
primary happily hums along further.  typically.

> Without the wfc-timeout option, drbd may become unusable for me.
> The whole reason we went with it, is so that the secondary can
> disappear for a while (upgrades, etc), then the primary
> (upgrades, etc).. and during this time we would have full access
> to our nfs partition.
>
> Now it seems like one power failure / issue could cause the box
> to reboot, and wait forever.. leaving me with no recourse but to
> degrade to a pure ext3 device mount...

that is how we do it all the time. though, we tend to NOT use
wfc-timeout at all, because you really risk data integrity that way.

you always should be able to answer "yes" at that drbdadm prompt.
of course, at remote installations you should have a terminal server,
power switch, tell grub about the terminal switch, make sure that you
are able to ssh into the box asap during the boot phase etc.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.