[DRBD-user] drbd turned on me, and bit my hand this morn

Brad Barnett lists at l8r.net
Tue Dec 6 21:22:08 CET 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, 6 Dec 2005 21:06:32 +0100
Lars Ellenberg <Lars.Ellenberg at linbit.com> wrote:

> / 2005-12-06 14:39:20 -0500
> \ Brad Barnett:
> > I use a meta disk though, because I wanted to be able to move to
> > ext3 without worries.  This device was the primary, so.. what
> > could it be doing during startup?  The primary should already
> > have all the data on the partition as it sits, shouldn't it?
> > 
> > As for storage size, I'm not sure what you mean here by
> > initializing.  The drive was already initialized and prepped..
> > I was just trying to get the drbd volume readable after a
> > reboot.
> 
> drbd uses meta data, the amount of which is about proportional to
> actual storage size. independently of where this meta data is
> stored, it has to be read (and updated) on "drbdadm up" time.
> this is "drbd device initialization".
> it takes some time, some more time for bigger storage.
> 

Ok, I see what you're on about here...

> it may apear to take "forever" on drbd of several TB storage when
> the lower level raid is rebuilding at the same time, and the
> io-scheduler/driver/controller combination optimizes for high
> throughput, and thus kills performance when you need low latency
> instead.
> 
> > This is only a suggestion, as I know you have about 1000 other
> > things to do.  However, it would be nice if /proc/drbd showed
> > status information during this setup stage.  
> 
> we'll have a look at this.
> 

Thanks

> > Anyhow, aside from all of this, I am worried about this:
> >
> > "these timeouts are all not active, yet."
> 
> as I tried to point out: all configurable timeouts are for
> drbd _network_ connection/dialog things.
> 
> your setup seemed to be stuck in the drbd _disk_ attach/initialization
> phase, and there are no timeouts to tune there.
> we need to find out where and why exactly it gets stuck, and fix that.
> 
> my suggestion is that it does not really get stuck but just initializes
> very slowly, for whatever reason. if we can confirm this, and find the
> reason, we probably can fix it there. and this is most likely not a
> generic problem with drbd, but a special one for your setup.
> 

Ok, well, it didn't seem to start slowly until now... and there is no
reason I can think of for the slowdown.  The raid was not in degraded
mode, and nothing is running on that box at this point.. everything is
waiting for drbd to startup.

As I said, I did wait for 15 minutes at one point in time.  Several times
for > 5 minutes.  Nothing seemed to occur.

I guess I'll have to do more testing once I get my full drbd running
again.  


> > This is because there will definitely be times when the
> > secondary will go away for a long period of time.  I won't be
> > able to have the primary down, just waiting for the secondary to
> > come back.  Does wfc-timeout work?  It didn't seem to. :/
> 
> as pointed out before: you currently don't even get to the "I am ready
> to talk to my peer, lets see if I get a timeout trying to do so" stage.
> 

Ok, thanks.. it's good to know what's happening at this stage.  I was
trying to see what tcpdump was up to, and so on, at this point...

> anyways, if the primary runs, and you take the secondary down, the
> primary happily hums along further.  typically.
> 
> > Without the wfc-timeout option, drbd may become unusable for me.
> > The whole reason we went with it, is so that the secondary can
> > disappear for a while (upgrades, etc), then the primary
> > (upgrades, etc).. and during this time we would have full access
> > to our nfs partition.
> >
> > Now it seems like one power failure / issue could cause the box
> > to reboot, and wait forever.. leaving me with no recourse but to
> > degrade to a pure ext3 device mount...
> 
> that is how we do it all the time. though, we tend to NOT use
> wfc-timeout at all, because you really risk data integrity that way.

As long as I can actually ssh in, and tell the box to override the wait,
I'm happy with that.  I can understand why waiting in such a case might be
a good idea...

> 
> you always should be able to answer "yes" at that drbdadm prompt.

Does the prompt work over stdout too?  I ask, because even if I can't get
a serial console going, for whatever reasons, at least I could then ssh in
and setup things manually...

> of course, at remote installations you should have a terminal server,
> power switch, tell grub about the terminal switch, make sure that you
> are able to ssh into the box asap during the boot phase etc.
> 

We can stonith things, at least... so try not to make too much fun of my
paltry setup ;)))








More information about the drbd-user mailing list