AW: [DRBD-user] Some weird behaviour

Lars Ellenberg Lars.Ellenberg at linbit.com
Thu May 13 23:03:56 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2004-05-13 20:51:52 +0100
\ Nuno Tavares:
> Em Thu, 13 May 2004 08:17:25 +0200, Martin Bene escreveu:
> 
> > Hi,
> > 
> >> resource drbd0 {
> >>   protocol = C
> >>   fsckcmd  = /bin/true
> >>   inittimeout=-10
> > 
> > Eep, what are you doing here? With this inittimeout you don't give drbd
> > sufficient time to connect with the 2nd node before trying to continue
> > startup on ist own.
> > 
> > If you use inittimeout, it should be as a last resort kind of thing with
> > a timeout of -600 or something similar - this will still allow your
> > system to start if just one system comes up after a power fail.
> 
> For testing purposes, -600 seems a lot to me!
> Anyway, what are the alternatives? "load-only"? 
> As a matter of fact, I've considered and commented out inittimeout, and
> uncommented load-only, since my DRBD is heartbeat-managed.
> Is this the right way to do it?

the point is: heartbeat does not know where the "most recent" data
lives. it only knows whether a node is up or not.
it will happily tell a node with outdated data to become primary.
thus you risk data corruption.
so only do this if you care more about availability than data interity,
i.e. it is more important to be online with *some* data,
than to be online with the most *up-to-date* transactions.

this situation only happens after total failure anyways, so you should
give your nodes enough time to boot, and fsck, and whatnot, and give DRBD 
time to connect and start to sync in the right direction.

> I'm just affraid that DRBD's algorithm contradicts what heartbeat thinks,
> so I didn't really care about DRBD choosing WHO should be Primary.
> Heartbeat will decide.

but heartbeat does not know.
you have been warned.

> > At a guess this short inittimeout is also the caus for the
> > "predetermined states are in contradiction to GC's" Messages: The box
> > that used to be secondary decided to start on ist own because of
> > inittimeout and subsequently was made primary be heartbeat, thus causing
> > the message when the former primary connected.
> 
> Ok, it's just a warning then. The old data is still consistent,
> right?

you "just" lost the most recent transactions ...
whether this matters to you depends on the kind of data,
and on your personal "risk factor" :)

	Lars Ellenberg



More information about the drbd-user mailing list