AW: [DRBD-user] Re: Double Failure

Martin Bene martin.bene at icomedias.com
Thu Apr 22 16:49:48 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Tim,

> After reading the docs, and the example config file, my 
> understanding is
> as follows, from the point of view of data integrity (ignoring the
> availability): (I'm only considering inittimeout and load-only here)
> 
> - initttimeout = 0 : SAFE (will wait for operator)
> - inittimeout = positive : UNSAFE (will force primary after timeout)
> - inittimeout = negative : SAFE (will start, but in secondary mode)
> - load-only : SAFE (will start, but in secondary mode)
> 
> However, this seems at odds with your statement "don't use them". My
> understanding was that only setting inittimeout to a positive 
> value could compromise the data.

I'd say: Wrong. Only safe setting is "0". 
Assumption: Node A is in startup process, node B is down as per your
example.

With negative inittimeout drbd startup will finish without operator
intervention. The init process will now start heartbeat, which will
proceed to aquire resources after ist initdead setting expires. 

Net result: if you set inittimeout to anything other than 0, the system
will start up when the other node is unreachable and proceed on the
assumption that it's got good data.

> In particular, I was thinking of using load-only and letting heartbeat
> decide (via datadisk) what to do. Is this safe? As I understand it,
> executing "datadisk start" will never cause a node to become 
> primary if the state of the other node is "Unknown", therefore this 
> should be safe - is that correct, or am I missing something?

As far as I know datadisk will happily make the device primary if the
other side is unreachable. Besides, heartbeat doesn't have any
information that would enable it to make a good decision on which side
has the most current data - drbd does.

So: if data integrity is important to you, set inittimeout to 0. You
won't loose data by making active a node with not up-to-date data but
you also won't be able to automatically start your cluster unless both
systems are available.

On the other hand, for some of my systems availability of the services
is more important than guaranteeing up to date data; starting up with a
webserver that might miss the last few updates can be preferable to not
starting up at all without user intervention.

Bye, Martin
********************************************************
Martin Bene,                 CTO
icomedias GmbH,              A-8020 Graz, Entenplatz 1b
t +43 (316) 721671-14,       f +43 (316) 721671-26
e martin.bene at icomedias.com, i http://www.icomedias.com
******************************************************** 



More information about the drbd-user mailing list