[DRBD-user] quick question on shuting down one node

Wed Feb 6 09:11:17 CET 2008

On Tue, Feb 05, 2008 at 07:24:29PM -0700, Alex Dean wrote:
> Martin Gombac wrote:
> >Hi.
> >
> >I have two consistent nodes and want to bring one down to add another disk.
> >In the mean time the other node will take over all resources.
> >When first node comes back it will have an outdated resource(s).
> >On this node drbd will be started first then heartbeat.
> >Heartbeat will want to take over the resources immediately, while drbd 
> >(devices) are still syncing and make SyncSource primary.
> >
> >Would this pose a problem?
> >Should heartbeat be started only after drbd synchronization finishes? 
> >The latter is how I used to do it but i don't think it's necessary.
> 
> That's what I've always done.  I only start heartbeat on a node which 
> could legitimately become primary.  A SyncTarget, or any other node with 
> outdated/inconsistent data, should never be primary.  So, I figure it 
> shouldn't have heartbeat running.

right. even though you technically can make it primary while it has
connection to good data, that would usually be bad practice.

normally, you should have "auto_failback off", resp. the equivalent of a
"default resource stickiness" in the order of 200 to 1000.  so just
because the node is back, heartbeat should NOT initiate an immediate
failback -- any unnecessary service disruption should be avoided.

but, if its CRM (soon: PaceMaker [name is to the point, btw,
if slightly invidious]) you can put one node into "standby" mode,
which will survive reboot.
once it is all healthy, you switch to "online".
if it is heartbeat "legacy 1" mode, you could/should have a
maintenance runlevel without heartbeat, and switch to the
"HA" heartbeat runlevel, once maintenance is over.

btw, to wait in a script for the sync to finish,
you can loop around "drbdsetup /dev/drbdX wait-sync"

> You may be able to write some heartbeat CRM resource constraints to only 
> allow a node to start resources if drbd is in a consistent state, but 
> I'm not sure how to do that at the moment.

seems to be rather difficult compared with the other options.
but theoretically one should be able to get the "OCF" multi-state
multi-instance master-slave etc. agent to refuse to become primary
when its local disk is not UpToDate.

but, again:
all of this is not strictly necessary.

it just feels "less right" to put a node without good local data
into Primary role.  (it also has a performance penalty: reads have to be
served over the network, too).

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.