Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Feb 05, 2008 at 07:24:29PM -0700, Alex Dean wrote: > Martin Gombac wrote: > >Hi. > > > >I have two consistent nodes and want to bring one down to add another disk. > >In the mean time the other node will take over all resources. > >When first node comes back it will have an outdated resource(s). > >On this node drbd will be started first then heartbeat. > >Heartbeat will want to take over the resources immediately, while drbd > >(devices) are still syncing and make SyncSource primary. > > > >Would this pose a problem? > >Should heartbeat be started only after drbd synchronization finishes? > >The latter is how I used to do it but i don't think it's necessary. > > That's what I've always done. I only start heartbeat on a node which > could legitimately become primary. A SyncTarget, or any other node with > outdated/inconsistent data, should never be primary. So, I figure it > shouldn't have heartbeat running. right. even though you technically can make it primary while it has connection to good data, that would usually be bad practice. normally, you should have "auto_failback off", resp. the equivalent of a "default resource stickiness" in the order of 200 to 1000. so just because the node is back, heartbeat should NOT initiate an immediate failback -- any unnecessary service disruption should be avoided. but, if its CRM (soon: PaceMaker [name is to the point, btw, if slightly invidious]) you can put one node into "standby" mode, which will survive reboot. once it is all healthy, you switch to "online". if it is heartbeat "legacy 1" mode, you could/should have a maintenance runlevel without heartbeat, and switch to the "HA" heartbeat runlevel, once maintenance is over. btw, to wait in a script for the sync to finish, you can loop around "drbdsetup /dev/drbdX wait-sync" > You may be able to write some heartbeat CRM resource constraints to only > allow a node to start resources if drbd is in a consistent state, but > I'm not sure how to do that at the moment. seems to be rather difficult compared with the other options. but theoretically one should be able to get the "OCF" multi-state multi-instance master-slave etc. agent to refuse to become primary when its local disk is not UpToDate. but, again: all of this is not strictly necessary. it just feels "less right" to put a node without good local data into Primary role. (it also has a performance penalty: reads have to be served over the network, too). -- : Lars Ellenberg http://www.linbit.com : : DRBD/HA support and consulting sales at linbit.com : : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : __ please use the "List-Reply" function of your email client.