[DRBD-user] what's the correct procedure for promoting secondary to primary (when primary is down)?

Fri Sep 21 16:17:08 CEST 2012

On Fri, Sep 21, 2012 at 4:27 AM, Felix Frank <ff at mpexnet.de> wrote:
> Hi,
>
> On 09/21/2012 01:01 AM, Lonni J Friedman wrote:
>> I'm running version 8.3.11 on a two node Fedora16-x86_64 setup.  I've
>> read all of the official documentation, but I'm unclear what the
>> correct procedure is for (manually) promoting the secondary to become
>> primary when the original primary is down.  I thought it would be as
>> simple as "drbdadm -f primary r0", but that keeps failing:
>> 0: State change failed: (-2) Need access to UpToDate data
>> Command 'drbdsetup 0 primary' terminated with exit code 17
>>
>> I've read a few scattered threads that I need to first invalidate the
>> peer (original primary), but that also fails, because its (still)
>> down.  Surely I'm missing something simple here?
>
> You are.
>
> Invalidating the peer has nothing to do with this.
>
> The above error means that your secondary machine is in an unclean
> state, e.g. diskless or inconsistent.
>
> Have a hard look at /proc/drbd. Any HA setup with DRBD must (imho)
> include monitoring of the DRBD health, since (obviously) nothing will
> break immediately if you loose your redundancy (because that would mean
> lousy availability), but if you don't take steps to restore redundancy,
> well, failover is no option then.
> I suspect that's what happened to you at some point and was never fixed.

I guess I should have elaborated that this isn't a production setup,
and I'm not at the stage of trying to accomplish HA.  My goal was to
understand how to manually convert the secondary to primary, such that
I could later implement the HA setup which would handle it in an
automated fashion.

I'm trying to do a proof of concept, in which I was attempting to
synthetically simulate the failure of the primary.   I did this by
bringing down the network interface on the primary.  Prior to doing
that, drbd-overview reported everything as being in sync
(UpToDate/UpToDate).

So I'm still completely confused how to handle this scenario.  The
only documented mechanism that I can find for promoting a secondary to
primary is when the primary is still up & healthy.  Other than for
some scheduled downtime maintenance of the primary, I'm not sure when
that scenario would ever be useful.

What is the correct procedure for forcing a secondary to become
primary if the primary is down/gone?  If its documented somewhere,
please point me there, and I'll gladly follow it.  I just can't find
it anywhere.

thanks!