[DRBD-user] Failover Behavior in Server-Crash Scenario

Thu Dec 6 23:36:32 CET 2012

On Thu, Dec 6, 2012 at 5:38 PM, Robinson, Eric <eric.robinson at psmnv.com> wrote:
> With Pacemaker 1.1.8 and drbd 8.4.2, we are observing that when the primary
> node is put into standby mode ('crm node standby') the drbd resource on the
> secondary node refuses to be promoted because it is in a WFConnection state.
> Is this normal and by design?

That it is in the WFConnection state is perfectly normal, and has
always been that way. If you put one node in standby, it does the
equivalent of a manual "drbdadm down", meaning it tears down the
network connection along with detaching from the backing disk. After
that, the peer keeps "Waiting For [a] Connection" for whenever the
standby node comes back.

That, though, should not keep the remaining online node from
promoting. Have you checked your Pacemaker and DRBD logs to determine
whether (a) Pacemaker is not even _attempting_ a promotion, or (b)
Pacemaker does attempt it but it fails at the DRBD level?

> I don't recall seeing this behavior on our
> older clusters. What is supposed to happen when the primary node is put in
> standby or suddenly crashes?

Absent any constraints to restrict this, Pacemaker should promote the
surviving node to the Master role, and start dependent resources
there. Which is, I believe, exactly what you're rightfully expecting.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now