[DRBD-user] Pacemaker + Dual Primary, handlers and fail-back issues

Thu Mar 1 16:10:56 CET 2012

On Wed, Feb 29, 2012 at 04:08:59PM -0300, Daniel Grunblatt wrote:
> Hi,
> 
> I have a 2 node cluster with sles11sp1, with the latest patches.
> Configured Pacemaker, dual primary drbd and xen.
> 
> Here's the configuration:
...

> Now, according to this page: <http://www.drbd.org/users-guide-8.3/s-pacemaker-fencing.html>http://www.drbd.org/users-guide-8.3/s-pacemaker-fencing.html
> 
> the last paragraph, says:
> Thus, if the DRBD replication link becomes disconnected, the
> crm-fence-peer.sh script contacts the cluster manager, determines
> the Pacemaker Master/Slave resource associated with this DRBD
> resource, and ensures that the Master/Slave resource no longer gets
> promoted on any node other than the currently active one.
> Conversely, when the connection is re-established and DRBD completes
> its synchronization process, then that constraint is removed and the
> cluster manager is free to promote the resource on any node again.
> 
> Unfortunately, that is not happening in my configuration and I don't
> understand why.

As Andreas already pointed out,
most likely you have the DRBD init script enabled,
which configures your DRBD during boot,
the resync is done so quickly that it finishes
before the pacemaker stack has established membership.

Thus the after-resync script runs before cluster comm is established,
and does not find a cluster to ask to remove that constraint...

> Here's what I'm doing:
> 1) rcnetwork stop on XM01
> 2) XM02 stonith's XM01 (so far, so good)
> 3) the VM migrates to XM02 (1 minute downtime which is more than fine)
> 4) XM01 comes back
> 5) The DRBD resources appear as Master/Slave (on dual primary!)
> 6) I can see some constraints generated by the handler in drbd.conf
> 7) xm02:~ # rcdrbd status
> drbd driver loaded OK; device status:
> version: 8.3.11 (api:88/proto:86-96)
> GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by
> phil at fat-tyre, 2011-06-29 11:37:11
> m:res       cs         ro                 ds                 p  mounted  fstype
> 0:vmsvn     Connected  Secondary/Primary  UpToDate/UpToDate  C
> 1:srvsvn1   Connected  Secondary/Primary  UpToDate/UpToDate  C
> 2:srvsvn2   Connected  Secondary/Primary  UpToDate/UpToDate  C
> 3:vmconfig  Connected  Secondary/Primary  UpToDate/UpToDate  C
> 
> They are all UPTODATE!
> 8) The constraints generated by the handler are still there. Waited
> a lifetime, still there...
> 9) Manually remove the constraints, the VM goes down for a little
> while and the DRBD resources are back as Master/Master.
> 
> is there anything wrong in my configuration? How can both nodes
> become Master on a fail back?
> 
> Thanks!
> Daniel

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com