[DRBD-user] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

Darren.Mansell at opengi.co.uk Darren.Mansell at opengi.co.uk
Fri Sep 30 11:22:42 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Thanks Lars, that makes perfect sense.

I now need to find a STONITH agent for VMware test machines without
hardware STONITH..

Regards,
Darren

-----Original Message-----
From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Lars Ellenberg
Sent: 30 September 2011 08:56
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

On Wed, Sep 28, 2011 at 11:45:43AM +0100, Darren.Mansell at opengi.co.uk
wrote:
> Hello all.
> 
>  
> 
> I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2 
> for dual-primary shared FS.

You need enabled, and reliably working, stonith, to successfully use
cluster file systems. Regardless of whether those are on top of DRBD or
something else.

The cluster file system blocks, until it got confirmation about the
other node being fenced.

> I've followed the instructions on the DRBD applications site and it 
> works really well.
> 
>  
> 
> However, if I 'pull the plug' on a node, the other node continues to 
> operate the clones, but the filesystem is locked and inaccessible (the

> monitor op works for the filesystem, but fails for the OCFS2 
> resource.)
> 
>  
> 
> If I do a reboot one node, there are no problems and I can continue to

> access the OCFS2 FS.
> 
>  
> 
> After I pull the plug:
> 
>  
> 
> Online: [ test-odp-02 ]
> 
> OFFLINE: [ test-odp-01 ]
> 
>  
> 
> Resource Group: Load-Balancing
> 
>      Virtual-IP-ODP     (ocf::heartbeat:IPaddr2):       Started
> test-odp-02
> 
>      Virtual-IP-ODPWS   (ocf::heartbeat:IPaddr2):       Started
> test-odp-02
> 
>      ldirectord (ocf::heartbeat:ldirectord):    Started test-odp-02
> 
> Master/Slave Set: ms_drbd_ocfs2 [p_drbd_ocfs2]
> 
>      Masters: [ test-odp-02 ]
> 
>      Stopped: [ p_drbd_ocfs2:1 ]
> 
> Clone Set: cl-odp [odp]
> 
>      Started: [ test-odp-02 ]
> 
>      Stopped: [ odp:1 ]
> 
> Clone Set: cl-odpws [odpws]
> 
>      Started: [ test-odp-02 ]
> 
>      Stopped: [ odpws:1 ]
> 
> Clone Set: cl_fs_ocfs2 [p_fs_ocfs2]
> 
>      Started: [ test-odp-02 ]
> 
>      Stopped: [ p_fs_ocfs2:1 ]
> 
> Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt]
> 
>      Started: [ test-odp-02 ]
> 
>      Stopped: [ g_ocfs2mgmt:1 ]
> 
>  
> 
> Failed actions:
> 
>     p_o2cb:0_monitor_10000 (node=test-odp-02, call=19, rc=-2, 
> status=Timed Out): unknown
> 
> exec error
> 
>  
> 
>  
> 
> test-odp-02:~ # mount
> 
> /dev/drbd0 on /opt/odp type ocfs2
> (rw,_netdev,noatime,cluster_stack=pcmk)
> 
>  
> 
> test-odp-02:~ # ls /opt/odp
> 
> ...just hangs forever...
> 
>  
> 
> If I then power test-odp-01 back on, everything fails back fine and 
> the ls command suddenly completes.
> 
>  
> 
> It seems to me that OCFS2 is trying to talk to the node that has 
> disappeared and doesn't time out. Does anyone have any ideas? 
> (attached CRM and DRBD configs)
> 
>  
> 
> Many thanks.
> 
>  
> 
> Darren Mansell
> 
> 
> 
>  
> 


> node test-odp-01
> node test-odp-02 \
>         attributes standby="off"
> primitive Virtual-IP-ODP ocf:heartbeat:IPaddr2 \
>         params lvs_support="true" ip="2.21.15.100" cidr_netmask="8"
broadcast="2.255.255.255" \
>         op monitor interval="1m" timeout="10s" \
>         meta migration-threshold="10" failure-timeout="600"
> primitive Virtual-IP-ODPWS ocf:heartbeat:IPaddr2 \
>         params lvs_support="true" ip="2.21.15.103" cidr_netmask="8"
broadcast="2.255.255.255" \
>         op monitor interval="1m" timeout="10s" \
>         meta migration-threshold="10" failure-timeout="600"
> primitive ldirectord ocf:heartbeat:ldirectord \
>         params configfile="/etc/ha.d/ldirectord.cf" \
>         op monitor interval="2m" timeout="20s" \
>         meta migration-threshold="10" failure-timeout="600"
> primitive odp lsb:odp \
>         op monitor interval="10s" enabled="true" timeout="10s" \
>         meta migration-threshold="10" failure-timeout="600"
> primitive odpwebservice lsb:odpws \
>         op monitor interval="10s" enabled="true" timeout="10s" \
>         meta migration-threshold="10" failure-timeout="600"
> primitive p_controld ocf:pacemaker:controld \
>         op monitor interval="10s" enabled="true" timeout="10s" \
>         meta migration-threshold="10" failure-timeout="600"
> primitive p_drbd_ocfs2 ocf:linbit:drbd \
>         params drbd_resource="r0" \
>         op monitor interval="10s" enabled="true" timeout="10s" \
>         meta migration-threshold="10" failure-timeout="600"
> primitive p_fs_ocfs2 ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/r0" directory="/opt/odp"
fstype="ocfs2" options="rw,noatime" \
>         op monitor interval="10s" enabled="true" timeout="10s" \
>         meta migration-threshold="10" failure-timeout="600"
> primitive p_o2cb ocf:ocfs2:o2cb \
>         op monitor interval="10s" enabled="true" timeout="10s" \
>         meta migration-threshold="10" failure-timeout="600"
> group Load-Balancing Virtual-IP-ODP Virtual-IP-ODPWS ldirectord group 
> g_ocfs2mgmt p_controld p_o2cb ms ms_drbd_ocfs2 p_drbd_ocfs2 \
>         meta master-max="2" clone-max="2" notify="true"
> clone cl-odp odp
> clone cl-odpws odpws
> clone cl_fs_ocfs2 p_fs_ocfs2 \
>         meta target-role="Started"
> clone cl_ocfs2mgmt g_ocfs2mgmt \
>         meta interleave="true"
> location Prefer-Node1 ldirectord \
>         rule $id="prefer-node1-rule" 100: #uname eq test-odp-01 order 
> o_ocfs2 inf: ms_drbd_ocfs2:promote cl_ocfs2mgmt:start 
> cl_fs_ocfs2:start order tomcatlast1 inf: cl_fs_ocfs2 cl-odp order 
> tomcatlast2 inf: cl_fs_ocfs2 cl-odpws property 
> $id="cib-bootstrap-options" \
>         dc-version="1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         no-quorum-policy="ignore" \
>         start-failure-is-fatal="false" \
>         stonith-action="reboot" \
>         stonith-enabled="false" \
>         last-lrm-refresh="1317207361"

> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list