[DRBD-user] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

Darren.Mansell at opengi.co.uk Darren.Mansell at opengi.co.uk
Wed Sep 28 12:45:43 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello all.

 

I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2
for dual-primary shared FS.

 

I've followed the instructions on the DRBD applications site and it
works really well.

 

However, if I 'pull the plug' on a node, the other node continues to
operate the clones, but the filesystem is locked and inaccessible (the
monitor op works for the filesystem, but fails for the OCFS2 resource.)

 

If I do a reboot one node, there are no problems and I can continue to
access the OCFS2 FS.

 

After I pull the plug:

 

Online: [ test-odp-02 ]

OFFLINE: [ test-odp-01 ]

 

Resource Group: Load-Balancing

     Virtual-IP-ODP     (ocf::heartbeat:IPaddr2):       Started
test-odp-02

     Virtual-IP-ODPWS   (ocf::heartbeat:IPaddr2):       Started
test-odp-02

     ldirectord (ocf::heartbeat:ldirectord):    Started test-odp-02

Master/Slave Set: ms_drbd_ocfs2 [p_drbd_ocfs2]

     Masters: [ test-odp-02 ]

     Stopped: [ p_drbd_ocfs2:1 ]

Clone Set: cl-odp [odp]

     Started: [ test-odp-02 ]

     Stopped: [ odp:1 ]

Clone Set: cl-odpws [odpws]

     Started: [ test-odp-02 ]

     Stopped: [ odpws:1 ]

Clone Set: cl_fs_ocfs2 [p_fs_ocfs2]

     Started: [ test-odp-02 ]

     Stopped: [ p_fs_ocfs2:1 ]

Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt]

     Started: [ test-odp-02 ]

     Stopped: [ g_ocfs2mgmt:1 ]

 

Failed actions:

    p_o2cb:0_monitor_10000 (node=test-odp-02, call=19, rc=-2,
status=Timed Out): unknown

exec error

 

 

test-odp-02:~ # mount

/dev/drbd0 on /opt/odp type ocfs2
(rw,_netdev,noatime,cluster_stack=pcmk)

 

test-odp-02:~ # ls /opt/odp

...just hangs forever...

 

If I then power test-odp-01 back on, everything fails back fine and the
ls command suddenly completes.

 

It seems to me that OCFS2 is trying to talk to the node that has
disappeared and doesn't time out. Does anyone have any ideas? (attached
CRM and DRBD configs)

 

Many thanks.

 

Darren Mansell



 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110928/ecc8c415/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drbd.conf
Type: application/octet-stream
Size: 1193 bytes
Desc: drbd.conf
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110928/ecc8c415/attachment.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crm.txt
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110928/ecc8c415/attachment.txt>


More information about the drbd-user mailing list