[DRBD-user] Problem with Failover with DRBD, Heartbeat, and Pacemaker

Jared Harvey jharvey at pioneershospital.org
Tue Apr 2 21:31:42 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I have managed to get this setup to work in a virtual environment, but am running into problems when running on physical hardware.  My current setup is the latest debian linux with the approved packages for drbd and heartbeat

This is my heartbeat status when running.

root at HPSATA01:~# crm_mon -1
============
Last updated: Tue Apr  2 11:14:32 2013
Stack: Heartbeat
Current DC: hpsata02 (9e01243d-9232-437e-81ad-e00aabffef8f) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ hpsata01 hpsata02 ]

Master/Slave Set: ms_g_drbd
     Masters: [ hpsata01 ]
     Slaves: [ hpsata02 ]
Resource Group: g_services
     MetaFS     (ocf::heartbeat:Filesystem):    Started hpsata01
     lvmdata    (ocf::heartbeat:LVM):   Started hpsata01
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started hpsata01
     iscsi      (lsb:iscsitarget):      Started hpsata01

Failed actions:
    iscsi_monitor_0 (node=hpsata01, call=7, rc=1, status=complete): unknown error
    iscsi_monitor_0 (node=hpsata02, call=105, rc=1, status=complete): unknown error
root at HPSATA01:~#

And my DRBD status:

root at HPSATA01:~# service drbd status
drbd driver loaded OK; device status:
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
m:res   cs         ro                 ds                 p  mounted  fstype
0:meta  Connected  Primary/Secondary  UpToDate/UpToDate  C  /meta    ext3
1:data  Connected  Primary/Secondary  UpToDate/UpToDate  C
root at HPSATA01:~#

So far so good.  The problem arises when I put the primary node to standby.  The same thing happens when I reboot, power off, or pull the plug.

root at HPSATA01:~# crm_mon -1
============
Last updated: Tue Apr  2 12:54:56 2013
Stack: Heartbeat
Current DC: hpsata02 (9e01243d-9232-437e-81ad-e00aabffef8f) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Node hpsata01 (d60c3fcb-55e4-44a1-9e7a-2b03efb501e8): standby
Online: [ hpsata02 ]

Master/Slave Set: ms_g_drbd
     Slaves: [ hpsata02 ]
     Stopped: [ g_drbd:0 ]

On node 2 this is the DRBD status

root at HPSATA02:~# service drbd status
drbd driver loaded OK; device status:
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
m:res   cs            ro                 ds                 p  mounted  fstype
0:meta  WFConnection  Secondary/Unknown  Outdated/DUnknown  C
1:data  WFConnection  Secondary/Unknown  Outdated/DUnknown  C
root at HPSATA02:~#

Here is some info from my ha-debug file:

Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: unpack_config: On loss of CCM Quorum: Ignore
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: unpack_status: Node hpsata01 is in standby-mode
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: determine_online_status: Node hpsata01 is standby
Apr 02 12:53:33 HPSATA02 pengine: [1892]: WARN: unpack_rsc_op: Processing failed op iscsi_monitor_0 on hpsata01: unknown error (1)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: determine_online_status: Node hpsata02 is online
Apr 02 12:53:33 HPSATA02 pengine: [1892]: WARN: unpack_rsc_op: Processing failed op iscsi_monitor_0 on hpsata02: unknown error (1)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: clone_print:  Master/Slave Set: ms_g_drbd
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: short_print:      Slaves: [ hpsata02 ]
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: short_print:      Stopped: [ g_drbd:0 ]
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: group_print:  Resource Group: g_services
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: native_print:      MetaFS (ocf::heartbeat:Filesystem):       Stopped
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: native_print:      lvmdata      (ocf::heartbeat:LVM):       Stopped
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: native_print:      ClusterIP       (ocf::heartbeat:IPaddr2):  Stopped
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: native_print:      iscsi (lsb:iscsitarget):   Stopped
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: native_merge_weights: drbd_meta:0: Rolling back scores from drbd_data:0
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: native_color: Resource drbd_meta:0 cannot run anywhere
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: native_color: Resource drbd_data:0 cannot run anywhere
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: master_color: Promoting g_drbd:1 (Slave hpsata02)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: master_color: ms_g_drbd: Promoted 1 instances of a possible 1 to master
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: master_color: Promoting g_drbd:1 (Slave hpsata02)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: master_color: ms_g_drbd: Promoted 1 instances of a possible 1 to master
Apr 02 12:53:33 HPSATA02 pengine: [1892]: ERROR: create_notification_boundaries: Creating boundaries for ms_g_drbd
Apr 02 12:53:33 HPSATA02 pengine: [1892]: ERROR: create_notification_boundaries: Creating boundaries for ms_g_drbd
Apr 02 12:53:33 HPSATA02 pengine: [1892]: ERROR: create_notification_boundaries: Creating boundaries for ms_g_drbd
Apr 02 12:53:33 HPSATA02 pengine: [1892]: ERROR: create_notification_boundaries: Creating boundaries for ms_g_drbd
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: RecurringOp:  Start recurring monitor (30s) for ClusterIP on hpsata02
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: LogActions: Leave resource drbd_meta:0 (Stopped)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: LogActions: Leave resource drbd_data:0 (Stopped)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: LogActions: Promote drbd_meta:1 (Slave -> Master hpsata02)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: LogActions: Promote drbd_data:1 (Slave -> Master hpsata02)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: LogActions: Start MetaFS (hpsata02)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: LogActions: Start lvmdata (hpsata02)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: LogActions: Start ClusterIP     (hpsata02)
Apr 02 12:53:33 HPSATA02 pengine: [1892]: notice: LogActions: Start iscsi  (hpsata02)
Apr 02 12:53:33 HPSATA02 crmd: [1886]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Apr 02 12:53:33 HPSATA02 crmd: [1886]: info: unpack_graph: Unpacked transition 51: 21 actions in 21 synapses
Apr 02 12:53:33 HPSATA02 crmd: [1886]: info: do_te_invoke: Processing graph 51 (ref=pe_calc-dc-1364928813-381) derived from /var/lib/pengine/pe-input-1669.bz2
Apr 02 12:53:33 HPSATA02 crmd: [1886]: info: te_pseudo_action: Pseudo action 43 fired and confirmed
Apr 02 12:53:33 HPSATA02 crmd: [1886]: info: te_rsc_command: Initiating action 84: notify drbd_meta:1_pre_notify_promote_0 on hpsata02 (local)
Apr 02 12:53:33 HPSATA02 crmd: [1886]: info: do_lrm_rsc_op: Performing key=84:51:0:d21de83f-ed1b-428e-845e-937210405b1c op=drbd_meta:1_notify_0 )
Apr 02 12:53:33 HPSATA02 lrmd: [1883]: info: rsc:drbd_meta:1:143: notify
Apr 02 12:53:33 HPSATA02 crmd: [1886]: info: te_rsc_command: Initiating action 86: notify drbd_data:1_pre_notify_promote_0 on hpsata02 (local)
Apr 02 12:53:33 HPSATA02 crmd: [1886]: info: do_lrm_rsc_op: Performing key=86:51:0:d21de83f-ed1b-428e-845e-937210405b1c op=drbd_data:1_notify_0 )
Apr 02 12:53:33 HPSATA02 lrmd: [1883]: info: rsc:drbd_data:1:144: notify
Apr 02 12:53:33 HPSATA02 pengine: [1892]: info: process_pe_message: Transition 51: PEngine Input stored in: /var/lib/pengine/pe-input-1669.bz2
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: process_lrm_event: LRM operation drbd_meta:1_notify_0 (call=143, rc=0, cib-update=254, confirmed=true) ok
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: process_lrm_event: LRM operation drbd_data:1_notify_0 (call=144, rc=0, cib-update=255, confirmed=true) ok
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: match_graph_event: Action drbd_meta:1_pre_notify_promote_0 (84) confirmed on hpsata02 (rc=0)
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: match_graph_event: Action drbd_data:1_pre_notify_promote_0 (86) confirmed on hpsata02 (rc=0)
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_pseudo_action: Pseudo action 44 fired and confirmed
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_pseudo_action: Pseudo action 41 fired and confirmed
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_pseudo_action: Pseudo action 27 fired and confirmed
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_rsc_command: Initiating action 16: promote drbd_meta:1_promote_0 on hpsata02 (local)
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: do_lrm_rsc_op: Performing key=16:51:0:d21de83f-ed1b-428e-845e-937210405b1c op=drbd_meta:1_promote_0 )
Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: rsc:drbd_meta:1:145: promote
Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: RA output: (drbd_meta:1:promote:stdout)

Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: process_lrm_event: LRM operation drbd_meta:1_promote_0 (call=145, rc=0, cib-update=256, confirmed=true) ok
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: match_graph_event: Action drbd_meta:1_promote_0 (16) confirmed on hpsata02 (rc=0)
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_rsc_command: Initiating action 20: promote drbd_data:1_promote_0 on hpsata02 (local)
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: do_lrm_rsc_op: Performing key=20:51:0:d21de83f-ed1b-428e-845e-937210405b1c op=drbd_data:1_promote_0 )
Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: rsc:drbd_data:1:146: promote
Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: RA output: (drbd_data:1:promote:stdout)

Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: process_lrm_event: LRM operation drbd_data:1_promote_0 (call=146, rc=0, cib-update=257, confirmed=true) ok
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: match_graph_event: Action drbd_data:1_promote_0 (20) confirmed on hpsata02 (rc=0)
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_pseudo_action: Pseudo action 28 fired and confirmed
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_pseudo_action: Pseudo action 42 fired and confirmed
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_pseudo_action: Pseudo action 45 fired and confirmed
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_rsc_command: Initiating action 85: notify drbd_meta:1_post_notify_promote_0 on hpsata02 (local)
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: do_lrm_rsc_op: Performing key=85:51:0:d21de83f-ed1b-428e-845e-937210405b1c op=drbd_meta:1_notify_0 )
Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: rsc:drbd_meta:1:147: notify
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: te_rsc_command: Initiating action 87: notify drbd_data:1_post_notify_promote_0 on hpsata02 (local)
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: do_lrm_rsc_op: Performing key=87:51:0:d21de83f-ed1b-428e-845e-937210405b1c op=drbd_data:1_notify_0 )
Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: rsc:drbd_data:1:148: notify
Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: RA output: (drbd_meta:1:notify:stdout)

Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: RA output: (drbd_data:1:notify:stdout)

drbd[2542]:   2013/04/02_12:53:34 ERROR: meta: Called drbdadm -c /etc/drbd.conf outdate meta
drbd[2543]:   2013/04/02_12:53:34 ERROR: data: Called drbdadm -c /etc/drbd.conf outdate data
drbd[2542]:   2013/04/02_12:53:34 ERROR: meta: Exit code 17
drbd[2543]:   2013/04/02_12:53:34 ERROR: data: Exit code 17
drbd[2542]:   2013/04/02_12:53:34 ERROR: meta: Command output:
drbd[2543]:   2013/04/02_12:53:34 ERROR: data: Command output:
Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: RA output: (drbd_meta:1:notify:stdout)

Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: RA output: (drbd_data:1:notify:stdout)

Apr 02 12:53:34 HPSATA02 attrd: [1885]: info: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_meta:1 (-INFINITY)
Apr 02 12:53:34 HPSATA02 attrd: [1885]: info: attrd_perform_update: Sent update 137: master-drbd_meta:1=-INFINITY
Apr 02 12:53:34 HPSATA02 attrd: [1885]: info: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_data:1 (-INFINITY)
Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: RA output: (drbd_meta:1:notify:stdout)

Apr 02 12:53:34 HPSATA02 lrmd: [1883]: info: RA output: (drbd_data:1:notify:stdout)

Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: process_lrm_event: LRM operation drbd_meta:1_notify_0 (call=147, rc=0, cib-update=258, confirmed=true) ok
Apr 02 12:53:34 HPSATA02 crmd: [1886]: info: process_lrm_event: LRM operation drbd_data:1_notify_0 (call=148, rc=0, cib-update=259, confirmed=true) ok

Once I bring node 1 back up everything works fine again.

If anyone has a clue as to what is going on I would appreciate the help.

Thanks,

Jared.
________________________________
This email and any files transmitted with it are intended solely for the use of the individuals or entities to which they are addressed and are not to be disclosed to any other party. If you have received this email in error please return it to the sender, and delete any copies thereof. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130402/9822eb35/attachment.htm>


More information about the drbd-user mailing list