Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I sorted it out. I thought it would be like 'obliterate-peer.sh' or 'rhcs_fence' on cman where cman just had to be running for the fence-handler script to work. I dug into crm-fence-peer.sh and saw that it was parsing cib.xml. I'd not configured DRBD inside pacemaker, so the fence handler was failing. Thanks for letting me rubber-ducky debug on the list. :) digimer On 29/09/13 21:38, Digimer wrote: > I fixed my pacemaker fencing issue (upgraded from pacemaker 1.1.9 to > 1.1.10 and changed my fencing from fence_xvm to fence_virsh). So now I > can repeatedly crash a node and it will be fenced successfully. However, > drbd still complains that the fence is broken; > > When I start DRBD on the crashed node (still using 'echo c > > /proc/sysrq-trigger'), it drops complaining of a split-brain. Now sure > how that's even possible because, as I understand it, 'echo c' is a > pretty instant crash... > > System logs from the surviving peer: > > ==== > Sep 29 21:28:40 pcmk1 corosync[616]: [TOTEM ] A processor failed, > forming new configuration. > Sep 29 21:28:41 pcmk1 corosync[616]: [TOTEM ] A new membership > (192.168.122.201:168) was formed. Members left: 2 > Sep 29 21:28:41 pcmk1 crmd[634]: warning: match_down_event: No match > for shutdown action on 2 > Sep 29 21:28:41 pcmk1 crmd[634]: notice: peer_update_callback: > Stonith/shutdown of pcmk2.alteeve.ca not matched > Sep 29 21:28:41 pcmk1 crmd[634]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Sep 29 21:28:41 pcmk1 pacemakerd[628]: notice: > pcmk_quorum_notification: Membership 168: quorum lost (1) > Sep 29 21:28:41 pcmk1 pacemakerd[628]: notice: crm_update_peer_state: > pcmk_quorum_notification: Node pcmk2.alteeve.ca[2] - state is now lost > (was member) > Sep 29 21:28:41 pcmk1 crmd[634]: notice: pcmk_quorum_notification: > Membership 168: quorum lost (1) > Sep 29 21:28:41 pcmk1 crmd[634]: notice: crm_update_peer_state: > pcmk_quorum_notification: Node pcmk2.alteeve.ca[2] - state is now lost > (was member) > Sep 29 21:28:41 pcmk1 crmd[634]: warning: match_down_event: No match > for shutdown action on 2 > Sep 29 21:28:41 pcmk1 crmd[634]: notice: peer_update_callback: > Stonith/shutdown of pcmk2.alteeve.ca not matched > Sep 29 21:28:41 pcmk1 pengine[633]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Sep 29 21:28:41 pcmk1 pengine[633]: warning: pe_fence_node: Node > pcmk2.alteeve.ca will be fenced because our peer process is no longer > available > Sep 29 21:28:41 pcmk1 pengine[633]: warning: determine_online_status: > Node pcmk2.alteeve.ca is unclean > Sep 29 21:28:41 pcmk1 pengine[633]: warning: stage6: Scheduling Node > pcmk2.alteeve.ca for STONITH > Sep 29 21:28:41 pcmk1 pengine[633]: notice: LogActions: Move > fence_pcmk2_virsh#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca) > Sep 29 21:28:41 pcmk1 corosync[616]: [QUORUM] This node is within the > non-primary component and will NOT provide any services. > Sep 29 21:28:41 pcmk1 corosync[616]: [QUORUM] Members[1]: 1 > Sep 29 21:28:41 pcmk1 corosync[616]: [MAIN ] Completed service > synchronization, ready to provide service. > Sep 29 21:28:42 pcmk1 pengine[633]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Sep 29 21:28:42 pcmk1 pengine[633]: warning: pe_fence_node: Node > pcmk2.alteeve.ca will be fenced because the node is no longer part of > the cluster > Sep 29 21:28:42 pcmk1 pengine[633]: warning: determine_online_status: > Node pcmk2.alteeve.ca is unclean > Sep 29 21:28:42 pcmk1 pengine[633]: warning: custom_action: Action > fence_pcmk2_virsh_stop_0 on pcmk2.alteeve.ca is unrunnable (offline) > Sep 29 21:28:42 pcmk1 pengine[633]: warning: stage6: Scheduling Node > pcmk2.alteeve.ca for STONITH > Sep 29 21:28:42 pcmk1 pengine[633]: notice: LogActions: Move > fence_pcmk2_virsh#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca) > Sep 29 21:28:42 pcmk1 crmd[634]: notice: te_fence_node: Executing > reboot fencing operation (9) on pcmk2.alteeve.ca (timeout=60000) > Sep 29 21:28:42 pcmk1 pengine[633]: warning: process_pe_message: > Calculated Transition 18: /var/lib/pacemaker/pengine/pe-warn-16.bz2 > Sep 29 21:28:42 pcmk1 stonith-ng[630]: notice: handle_request: Client > crmd.634.84ac8196 wants to fence (reboot) 'pcmk2.alteeve.ca' with device > '(any)' > Sep 29 21:28:42 pcmk1 stonith-ng[630]: notice: > initiate_remote_stonith_op: Initiating remote operation reboot for > pcmk2.alteeve.ca: 0d1addd2-586b-4137-a0e2-ae5b06dddbfe (0) > Sep 29 21:28:42 pcmk1 stonith-ng[630]: notice: > can_fence_host_with_device: fence_pcmk1_virsh can not fence > pcmk2.alteeve.ca: static-list > Sep 29 21:28:42 pcmk1 stonith-ng[630]: notice: > can_fence_host_with_device: fence_pcmk2_virsh can fence > pcmk2.alteeve.ca: static-list > Sep 29 21:28:42 pcmk1 stonith-ng[630]: notice: > can_fence_host_with_device: fence_pcmk1_virsh can not fence > pcmk2.alteeve.ca: static-list > Sep 29 21:28:42 pcmk1 stonith-ng[630]: notice: > can_fence_host_with_device: fence_pcmk2_virsh can fence > pcmk2.alteeve.ca: static-list > Sep 29 21:28:42 pcmk1 fence_virsh: Parse error: Ignoring unknown option > 'nodename=pcmk2.alteeve.ca > Sep 29 21:28:44 pcmk1 stonith-ng[630]: notice: log_operation: > Operation 'reboot' [27363] (call 4 from crmd.634) for host > 'pcmk2.alteeve.ca' with device 'fence_pcmk2_virsh' returned: 0 (OK) > Sep 29 21:28:44 pcmk1 stonith-ng[630]: notice: remote_op_done: > Operation reboot of pcmk2.alteeve.ca by pcmk1.alteeve.ca for > crmd.634 at pcmk1.alteeve.ca.0d1addd2: OK > Sep 29 21:28:44 pcmk1 crmd[634]: notice: tengine_stonith_callback: > Stonith operation 4/9:18:0:01494e04-df34-4075-84a1-545d598bfc3e: OK (0) > Sep 29 21:28:44 pcmk1 crmd[634]: notice: run_graph: Transition 18 > (Complete=1, Pending=0, Fired=0, Skipped=4, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-warn-16.bz2): Stopped > Sep 29 21:28:44 pcmk1 crmd[634]: notice: tengine_stonith_notify: Peer > pcmk2.alteeve.ca was terminated (reboot) by pcmk1.alteeve.ca for > pcmk1.alteeve.ca: OK (ref=0d1addd2-586b-4137-a0e2-ae5b06dddbfe) by > client crmd.634 > Sep 29 21:28:44 pcmk1 pengine[633]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Sep 29 21:28:44 pcmk1 pengine[633]: notice: LogActions: Start > fence_pcmk2_virsh#011(pcmk1.alteeve.ca) > Sep 29 21:28:44 pcmk1 pengine[633]: notice: process_pe_message: > Calculated Transition 19: /var/lib/pacemaker/pengine/pe-input-42.bz2 > Sep 29 21:28:44 pcmk1 crmd[634]: notice: te_rsc_command: Initiating > action 6: start fence_pcmk2_virsh_start_0 on pcmk1.alteeve.ca (local) > Sep 29 21:28:44 pcmk1 stonith-ng[630]: notice: > stonith_device_register: Device 'fence_pcmk2_virsh' already existed in > device list (2 active devices) > Sep 29 21:28:46 pcmk1 crmd[634]: notice: process_lrm_event: LRM > operation fence_pcmk2_virsh_start_0 (call=61, rc=0, cib-update=143, > confirmed=true) ok > Sep 29 21:28:46 pcmk1 crmd[634]: notice: run_graph: Transition 19 > (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-42.bz2): Complete > Sep 29 21:28:46 pcmk1 crmd[634]: notice: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680039] d-con r0: PingAck did not > arrive in time. > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680078] d-con r0: peer( Primary -> > Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown > ) susp( 0 -> 1 ) > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680275] d-con r0: asender terminated > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680278] d-con r0: Terminating drbd_a_r0 > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680344] d-con r0: Connection closed > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680453] d-con r0: conn( > NetworkFailure -> Unconnected ) > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680455] d-con r0: receiver terminated > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680456] d-con r0: Restarting > receiver thread > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680465] d-con r0: receiver (re)started > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680471] d-con r0: conn( Unconnected > -> WFConnection ) > Sep 29 21:28:48 pcmk1 kernel: [ 4460.680539] d-con r0: helper command: > /sbin/drbdadm fence-peer r0 > Sep 29 21:28:48 pcmk1 crm-fence-peer.sh[27388]: invoked for r0 > Sep 29 21:28:48 pcmk1 crm-fence-peer.sh[27388]: WARNING drbd-fencing > could not determine the master id of drbd resource r0 > Sep 29 21:28:48 pcmk1 kernel: [ 4460.702907] d-con r0: helper command: > /sbin/drbdadm fence-peer r0 exit code 1 (0x100) > Sep 29 21:28:48 pcmk1 kernel: [ 4460.702909] d-con r0: fence-peer helper > broken, returned 1 > ==== > > And from when I start DRBD back up; > > Survivor's logs; > > ==== > Sep 29 21:33:36 pcmk1 kernel: [ 4747.844224] d-con r0: Handshake > successful: Agreed network protocol version 101 > Sep 29 21:33:36 pcmk1 kernel: [ 4747.844268] d-con r0: conn( > WFConnection -> WFReportParams ) > Sep 29 21:33:36 pcmk1 kernel: [ 4747.844271] d-con r0: Starting asender > thread (from drbd_r_r0 [27347]) > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853184] block drbd0: > drbd_sync_handshake: > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853190] block drbd0: self > BFBB0CC0CCCEA511:0000000000000000:2E6EDFAE818A295C:A750A07C42A73C35 > bits:0 flags:0 > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853193] block drbd0: peer > BFBB0CC0CCCEA510:0000000000000000:2E6EDFAE818A295C:A750A07C42A73C35 > bits:0 flags:2 > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853195] block drbd0: > uuid_compare()=-1 by rule 40 > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853197] block drbd0: I shall become > SyncTarget, but I am primary! > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853299] d-con r0: conn( > WFReportParams -> Disconnecting ) > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853303] d-con r0: error receiving > ReportState, e: -5 l: 0! > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853392] d-con r0: asender terminated > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853400] d-con r0: Terminating drbd_a_r0 > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853520] d-con r0: Connection closed > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853729] d-con r0: conn( > Disconnecting -> StandAlone ) > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853733] d-con r0: receiver terminated > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853735] d-con r0: Terminating drbd_r_r0 > Sep 29 21:33:36 pcmk1 kernel: [ 4747.853744] d-con r0: helper command: > /sbin/drbdadm fence-peer r0 > Sep 29 21:33:36 pcmk1 crm-fence-peer.sh[27420]: invoked for r0 > Sep 29 21:33:36 pcmk1 crm-fence-peer.sh[27420]: WARNING drbd-fencing > could not determine the master id of drbd resource r0 > Sep 29 21:33:36 pcmk1 kernel: [ 4747.871111] d-con r0: helper command: > /sbin/drbdadm fence-peer r0 exit code 1 (0x100) > Sep 29 21:33:36 pcmk1 kernel: [ 4747.871114] d-con r0: fence-peer helper > broken, returned 1 > ==== > > Recovered peer's logs; > > ==== > Sep 29 21:33:35 pcmk2 kernel: [ 285.613761] events: mcg drbd: 3 > Sep 29 21:33:35 pcmk2 kernel: [ 285.616368] drbd: initialized. Version: > 8.4.3 (api:1/proto:86-101) > Sep 29 21:33:35 pcmk2 kernel: [ 285.616371] drbd: srcversion: > 5CF35A4122BF8D21CC12AE2 > Sep 29 21:33:35 pcmk2 kernel: [ 285.616372] drbd: registered as block > device major 147 > Sep 29 21:33:35 pcmk2 drbd[454]: Starting DRBD resources: [ > Sep 29 21:33:35 pcmk2 drbd[454]: create res: r0 > Sep 29 21:33:35 pcmk2 drbd[454]: prepare disk: r0 > Sep 29 21:33:35 pcmk2 kernel: [ 285.632794] d-con r0: Starting worker > thread (from drbdsetup [469]) > Sep 29 21:33:35 pcmk2 kernel: [ 285.632958] block drbd0: disk( Diskless > -> Attaching ) > Sep 29 21:33:35 pcmk2 kernel: [ 285.633062] d-con r0: Method to ensure > write ordering: drain > Sep 29 21:33:35 pcmk2 kernel: [ 285.633065] block drbd0: max BIO size = > 1048576 > Sep 29 21:33:35 pcmk2 kernel: [ 285.633069] block drbd0: drbd_bm_resize > called with capacity == 25556136 > Sep 29 21:33:35 pcmk2 kernel: [ 285.633111] block drbd0: resync bitmap: > bits=3194517 words=49915 pages=98 > Sep 29 21:33:35 pcmk2 kernel: [ 285.633113] block drbd0: size = 12 GB > (12778068 KB) > Sep 29 21:33:35 pcmk2 drbd[454]: adjust disk: r0 > Sep 29 21:33:35 pcmk2 kernel: [ 285.633485] block drbd0: bitmap READ of > 98 pages took 0 jiffies > Sep 29 21:33:35 pcmk2 kernel: [ 285.633541] block drbd0: recounting of > set bits took additional 0 jiffies > Sep 29 21:33:35 pcmk2 kernel: [ 285.633542] block drbd0: 0 KB (0 bits) > marked out-of-sync by on disk bit-map. > Sep 29 21:33:35 pcmk2 kernel: [ 285.633546] block drbd0: disk( > Attaching -> Consistent ) > Sep 29 21:33:35 pcmk2 kernel: [ 285.633548] block drbd0: attached to > UUIDs BFBB0CC0CCCEA511:0000000000000000:2E6EDFAE818A295C:A750A07C42A73C35 > Sep 29 21:33:35 pcmk2 drbd[454]: adjust net: r0 > Sep 29 21:33:35 pcmk2 drbd[454]: ] > Sep 29 21:33:35 pcmk2 kernel: [ 285.636054] d-con r0: conn( StandAlone > -> Unconnected ) > Sep 29 21:33:35 pcmk2 kernel: [ 285.636118] d-con r0: Starting receiver > thread (from drbd_w_r0 [470]) > Sep 29 21:33:35 pcmk2 kernel: [ 285.636842] d-con r0: receiver (re)started > Sep 29 21:33:35 pcmk2 kernel: [ 285.636852] d-con r0: conn( Unconnected > -> WFConnection ) > Sep 29 21:33:36 pcmk2 kernel: [ 286.137522] d-con r0: Handshake > successful: Agreed network protocol version 101 > Sep 29 21:33:36 pcmk2 kernel: [ 286.137605] d-con r0: conn( > WFConnection -> WFReportParams ) > Sep 29 21:33:36 pcmk2 kernel: [ 286.137611] d-con r0: Starting asender > thread (from drbd_r_r0 [473]) > Sep 29 21:33:36 pcmk2 kernel: [ 286.146707] d-con r0: meta connection > shut down by peer. > Sep 29 21:33:36 pcmk2 kernel: [ 286.146823] d-con r0: conn( > WFReportParams -> NetworkFailure ) > Sep 29 21:33:36 pcmk2 kernel: [ 286.146831] d-con r0: asender terminated > Sep 29 21:33:36 pcmk2 kernel: [ 286.146834] d-con r0: Terminating drbd_a_r0 > Sep 29 21:33:36 pcmk2 kernel: [ 286.147369] d-con r0: Connection closed > Sep 29 21:33:36 pcmk2 kernel: [ 286.147397] d-con r0: conn( > NetworkFailure -> Unconnected ) > Sep 29 21:33:36 pcmk2 kernel: [ 286.147400] d-con r0: receiver terminated > Sep 29 21:33:36 pcmk2 kernel: [ 286.147401] d-con r0: Restarting > receiver thread > Sep 29 21:33:36 pcmk2 kernel: [ 286.147403] d-con r0: receiver (re)started > Sep 29 21:33:36 pcmk2 kernel: [ 286.147413] d-con r0: conn( Unconnected > -> WFConnection ) > ==== > > Any idea? > > digimer > > On 29/09/13 16:06, Digimer wrote: >> Hi all, >> >> I'm back to learning pacemaker and DRBD 8.4. I've got a couple VMs >> with fence_xvm / fence_virtd working (I can crash a node with 'echo c > >> /proc/sysrq-trigger' and it reboots). >> >> However, when I start DRBD (not added to pacemaker, just running with >> 'handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; }' and 'disk { >> fencing resource-and-stonith; }') and crash the node, DRBD says the >> fence handler is broken and the resource is split-brained when it reboots. >> >> Below are all the details, but the main line, I think, is; >> >> ==== >> Sep 29 15:56:50 pcmk1 crm-fence-peer.sh[539]: invoked for r0 >> Sep 29 15:56:50 pcmk1 crm-fence-peer.sh[539]: WARNING drbd-fencing could >> not determine the master id of drbd resource r0 >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.280209] d-con r0: helper command: >> /sbin/drbdadm fence-peer r0 exit code 1 (0x100) >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.280213] d-con r0: fence-peer helper >> broken, returned 1 >> ==== >> >> Here are the extended details. Note that, for reasons that may be >> related, the fence action from pacemaker works the first time but fails >> the second time. However, if I start both nodes clean, start DRBD and >> crash the node, the fence works but the same DRBD fence error is shown. >> >> digimer >> >> [root at pcmk1 ~]# pcs config show >> ==== >> Cluster Name: an-pcmk-01 >> Corosync Nodes: >> pcmk1.alteeve.ca pcmk2.alteeve.ca >> Pacemaker Nodes: >> pcmk1.alteeve.ca pcmk2.alteeve.ca >> >> Resources: >> >> Location Constraints: >> Ordering Constraints: >> Colocation Constraints: >> >> Cluster Properties: >> cluster-infrastructure: corosync >> dc-version: 1.1.9-3.fc19-781a388 >> no-quorum-policy: ignore >> stonith-enabled: true >> ==== >> >> [root at pcmk1 ~]# pcs status >> ==== >> Cluster name: an-pcmk-01 >> Last updated: Sun Sep 29 15:26:15 2013 >> Last change: Sat Sep 28 15:30:12 2013 via cibadmin on pcmk1.alteeve.ca >> Stack: corosync >> Current DC: pcmk1.alteeve.ca (1) - partition with quorum >> Version: 1.1.9-3.fc19-781a388 >> 2 Nodes configured, unknown expected votes >> 2 Resources configured. >> >> >> Online: [ pcmk1.alteeve.ca pcmk2.alteeve.ca ] >> >> Full list of resources: >> >> fence_pcmk1_xvm (stonith:fence_xvm): Started pcmk1.alteeve.ca >> fence_pcmk2_xvm (stonith:fence_xvm): Started pcmk2.alteeve.ca >> ==== >> >> DRBD is not running, crashing 'pcmk2'; >> >> ==== >> [root at pcmk2 ~]# echo c > /proc/sysrq-trigger >> ==== >> >> Node 1's system logs; >> ==== >> Sep 29 15:27:16 pcmk1 corosync[404]: [TOTEM ] A processor failed, >> forming new configuration. >> Sep 29 15:27:17 pcmk1 corosync[404]: [TOTEM ] A new membership >> (192.168.122.201:52) was formed. Members left: 2 >> Sep 29 15:27:17 pcmk1 crmd[422]: warning: match_down_event: No match >> for shutdown action on 2 >> Sep 29 15:27:17 pcmk1 crmd[422]: notice: peer_update_callback: >> Stonith/shutdown of pcmk2.alteeve.ca not matched >> Sep 29 15:27:17 pcmk1 crmd[422]: notice: do_state_transition: State >> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC >> cause=C_FSA_INTERNAL origin=abort_transition_graph ] >> Sep 29 15:27:17 pcmk1 pengine[421]: notice: unpack_config: On loss of >> CCM Quorum: Ignore >> Sep 29 15:27:17 pcmk1 pengine[421]: warning: pe_fence_node: Node >> pcmk2.alteeve.ca will be fenced because our peer process is no longer >> available >> Sep 29 15:27:17 pcmk1 pengine[421]: warning: determine_online_status: >> Node pcmk2.alteeve.ca is unclean >> Sep 29 15:27:17 pcmk1 pengine[421]: warning: stage6: Scheduling Node >> pcmk2.alteeve.ca for STONITH >> Sep 29 15:27:17 pcmk1 pengine[421]: notice: LogActions: Move >> fence_pcmk2_xvm#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca) >> Sep 29 15:27:17 pcmk1 crmd[422]: notice: pcmk_quorum_notification: >> Membership 52: quorum lost (1) >> Sep 29 15:27:17 pcmk1 corosync[404]: [QUORUM] This node is within the >> non-primary component and will NOT provide any services. >> Sep 29 15:27:17 pcmk1 corosync[404]: [QUORUM] Members[1]: 1 >> Sep 29 15:27:17 pcmk1 crmd[422]: notice: crm_update_peer_state: >> pcmk_quorum_notification: Node pcmk2.alteeve.ca[2] - state is now lost >> (was member) >> Sep 29 15:27:17 pcmk1 crmd[422]: warning: match_down_event: No match >> for shutdown action on 2 >> Sep 29 15:27:17 pcmk1 crmd[422]: notice: peer_update_callback: >> Stonith/shutdown of pcmk2.alteeve.ca not matched >> Sep 29 15:27:17 pcmk1 corosync[404]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Sep 29 15:27:17 pcmk1 pengine[421]: warning: process_pe_message: >> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-warn-1.bz2 >> Sep 29 15:27:18 pcmk1 pengine[421]: notice: unpack_config: On loss of >> CCM Quorum: Ignore >> Sep 29 15:27:18 pcmk1 pengine[421]: warning: pe_fence_node: Node >> pcmk2.alteeve.ca will be fenced because the node is no longer part of >> the cluster >> Sep 29 15:27:18 pcmk1 pengine[421]: warning: determine_online_status: >> Node pcmk2.alteeve.ca is unclean >> Sep 29 15:27:18 pcmk1 pengine[421]: warning: custom_action: Action >> fence_pcmk2_xvm_stop_0 on pcmk2.alteeve.ca is unrunnable (offline) >> Sep 29 15:27:18 pcmk1 pengine[421]: warning: stage6: Scheduling Node >> pcmk2.alteeve.ca for STONITH >> Sep 29 15:27:18 pcmk1 pengine[421]: notice: LogActions: Move >> fence_pcmk2_xvm#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca) >> Sep 29 15:27:18 pcmk1 pengine[421]: warning: process_pe_message: >> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-2.bz2 >> Sep 29 15:27:18 pcmk1 crmd[422]: notice: te_fence_node: Executing >> reboot fencing operation (9) on pcmk2.alteeve.ca (timeout=60000) >> Sep 29 15:27:18 pcmk1 stonith-ng[418]: notice: handle_request: Client >> crmd.422.e5752159 wants to fence (reboot) 'pcmk2.alteeve.ca' with device >> '(any)' >> Sep 29 15:27:18 pcmk1 stonith-ng[418]: notice: >> initiate_remote_stonith_op: Initiating remote operation reboot for >> pcmk2.alteeve.ca: d9e46c20-0f15-417c-bbb2-20b10eed8b4d (0) >> Sep 29 15:27:20 pcmk1 stonith-ng[418]: notice: log_operation: >> Operation 'reboot' [461] (call 2 from crmd.422) for host >> 'pcmk2.alteeve.ca' with device 'fence_pcmk2_xvm' returned: 0 (OK) >> Sep 29 15:27:20 pcmk1 stonith-ng[418]: notice: remote_op_done: >> Operation reboot of pcmk2.alteeve.ca by pcmk1.alteeve.ca for >> crmd.422 at pcmk1.alteeve.ca.d9e46c20: OK >> Sep 29 15:27:20 pcmk1 crmd[422]: notice: tengine_stonith_callback: >> Stonith operation 2/9:2:0:b91ef09d-b521-4e52-a3fb-bbcf1a0f6a37: OK (0) >> Sep 29 15:27:20 pcmk1 crmd[422]: notice: tengine_stonith_notify: Peer >> pcmk2.alteeve.ca was terminated (reboot) by pcmk1.alteeve.ca for >> pcmk1.alteeve.ca: OK (ref=d9e46c20-0f15-417c-bbb2-20b10eed8b4d) by >> client crmd.422 >> Sep 29 15:27:20 pcmk1 crmd[422]: notice: te_rsc_command: Initiating >> action 7: start fence_pcmk2_xvm_start_0 on pcmk1.alteeve.ca (local) >> Sep 29 15:27:20 pcmk1 stonith-ng[418]: notice: >> stonith_device_register: Added 'fence_pcmk2_xvm' to the device list (2 >> active devices) >> Sep 29 15:27:21 pcmk1 crmd[422]: notice: process_lrm_event: LRM >> operation fence_pcmk2_xvm_start_0 (call=16, rc=0, cib-update=43, >> confirmed=true) ok >> Sep 29 15:27:21 pcmk1 crmd[422]: notice: run_graph: Transition 2 >> (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0, >> Source=/var/lib/pacemaker/pengine/pe-warn-2.bz2): Complete >> Sep 29 15:27:21 pcmk1 pengine[421]: notice: unpack_config: On loss of >> CCM Quorum: Ignore >> Sep 29 15:27:21 pcmk1 pengine[421]: notice: process_pe_message: >> Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-16.bz2 >> Sep 29 15:27:21 pcmk1 crmd[422]: notice: run_graph: Transition 3 >> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, >> Source=/var/lib/pacemaker/pengine/pe-input-16.bz2): Complete >> Sep 29 15:27:21 pcmk1 crmd[422]: notice: do_state_transition: State >> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS >> cause=C_FSA_INTERNAL origin=notify_crmd ] >> ==== >> >> [root at pcmk1 ~]# pcs status >> ==== >> Cluster name: an-pcmk-01 >> Last updated: Sun Sep 29 15:28:06 2013 >> Last change: Sat Sep 28 15:30:12 2013 via cibadmin on pcmk1.alteeve.ca >> Stack: corosync >> Current DC: pcmk1.alteeve.ca (1) - partition WITHOUT quorum >> Version: 1.1.9-3.fc19-781a388 >> 2 Nodes configured, unknown expected votes >> 2 Resources configured. >> >> >> Online: [ pcmk1.alteeve.ca ] >> OFFLINE: [ pcmk2.alteeve.ca ] >> >> Full list of resources: >> >> fence_pcmk1_xvm (stonith:fence_xvm): Started pcmk1.alteeve.ca >> fence_pcmk2_xvm (stonith:fence_xvm): Started pcmk1.alteeve.ca >> ==== >> >> So it seems that fencing by pacemaker works. I re-add the pcmk2 node to >> the cluster and start DRBD; >> >> [root at pcmk1 ~]# pcs status >> ==== >> Cluster name: an-pcmk-01 >> Last updated: Sun Sep 29 15:32:29 2013 >> Last change: Sat Sep 28 15:30:12 2013 via cibadmin on pcmk1.alteeve.ca >> Stack: corosync >> Current DC: pcmk1.alteeve.ca (1) - partition with quorum >> Version: 1.1.9-3.fc19-781a388 >> 2 Nodes configured, unknown expected votes >> 2 Resources configured. >> >> >> Online: [ pcmk1.alteeve.ca pcmk2.alteeve.ca ] >> >> Full list of resources: >> >> fence_pcmk1_xvm (stonith:fence_xvm): Started pcmk1.alteeve.ca >> fence_pcmk2_xvm (stonith:fence_xvm): Started pcmk2.alteeve.ca >> ==== >> >> Starting DRBD on pcmk1; >> ==== >> Sep 29 15:33:09 pcmk1 systemd[1]: Starting LSB: Control drbd resources.... >> Sep 29 15:33:09 pcmk1 kernel: [ 930.145896] events: mcg drbd: 3 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.148651] drbd: initialized. Version: >> 8.4.3 (api:1/proto:86-101) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.148657] drbd: srcversion: >> 5CF35A4122BF8D21CC12AE2 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.148658] drbd: registered as block >> device major 147 >> Sep 29 15:33:09 pcmk1 drbd[492]: Starting DRBD resources: [ >> Sep 29 15:33:09 pcmk1 drbd[492]: create res: r0 >> Sep 29 15:33:09 pcmk1 drbd[492]: prepare disk: r0 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.165573] d-con r0: Starting worker >> thread (from drbdsetup [506]) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.165686] block drbd0: disk( Diskless >> -> Attaching ) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.165757] d-con r0: Method to ensure >> write ordering: drain >> Sep 29 15:33:09 pcmk1 kernel: [ 930.165759] block drbd0: max BIO size = >> 1048576 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.165762] block drbd0: drbd_bm_resize >> called with capacity == 25556136 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.165801] block drbd0: resync bitmap: >> bits=3194517 words=49915 pages=98 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.165803] block drbd0: size = 12 GB >> (12778068 KB) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.166971] block drbd0: bitmap READ of >> 98 pages took 1 jiffies >> Sep 29 15:33:09 pcmk1 kernel: [ 930.167177] block drbd0: recounting of >> set bits took additional 1 jiffies >> Sep 29 15:33:09 pcmk1 kernel: [ 930.167179] block drbd0: 0 KB (0 bits) >> marked out-of-sync by on disk bit-map. >> Sep 29 15:33:09 pcmk1 kernel: [ 930.167182] block drbd0: disk( >> Attaching -> Outdated ) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.167184] block drbd0: attached to >> UUIDs A74FA07C42A73C34:0000000000000000:D307B78549F98F3B:D306B78549F98F3B >> Sep 29 15:33:09 pcmk1 drbd[492]: adjust disk: r0 >> Sep 29 15:33:09 pcmk1 drbd[492]: adjust net: r0 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.169649] d-con r0: conn( StandAlone >> -> Unconnected ) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.169661] d-con r0: Starting receiver >> thread (from drbd_w_r0 [507]) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.169682] d-con r0: receiver (re)started >> Sep 29 15:33:09 pcmk1 kernel: [ 930.169689] d-con r0: conn( Unconnected >> -> WFConnection ) >> Sep 29 15:33:09 pcmk1 drbd[492]: ] >> Sep 29 15:33:09 pcmk1 kernel: [ 930.670525] d-con r0: Handshake >> successful: Agreed network protocol version 101 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.670593] d-con r0: conn( >> WFConnection -> WFReportParams ) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.670598] d-con r0: Starting asender >> thread (from drbd_r_r0 [510]) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.680155] block drbd0: >> drbd_sync_handshake: >> Sep 29 15:33:09 pcmk1 kernel: [ 930.680164] block drbd0: self >> A74FA07C42A73C34:0000000000000000:D307B78549F98F3B:D306B78549F98F3B >> bits:0 flags:0 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.680171] block drbd0: peer >> BFBB0CC0CCCEA510:A74FA07C42A73C35:D307B78549F98F3B:D306B78549F98F3B >> bits:0 flags:0 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.680178] block drbd0: >> uuid_compare()=-1 by rule 50 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.680189] block drbd0: peer( Unknown >> -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> >> UpToDate ) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.680609] block drbd0: receive bitmap >> stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% >> Sep 29 15:33:09 pcmk1 kernel: [ 930.680704] block drbd0: send bitmap >> stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% >> Sep 29 15:33:09 pcmk1 kernel: [ 930.680710] block drbd0: conn( >> WFBitMapT -> WFSyncUUID ) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.682605] block drbd0: updated sync >> uuid A750A07C42A73C34:0000000000000000:D307B78549F98F3B:D306B78549F98F3B >> Sep 29 15:33:09 pcmk1 kernel: [ 930.682727] block drbd0: helper >> command: /sbin/drbdadm before-resync-target minor-0 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.683957] block drbd0: role( >> Secondary -> Primary ) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.684148] block drbd0: helper >> command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.684761] block drbd0: conn( >> WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.684780] block drbd0: Began resync >> as SyncTarget (will sync 0 KB [0 bits set]). >> Sep 29 15:33:09 pcmk1 kernel: [ 930.684873] block drbd0: peer( >> Secondary -> Primary ) >> Sep 29 15:33:09 pcmk1 drbd[492]: WARN: stdin/stdout is not a TTY; using >> /dev/console. >> Sep 29 15:33:09 pcmk1 systemd[1]: Started LSB: Control drbd resources.. >> Sep 29 15:33:09 pcmk1 kernel: [ 930.685167] block drbd0: Resync done >> (total 1 sec; paused 0 sec; 0 K/sec) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.685171] block drbd0: updated UUIDs >> BFBB0CC0CCCEA511:0000000000000000:A750A07C42A73C35:A74FA07C42A73C35 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.685175] block drbd0: conn( >> SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) >> Sep 29 15:33:09 pcmk1 kernel: [ 930.686606] block drbd0: helper >> command: /sbin/drbdadm after-resync-target minor-0 >> Sep 29 15:33:09 pcmk1 kernel: [ 930.688369] block drbd0: helper >> command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) >> ==== >> >> Starting DRBD on pcmk2; >> ==== >> Sep 29 15:33:09 pcmk2 systemd[1]: Starting LSB: Control drbd resources.... >> Sep 29 15:33:09 pcmk2 kernel: [ 342.307173] events: mcg drbd: 3 >> Sep 29 15:33:09 pcmk2 kernel: [ 342.309495] drbd: initialized. Version: >> 8.4.3 (api:1/proto:86-101) >> Sep 29 15:33:09 pcmk2 kernel: [ 342.309497] drbd: srcversion: >> 5CF35A4122BF8D21CC12AE2 >> Sep 29 15:33:09 pcmk2 kernel: [ 342.309498] drbd: registered as block >> device major 147 >> Sep 29 15:33:09 pcmk2 drbd[433]: Starting DRBD resources: [ >> Sep 29 15:33:09 pcmk2 drbd[433]: create res: r0 >> Sep 29 15:33:09 pcmk2 drbd[433]: prepare disk: r0 >> Sep 29 15:33:09 pcmk2 kernel: [ 342.325521] d-con r0: Starting worker >> thread (from drbdsetup [447]) >> Sep 29 15:33:09 pcmk2 kernel: [ 342.325646] block drbd0: disk( Diskless >> -> Attaching ) >> Sep 29 15:33:09 pcmk2 kernel: [ 342.325722] d-con r0: Method to ensure >> write ordering: drain >> Sep 29 15:33:09 pcmk2 kernel: [ 342.325724] block drbd0: max BIO size = >> 1048576 >> Sep 29 15:33:09 pcmk2 kernel: [ 342.325728] block drbd0: drbd_bm_resize >> called with capacity == 25556136 >> Sep 29 15:33:09 pcmk2 kernel: [ 342.325770] block drbd0: resync bitmap: >> bits=3194517 words=49915 pages=98 >> Sep 29 15:33:09 pcmk2 kernel: [ 342.325772] block drbd0: size = 12 GB >> (12778068 KB) >> Sep 29 15:33:09 pcmk2 kernel: [ 342.326865] block drbd0: bitmap READ of >> 98 pages took 1 jiffies >> Sep 29 15:33:09 pcmk2 kernel: [ 342.326916] block drbd0: recounting of >> set bits took additional 0 jiffies >> Sep 29 15:33:09 pcmk2 kernel: [ 342.326917] block drbd0: 0 KB (0 bits) >> marked out-of-sync by on disk bit-map. >> Sep 29 15:33:09 pcmk2 kernel: [ 342.326921] block drbd0: disk( >> Attaching -> UpToDate ) pdsk( DUnknown -> Outdated ) >> Sep 29 15:33:09 pcmk2 kernel: [ 342.326923] block drbd0: attached to >> UUIDs BFBB0CC0CCCEA511:A74FA07C42A73C35:D307B78549F98F3B:D306B78549F98F3B >> Sep 29 15:33:09 pcmk2 drbd[433]: adjust disk: r0 >> Sep 29 15:33:09 pcmk2 kernel: [ 342.328489] d-con r0: conn( StandAlone >> -> Unconnected ) >> Sep 29 15:33:09 pcmk2 kernel: [ 342.328502] d-con r0: Starting receiver >> thread (from drbd_w_r0 [448]) >> Sep 29 15:33:09 pcmk2 kernel: [ 342.328567] d-con r0: receiver (re)started >> Sep 29 15:33:09 pcmk2 kernel: [ 342.328574] d-con r0: conn( Unconnected >> -> WFConnection ) >> Sep 29 15:33:09 pcmk2 drbd[433]: adjust net: r0 >> Sep 29 15:33:09 pcmk2 drbd[433]: ] >> Sep 29 15:33:10 pcmk2 kernel: [ 343.731229] d-con r0: Handshake >> successful: Agreed network protocol version 101 >> Sep 29 15:33:10 pcmk2 kernel: [ 343.731268] d-con r0: conn( >> WFConnection -> WFReportParams ) >> Sep 29 15:33:10 pcmk2 kernel: [ 343.731271] d-con r0: Starting asender >> thread (from drbd_r_r0 [451]) >> Sep 29 15:33:10 pcmk2 kernel: [ 343.741130] block drbd0: >> drbd_sync_handshake: >> Sep 29 15:33:10 pcmk2 kernel: [ 343.741140] block drbd0: self >> BFBB0CC0CCCEA510:A74FA07C42A73C35:D307B78549F98F3B:D306B78549F98F3B >> bits:0 flags:0 >> Sep 29 15:33:10 pcmk2 kernel: [ 343.741146] block drbd0: peer >> A74FA07C42A73C34:0000000000000000:D307B78549F98F3B:D306B78549F98F3B >> bits:0 flags:0 >> Sep 29 15:33:10 pcmk2 kernel: [ 343.741152] block drbd0: >> uuid_compare()=1 by rule 70 >> Sep 29 15:33:10 pcmk2 kernel: [ 343.741163] block drbd0: peer( Unknown >> -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> >> Consistent ) >> Sep 29 15:33:10 pcmk2 kernel: [ 343.741315] block drbd0: send bitmap >> stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% >> Sep 29 15:33:10 pcmk2 kernel: [ 343.741845] block drbd0: receive bitmap >> stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% >> Sep 29 15:33:10 pcmk2 kernel: [ 343.741852] block drbd0: helper >> command: /sbin/drbdadm before-resync-source minor-0 >> Sep 29 15:33:10 pcmk2 kernel: [ 343.743229] block drbd0: helper >> command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0) >> Sep 29 15:33:10 pcmk2 kernel: [ 343.743245] block drbd0: conn( >> WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) >> Sep 29 15:33:10 pcmk2 kernel: [ 343.743253] block drbd0: Began resync >> as SyncSource (will sync 0 KB [0 bits set]). >> Sep 29 15:33:10 pcmk2 kernel: [ 343.743288] block drbd0: updated sync >> UUID BFBB0CC0CCCEA510:A750A07C42A73C35:A74FA07C42A73C35:D307B78549F98F3B >> Sep 29 15:33:10 pcmk2 kernel: [ 343.744829] block drbd0: peer( >> Secondary -> Primary ) >> Sep 29 15:33:10 pcmk2 kernel: [ 343.745923] block drbd0: role( >> Secondary -> Primary ) >> Sep 29 15:33:10 pcmk2 drbd[433]: WARN: stdin/stdout is not a TTY; using >> /dev/console. >> Sep 29 15:33:10 pcmk2 systemd[1]: Started LSB: Control drbd resources.. >> Sep 29 15:33:10 pcmk2 kernel: [ 343.750704] block drbd0: Resync done >> (total 1 sec; paused 0 sec; 0 K/sec) >> Sep 29 15:33:10 pcmk2 kernel: [ 343.750710] block drbd0: updated UUIDs >> BFBB0CC0CCCEA511:0000000000000000:A750A07C42A73C35:A74FA07C42A73C35 >> Sep 29 15:33:10 pcmk2 kernel: [ 343.750716] block drbd0: conn( >> SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) >> ==== >> >> Showing /proc/drbd; >> >> [root at pcmk1 ~]# cat /proc/drbd >> ==== >> version: 8.4.3 (api:1/proto:86-101) >> srcversion: 5CF35A4122BF8D21CC12AE2 >> 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- >> ns:0 nr:0 dw:0 dr:152 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 >> ==== >> >> Now when I crash pcmk2 again, it gets fenced but DRBD thinks the fence >> fails. >> >> ==== >> Sep 29 15:56:50 pcmk1 corosync[404]: [TOTEM ] A processor failed, >> forming new configuration. >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189045] d-con r0: PingAck did not >> arrive in time. >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189086] d-con r0: peer( Primary -> >> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown >> ) susp( 0 -> 1 ) >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189251] d-con r0: asender terminated >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189253] d-con r0: Terminating drbd_a_r0 >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189317] d-con r0: Connection closed >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189499] d-con r0: conn( >> NetworkFailure -> Unconnected ) >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189501] d-con r0: receiver terminated >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189502] d-con r0: Restarting >> receiver thread >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189503] d-con r0: receiver (re)started >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189508] d-con r0: conn( Unconnected >> -> WFConnection ) >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189523] d-con r0: helper command: >> /sbin/drbdadm fence-peer r0 >> Sep 29 15:56:50 pcmk1 crm-fence-peer.sh[539]: invoked for r0 >> Sep 29 15:56:50 pcmk1 crm-fence-peer.sh[539]: WARNING drbd-fencing could >> not determine the master id of drbd resource r0 >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.280209] d-con r0: helper command: >> /sbin/drbdadm fence-peer r0 exit code 1 (0x100) >> Sep 29 15:56:50 pcmk1 kernel: [ 2351.280213] d-con r0: fence-peer helper >> broken, returned 1 >> Sep 29 15:56:51 pcmk1 corosync[404]: [TOTEM ] A new membership >> (192.168.122.201:60) was formed. Members left: 2 >> Sep 29 15:56:51 pcmk1 crmd[422]: warning: match_down_event: No match >> for shutdown action on 2 >> Sep 29 15:56:51 pcmk1 crmd[422]: notice: peer_update_callback: >> Stonith/shutdown of pcmk2.alteeve.ca not matched >> Sep 29 15:56:51 pcmk1 crmd[422]: notice: do_state_transition: State >> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC >> cause=C_FSA_INTERNAL origin=abort_transition_graph ] >> Sep 29 15:56:51 pcmk1 corosync[404]: [QUORUM] This node is within the >> non-primary component and will NOT provide any services. >> Sep 29 15:56:51 pcmk1 corosync[404]: [QUORUM] Members[1]: 1 >> Sep 29 15:56:51 pcmk1 pengine[421]: notice: unpack_config: On loss of >> CCM Quorum: Ignore >> Sep 29 15:56:51 pcmk1 pengine[421]: warning: pe_fence_node: Node >> pcmk2.alteeve.ca will be fenced because our peer process is no longer >> available >> Sep 29 15:56:51 pcmk1 pengine[421]: warning: determine_online_status: >> Node pcmk2.alteeve.ca is unclean >> Sep 29 15:56:51 pcmk1 pengine[421]: warning: stage6: Scheduling Node >> pcmk2.alteeve.ca for STONITH >> Sep 29 15:56:51 pcmk1 pengine[421]: notice: LogActions: Move >> fence_pcmk2_xvm#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca) >> Sep 29 15:56:51 pcmk1 pengine[421]: warning: process_pe_message: >> Calculated Transition 6: /var/lib/pacemaker/pengine/pe-warn-3.bz2 >> Sep 29 15:56:51 pcmk1 corosync[404]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Sep 29 15:56:51 pcmk1 crmd[422]: notice: pcmk_quorum_notification: >> Membership 60: quorum lost (1) >> Sep 29 15:56:51 pcmk1 crmd[422]: notice: crm_update_peer_state: >> pcmk_quorum_notification: Node pcmk2.alteeve.ca[2] - state is now lost >> (was member) >> Sep 29 15:56:51 pcmk1 crmd[422]: warning: match_down_event: No match >> for shutdown action on 2 >> Sep 29 15:56:51 pcmk1 crmd[422]: notice: peer_update_callback: >> Stonith/shutdown of pcmk2.alteeve.ca not matched >> Sep 29 15:56:52 pcmk1 pengine[421]: notice: unpack_config: On loss of >> CCM Quorum: Ignore >> Sep 29 15:56:52 pcmk1 pengine[421]: warning: pe_fence_node: Node >> pcmk2.alteeve.ca will be fenced because the node is no longer part of >> the cluster >> Sep 29 15:56:52 pcmk1 pengine[421]: warning: determine_online_status: >> Node pcmk2.alteeve.ca is unclean >> Sep 29 15:56:52 pcmk1 pengine[421]: warning: custom_action: Action >> fence_pcmk2_xvm_stop_0 on pcmk2.alteeve.ca is unrunnable (offline) >> Sep 29 15:56:52 pcmk1 pengine[421]: warning: stage6: Scheduling Node >> pcmk2.alteeve.ca for STONITH >> Sep 29 15:56:52 pcmk1 pengine[421]: notice: LogActions: Move >> fence_pcmk2_xvm#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca) >> Sep 29 15:56:52 pcmk1 pengine[421]: warning: process_pe_message: >> Calculated Transition 7: /var/lib/pacemaker/pengine/pe-warn-4.bz2 >> Sep 29 15:56:52 pcmk1 crmd[422]: notice: te_fence_node: Executing >> reboot fencing operation (9) on pcmk2.alteeve.ca (timeout=60000) >> Sep 29 15:56:52 pcmk1 stonith-ng[418]: notice: handle_request: Client >> crmd.422.e5752159 wants to fence (reboot) 'pcmk2.alteeve.ca' with device >> '(any)' >> Sep 29 15:56:52 pcmk1 stonith-ng[418]: notice: >> initiate_remote_stonith_op: Initiating remote operation reboot for >> pcmk2.alteeve.ca: f09f32c4-e726-464b-9065-56346f7c047c (0) >> Sep 29 15:56:52 pcmk1 stonith-ng[418]: error: remote_op_done: >> Operation reboot of pcmk2.alteeve.ca by pcmk1.alteeve.ca for >> crmd.422 at pcmk1.alteeve.ca.f09f32c4: No such device >> Sep 29 15:56:52 pcmk1 crmd[422]: notice: tengine_stonith_callback: >> Stonith operation 3/9:7:0:b91ef09d-b521-4e52-a3fb-bbcf1a0f6a37: No such >> device (-19) >> Sep 29 15:56:52 pcmk1 crmd[422]: notice: tengine_stonith_callback: >> Stonith operation 3 for pcmk2.alteeve.ca failed (No such device): >> aborting transition. >> Sep 29 15:56:52 pcmk1 crmd[422]: notice: tengine_stonith_notify: Peer >> pcmk2.alteeve.ca was not terminated (reboot) by pcmk1.alteeve.ca for >> pcmk1.alteeve.ca: No such device >> (ref=f09f32c4-e726-464b-9065-56346f7c047c) by client crmd.422 >> Sep 29 15:56:52 pcmk1 crmd[422]: notice: run_graph: Transition 7 >> (Complete=1, Pending=0, Fired=0, Skipped=4, Incomplete=0, >> Source=/var/lib/pacemaker/pengine/pe-warn-4.bz2): Stopped >> Sep 29 15:56:52 pcmk1 crmd[422]: notice: too_many_st_failures: No >> devices found in cluster to fence pcmk2.alteeve.ca, giving up >> Sep 29 15:56:52 pcmk1 crmd[422]: notice: do_state_transition: State >> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS >> cause=C_FSA_INTERNAL origin=notify_crmd ] >> ==== >> > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?