[DRBD-user] Problem with crm fencing

Digimer lists at alteeve.ca
Mon Sep 30 07:25:12 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I sorted it out.

I thought it would be like 'obliterate-peer.sh' or 'rhcs_fence' on cman
where cman just had to be running for the fence-handler script to work.
I dug into crm-fence-peer.sh and saw that it was parsing cib.xml. I'd
not configured DRBD inside pacemaker, so the fence handler was failing.

Thanks for letting me rubber-ducky debug on the list. :)

digimer

On 29/09/13 21:38, Digimer wrote:
> I fixed my pacemaker fencing issue (upgraded from pacemaker 1.1.9 to
> 1.1.10 and changed my fencing from fence_xvm to fence_virsh). So now I
> can repeatedly crash a node and it will be fenced successfully. However,
> drbd still complains that the fence is broken;
> 
> When I start DRBD on the crashed node (still using 'echo c >
> /proc/sysrq-trigger'), it drops complaining of a split-brain. Now sure
> how that's even possible because, as I understand it, 'echo c' is a
> pretty instant crash...
> 
> System logs from the surviving peer:
> 
> ====
> Sep 29 21:28:40 pcmk1 corosync[616]:  [TOTEM ] A processor failed,
> forming new configuration.
> Sep 29 21:28:41 pcmk1 corosync[616]:  [TOTEM ] A new membership
> (192.168.122.201:168) was formed. Members left: 2
> Sep 29 21:28:41 pcmk1 crmd[634]:  warning: match_down_event: No match
> for shutdown action on 2
> Sep 29 21:28:41 pcmk1 crmd[634]:   notice: peer_update_callback:
> Stonith/shutdown of pcmk2.alteeve.ca not matched
> Sep 29 21:28:41 pcmk1 crmd[634]:   notice: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Sep 29 21:28:41 pcmk1 pacemakerd[628]:   notice:
> pcmk_quorum_notification: Membership 168: quorum lost (1)
> Sep 29 21:28:41 pcmk1 pacemakerd[628]:   notice: crm_update_peer_state:
> pcmk_quorum_notification: Node pcmk2.alteeve.ca[2] - state is now lost
> (was member)
> Sep 29 21:28:41 pcmk1 crmd[634]:   notice: pcmk_quorum_notification:
> Membership 168: quorum lost (1)
> Sep 29 21:28:41 pcmk1 crmd[634]:   notice: crm_update_peer_state:
> pcmk_quorum_notification: Node pcmk2.alteeve.ca[2] - state is now lost
> (was member)
> Sep 29 21:28:41 pcmk1 crmd[634]:  warning: match_down_event: No match
> for shutdown action on 2
> Sep 29 21:28:41 pcmk1 crmd[634]:   notice: peer_update_callback:
> Stonith/shutdown of pcmk2.alteeve.ca not matched
> Sep 29 21:28:41 pcmk1 pengine[633]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
> Sep 29 21:28:41 pcmk1 pengine[633]:  warning: pe_fence_node: Node
> pcmk2.alteeve.ca will be fenced because our peer process is no longer
> available
> Sep 29 21:28:41 pcmk1 pengine[633]:  warning: determine_online_status:
> Node pcmk2.alteeve.ca is unclean
> Sep 29 21:28:41 pcmk1 pengine[633]:  warning: stage6: Scheduling Node
> pcmk2.alteeve.ca for STONITH
> Sep 29 21:28:41 pcmk1 pengine[633]:   notice: LogActions: Move
> fence_pcmk2_virsh#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca)
> Sep 29 21:28:41 pcmk1 corosync[616]:  [QUORUM] This node is within the
> non-primary component and will NOT provide any services.
> Sep 29 21:28:41 pcmk1 corosync[616]:  [QUORUM] Members[1]: 1
> Sep 29 21:28:41 pcmk1 corosync[616]:  [MAIN  ] Completed service
> synchronization, ready to provide service.
> Sep 29 21:28:42 pcmk1 pengine[633]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
> Sep 29 21:28:42 pcmk1 pengine[633]:  warning: pe_fence_node: Node
> pcmk2.alteeve.ca will be fenced because the node is no longer part of
> the cluster
> Sep 29 21:28:42 pcmk1 pengine[633]:  warning: determine_online_status:
> Node pcmk2.alteeve.ca is unclean
> Sep 29 21:28:42 pcmk1 pengine[633]:  warning: custom_action: Action
> fence_pcmk2_virsh_stop_0 on pcmk2.alteeve.ca is unrunnable (offline)
> Sep 29 21:28:42 pcmk1 pengine[633]:  warning: stage6: Scheduling Node
> pcmk2.alteeve.ca for STONITH
> Sep 29 21:28:42 pcmk1 pengine[633]:   notice: LogActions: Move
> fence_pcmk2_virsh#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca)
> Sep 29 21:28:42 pcmk1 crmd[634]:   notice: te_fence_node: Executing
> reboot fencing operation (9) on pcmk2.alteeve.ca (timeout=60000)
> Sep 29 21:28:42 pcmk1 pengine[633]:  warning: process_pe_message:
> Calculated Transition 18: /var/lib/pacemaker/pengine/pe-warn-16.bz2
> Sep 29 21:28:42 pcmk1 stonith-ng[630]:   notice: handle_request: Client
> crmd.634.84ac8196 wants to fence (reboot) 'pcmk2.alteeve.ca' with device
> '(any)'
> Sep 29 21:28:42 pcmk1 stonith-ng[630]:   notice:
> initiate_remote_stonith_op: Initiating remote operation reboot for
> pcmk2.alteeve.ca: 0d1addd2-586b-4137-a0e2-ae5b06dddbfe (0)
> Sep 29 21:28:42 pcmk1 stonith-ng[630]:   notice:
> can_fence_host_with_device: fence_pcmk1_virsh can not fence
> pcmk2.alteeve.ca: static-list
> Sep 29 21:28:42 pcmk1 stonith-ng[630]:   notice:
> can_fence_host_with_device: fence_pcmk2_virsh can fence
> pcmk2.alteeve.ca: static-list
> Sep 29 21:28:42 pcmk1 stonith-ng[630]:   notice:
> can_fence_host_with_device: fence_pcmk1_virsh can not fence
> pcmk2.alteeve.ca: static-list
> Sep 29 21:28:42 pcmk1 stonith-ng[630]:   notice:
> can_fence_host_with_device: fence_pcmk2_virsh can fence
> pcmk2.alteeve.ca: static-list
> Sep 29 21:28:42 pcmk1 fence_virsh: Parse error: Ignoring unknown option
> 'nodename=pcmk2.alteeve.ca
> Sep 29 21:28:44 pcmk1 stonith-ng[630]:   notice: log_operation:
> Operation 'reboot' [27363] (call 4 from crmd.634) for host
> 'pcmk2.alteeve.ca' with device 'fence_pcmk2_virsh' returned: 0 (OK)
> Sep 29 21:28:44 pcmk1 stonith-ng[630]:   notice: remote_op_done:
> Operation reboot of pcmk2.alteeve.ca by pcmk1.alteeve.ca for
> crmd.634 at pcmk1.alteeve.ca.0d1addd2: OK
> Sep 29 21:28:44 pcmk1 crmd[634]:   notice: tengine_stonith_callback:
> Stonith operation 4/9:18:0:01494e04-df34-4075-84a1-545d598bfc3e: OK (0)
> Sep 29 21:28:44 pcmk1 crmd[634]:   notice: run_graph: Transition 18
> (Complete=1, Pending=0, Fired=0, Skipped=4, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-warn-16.bz2): Stopped
> Sep 29 21:28:44 pcmk1 crmd[634]:   notice: tengine_stonith_notify: Peer
> pcmk2.alteeve.ca was terminated (reboot) by pcmk1.alteeve.ca for
> pcmk1.alteeve.ca: OK (ref=0d1addd2-586b-4137-a0e2-ae5b06dddbfe) by
> client crmd.634
> Sep 29 21:28:44 pcmk1 pengine[633]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
> Sep 29 21:28:44 pcmk1 pengine[633]:   notice: LogActions: Start
> fence_pcmk2_virsh#011(pcmk1.alteeve.ca)
> Sep 29 21:28:44 pcmk1 pengine[633]:   notice: process_pe_message:
> Calculated Transition 19: /var/lib/pacemaker/pengine/pe-input-42.bz2
> Sep 29 21:28:44 pcmk1 crmd[634]:   notice: te_rsc_command: Initiating
> action 6: start fence_pcmk2_virsh_start_0 on pcmk1.alteeve.ca (local)
> Sep 29 21:28:44 pcmk1 stonith-ng[630]:   notice:
> stonith_device_register: Device 'fence_pcmk2_virsh' already existed in
> device list (2 active devices)
> Sep 29 21:28:46 pcmk1 crmd[634]:   notice: process_lrm_event: LRM
> operation fence_pcmk2_virsh_start_0 (call=61, rc=0, cib-update=143,
> confirmed=true) ok
> Sep 29 21:28:46 pcmk1 crmd[634]:   notice: run_graph: Transition 19
> (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-42.bz2): Complete
> Sep 29 21:28:46 pcmk1 crmd[634]:   notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680039] d-con r0: PingAck did not
> arrive in time.
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680078] d-con r0: peer( Primary ->
> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown
> ) susp( 0 -> 1 )
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680275] d-con r0: asender terminated
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680278] d-con r0: Terminating drbd_a_r0
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680344] d-con r0: Connection closed
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680453] d-con r0: conn(
> NetworkFailure -> Unconnected )
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680455] d-con r0: receiver terminated
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680456] d-con r0: Restarting
> receiver thread
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680465] d-con r0: receiver (re)started
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680471] d-con r0: conn( Unconnected
> -> WFConnection )
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.680539] d-con r0: helper command:
> /sbin/drbdadm fence-peer r0
> Sep 29 21:28:48 pcmk1 crm-fence-peer.sh[27388]: invoked for r0
> Sep 29 21:28:48 pcmk1 crm-fence-peer.sh[27388]: WARNING drbd-fencing
> could not determine the master id of drbd resource r0
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.702907] d-con r0: helper command:
> /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
> Sep 29 21:28:48 pcmk1 kernel: [ 4460.702909] d-con r0: fence-peer helper
> broken, returned 1
> ====
> 
> And from when I start DRBD back up;
> 
> Survivor's logs;
> 
> ====
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.844224] d-con r0: Handshake
> successful: Agreed network protocol version 101
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.844268] d-con r0: conn(
> WFConnection -> WFReportParams )
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.844271] d-con r0: Starting asender
> thread (from drbd_r_r0 [27347])
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853184] block drbd0:
> drbd_sync_handshake:
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853190] block drbd0: self
> BFBB0CC0CCCEA511:0000000000000000:2E6EDFAE818A295C:A750A07C42A73C35
> bits:0 flags:0
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853193] block drbd0: peer
> BFBB0CC0CCCEA510:0000000000000000:2E6EDFAE818A295C:A750A07C42A73C35
> bits:0 flags:2
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853195] block drbd0:
> uuid_compare()=-1 by rule 40
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853197] block drbd0: I shall become
> SyncTarget, but I am primary!
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853299] d-con r0: conn(
> WFReportParams -> Disconnecting )
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853303] d-con r0: error receiving
> ReportState, e: -5 l: 0!
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853392] d-con r0: asender terminated
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853400] d-con r0: Terminating drbd_a_r0
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853520] d-con r0: Connection closed
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853729] d-con r0: conn(
> Disconnecting -> StandAlone )
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853733] d-con r0: receiver terminated
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853735] d-con r0: Terminating drbd_r_r0
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.853744] d-con r0: helper command:
> /sbin/drbdadm fence-peer r0
> Sep 29 21:33:36 pcmk1 crm-fence-peer.sh[27420]: invoked for r0
> Sep 29 21:33:36 pcmk1 crm-fence-peer.sh[27420]: WARNING drbd-fencing
> could not determine the master id of drbd resource r0
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.871111] d-con r0: helper command:
> /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
> Sep 29 21:33:36 pcmk1 kernel: [ 4747.871114] d-con r0: fence-peer helper
> broken, returned 1
> ====
> 
> Recovered peer's logs;
> 
> ====
> Sep 29 21:33:35 pcmk2 kernel: [  285.613761] events: mcg drbd: 3
> Sep 29 21:33:35 pcmk2 kernel: [  285.616368] drbd: initialized. Version:
> 8.4.3 (api:1/proto:86-101)
> Sep 29 21:33:35 pcmk2 kernel: [  285.616371] drbd: srcversion:
> 5CF35A4122BF8D21CC12AE2
> Sep 29 21:33:35 pcmk2 kernel: [  285.616372] drbd: registered as block
> device major 147
> Sep 29 21:33:35 pcmk2 drbd[454]: Starting DRBD resources: [
> Sep 29 21:33:35 pcmk2 drbd[454]: create res: r0
> Sep 29 21:33:35 pcmk2 drbd[454]: prepare disk: r0
> Sep 29 21:33:35 pcmk2 kernel: [  285.632794] d-con r0: Starting worker
> thread (from drbdsetup [469])
> Sep 29 21:33:35 pcmk2 kernel: [  285.632958] block drbd0: disk( Diskless
> -> Attaching )
> Sep 29 21:33:35 pcmk2 kernel: [  285.633062] d-con r0: Method to ensure
> write ordering: drain
> Sep 29 21:33:35 pcmk2 kernel: [  285.633065] block drbd0: max BIO size =
> 1048576
> Sep 29 21:33:35 pcmk2 kernel: [  285.633069] block drbd0: drbd_bm_resize
> called with capacity == 25556136
> Sep 29 21:33:35 pcmk2 kernel: [  285.633111] block drbd0: resync bitmap:
> bits=3194517 words=49915 pages=98
> Sep 29 21:33:35 pcmk2 kernel: [  285.633113] block drbd0: size = 12 GB
> (12778068 KB)
> Sep 29 21:33:35 pcmk2 drbd[454]: adjust disk: r0
> Sep 29 21:33:35 pcmk2 kernel: [  285.633485] block drbd0: bitmap READ of
> 98 pages took 0 jiffies
> Sep 29 21:33:35 pcmk2 kernel: [  285.633541] block drbd0: recounting of
> set bits took additional 0 jiffies
> Sep 29 21:33:35 pcmk2 kernel: [  285.633542] block drbd0: 0 KB (0 bits)
> marked out-of-sync by on disk bit-map.
> Sep 29 21:33:35 pcmk2 kernel: [  285.633546] block drbd0: disk(
> Attaching -> Consistent )
> Sep 29 21:33:35 pcmk2 kernel: [  285.633548] block drbd0: attached to
> UUIDs BFBB0CC0CCCEA511:0000000000000000:2E6EDFAE818A295C:A750A07C42A73C35
> Sep 29 21:33:35 pcmk2 drbd[454]: adjust net: r0
> Sep 29 21:33:35 pcmk2 drbd[454]: ]
> Sep 29 21:33:35 pcmk2 kernel: [  285.636054] d-con r0: conn( StandAlone
> -> Unconnected )
> Sep 29 21:33:35 pcmk2 kernel: [  285.636118] d-con r0: Starting receiver
> thread (from drbd_w_r0 [470])
> Sep 29 21:33:35 pcmk2 kernel: [  285.636842] d-con r0: receiver (re)started
> Sep 29 21:33:35 pcmk2 kernel: [  285.636852] d-con r0: conn( Unconnected
> -> WFConnection )
> Sep 29 21:33:36 pcmk2 kernel: [  286.137522] d-con r0: Handshake
> successful: Agreed network protocol version 101
> Sep 29 21:33:36 pcmk2 kernel: [  286.137605] d-con r0: conn(
> WFConnection -> WFReportParams )
> Sep 29 21:33:36 pcmk2 kernel: [  286.137611] d-con r0: Starting asender
> thread (from drbd_r_r0 [473])
> Sep 29 21:33:36 pcmk2 kernel: [  286.146707] d-con r0: meta connection
> shut down by peer.
> Sep 29 21:33:36 pcmk2 kernel: [  286.146823] d-con r0: conn(
> WFReportParams -> NetworkFailure )
> Sep 29 21:33:36 pcmk2 kernel: [  286.146831] d-con r0: asender terminated
> Sep 29 21:33:36 pcmk2 kernel: [  286.146834] d-con r0: Terminating drbd_a_r0
> Sep 29 21:33:36 pcmk2 kernel: [  286.147369] d-con r0: Connection closed
> Sep 29 21:33:36 pcmk2 kernel: [  286.147397] d-con r0: conn(
> NetworkFailure -> Unconnected )
> Sep 29 21:33:36 pcmk2 kernel: [  286.147400] d-con r0: receiver terminated
> Sep 29 21:33:36 pcmk2 kernel: [  286.147401] d-con r0: Restarting
> receiver thread
> Sep 29 21:33:36 pcmk2 kernel: [  286.147403] d-con r0: receiver (re)started
> Sep 29 21:33:36 pcmk2 kernel: [  286.147413] d-con r0: conn( Unconnected
> -> WFConnection )
> ====
> 
> Any idea?
> 
> digimer
> 
> On 29/09/13 16:06, Digimer wrote:
>> Hi all,
>>
>>   I'm back to learning pacemaker and DRBD 8.4. I've got a couple VMs
>> with fence_xvm / fence_virtd working (I can crash a node with 'echo c >
>> /proc/sysrq-trigger' and it reboots).
>>
>>   However, when I start DRBD (not added to pacemaker, just running with
>> 'handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; }' and 'disk {
>> fencing resource-and-stonith; }') and crash the node, DRBD says the
>> fence handler is broken and the resource is split-brained when it reboots.
>>
>>   Below are all the details, but the main line, I think, is;
>>
>> ====
>> Sep 29 15:56:50 pcmk1 crm-fence-peer.sh[539]: invoked for r0
>> Sep 29 15:56:50 pcmk1 crm-fence-peer.sh[539]: WARNING drbd-fencing could
>> not determine the master id of drbd resource r0
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.280209] d-con r0: helper command:
>> /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.280213] d-con r0: fence-peer helper
>> broken, returned 1
>> ====
>>
>>   Here are the extended details. Note that, for reasons that may be
>> related, the fence action from pacemaker works the first time but fails
>> the second time. However, if I start both nodes clean, start DRBD and
>> crash the node, the fence works but the same DRBD fence error is shown.
>>
>> digimer
>>
>> [root at pcmk1 ~]# pcs config show
>> ====
>> Cluster Name: an-pcmk-01
>> Corosync Nodes:
>>  pcmk1.alteeve.ca pcmk2.alteeve.ca
>> Pacemaker Nodes:
>>  pcmk1.alteeve.ca pcmk2.alteeve.ca
>>
>> Resources:
>>
>> Location Constraints:
>> Ordering Constraints:
>> Colocation Constraints:
>>
>> Cluster Properties:
>>  cluster-infrastructure: corosync
>>  dc-version: 1.1.9-3.fc19-781a388
>>  no-quorum-policy: ignore
>>  stonith-enabled: true
>> ====
>>
>> [root at pcmk1 ~]# pcs status
>> ====
>> Cluster name: an-pcmk-01
>> Last updated: Sun Sep 29 15:26:15 2013
>> Last change: Sat Sep 28 15:30:12 2013 via cibadmin on pcmk1.alteeve.ca
>> Stack: corosync
>> Current DC: pcmk1.alteeve.ca (1) - partition with quorum
>> Version: 1.1.9-3.fc19-781a388
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>>
>>
>> Online: [ pcmk1.alteeve.ca pcmk2.alteeve.ca ]
>>
>> Full list of resources:
>>
>>  fence_pcmk1_xvm	(stonith:fence_xvm):	Started pcmk1.alteeve.ca
>>  fence_pcmk2_xvm	(stonith:fence_xvm):	Started pcmk2.alteeve.ca
>> ====
>>
>> DRBD is not running, crashing 'pcmk2';
>>
>> ====
>> [root at pcmk2 ~]# echo c > /proc/sysrq-trigger
>> ====
>>
>> Node 1's system logs;
>> ====
>> Sep 29 15:27:16 pcmk1 corosync[404]:  [TOTEM ] A processor failed,
>> forming new configuration.
>> Sep 29 15:27:17 pcmk1 corosync[404]:  [TOTEM ] A new membership
>> (192.168.122.201:52) was formed. Members left: 2
>> Sep 29 15:27:17 pcmk1 crmd[422]:  warning: match_down_event: No match
>> for shutdown action on 2
>> Sep 29 15:27:17 pcmk1 crmd[422]:   notice: peer_update_callback:
>> Stonith/shutdown of pcmk2.alteeve.ca not matched
>> Sep 29 15:27:17 pcmk1 crmd[422]:   notice: do_state_transition: State
>> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>> Sep 29 15:27:17 pcmk1 pengine[421]:   notice: unpack_config: On loss of
>> CCM Quorum: Ignore
>> Sep 29 15:27:17 pcmk1 pengine[421]:  warning: pe_fence_node: Node
>> pcmk2.alteeve.ca will be fenced because our peer process is no longer
>> available
>> Sep 29 15:27:17 pcmk1 pengine[421]:  warning: determine_online_status:
>> Node pcmk2.alteeve.ca is unclean
>> Sep 29 15:27:17 pcmk1 pengine[421]:  warning: stage6: Scheduling Node
>> pcmk2.alteeve.ca for STONITH
>> Sep 29 15:27:17 pcmk1 pengine[421]:   notice: LogActions: Move
>> fence_pcmk2_xvm#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca)
>> Sep 29 15:27:17 pcmk1 crmd[422]:   notice: pcmk_quorum_notification:
>> Membership 52: quorum lost (1)
>> Sep 29 15:27:17 pcmk1 corosync[404]:  [QUORUM] This node is within the
>> non-primary component and will NOT provide any services.
>> Sep 29 15:27:17 pcmk1 corosync[404]:  [QUORUM] Members[1]: 1
>> Sep 29 15:27:17 pcmk1 crmd[422]:   notice: crm_update_peer_state:
>> pcmk_quorum_notification: Node pcmk2.alteeve.ca[2] - state is now lost
>> (was member)
>> Sep 29 15:27:17 pcmk1 crmd[422]:  warning: match_down_event: No match
>> for shutdown action on 2
>> Sep 29 15:27:17 pcmk1 crmd[422]:   notice: peer_update_callback:
>> Stonith/shutdown of pcmk2.alteeve.ca not matched
>> Sep 29 15:27:17 pcmk1 corosync[404]:  [MAIN  ] Completed service
>> synchronization, ready to provide service.
>> Sep 29 15:27:17 pcmk1 pengine[421]:  warning: process_pe_message:
>> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-warn-1.bz2
>> Sep 29 15:27:18 pcmk1 pengine[421]:   notice: unpack_config: On loss of
>> CCM Quorum: Ignore
>> Sep 29 15:27:18 pcmk1 pengine[421]:  warning: pe_fence_node: Node
>> pcmk2.alteeve.ca will be fenced because the node is no longer part of
>> the cluster
>> Sep 29 15:27:18 pcmk1 pengine[421]:  warning: determine_online_status:
>> Node pcmk2.alteeve.ca is unclean
>> Sep 29 15:27:18 pcmk1 pengine[421]:  warning: custom_action: Action
>> fence_pcmk2_xvm_stop_0 on pcmk2.alteeve.ca is unrunnable (offline)
>> Sep 29 15:27:18 pcmk1 pengine[421]:  warning: stage6: Scheduling Node
>> pcmk2.alteeve.ca for STONITH
>> Sep 29 15:27:18 pcmk1 pengine[421]:   notice: LogActions: Move
>> fence_pcmk2_xvm#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca)
>> Sep 29 15:27:18 pcmk1 pengine[421]:  warning: process_pe_message:
>> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-2.bz2
>> Sep 29 15:27:18 pcmk1 crmd[422]:   notice: te_fence_node: Executing
>> reboot fencing operation (9) on pcmk2.alteeve.ca (timeout=60000)
>> Sep 29 15:27:18 pcmk1 stonith-ng[418]:   notice: handle_request: Client
>> crmd.422.e5752159 wants to fence (reboot) 'pcmk2.alteeve.ca' with device
>> '(any)'
>> Sep 29 15:27:18 pcmk1 stonith-ng[418]:   notice:
>> initiate_remote_stonith_op: Initiating remote operation reboot for
>> pcmk2.alteeve.ca: d9e46c20-0f15-417c-bbb2-20b10eed8b4d (0)
>> Sep 29 15:27:20 pcmk1 stonith-ng[418]:   notice: log_operation:
>> Operation 'reboot' [461] (call 2 from crmd.422) for host
>> 'pcmk2.alteeve.ca' with device 'fence_pcmk2_xvm' returned: 0 (OK)
>> Sep 29 15:27:20 pcmk1 stonith-ng[418]:   notice: remote_op_done:
>> Operation reboot of pcmk2.alteeve.ca by pcmk1.alteeve.ca for
>> crmd.422 at pcmk1.alteeve.ca.d9e46c20: OK
>> Sep 29 15:27:20 pcmk1 crmd[422]:   notice: tengine_stonith_callback:
>> Stonith operation 2/9:2:0:b91ef09d-b521-4e52-a3fb-bbcf1a0f6a37: OK (0)
>> Sep 29 15:27:20 pcmk1 crmd[422]:   notice: tengine_stonith_notify: Peer
>> pcmk2.alteeve.ca was terminated (reboot) by pcmk1.alteeve.ca for
>> pcmk1.alteeve.ca: OK (ref=d9e46c20-0f15-417c-bbb2-20b10eed8b4d) by
>> client crmd.422
>> Sep 29 15:27:20 pcmk1 crmd[422]:   notice: te_rsc_command: Initiating
>> action 7: start fence_pcmk2_xvm_start_0 on pcmk1.alteeve.ca (local)
>> Sep 29 15:27:20 pcmk1 stonith-ng[418]:   notice:
>> stonith_device_register: Added 'fence_pcmk2_xvm' to the device list (2
>> active devices)
>> Sep 29 15:27:21 pcmk1 crmd[422]:   notice: process_lrm_event: LRM
>> operation fence_pcmk2_xvm_start_0 (call=16, rc=0, cib-update=43,
>> confirmed=true) ok
>> Sep 29 15:27:21 pcmk1 crmd[422]:   notice: run_graph: Transition 2
>> (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-warn-2.bz2): Complete
>> Sep 29 15:27:21 pcmk1 pengine[421]:   notice: unpack_config: On loss of
>> CCM Quorum: Ignore
>> Sep 29 15:27:21 pcmk1 pengine[421]:   notice: process_pe_message:
>> Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-16.bz2
>> Sep 29 15:27:21 pcmk1 crmd[422]:   notice: run_graph: Transition 3
>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-16.bz2): Complete
>> Sep 29 15:27:21 pcmk1 crmd[422]:   notice: do_state_transition: State
>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> ====
>>
>> [root at pcmk1 ~]# pcs status
>> ====
>> Cluster name: an-pcmk-01
>> Last updated: Sun Sep 29 15:28:06 2013
>> Last change: Sat Sep 28 15:30:12 2013 via cibadmin on pcmk1.alteeve.ca
>> Stack: corosync
>> Current DC: pcmk1.alteeve.ca (1) - partition WITHOUT quorum
>> Version: 1.1.9-3.fc19-781a388
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>>
>>
>> Online: [ pcmk1.alteeve.ca ]
>> OFFLINE: [ pcmk2.alteeve.ca ]
>>
>> Full list of resources:
>>
>>  fence_pcmk1_xvm	(stonith:fence_xvm):	Started pcmk1.alteeve.ca
>>  fence_pcmk2_xvm	(stonith:fence_xvm):	Started pcmk1.alteeve.ca
>> ====
>>
>> So it seems that fencing by pacemaker works. I re-add the pcmk2 node to
>> the cluster and start DRBD;
>>
>> [root at pcmk1 ~]# pcs status
>> ====
>> Cluster name: an-pcmk-01
>> Last updated: Sun Sep 29 15:32:29 2013
>> Last change: Sat Sep 28 15:30:12 2013 via cibadmin on pcmk1.alteeve.ca
>> Stack: corosync
>> Current DC: pcmk1.alteeve.ca (1) - partition with quorum
>> Version: 1.1.9-3.fc19-781a388
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>>
>>
>> Online: [ pcmk1.alteeve.ca pcmk2.alteeve.ca ]
>>
>> Full list of resources:
>>
>>  fence_pcmk1_xvm	(stonith:fence_xvm):	Started pcmk1.alteeve.ca
>>  fence_pcmk2_xvm	(stonith:fence_xvm):	Started pcmk2.alteeve.ca
>> ====
>>
>> Starting DRBD on pcmk1;
>> ====
>> Sep 29 15:33:09 pcmk1 systemd[1]: Starting LSB: Control drbd resources....
>> Sep 29 15:33:09 pcmk1 kernel: [  930.145896] events: mcg drbd: 3
>> Sep 29 15:33:09 pcmk1 kernel: [  930.148651] drbd: initialized. Version:
>> 8.4.3 (api:1/proto:86-101)
>> Sep 29 15:33:09 pcmk1 kernel: [  930.148657] drbd: srcversion:
>> 5CF35A4122BF8D21CC12AE2
>> Sep 29 15:33:09 pcmk1 kernel: [  930.148658] drbd: registered as block
>> device major 147
>> Sep 29 15:33:09 pcmk1 drbd[492]: Starting DRBD resources: [
>> Sep 29 15:33:09 pcmk1 drbd[492]: create res: r0
>> Sep 29 15:33:09 pcmk1 drbd[492]: prepare disk: r0
>> Sep 29 15:33:09 pcmk1 kernel: [  930.165573] d-con r0: Starting worker
>> thread (from drbdsetup [506])
>> Sep 29 15:33:09 pcmk1 kernel: [  930.165686] block drbd0: disk( Diskless
>> -> Attaching )
>> Sep 29 15:33:09 pcmk1 kernel: [  930.165757] d-con r0: Method to ensure
>> write ordering: drain
>> Sep 29 15:33:09 pcmk1 kernel: [  930.165759] block drbd0: max BIO size =
>> 1048576
>> Sep 29 15:33:09 pcmk1 kernel: [  930.165762] block drbd0: drbd_bm_resize
>> called with capacity == 25556136
>> Sep 29 15:33:09 pcmk1 kernel: [  930.165801] block drbd0: resync bitmap:
>> bits=3194517 words=49915 pages=98
>> Sep 29 15:33:09 pcmk1 kernel: [  930.165803] block drbd0: size = 12 GB
>> (12778068 KB)
>> Sep 29 15:33:09 pcmk1 kernel: [  930.166971] block drbd0: bitmap READ of
>> 98 pages took 1 jiffies
>> Sep 29 15:33:09 pcmk1 kernel: [  930.167177] block drbd0: recounting of
>> set bits took additional 1 jiffies
>> Sep 29 15:33:09 pcmk1 kernel: [  930.167179] block drbd0: 0 KB (0 bits)
>> marked out-of-sync by on disk bit-map.
>> Sep 29 15:33:09 pcmk1 kernel: [  930.167182] block drbd0: disk(
>> Attaching -> Outdated )
>> Sep 29 15:33:09 pcmk1 kernel: [  930.167184] block drbd0: attached to
>> UUIDs A74FA07C42A73C34:0000000000000000:D307B78549F98F3B:D306B78549F98F3B
>> Sep 29 15:33:09 pcmk1 drbd[492]: adjust disk: r0
>> Sep 29 15:33:09 pcmk1 drbd[492]: adjust net: r0
>> Sep 29 15:33:09 pcmk1 kernel: [  930.169649] d-con r0: conn( StandAlone
>> -> Unconnected )
>> Sep 29 15:33:09 pcmk1 kernel: [  930.169661] d-con r0: Starting receiver
>> thread (from drbd_w_r0 [507])
>> Sep 29 15:33:09 pcmk1 kernel: [  930.169682] d-con r0: receiver (re)started
>> Sep 29 15:33:09 pcmk1 kernel: [  930.169689] d-con r0: conn( Unconnected
>> -> WFConnection )
>> Sep 29 15:33:09 pcmk1 drbd[492]: ]
>> Sep 29 15:33:09 pcmk1 kernel: [  930.670525] d-con r0: Handshake
>> successful: Agreed network protocol version 101
>> Sep 29 15:33:09 pcmk1 kernel: [  930.670593] d-con r0: conn(
>> WFConnection -> WFReportParams )
>> Sep 29 15:33:09 pcmk1 kernel: [  930.670598] d-con r0: Starting asender
>> thread (from drbd_r_r0 [510])
>> Sep 29 15:33:09 pcmk1 kernel: [  930.680155] block drbd0:
>> drbd_sync_handshake:
>> Sep 29 15:33:09 pcmk1 kernel: [  930.680164] block drbd0: self
>> A74FA07C42A73C34:0000000000000000:D307B78549F98F3B:D306B78549F98F3B
>> bits:0 flags:0
>> Sep 29 15:33:09 pcmk1 kernel: [  930.680171] block drbd0: peer
>> BFBB0CC0CCCEA510:A74FA07C42A73C35:D307B78549F98F3B:D306B78549F98F3B
>> bits:0 flags:0
>> Sep 29 15:33:09 pcmk1 kernel: [  930.680178] block drbd0:
>> uuid_compare()=-1 by rule 50
>> Sep 29 15:33:09 pcmk1 kernel: [  930.680189] block drbd0: peer( Unknown
>> -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown ->
>> UpToDate )
>> Sep 29 15:33:09 pcmk1 kernel: [  930.680609] block drbd0: receive bitmap
>> stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>> Sep 29 15:33:09 pcmk1 kernel: [  930.680704] block drbd0: send bitmap
>> stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>> Sep 29 15:33:09 pcmk1 kernel: [  930.680710] block drbd0: conn(
>> WFBitMapT -> WFSyncUUID )
>> Sep 29 15:33:09 pcmk1 kernel: [  930.682605] block drbd0: updated sync
>> uuid A750A07C42A73C34:0000000000000000:D307B78549F98F3B:D306B78549F98F3B
>> Sep 29 15:33:09 pcmk1 kernel: [  930.682727] block drbd0: helper
>> command: /sbin/drbdadm before-resync-target minor-0
>> Sep 29 15:33:09 pcmk1 kernel: [  930.683957] block drbd0: role(
>> Secondary -> Primary )
>> Sep 29 15:33:09 pcmk1 kernel: [  930.684148] block drbd0: helper
>> command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
>> Sep 29 15:33:09 pcmk1 kernel: [  930.684761] block drbd0: conn(
>> WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
>> Sep 29 15:33:09 pcmk1 kernel: [  930.684780] block drbd0: Began resync
>> as SyncTarget (will sync 0 KB [0 bits set]).
>> Sep 29 15:33:09 pcmk1 kernel: [  930.684873] block drbd0: peer(
>> Secondary -> Primary )
>> Sep 29 15:33:09 pcmk1 drbd[492]: WARN: stdin/stdout is not a TTY; using
>> /dev/console.
>> Sep 29 15:33:09 pcmk1 systemd[1]: Started LSB: Control drbd resources..
>> Sep 29 15:33:09 pcmk1 kernel: [  930.685167] block drbd0: Resync done
>> (total 1 sec; paused 0 sec; 0 K/sec)
>> Sep 29 15:33:09 pcmk1 kernel: [  930.685171] block drbd0: updated UUIDs
>> BFBB0CC0CCCEA511:0000000000000000:A750A07C42A73C35:A74FA07C42A73C35
>> Sep 29 15:33:09 pcmk1 kernel: [  930.685175] block drbd0: conn(
>> SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
>> Sep 29 15:33:09 pcmk1 kernel: [  930.686606] block drbd0: helper
>> command: /sbin/drbdadm after-resync-target minor-0
>> Sep 29 15:33:09 pcmk1 kernel: [  930.688369] block drbd0: helper
>> command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
>> ====
>>
>> Starting DRBD on pcmk2;
>> ====
>> Sep 29 15:33:09 pcmk2 systemd[1]: Starting LSB: Control drbd resources....
>> Sep 29 15:33:09 pcmk2 kernel: [  342.307173] events: mcg drbd: 3
>> Sep 29 15:33:09 pcmk2 kernel: [  342.309495] drbd: initialized. Version:
>> 8.4.3 (api:1/proto:86-101)
>> Sep 29 15:33:09 pcmk2 kernel: [  342.309497] drbd: srcversion:
>> 5CF35A4122BF8D21CC12AE2
>> Sep 29 15:33:09 pcmk2 kernel: [  342.309498] drbd: registered as block
>> device major 147
>> Sep 29 15:33:09 pcmk2 drbd[433]: Starting DRBD resources: [
>> Sep 29 15:33:09 pcmk2 drbd[433]: create res: r0
>> Sep 29 15:33:09 pcmk2 drbd[433]: prepare disk: r0
>> Sep 29 15:33:09 pcmk2 kernel: [  342.325521] d-con r0: Starting worker
>> thread (from drbdsetup [447])
>> Sep 29 15:33:09 pcmk2 kernel: [  342.325646] block drbd0: disk( Diskless
>> -> Attaching )
>> Sep 29 15:33:09 pcmk2 kernel: [  342.325722] d-con r0: Method to ensure
>> write ordering: drain
>> Sep 29 15:33:09 pcmk2 kernel: [  342.325724] block drbd0: max BIO size =
>> 1048576
>> Sep 29 15:33:09 pcmk2 kernel: [  342.325728] block drbd0: drbd_bm_resize
>> called with capacity == 25556136
>> Sep 29 15:33:09 pcmk2 kernel: [  342.325770] block drbd0: resync bitmap:
>> bits=3194517 words=49915 pages=98
>> Sep 29 15:33:09 pcmk2 kernel: [  342.325772] block drbd0: size = 12 GB
>> (12778068 KB)
>> Sep 29 15:33:09 pcmk2 kernel: [  342.326865] block drbd0: bitmap READ of
>> 98 pages took 1 jiffies
>> Sep 29 15:33:09 pcmk2 kernel: [  342.326916] block drbd0: recounting of
>> set bits took additional 0 jiffies
>> Sep 29 15:33:09 pcmk2 kernel: [  342.326917] block drbd0: 0 KB (0 bits)
>> marked out-of-sync by on disk bit-map.
>> Sep 29 15:33:09 pcmk2 kernel: [  342.326921] block drbd0: disk(
>> Attaching -> UpToDate ) pdsk( DUnknown -> Outdated )
>> Sep 29 15:33:09 pcmk2 kernel: [  342.326923] block drbd0: attached to
>> UUIDs BFBB0CC0CCCEA511:A74FA07C42A73C35:D307B78549F98F3B:D306B78549F98F3B
>> Sep 29 15:33:09 pcmk2 drbd[433]: adjust disk: r0
>> Sep 29 15:33:09 pcmk2 kernel: [  342.328489] d-con r0: conn( StandAlone
>> -> Unconnected )
>> Sep 29 15:33:09 pcmk2 kernel: [  342.328502] d-con r0: Starting receiver
>> thread (from drbd_w_r0 [448])
>> Sep 29 15:33:09 pcmk2 kernel: [  342.328567] d-con r0: receiver (re)started
>> Sep 29 15:33:09 pcmk2 kernel: [  342.328574] d-con r0: conn( Unconnected
>> -> WFConnection )
>> Sep 29 15:33:09 pcmk2 drbd[433]: adjust net: r0
>> Sep 29 15:33:09 pcmk2 drbd[433]: ]
>> Sep 29 15:33:10 pcmk2 kernel: [  343.731229] d-con r0: Handshake
>> successful: Agreed network protocol version 101
>> Sep 29 15:33:10 pcmk2 kernel: [  343.731268] d-con r0: conn(
>> WFConnection -> WFReportParams )
>> Sep 29 15:33:10 pcmk2 kernel: [  343.731271] d-con r0: Starting asender
>> thread (from drbd_r_r0 [451])
>> Sep 29 15:33:10 pcmk2 kernel: [  343.741130] block drbd0:
>> drbd_sync_handshake:
>> Sep 29 15:33:10 pcmk2 kernel: [  343.741140] block drbd0: self
>> BFBB0CC0CCCEA510:A74FA07C42A73C35:D307B78549F98F3B:D306B78549F98F3B
>> bits:0 flags:0
>> Sep 29 15:33:10 pcmk2 kernel: [  343.741146] block drbd0: peer
>> A74FA07C42A73C34:0000000000000000:D307B78549F98F3B:D306B78549F98F3B
>> bits:0 flags:0
>> Sep 29 15:33:10 pcmk2 kernel: [  343.741152] block drbd0:
>> uuid_compare()=1 by rule 70
>> Sep 29 15:33:10 pcmk2 kernel: [  343.741163] block drbd0: peer( Unknown
>> -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated ->
>> Consistent )
>> Sep 29 15:33:10 pcmk2 kernel: [  343.741315] block drbd0: send bitmap
>> stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>> Sep 29 15:33:10 pcmk2 kernel: [  343.741845] block drbd0: receive bitmap
>> stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>> Sep 29 15:33:10 pcmk2 kernel: [  343.741852] block drbd0: helper
>> command: /sbin/drbdadm before-resync-source minor-0
>> Sep 29 15:33:10 pcmk2 kernel: [  343.743229] block drbd0: helper
>> command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
>> Sep 29 15:33:10 pcmk2 kernel: [  343.743245] block drbd0: conn(
>> WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
>> Sep 29 15:33:10 pcmk2 kernel: [  343.743253] block drbd0: Began resync
>> as SyncSource (will sync 0 KB [0 bits set]).
>> Sep 29 15:33:10 pcmk2 kernel: [  343.743288] block drbd0: updated sync
>> UUID BFBB0CC0CCCEA510:A750A07C42A73C35:A74FA07C42A73C35:D307B78549F98F3B
>> Sep 29 15:33:10 pcmk2 kernel: [  343.744829] block drbd0: peer(
>> Secondary -> Primary )
>> Sep 29 15:33:10 pcmk2 kernel: [  343.745923] block drbd0: role(
>> Secondary -> Primary )
>> Sep 29 15:33:10 pcmk2 drbd[433]: WARN: stdin/stdout is not a TTY; using
>> /dev/console.
>> Sep 29 15:33:10 pcmk2 systemd[1]: Started LSB: Control drbd resources..
>> Sep 29 15:33:10 pcmk2 kernel: [  343.750704] block drbd0: Resync done
>> (total 1 sec; paused 0 sec; 0 K/sec)
>> Sep 29 15:33:10 pcmk2 kernel: [  343.750710] block drbd0: updated UUIDs
>> BFBB0CC0CCCEA511:0000000000000000:A750A07C42A73C35:A74FA07C42A73C35
>> Sep 29 15:33:10 pcmk2 kernel: [  343.750716] block drbd0: conn(
>> SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
>> ====
>>
>> Showing /proc/drbd;
>>
>> [root at pcmk1 ~]# cat /proc/drbd
>> ====
>> version: 8.4.3 (api:1/proto:86-101)
>> srcversion: 5CF35A4122BF8D21CC12AE2
>>  0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
>>     ns:0 nr:0 dw:0 dr:152 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
>> ====
>>
>> Now when I crash pcmk2 again, it gets fenced but DRBD thinks the fence
>> fails.
>>
>> ====
>> Sep 29 15:56:50 pcmk1 corosync[404]:  [TOTEM ] A processor failed,
>> forming new configuration.
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189045] d-con r0: PingAck did not
>> arrive in time.
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189086] d-con r0: peer( Primary ->
>> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown
>> ) susp( 0 -> 1 )
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189251] d-con r0: asender terminated
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189253] d-con r0: Terminating drbd_a_r0
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189317] d-con r0: Connection closed
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189499] d-con r0: conn(
>> NetworkFailure -> Unconnected )
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189501] d-con r0: receiver terminated
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189502] d-con r0: Restarting
>> receiver thread
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189503] d-con r0: receiver (re)started
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189508] d-con r0: conn( Unconnected
>> -> WFConnection )
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.189523] d-con r0: helper command:
>> /sbin/drbdadm fence-peer r0
>> Sep 29 15:56:50 pcmk1 crm-fence-peer.sh[539]: invoked for r0
>> Sep 29 15:56:50 pcmk1 crm-fence-peer.sh[539]: WARNING drbd-fencing could
>> not determine the master id of drbd resource r0
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.280209] d-con r0: helper command:
>> /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
>> Sep 29 15:56:50 pcmk1 kernel: [ 2351.280213] d-con r0: fence-peer helper
>> broken, returned 1
>> Sep 29 15:56:51 pcmk1 corosync[404]:  [TOTEM ] A new membership
>> (192.168.122.201:60) was formed. Members left: 2
>> Sep 29 15:56:51 pcmk1 crmd[422]:  warning: match_down_event: No match
>> for shutdown action on 2
>> Sep 29 15:56:51 pcmk1 crmd[422]:   notice: peer_update_callback:
>> Stonith/shutdown of pcmk2.alteeve.ca not matched
>> Sep 29 15:56:51 pcmk1 crmd[422]:   notice: do_state_transition: State
>> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>> Sep 29 15:56:51 pcmk1 corosync[404]:  [QUORUM] This node is within the
>> non-primary component and will NOT provide any services.
>> Sep 29 15:56:51 pcmk1 corosync[404]:  [QUORUM] Members[1]: 1
>> Sep 29 15:56:51 pcmk1 pengine[421]:   notice: unpack_config: On loss of
>> CCM Quorum: Ignore
>> Sep 29 15:56:51 pcmk1 pengine[421]:  warning: pe_fence_node: Node
>> pcmk2.alteeve.ca will be fenced because our peer process is no longer
>> available
>> Sep 29 15:56:51 pcmk1 pengine[421]:  warning: determine_online_status:
>> Node pcmk2.alteeve.ca is unclean
>> Sep 29 15:56:51 pcmk1 pengine[421]:  warning: stage6: Scheduling Node
>> pcmk2.alteeve.ca for STONITH
>> Sep 29 15:56:51 pcmk1 pengine[421]:   notice: LogActions: Move
>> fence_pcmk2_xvm#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca)
>> Sep 29 15:56:51 pcmk1 pengine[421]:  warning: process_pe_message:
>> Calculated Transition 6: /var/lib/pacemaker/pengine/pe-warn-3.bz2
>> Sep 29 15:56:51 pcmk1 corosync[404]:  [MAIN  ] Completed service
>> synchronization, ready to provide service.
>> Sep 29 15:56:51 pcmk1 crmd[422]:   notice: pcmk_quorum_notification:
>> Membership 60: quorum lost (1)
>> Sep 29 15:56:51 pcmk1 crmd[422]:   notice: crm_update_peer_state:
>> pcmk_quorum_notification: Node pcmk2.alteeve.ca[2] - state is now lost
>> (was member)
>> Sep 29 15:56:51 pcmk1 crmd[422]:  warning: match_down_event: No match
>> for shutdown action on 2
>> Sep 29 15:56:51 pcmk1 crmd[422]:   notice: peer_update_callback:
>> Stonith/shutdown of pcmk2.alteeve.ca not matched
>> Sep 29 15:56:52 pcmk1 pengine[421]:   notice: unpack_config: On loss of
>> CCM Quorum: Ignore
>> Sep 29 15:56:52 pcmk1 pengine[421]:  warning: pe_fence_node: Node
>> pcmk2.alteeve.ca will be fenced because the node is no longer part of
>> the cluster
>> Sep 29 15:56:52 pcmk1 pengine[421]:  warning: determine_online_status:
>> Node pcmk2.alteeve.ca is unclean
>> Sep 29 15:56:52 pcmk1 pengine[421]:  warning: custom_action: Action
>> fence_pcmk2_xvm_stop_0 on pcmk2.alteeve.ca is unrunnable (offline)
>> Sep 29 15:56:52 pcmk1 pengine[421]:  warning: stage6: Scheduling Node
>> pcmk2.alteeve.ca for STONITH
>> Sep 29 15:56:52 pcmk1 pengine[421]:   notice: LogActions: Move
>> fence_pcmk2_xvm#011(Started pcmk2.alteeve.ca -> pcmk1.alteeve.ca)
>> Sep 29 15:56:52 pcmk1 pengine[421]:  warning: process_pe_message:
>> Calculated Transition 7: /var/lib/pacemaker/pengine/pe-warn-4.bz2
>> Sep 29 15:56:52 pcmk1 crmd[422]:   notice: te_fence_node: Executing
>> reboot fencing operation (9) on pcmk2.alteeve.ca (timeout=60000)
>> Sep 29 15:56:52 pcmk1 stonith-ng[418]:   notice: handle_request: Client
>> crmd.422.e5752159 wants to fence (reboot) 'pcmk2.alteeve.ca' with device
>> '(any)'
>> Sep 29 15:56:52 pcmk1 stonith-ng[418]:   notice:
>> initiate_remote_stonith_op: Initiating remote operation reboot for
>> pcmk2.alteeve.ca: f09f32c4-e726-464b-9065-56346f7c047c (0)
>> Sep 29 15:56:52 pcmk1 stonith-ng[418]:    error: remote_op_done:
>> Operation reboot of pcmk2.alteeve.ca by pcmk1.alteeve.ca for
>> crmd.422 at pcmk1.alteeve.ca.f09f32c4: No such device
>> Sep 29 15:56:52 pcmk1 crmd[422]:   notice: tengine_stonith_callback:
>> Stonith operation 3/9:7:0:b91ef09d-b521-4e52-a3fb-bbcf1a0f6a37: No such
>> device (-19)
>> Sep 29 15:56:52 pcmk1 crmd[422]:   notice: tengine_stonith_callback:
>> Stonith operation 3 for pcmk2.alteeve.ca failed (No such device):
>> aborting transition.
>> Sep 29 15:56:52 pcmk1 crmd[422]:   notice: tengine_stonith_notify: Peer
>> pcmk2.alteeve.ca was not terminated (reboot) by pcmk1.alteeve.ca for
>> pcmk1.alteeve.ca: No such device
>> (ref=f09f32c4-e726-464b-9065-56346f7c047c) by client crmd.422
>> Sep 29 15:56:52 pcmk1 crmd[422]:   notice: run_graph: Transition 7
>> (Complete=1, Pending=0, Fired=0, Skipped=4, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-warn-4.bz2): Stopped
>> Sep 29 15:56:52 pcmk1 crmd[422]:   notice: too_many_st_failures: No
>> devices found in cluster to fence pcmk2.alteeve.ca, giving up
>> Sep 29 15:56:52 pcmk1 crmd[422]:   notice: do_state_transition: State
>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> ====
>>
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?



More information about the drbd-user mailing list