Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi All, Have a setup where 2 nodes are synchronized by DRBD and maintained by COROSYNC. Here DRBD works as Primary/Primary mode. Server1 and Server2 As I shutdown Server1, the Server2 wait for sometime and restarts by itself. Where as the other way if Server2 is shutdown when Server1 online then nothing happens to Server1. Please find the log attached and the versions used for the same. I do understand the fencing creating issue, but is there any way to disable the same. -- Regards, Kamal Kishore B V -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150417/1c8a119f/attachment.htm> -------------- next part -------------- root at server2:~# cat /etc/issue && dpkg -l | egrep "corosync|pacemaker|drbd|xen|ocfs2" Ubuntu 12.04 LTS \n \l ii corosync 1.4.2-2ubuntu0.1 Standards-based cluster framework (daemon and modules) hi drbd8-utils 2:8.3.11-0ubuntu1 RAID 1 over tcp/ip for Linux utilities ii libxen-4.1 4.1.5-0ubuntu0.12.04.3 Public libs for Xen ii libxenstore3.0 4.1.5-0ubuntu0.12.04.3 Xenstore communications library for Xen ii ocfs2-tools 1.6.3-4ubuntu1 tools for managing OCFS2 cluster filesystems ii pacemaker 1.1.6-2ubuntu3.2 HA cluster resource manager ii xen-hypervisor-4.1-amd64 4.1.5-0ubuntu0.12.04.3 Xen Hypervisor on AMD64 ii xen-utils-4.1 4.1.5-0ubuntu0.12.04.3 XEN administrative tools ii xen-utils-common 4.1.2-1ubuntu1 XEN administrative tools - common files ii xenstore-utils 4.1.5-0ubuntu0.12.04.3 Xenstore utilities for Xen cat /etc/issue && dpkg -l | egrep "bridge-utils" bridge-utils 1.5-2ubuntu7 Utilities for configuring the Linux Ethernet bridge cat /etc/issue && dpkg -l | egrep "ssh" Ubuntu 12.04 LTS \n \l ii libssh-4 0.5.2-1 tiny C SSH library ii openssh-client 1:5.9p1-5ubuntu1.4 secure shell (SSH) client, for secure access to remote machines ii openssh-server 1:5.9p1-5ubuntu1.4 secure shell (SSH) server, for secure access from remote machines ii ssh-askpass-gnome 1:5.9p1-5ubuntu1 interactive X program to prompt users for a passphrase for ssh-add ii ssh-import-id 2.10-0ubuntu1 securely retrieve an SSH public key and install it locally cat /etc/issue && dpkg -l | egrep "vncviewer" Ubuntu 12.04 LTS \n \l ii xtightvncviewer 1.3.9-6.2ubuntu2 virtual network computing client software for X -------------- next part -------------- Apr 18 02:49:03 server2 crmd: [1181]: WARN: check_dead_member: Our DC node (server1) left the cluster Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ] Apr 18 02:49:03 server2 crmd: [1181]: info: update_dc: Unset DC server1 Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Apr 18 02:49:03 server2 crmd: [1181]: info: do_te_control: Registering TE UUID: 11058a2e-0c23-4f0d-a911-65a5eb1b74a0 Apr 18 02:49:03 server2 crmd: [1181]: info: set_graph_functions: Setting custom graph functions Apr 18 02:49:03 server2 crmd: [1181]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_takeover: Taking over DC status for this partition Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_readwrite: We are now in R/W mode Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/17, version=0.116.8): ok (rc=0) Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/18, version=0.116.9): ok (rc=0) Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/20, version=0.116.10): ok (rc=0) Apr 18 02:49:03 server2 crmd: [1181]: info: join_make_offer: Making join offers based on membership 156200 Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/22, version=0.116.11): ok (rc=0) Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks Apr 18 02:49:03 server2 crmd: [1181]: info: ais_dispatch_message: Membership 156200: quorum still lost Apr 18 02:49:03 server2 crmd: [1181]: info: crmd_ais_dispatch: Setting expected votes to 2 Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/25, version=0.116.12): ok (rc=0) Apr 18 02:49:03 server2 crmd: [1181]: info: update_dc: Set DC to server2 (3.0.5) Apr 18 02:49:03 server2 crmd: [1181]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms Apr 18 02:49:03 server2 crmd: [1181]: info: config_query_callback: Checking for expired actions every 900000ms Apr 18 02:49:03 server2 crmd: [1181]: info: config_query_callback: Sending expected-votes=2 to corosync Apr 18 02:49:03 server2 crmd: [1181]: info: ais_dispatch_message: Membership 156200: quorum still lost Apr 18 02:49:03 server2 crmd: [1181]: info: crmd_ais_dispatch: Setting expected votes to 2 Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/28, version=0.116.13): ok (rc=0) Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: All 1 cluster nodes responded to the join offer. Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_join_finalize: join-1: Syncing the CIB from server2 to the rest of the cluster Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/29, version=0.116.13): ok (rc=0) Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/30, version=0.116.14): ok (rc=0) Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_join_ack: join-1: Updating node state to member for server2 Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='server2']/lrm (origin=local/crmd/31, version=0.116.15): ok (rc=0) Apr 18 02:49:03 server2 crmd: [1181]: info: erase_xpath_callback: Deletion of "//node_state[@uname='server2']/lrm": ok (rc=0) Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ] Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date Apr 18 02:49:03 server2 crmd: [1181]: info: crm_update_quorum: Updating quorum status to false (call=35) Apr 18 02:49:03 server2 crmd: [1181]: info: abort_transition_graph: do_te_invoke:167 - Triggered transition abort (complete=1) : Peer Cancelled Apr 18 02:49:03 server2 crmd: [1181]: info: do_pe_invoke: Query 36: Requesting the current CIB: S_POLICY_ENGINE Apr 18 02:49:03 server2 attrd: [1179]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Apr 18 02:49:03 server2 attrd: [1179]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/33, version=0.116.17): ok (rc=0) Apr 18 02:49:03 server2 crmd: [1181]: WARN: match_down_event: No match for shutdown action on server1 Apr 18 02:49:03 server2 crmd: [1181]: info: te_update_diff: Stonith/shutdown of server1 not matched Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/35, version=0.116.19): ok (rc=0) Apr 18 02:49:03 server2 crmd: [1181]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=server1, magic=NA, cib=0.116.18) : Node failure Apr 18 02:49:03 server2 crmd: [1181]: info: do_pe_invoke: Query 37: Requesting the current CIB: S_POLICY_ENGINE Apr 18 02:49:03 server2 crmd: [1181]: info: do_pe_invoke_callback: Invoking the PE: query=37, ref=pe_calc-dc-1429305543-16, seq=156200, quorate=0 Apr 18 02:49:03 server2 attrd: [1179]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-resDRBDr1:0 (10000) Apr 18 02:49:03 server2 pengine: [1180]: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 18 02:49:03 server2 pengine: [1180]: notice: RecurringOp: Start recurring monitor (20s) for resXen1 on server2 Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Start resXen1#011(server2) Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Leave resDRBDr1:0#011(Master server2) Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Leave resDRBDr1:1#011(Stopped) Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Leave resOCFS2r1:0#011(Started server2) Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Leave resOCFS2r1:1#011(Stopped) Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Apr 18 02:49:03 server2 crmd: [1181]: info: unpack_graph: Unpacked transition 0: 2 actions in 2 synapses Apr 18 02:49:03 server2 crmd: [1181]: info: do_te_invoke: Processing graph 0 (ref=pe_calc-dc-1429305543-16) derived from /var/lib/pengine/pe-input-1545.bz2 Apr 18 02:49:03 server2 crmd: [1181]: info: te_rsc_command: Initiating action 6: start resXen1_start_0 on server2 (local) Apr 18 02:49:03 server2 crmd: [1181]: info: do_lrm_rsc_op: Performing key=6:0:0:11058a2e-0c23-4f0d-a911-65a5eb1b74a0 op=resXen1_start_0 ) Apr 18 02:49:03 server2 lrmd: [1178]: info: rsc:resXen1 start[14] (pid 5929) Apr 18 02:49:03 server2 pengine: [1180]: notice: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/pengine/pe-input-1545.bz2 Apr 18 02:49:05 server2 kernel: [ 839.612059] o2net: Connection to node server1 (num 0) at 192.168.0.91:7777 has been idle for 10.12 secs, shutting it down. Apr 18 02:49:05 server2 kernel: [ 839.612093] o2net: No longer connected to node server1 (num 0) at 192.168.0.91:7777 Apr 18 02:49:05 server2 kernel: [ 839.612126] (xend,5992,1):dlm_send_remote_convert_request:395 ERROR: Error -112 when sending message 504 (key 0x2d50ec47) to node 0 Apr 18 02:49:05 server2 kernel: [ 839.612134] o2dlm: Waiting on the death of node 0 in domain 89D0A7DA3B9B43EDB575466F176F7A0C Apr 18 02:49:15 server2 kernel: [ 849.628073] o2net: No connection established with node 0 after 10.0 seconds, giving up. Apr 18 02:49:16 server2 kernel: [ 851.344088] block drbd0: PingAck did not arrive in time. Apr 18 02:49:16 server2 kernel: [ 851.344101] block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) Apr 18 02:49:16 server2 kernel: [ 851.352284] block drbd0: asender terminated Apr 18 02:49:16 server2 kernel: [ 851.352292] block drbd0: Terminating drbd0_asender Apr 18 02:49:16 server2 kernel: [ 851.352361] block drbd0: Connection closed Apr 18 02:49:16 server2 kernel: [ 851.352405] block drbd0: conn( NetworkFailure -> Unconnected ) Apr 18 02:49:16 server2 kernel: [ 851.352409] block drbd0: receiver terminated Apr 18 02:49:16 server2 kernel: [ 851.352411] block drbd0: Restarting drbd0_receiver Apr 18 02:49:16 server2 kernel: [ 851.352413] block drbd0: receiver (re)started Apr 18 02:49:16 server2 kernel: [ 851.352418] block drbd0: conn( Unconnected -> WFConnection ) Apr 18 02:49:16 server2 kernel: [ 851.352496] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 Apr 18 02:49:16 server2 crm-fence-peer.sh[6078]: invoked for r0 Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: - <cib admin_epoch="0" epoch="116" num_updates="21" /> Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <cib epoch="117" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.5" update-origin="server2" update-client="cibadmin" cib-last-written="Sat Apr 18 02:37:19 2015" have-quorum="0" dc-uuid="server2" > Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <configuration > Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <constraints > Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <rsc_location rsc="msDRBDr1" id="drbd-fence-by-handler-msDRBDr1" __crm_diff_marker__="added:top" > Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-rule-msDRBDr1" > Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <expression attribute="#uname" operation="ne" value="server2" id="drbd-fence-by-handler-expr-msDRBDr1" /> Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </rule> Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </rsc_location> Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </constraints> Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </configuration> Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </cib> Apr 18 02:49:17 server2 crmd: [1181]: info: abort_transition_graph: te_update_diff:124 - Triggered transition abort (complete=0, tag=diff, id=(null), magic=NA, cib=0.117.1) : Non-status change Apr 18 02:49:17 server2 cib: [1177]: info: cib_process_request: Operation complete: op cib_create for section constraints (origin=local/cibadmin/2, version=0.117.1): ok (rc=0) Apr 18 02:49:17 server2 crmd: [1181]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000 Apr 18 02:49:17 server2 crmd: [1181]: info: update_abort_priority: Abort action done superceeded by restart Apr 18 02:49:17 server2 crm-fence-peer.sh[6078]: INFO peer is reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-msDRBDr1' Apr 18 02:49:17 server2 kernel: [ 852.446543] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 4 (0x400) Apr 18 02:49:17 server2 kernel: [ 852.446547] block drbd0: fence-peer helper returned 4 (peer was fenced) Apr 18 02:49:17 server2 kernel: [ 852.446553] block drbd0: pdsk( DUnknown -> Outdated ) Apr 18 02:49:17 server2 kernel: [ 852.446586] block drbd0: new current UUID 1F5EB0210CC2AB1B:A158E0E3B5B108B9:EFB663F37532A93A:EFB563F37532A93B Apr 18 02:49:17 server2 kernel: [ 852.492924] block drbd0: susp( 1 -> 0 ) Apr 18 02:49:25 server2 kernel: [ 859.644076] o2net: No connection established with node 0 after 10.0 seconds, giving up. Apr 18 02:49:25 server2 kernel: [ 859.644112] (xend,5992,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x2d50ec47) to node 0 Apr 18 02:49:25 server2 kernel: [ 859.644120] o2dlm: Waiting on the death of node 0 in domain 89D0A7DA3B9B43EDB575466F176F7A0C Apr 18 02:49:30 server2 kernel: [ 865.452111] o2net: Connection to node server1 (num 0) at 192.168.0.91:7777 shutdown, state 7 Apr 18 02:49:33 server2 kernel: [ 868.452122] o2net: Connection to node server1 (num 0) at 192.168.0.91:7777 shutdown, state 7 Apr 18 02:49:35 server2 kernel: [ 869.660076] o2net: No connection established with node 0 after 10.0 seconds, giving up. Apr 18 02:49:35 server2 kernel: [ 869.660116] (xend,5992,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x2d50ec47) to node 0 Apr 18 02:49:35 server2 kernel: [ 869.660123] o2dlm: Waiting on the death of node 0 in domain 89D0A7DA3B9B43EDB575466F176F7A0C