Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I've updated all drbd packages to the latest versions: MDA1PFP-S01 11:52:35 2551 0 ~ # yum list "*drbd*" Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager Installed Packages drbd.x86_64 8.9.8-1.el7 @/drbd-8.9.8-1.el7.x86_64 drbd-bash-completion.x86_64 8.9.8-1.el7 @/drbd-bash-completion-8.9.8-1.el7.x86_64 drbd-heartbeat.x86_64 8.9.8-1.el7 @/drbd-heartbeat-8.9.8-1.el7.x86_64 drbd-pacemaker.x86_64 8.9.8-1.el7 @/drbd-pacemaker-8.9.8-1.el7.x86_64 drbd-udev.x86_64 8.9.8-1.el7 @/drbd-udev-8.9.8-1.el7.x86_64 drbd-utils.x86_64 8.9.8-1.el7 installed drbd-xen.x86_64 8.9.8-1.el7 @/drbd-xen-8.9.8-1.el7.x86_64 kmod-drbd.x86_64 9.0.4_3.10.0_327.28.3-1.el7 @/kmod-drbd-9.0.4_3.10.0_327.28.3-1.el7.x86_64 but this did not fix the problem. The cluster starts fine, but when I stop the node with the DRBD master the resource is not promoted on the other node. Here is the test I am conducting: 1. start cluster MDA1PFP-S01 12:07:00 2566 0 ~ # pcs status Cluster name: MDA1PFP Last updated: Tue Sep 20 12:07:24 2016 Last change: Tue Sep 20 12:06:49 2016 by root via cibadmin on MDA1PFP-PCS02 Stack: corosync Current DC: MDA1PFP-PCS01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 6 resources configured Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] Full list of resources: mda-ip (ocf::heartbeat:IPaddr2): Started MDA1PFP-PCS01 Clone Set: ping-clone [ping] Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] ACTIVE (ocf::heartbeat:Dummy): Started MDA1PFP-PCS01 Master/Slave Set: drbd1_sync [drbd1] Masters: [ MDA1PFP-PCS01 ] Slaves: [ MDA1PFP-PCS02 ] PCSD Status: MDA1PFP-PCS01: Online MDA1PFP-PCS02: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled MDA1PFP-S01 12:06:31 2565 0 ~ # drbd-overview 1:shared_fs/0 Connected Primary/Secondary UpToDate/UpToDate 2. stop active cluster node MDA1PFP-S02 12:08:00 1295 0 ~ # pcs status Cluster name: MDA1PFP Last updated: Tue Sep 20 12:08:17 2016 Last change: Tue Sep 20 12:08:04 2016 by root via cibadmin on MDA1PFP-PCS02 Stack: corosync Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 6 resources configured Online: [ MDA1PFP-PCS02 ] OFFLINE: [ MDA1PFP-PCS01 ] Full list of resources: mda-ip (ocf::heartbeat:IPaddr2): Started MDA1PFP-PCS02 Clone Set: ping-clone [ping] Started: [ MDA1PFP-PCS02 ] Stopped: [ MDA1PFP-PCS01 ] ACTIVE (ocf::heartbeat:Dummy): Started MDA1PFP-PCS02 Master/Slave Set: drbd1_sync [drbd1] Slaves: [ MDA1PFP-PCS02 ] Stopped: [ MDA1PFP-PCS01 ] PCSD Status: MDA1PFP-PCS01: Online MDA1PFP-PCS02: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled In the log files I can see that the node actually gets promoted to master, but then gets demoted immediately, but I don't see the reason for doing this: Sep 20 12:08:00 MDA1PFP-S02 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="3224" x-info="http://www.rsyslog.com"] start Sep 20 12:08:00 MDA1PFP-S02 rsyslogd-2221: module 'imuxsock' already in this config, cannot be added [try http://www.rsyslog.com/e/2221 ] Sep 20 12:08:00 MDA1PFP-S02 systemd: Stopping System Logging Service... Sep 20 12:08:00 MDA1PFP-S02 systemd: Starting System Logging Service... Sep 20 12:08:00 MDA1PFP-S02 systemd: Started System Logging Service. Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]: notice: Operation ACTIVE_start_0: ok (node=MDA1PFP-PCS02, call=29, rc=0, cib-update=21, confirmed=true) Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok (node=MDA1PFP-PCS02, call=28, rc=0, cib-update=0, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: peer( Primary -> Secondary ) Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Adding inet address 192.168.120.20/32 with broadcast address 192.168.120.255 to device bond0 Sep 20 12:08:04 MDA1PFP-S02 avahi-daemon[1084]: Registering new address record for 192.168.120.20 on bond0.IPv4. Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Bringing device bond0 up Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.120.20 bond0 192.168.120.20 auto not_used not_used Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation mda-ip_start_0: ok (node=MDA1PFP-PCS02, call=31, rc=0, cib-update=23, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok (node=MDA1PFP-PCS02, call=32, rc=0, cib-update=0, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok (node=MDA1PFP-PCS02, call=34, rc=0, cib-update=0, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: ack_receiver terminated Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Terminating drbd_a_shared_f Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Connection closed Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( TearDown -> Unconnected ) Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver terminated Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Restarting receiver thread Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver (re)started Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( Unconnected -> WFConnection ) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok (node=MDA1PFP-PCS02, call=35, rc=0, cib-update=0, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok (node=MDA1PFP-PCS02, call=36, rc=0, cib-update=0, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: /sbin/drbdadm fence-peer shared_fs Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: invoked for shared_fs Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: INFO peer is not reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-shared_fs-drbd1_sync' Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: /sbin/drbdadm fence-peer shared_fs exit code 5 (0x500) Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: fence-peer helper returned 5 (peer is unreachable, assumed to be dead) Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: pdsk( DUnknown -> Outdated ) Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: role( Secondary -> Primary ) Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: new current UUID 098EF9936C4F4D27:5157BB476E60F5AA:6BC19D97CF96E5D2:6BC09D97CF96E5D2 Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: error: pcmkRegisterNode: Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_promote_0: ok (node=MDA1PFP-PCS02, call=37, rc=0, cib-update=25, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok (node=MDA1PFP-PCS02, call=38, rc=0, cib-update=0, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Our peer on the DC (MDA1PFP-PCS01) is dead Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK origin=peer_update_callback ] Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ] Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: crm_update_peer_proc: Node MDA1PFP-PCS01[1] - state is now lost (was member) Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: Removing all MDA1PFP-PCS01 attributes for attrd_peer_change_cb Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: Lost attribute writer MDA1PFP-PCS01 Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: Removing MDA1PFP-PCS01/1 from the membership list Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: Purged 1 peers with id=1 and/or uname=MDA1PFP-PCS01 from the membership cache Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]: notice: crm_update_peer_proc: Node MDA1PFP-PCS01[1] - state is now lost (was member) Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]: notice: Removing MDA1PFP-PCS01/1 from the membership list Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]: notice: Purged 1 peers with id=1 and/or uname=MDA1PFP-PCS01 from the membership cache Sep 20 12:08:04 MDA1PFP-S02 cib[2348]: notice: crm_update_peer_proc: Node MDA1PFP-PCS01[1] - state is now lost (was member) Sep 20 12:08:04 MDA1PFP-S02 cib[2348]: notice: Removing MDA1PFP-PCS01/1 from the membership list Sep 20 12:08:04 MDA1PFP-S02 cib[2348]: notice: Purged 1 peers with id=1 and/or uname=MDA1PFP-PCS01 from the membership cache Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: warning: FSA: Input I_ELECTION_DC from do_election_check() received in state S_INTEGRATION Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Notifications disabled Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: error: pcmkRegisterNode: Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]: notice: On loss of CCM Quorum: Ignore Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]: notice: Demote drbd1:0 (Master -> Slave MDA1PFP-PCS02) Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]: notice: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-1813.bz2 Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Initiating action 55: notify drbd1_pre_notify_demote_0 on MDA1PFP-PCS02 (local) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok (node=MDA1PFP-PCS02, call=39, rc=0, cib-update=0, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Initiating action 18: demote drbd1_demote_0 on MDA1PFP-PCS02 (local) Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: role( Primary -> Secondary ) Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: bitmap WRITE of 0 pages took 0 jiffies Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Sep 20 12:08:04 MDA1PFP-S02 systemd-udevd: error: /dev/drbd1: Wrong medium type Sep 20 12:08:04 MDA1PFP-S02 systemd-udevd: error: /dev/drbd1: Wrong medium type Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: error: pcmkRegisterNode: Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_demote_0: ok (node=MDA1PFP-PCS02, call=40, rc=0, cib-update=48, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Initiating action 56: notify drbd1_post_notify_demote_0 on MDA1PFP-PCS02 (local) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok (node=MDA1PFP-PCS02, call=41, rc=0, cib-update=0, confirmed=true) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Initiating action 20: monitor drbd1_monitor_60000 on MDA1PFP-PCS02 (local) Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: error: pcmkRegisterNode: Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Transition 0 (Complete=10, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1813.bz2): Complete Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]: notice: crm_reap_unseen_nodes: Node MDA1PFP-PCS01[1] - state is now lost (was member) Sep 20 12:08:05 MDA1PFP-S02 pacemakerd[2335]: notice: crm_reap_unseen_nodes: Node MDA1PFP-PCS01[1] - state is now lost (was member) Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]: warning: No match for shutdown action on 1 Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]: notice: Stonith/shutdown of MDA1PFP-PCS01 not matched Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Sep 20 12:08:05 MDA1PFP-S02 corosync[2244]: [TOTEM ] A new membership (192.168.121.11:1452) was formed. Members left: 1 Best wishes, Jens -- Jens Auer | CGI | Software-Engineer CGI (Germany) GmbH & Co. KG Rheinstraße 95 | 64295 Darmstadt | Germany T: +49 6151 36860 154 jens.auer at cgi.com Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben. CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI Group Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply e-mail.