Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, I'm trying to get DRBD dual-primary working on pacemaker 1.1.10 on RHEL 7 (beta 1). It's mostly working, except for a really strange problem. When I start pacemaker/corosync, DRBD starts and promotes to primary on both nodes quickly and without issue. After that, if I disable the DRBD resource, both nodes stop drbd just fine. The problem is when I try to re-enable the DRBD resource... One of the nodes will invoke crm-fence-peer.sh, which in turn adds a constraint blocking DRBD from becoming primary on one of the nodes (seems to be random, it's done this to both nodes). This, of course, leads to the resource entering a FAILED state on one of the nodes. I tried adding: handlers { after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; }. With this in place, eventually (about 60 seconds later), crm-unfence-peer.sh was called and the constraint was removed. However, by then, the resource had already entered a failed state. Here is the current config: ==== [root at an-c03n01 ~]# drbdadm dump # /etc/drbd.conf global { usage-count yes; } common { net { protocol C; allow-two-primaries yes; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } disk { fencing resource-and-stonith; } handlers { fence-peer /usr/lib/drbd/crm-fence-peer.sh; after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; } } # resource r0 on an-c03n01.alteeve.ca: not ignored, not stacked # defined at /etc/drbd.d/r0.res:3 resource r0 { on an-c03n01.alteeve.ca { volume 0 { device /dev/drbd0 minor 0; disk /dev/vdb1; meta-disk internal; } address ipv4 10.10.30.1:7788; } on an-c03n02.alteeve.ca { volume 0 { device /dev/drbd0 minor 0; disk /dev/vdb1; meta-disk internal; } address ipv4 10.10.30.2:7788; } net { verify-alg md5; data-integrity-alg md5; } disk { disk-flushes no; md-flushes no; } } ==== I'll walk through the steps, showing the logs from both nodes as I go. First, I start the cluster: ==== [root at an-c03n01 ~]# pcs cluster start --all an-c03n01.alteeve.ca: Starting Cluster... an-c03n02.alteeve.ca: Starting Cluster... ==== [root at an-c03n02 ~]# pcs status Cluster name: an-cluster-03 Last updated: Mon Jan 27 20:26:38 2014 Last change: Mon Jan 27 20:25:06 2014 via crmd on an-c03n01.alteeve.ca Stack: corosync Current DC: an-c03n02.alteeve.ca (2) - partition with quorum Version: 1.1.10-19.el7-368c726 2 Nodes configured 4 Resources configured Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ] Full list of resources: fence_n01_virsh (stonith:fence_virsh): Started an-c03n01.alteeve.ca fence_n02_virsh (stonith:fence_virsh): Started an-c03n02.alteeve.ca Master/Slave Set: drbd_r0_Clone [drbd_r0] Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ] PCSD Status: an-c03n01.alteeve.ca: an-c03n01.alteeve.ca: Online an-c03n02.alteeve.ca: an-c03n02.alteeve.ca: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled ==== [root at an-c03n02 ~]# cat /proc/drbd version: 8.4.4 (api:1/proto:86-101) GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by root at an-c03n02.alteeve.ca, 2014-01-26 16:48:51 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:0 dw:0 dr:152 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 ==== Startup logs from an-c03n01: ==== Jan 27 20:26:09 an-c03n01 systemd: Starting Corosync Cluster Engine... Jan 27 20:26:09 an-c03n01 corosync[823]: [MAIN ] Corosync Cluster Engine ('2.3.2'): started and ready to provide service. Jan 27 20:26:09 an-c03n01 corosync[823]: [MAIN ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] Initializing transport (UDP/IP Unicast). Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] The network interface [10.20.30.1] is now up. Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded: corosync configuration map access [0] Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: cmap Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded: corosync configuration service [1] Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: cfg Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: cpg Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded: corosync profile loading service [4] Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Using quorum provider corosync_votequorum Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: votequorum Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: quorum Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] adding new UDPU member {10.20.30.1} Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] adding new UDPU member {10.20.30.2} Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] A new membership (10.20.30.1:200) was formed. Members joined: 1 Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Members[1]: 1 Jan 27 20:26:09 an-c03n01 corosync[824]: [MAIN ] Completed service synchronization, ready to provide service. Jan 27 20:26:10 an-c03n01 corosync[824]: [TOTEM ] A new membership (10.20.30.1:208) was formed. Members joined: 2 Jan 27 20:26:10 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Jan 27 20:26:10 an-c03n01 corosync[824]: [QUORUM] This node is within the primary component and will provide service. Jan 27 20:26:10 an-c03n01 corosync[824]: [QUORUM] Members[2]: 1 2 Jan 27 20:26:10 an-c03n01 corosync[824]: [MAIN ] Completed service synchronization, ready to provide service. Jan 27 20:26:10 an-c03n01 corosync: Starting Corosync Cluster Engine (corosync): [ OK ] Jan 27 20:26:10 an-c03n01 systemd: Started Corosync Cluster Engine. Jan 27 20:26:10 an-c03n01 systemd: Starting Pacemaker High Availability Cluster Manager... Jan 27 20:26:10 an-c03n01 systemd: Started Pacemaker High Availability Cluster Manager. Jan 27 20:26:10 an-c03n01 pacemakerd: Could not establish pacemakerd connection: Connection refused (111) Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: mcp_read_config: Configured corosync to accept connections from group 189: OK (1) Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: main: Starting Pacemaker 1.1.10-19.el7 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc upstart systemd nagios corosync-native Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: cluster_connect_quorum: Quorum acquired Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now member (was (null)) Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now member (was (null)) Jan 27 20:26:10 an-c03n01 attrd[843]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Jan 27 20:26:10 an-c03n01 crmd[845]: notice: main: CRM Git Version: 368c726 Jan 27 20:26:10 an-c03n01 stonith-ng[841]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Jan 27 20:26:10 an-c03n01 attrd[843]: notice: main: Starting mainloop... Jan 27 20:26:10 an-c03n01 cib[840]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Jan 27 20:26:11 an-c03n01 crmd[845]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Jan 27 20:26:11 an-c03n01 crmd[845]: notice: cluster_connect_quorum: Quorum acquired Jan 27 20:26:11 an-c03n01 stonith-ng[841]: notice: setup_cib: Watching for stonith topology changes Jan 27 20:26:11 an-c03n01 stonith-ng[841]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:26:11 an-c03n01 crmd[845]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now member (was (null)) Jan 27 20:26:11 an-c03n01 crmd[845]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now member (was (null)) Jan 27 20:26:11 an-c03n01 crmd[845]: notice: do_started: The local CRM is operational Jan 27 20:26:11 an-c03n01 crmd[845]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Jan 27 20:26:12 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Added 'fence_n01_virsh' to the device list (1 active devices) Jan 27 20:26:13 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Added 'fence_n02_virsh' to the device list (2 active devices) Jan 27 20:26:32 an-c03n01 crmd[845]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Jan 27 20:26:32 an-c03n01 crmd[845]: notice: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ] Jan 27 20:26:32 an-c03n01 crmd[845]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ] Jan 27 20:26:32 an-c03n01 attrd[843]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Jan 27 20:26:33 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_monitor_0 (call=14, rc=7, cib-update=11, confirmed=true) not running Jan 27 20:26:33 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Jan 27 20:26:33 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 4: probe_complete=true Jan 27 20:26:33 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 7: probe_complete=true Jan 27 20:26:34 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Device 'fence_n01_virsh' already existed in device list (2 active devices) Jan 27 20:26:34 an-c03n01 kernel: [19496.418912] drbd r0: Starting worker thread (from drbdsetup [946]) Jan 27 20:26:34 an-c03n01 kernel: [19496.419207] block drbd0: disk( Diskless -> Attaching ) Jan 27 20:26:34 an-c03n01 kernel: [19496.419268] drbd r0: Method to ensure write ordering: drain Jan 27 20:26:34 an-c03n01 kernel: [19496.419270] block drbd0: max BIO size = 1048576 Jan 27 20:26:34 an-c03n01 kernel: [19496.419273] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024) Jan 27 20:26:34 an-c03n01 kernel: [19496.419275] block drbd0: drbd_bm_resize called with capacity == 41937592 Jan 27 20:26:34 an-c03n01 kernel: [19496.419346] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160 Jan 27 20:26:34 an-c03n01 kernel: [19496.419348] block drbd0: size = 20 GB (20968796 KB) Jan 27 20:26:34 an-c03n01 kernel: [19496.420788] block drbd0: bitmap READ of 160 pages took 1 jiffies Jan 27 20:26:34 an-c03n01 kernel: [19496.420892] block drbd0: recounting of set bits took additional 0 jiffies Jan 27 20:26:34 an-c03n01 kernel: [19496.420895] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:26:34 an-c03n01 kernel: [19496.420900] block drbd0: disk( Attaching -> Consistent ) Jan 27 20:26:34 an-c03n01 kernel: [19496.420904] block drbd0: attached to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D Jan 27 20:26:34 an-c03n01 kernel: [19496.428933] drbd r0: conn( StandAlone -> Unconnected ) Jan 27 20:26:34 an-c03n01 kernel: [19496.428949] drbd r0: Starting receiver thread (from drbd_w_r0 [947]) Jan 27 20:26:34 an-c03n01 kernel: [19496.428970] drbd r0: receiver (re)started Jan 27 20:26:34 an-c03n01 kernel: [19496.428978] drbd r0: conn( Unconnected -> WFConnection ) Jan 27 20:26:34 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5) Jan 27 20:26:34 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 11: master-drbd_r0=5 Jan 27 20:26:34 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=16, rc=0, cib-update=12, confirmed=true) ok Jan 27 20:26:34 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=17, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:26:35 an-c03n01 kernel: [19496.930042] drbd r0: Handshake successful: Agreed network protocol version 101 Jan 27 20:26:35 an-c03n01 kernel: [19496.930046] drbd r0: Agreed to support TRIM on protocol level Jan 27 20:26:35 an-c03n01 kernel: [19496.930093] drbd r0: conn( WFConnection -> WFReportParams ) Jan 27 20:26:35 an-c03n01 kernel: [19496.930095] drbd r0: Starting asender thread (from drbd_r_r0 [956]) Jan 27 20:26:35 an-c03n01 kernel: [19496.937081] block drbd0: drbd_sync_handshake: Jan 27 20:26:35 an-c03n01 kernel: [19496.937086] block drbd0: self AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D bits:0 flags:0 Jan 27 20:26:35 an-c03n01 kernel: [19496.937088] block drbd0: peer AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D bits:0 flags:0 Jan 27 20:26:35 an-c03n01 kernel: [19496.937091] block drbd0: uuid_compare()=0 by rule 40 Jan 27 20:26:35 an-c03n01 kernel: [19496.937098] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate ) Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation fence_n01_virsh_start_0 (call=15, rc=0, cib-update=13, confirmed=true) ok Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=19, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:26:35 an-c03n01 kernel: [19497.258935] block drbd0: peer( Secondary -> Primary ) Jan 27 20:26:35 an-c03n01 kernel: [19497.262592] block drbd0: role( Secondary -> Primary ) Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=20, rc=0, cib-update=14, confirmed=true) ok Jan 27 20:26:35 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000) Jan 27 20:26:35 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 13: master-drbd_r0=10000 Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=21, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:26:35 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 15: master-drbd_r0=10000 Jan 27 20:26:36 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation fence_n01_virsh_monitor_60000 (call=18, rc=0, cib-update=15, confirmed=false) ok ==== Startup logs from an-c03n02: ==== Jan 27 20:26:09 an-c03n02 systemd: Starting Corosync Cluster Engine... Jan 27 20:26:09 an-c03n02 corosync[21111]: [MAIN ] Corosync Cluster Engine ('2.3.2'): started and ready to provide service. Jan 27 20:26:09 an-c03n02 corosync[21111]: [MAIN ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] Initializing transport (UDP/IP Unicast). Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] The network interface [10.20.30.2] is now up. Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine loaded: corosync configuration map access [0] Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: cmap Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine loaded: corosync configuration service [1] Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: cfg Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: cpg Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine loaded: corosync profile loading service [4] Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Using quorum provider corosync_votequorum Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: votequorum Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: quorum Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] adding new UDPU member {10.20.30.1} Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] adding new UDPU member {10.20.30.2} Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] A new membership (10.20.30.2:204) was formed. Members joined: 2 Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Members[1]: 2 Jan 27 20:26:09 an-c03n02 corosync[21112]: [MAIN ] Completed service synchronization, ready to provide service. Jan 27 20:26:10 an-c03n02 corosync[21112]: [TOTEM ] A new membership (10.20.30.1:208) was formed. Members joined: 1 Jan 27 20:26:10 an-c03n02 corosync[21112]: [QUORUM] This node is within the primary component and will provide service. Jan 27 20:26:10 an-c03n02 corosync[21112]: [QUORUM] Members[2]: 1 2 Jan 27 20:26:10 an-c03n02 corosync[21112]: [MAIN ] Completed service synchronization, ready to provide service. Jan 27 20:26:10 an-c03n02 corosync: Starting Corosync Cluster Engine (corosync): [ OK ] Jan 27 20:26:10 an-c03n02 systemd: Started Corosync Cluster Engine. Jan 27 20:26:10 an-c03n02 systemd: Starting Pacemaker High Availability Cluster Manager... Jan 27 20:26:10 an-c03n02 systemd: Started Pacemaker High Availability Cluster Manager. Jan 27 20:26:10 an-c03n02 pacemakerd: Could not establish pacemakerd connection: Connection refused (111) Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: mcp_read_config: Configured corosync to accept connections from group 189: OK (1) Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: main: Starting Pacemaker 1.1.10-19.el7 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc upstart systemd nagios corosync-native Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: cluster_connect_quorum: Quorum acquired Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now member (was (null)) Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now member (was (null)) Jan 27 20:26:10 an-c03n02 stonith-ng[21129]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Jan 27 20:26:10 an-c03n02 cib[21128]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Jan 27 20:26:10 an-c03n02 crmd[21133]: notice: main: CRM Git Version: 368c726 Jan 27 20:26:10 an-c03n02 attrd[21131]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Jan 27 20:26:10 an-c03n02 attrd[21131]: notice: main: Starting mainloop... Jan 27 20:26:11 an-c03n02 stonith-ng[21129]: notice: setup_cib: Watching for stonith topology changes Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Jan 27 20:26:11 an-c03n02 stonith-ng[21129]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: cluster_connect_quorum: Quorum acquired Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now member (was (null)) Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now member (was (null)) Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: do_started: The local CRM is operational Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Jan 27 20:26:12 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Added 'fence_n01_virsh' to the device list (1 active devices) Jan 27 20:26:13 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Added 'fence_n02_virsh' to the device list (2 active devices) Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Jan 27 20:26:32 an-c03n02 attrd[21131]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start fence_n01_virsh (an-c03n01.alteeve.ca) Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start fence_n02_virsh (an-c03n02.alteeve.ca) Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:0 (an-c03n01.alteeve.ca) Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:1 (an-c03n02.alteeve.ca) Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-164.bz2 Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 8: monitor fence_n01_virsh_monitor_0 on an-c03n02.alteeve.ca (local) Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 4: monitor fence_n01_virsh_monitor_0 on an-c03n01.alteeve.ca Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 9: monitor fence_n02_virsh_monitor_0 on an-c03n02.alteeve.ca (local) Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 5: monitor fence_n02_virsh_monitor_0 on an-c03n01.alteeve.ca Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 6: monitor drbd_r0:0_monitor_0 on an-c03n01.alteeve.ca Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 10: monitor drbd_r0:1_monitor_0 on an-c03n02.alteeve.ca (local) Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_monitor_0 (call=14, rc=7, cib-update=28, confirmed=true) not running Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 7: probe_complete probe_complete on an-c03n02.alteeve.ca (local) - no waiting Jan 27 20:26:33 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Jan 27 20:26:33 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 4: probe_complete=true Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on an-c03n01.alteeve.ca - no waiting Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 11: start fence_n01_virsh_start_0 on an-c03n01.alteeve.ca Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: start fence_n02_virsh_start_0 on an-c03n02.alteeve.ca (local) Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 15: start drbd_r0:0_start_0 on an-c03n01.alteeve.ca Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 17: start drbd_r0:1_start_0 on an-c03n02.alteeve.ca (local) Jan 27 20:26:34 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Device 'fence_n02_virsh' already existed in device list (2 active devices) Jan 27 20:26:34 an-c03n02 kernel: [ 4904.724683] drbd r0: Starting worker thread (from drbdsetup [21238]) Jan 27 20:26:34 an-c03n02 kernel: [ 4904.724970] block drbd0: disk( Diskless -> Attaching ) Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725081] drbd r0: Method to ensure write ordering: drain Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725084] block drbd0: max BIO size = 1048576 Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725087] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024) Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725090] block drbd0: drbd_bm_resize called with capacity == 41937592 Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725180] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160 Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725183] block drbd0: size = 20 GB (20968796 KB) Jan 27 20:26:34 an-c03n02 kernel: [ 4904.727769] block drbd0: bitmap READ of 160 pages took 2 jiffies Jan 27 20:26:34 an-c03n02 kernel: [ 4904.727981] block drbd0: recounting of set bits took additional 0 jiffies Jan 27 20:26:34 an-c03n02 kernel: [ 4904.727985] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:26:34 an-c03n02 kernel: [ 4904.728001] block drbd0: disk( Attaching -> Consistent ) Jan 27 20:26:34 an-c03n02 kernel: [ 4904.728013] block drbd0: attached to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738601] drbd r0: conn( StandAlone -> Unconnected ) Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738688] drbd r0: Starting receiver thread (from drbd_w_r0 [21239]) Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738709] drbd r0: receiver (re)started Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738721] drbd r0: conn( Unconnected -> WFConnection ) Jan 27 20:26:34 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5) Jan 27 20:26:34 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 9: master-drbd_r0=5 Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=16, rc=0, cib-update=29, confirmed=true) ok Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 48: notify drbd_r0:0_post_notify_start_0 on an-c03n01.alteeve.ca Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 49: notify drbd_r0:1_post_notify_start_0 on an-c03n02.alteeve.ca (local) Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=17, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294095] drbd r0: Handshake successful: Agreed network protocol version 101 Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294099] drbd r0: Agreed to support TRIM on protocol level Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294132] drbd r0: conn( WFConnection -> WFReportParams ) Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294134] drbd r0: Starting asender thread (from drbd_r_r0 [21248]) Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303108] block drbd0: drbd_sync_handshake: Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303112] block drbd0: self AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D bits:0 flags:0 Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303114] block drbd0: peer AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D bits:0 flags:0 Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303115] block drbd0: uuid_compare()=0 by rule 40 Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303120] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate ) Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation fence_n02_virsh_start_0 (call=15, rc=0, cib-update=30, confirmed=true) ok Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: run_graph: Transition 0 (Complete=21, Pending=0, Fired=0, Skipped=4, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-164.bz2): Stopped Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:0 (Slave -> Master an-c03n02.alteeve.ca) Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:1 (Slave -> Master an-c03n01.alteeve.ca) Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-165.bz2 Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 7: monitor fence_n01_virsh_monitor_60000 on an-c03n01.alteeve.ca Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 10: monitor fence_n02_virsh_monitor_60000 on an-c03n02.alteeve.ca (local) Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 52: notify drbd_r0_pre_notify_promote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 54: notify drbd_r0_pre_notify_promote_0 on an-c03n01.alteeve.ca Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=19, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: promote drbd_r0_promote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 16: promote drbd_r0_promote_0 on an-c03n01.alteeve.ca Jan 27 20:26:35 an-c03n02 kernel: [ 4905.623345] block drbd0: role( Secondary -> Primary ) Jan 27 20:26:35 an-c03n02 kernel: [ 4905.626560] block drbd0: peer( Secondary -> Primary ) Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=20, rc=0, cib-update=32, confirmed=true) ok Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 53: notify drbd_r0_post_notify_promote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_post_notify_promote_0 on an-c03n01.alteeve.ca Jan 27 20:26:35 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000) Jan 27 20:26:35 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 13: master-drbd_r0=10000 Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=21, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:26:35 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 15: master-drbd_r0=10000 Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation fence_n02_virsh_monitor_60000 (call=18, rc=0, cib-update=33, confirmed=false) ok Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: run_graph: Transition 1 (Complete=14, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-165.bz2): Complete Jan 27 20:26:36 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:26:36 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-166.bz2 Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: run_graph: Transition 2 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-166.bz2): Complete Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] ==== So everything looks good. So now I'll disable the DRBD resource: ==== [root at an-c03n01 ~]# pcs resource disable drbd_r0_Clone [root at an-c03n01 ~]# pcs constraint Location Constraints: Ordering Constraints: Colocation Constraints: ==== [root at an-c03n02 ~]# pcs status Cluster name: an-cluster-03 Last updated: Mon Jan 27 20:29:23 2014 Last change: Mon Jan 27 20:29:10 2014 via crm_resource on an-c03n01.alteeve.ca Stack: corosync Current DC: an-c03n02.alteeve.ca (2) - partition with quorum Version: 1.1.10-19.el7-368c726 2 Nodes configured 4 Resources configured Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ] Full list of resources: fence_n01_virsh (stonith:fence_virsh): Started an-c03n01.alteeve.ca fence_n02_virsh (stonith:fence_virsh): Started an-c03n02.alteeve.ca Master/Slave Set: drbd_r0_Clone [drbd_r0] Stopped: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ] PCSD Status: an-c03n01.alteeve.ca: an-c03n01.alteeve.ca: Online an-c03n02.alteeve.ca: an-c03n02.alteeve.ca: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled ==== Disable logs from an-c03n01: ==== Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=22, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:29:10 an-c03n01 kernel: [19652.354342] block drbd0: role( Primary -> Secondary ) Jan 27 20:29:10 an-c03n01 kernel: [19652.354362] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Jan 27 20:29:10 an-c03n01 kernel: [19652.354364] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_demote_0 (call=23, rc=0, cib-update=16, confirmed=true) ok Jan 27 20:29:10 an-c03n01 kernel: [19652.363096] block drbd0: peer( Primary -> Secondary ) Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=24, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=25, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:29:10 an-c03n01 kernel: [19652.471517] drbd r0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) Jan 27 20:29:10 an-c03n01 kernel: [19652.471539] drbd r0: asender terminated Jan 27 20:29:10 an-c03n01 kernel: [19652.471542] drbd r0: Terminating drbd_a_r0 Jan 27 20:29:10 an-c03n01 kernel: [19652.472011] drbd r0: conn( TearDown -> Disconnecting ) Jan 27 20:29:10 an-c03n01 kernel: [19652.472332] drbd r0: Connection closed Jan 27 20:29:10 an-c03n01 kernel: [19652.472339] drbd r0: conn( Disconnecting -> StandAlone ) Jan 27 20:29:10 an-c03n01 kernel: [19652.472340] drbd r0: receiver terminated Jan 27 20:29:10 an-c03n01 kernel: [19652.472351] drbd r0: Terminating drbd_r_r0 Jan 27 20:29:10 an-c03n01 kernel: [19652.472377] block drbd0: disk( UpToDate -> Failed ) Jan 27 20:29:10 an-c03n01 kernel: [19652.482181] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Jan 27 20:29:10 an-c03n01 kernel: [19652.482186] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:29:10 an-c03n01 kernel: [19652.482208] block drbd0: disk( Failed -> Diskless ) Jan 27 20:29:10 an-c03n01 kernel: [19652.482288] block drbd0: drbd_bm_resize called with capacity == 0 Jan 27 20:29:10 an-c03n01 kernel: [19652.482327] drbd r0: Terminating drbd_w_r0 Jan 27 20:29:10 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (<null>) Jan 27 20:29:10 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent delete 17: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_stop_0 (call=26, rc=0, cib-update=17, confirmed=true) ok Jan 27 20:29:10 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent delete 19: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status ==== Disable logs from an-c03n02: ==== Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.139.23 Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++ 0.140.1 ae30c6348ea7b6da2cce70635f3b0a29 Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: -- <cib admin_epoch="0" epoch="139" num_updates="23"/> Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: ++ <nvpair id="drbd_r0_Clone-meta_attributes-target-role" name="target-role" value="Stopped"/> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: LogActions: Demote drbd_r0:0 (Master -> Stopped an-c03n02.alteeve.ca) Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: LogActions: Demote drbd_r0:1 (Master -> Stopped an-c03n01.alteeve.ca) Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-167.bz2 Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 46: notify drbd_r0_pre_notify_demote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 48: notify drbd_r0_pre_notify_demote_0 on an-c03n01.alteeve.ca Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=22, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 11: demote drbd_r0_demote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: demote drbd_r0_demote_0 on an-c03n01.alteeve.ca Jan 27 20:29:10 an-c03n02 kernel: [ 5060.718998] block drbd0: role( Primary -> Secondary ) Jan 27 20:29:10 an-c03n02 kernel: [ 5060.719041] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Jan 27 20:29:10 an-c03n02 kernel: [ 5060.719043] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:29:10 an-c03n02 kernel: [ 5060.727041] block drbd0: peer( Primary -> Secondary ) Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_demote_0 (call=23, rc=0, cib-update=36, confirmed=true) ok Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 47: notify drbd_r0_post_notify_demote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 49: notify drbd_r0_post_notify_demote_0 on an-c03n01.alteeve.ca Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=24, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 44: notify drbd_r0_pre_notify_stop_0 on an-c03n02.alteeve.ca (local) Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 45: notify drbd_r0_pre_notify_stop_0 on an-c03n01.alteeve.ca Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=25, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 12: stop drbd_r0_stop_0 on an-c03n02.alteeve.ca (local) Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 14: stop drbd_r0_stop_0 on an-c03n01.alteeve.ca Jan 27 20:29:10 an-c03n02 kernel: [ 5060.835968] drbd r0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Jan 27 20:29:10 an-c03n02 kernel: [ 5060.835976] drbd r0: asender terminated Jan 27 20:29:10 an-c03n02 kernel: [ 5060.835977] drbd r0: Terminating drbd_a_r0 Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836358] drbd r0: Connection closed Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836368] drbd r0: conn( Disconnecting -> StandAlone ) Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836369] drbd r0: receiver terminated Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836371] drbd r0: Terminating drbd_r_r0 Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836435] block drbd0: disk( UpToDate -> Failed ) Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846158] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846161] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846165] block drbd0: disk( Failed -> Diskless ) Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846249] block drbd0: drbd_bm_resize called with capacity == 0 Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846269] drbd r0: Terminating drbd_w_r0 Jan 27 20:29:10 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (<null>) Jan 27 20:29:10 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent delete 19: node=2, attr=master-drbd_r0, id=<n/a>, set=(null), section=status Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_stop_0 (call=26, rc=0, cib-update=37, confirmed=true) ok Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: run_graph: Transition 3 (Complete=22, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-167.bz2): Stopped Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-168.bz2 Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: run_graph: Transition 4 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-168.bz2): Complete Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] ==== Still looking good. Now here is where things go sideways... ==== [root at an-c03n01 ~]# pcs resource enable drbd_r0_Clone ==== [root at an-c03n02 ~]# pcs status Cluster name: an-cluster-03 Last updated: Mon Jan 27 20:32:52 2014 Last change: Mon Jan 27 20:32:05 2014 via cibadmin on an-c03n01.alteeve.ca Stack: corosync Current DC: an-c03n02.alteeve.ca (2) - partition with quorum Version: 1.1.10-19.el7-368c726 2 Nodes configured 4 Resources configured Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ] Full list of resources: fence_n01_virsh (stonith:fence_virsh): Started an-c03n01.alteeve.ca fence_n02_virsh (stonith:fence_virsh): Started an-c03n02.alteeve.ca Master/Slave Set: drbd_r0_Clone [drbd_r0] Masters: [ an-c03n02.alteeve.ca ] Slaves: [ an-c03n01.alteeve.ca ] Failed actions: drbd_r0_promote_0 on an-c03n01.alteeve.ca 'unknown error' (1): call=30, status=complete, last-rc-change='Mon Jan 27 20:32:05 2014', queued=15187ms, exec=0ms PCSD Status: an-c03n01.alteeve.ca: an-c03n01.alteeve.ca: Online an-c03n02.alteeve.ca: an-c03n02.alteeve.ca: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled ==== Enable logs from an-c03n01: ==== Jan 27 20:32:05 an-c03n01 kernel: [19827.078454] drbd r0: Starting worker thread (from drbdsetup [1337]) Jan 27 20:32:05 an-c03n01 kernel: [19827.078587] block drbd0: disk( Diskless -> Attaching ) Jan 27 20:32:05 an-c03n01 kernel: [19827.078655] drbd r0: Method to ensure write ordering: drain Jan 27 20:32:05 an-c03n01 kernel: [19827.078657] block drbd0: max BIO size = 1048576 Jan 27 20:32:05 an-c03n01 kernel: [19827.078661] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024) Jan 27 20:32:05 an-c03n01 kernel: [19827.078664] block drbd0: drbd_bm_resize called with capacity == 41937592 Jan 27 20:32:05 an-c03n01 kernel: [19827.078732] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160 Jan 27 20:32:05 an-c03n01 kernel: [19827.078734] block drbd0: size = 20 GB (20968796 KB) Jan 27 20:32:05 an-c03n01 kernel: [19827.080475] block drbd0: bitmap READ of 160 pages took 2 jiffies Jan 27 20:32:05 an-c03n01 kernel: [19827.080566] block drbd0: recounting of set bits took additional 0 jiffies Jan 27 20:32:05 an-c03n01 kernel: [19827.080568] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:32:05 an-c03n01 kernel: [19827.080575] block drbd0: disk( Attaching -> Consistent ) Jan 27 20:32:05 an-c03n01 kernel: [19827.080577] block drbd0: attached to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D Jan 27 20:32:05 an-c03n01 kernel: [19827.086606] drbd r0: conn( StandAlone -> Unconnected ) Jan 27 20:32:05 an-c03n01 kernel: [19827.086663] drbd r0: Starting receiver thread (from drbd_w_r0 [1338]) Jan 27 20:32:05 an-c03n01 kernel: [19827.086677] drbd r0: receiver (re)started Jan 27 20:32:05 an-c03n01 kernel: [19827.086682] drbd r0: conn( Unconnected -> WFConnection ) Jan 27 20:32:05 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5) Jan 27 20:32:05 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 23: master-drbd_r0=5 Jan 27 20:32:05 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=27, rc=0, cib-update=18, confirmed=true) ok Jan 27 20:32:05 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=28, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:05 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=29, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:05 an-c03n01 kernel: [19827.235110] drbd r0: helper command: /sbin/drbdadm fence-peer r0 Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: invoked for r0 Jan 27 20:32:05 an-c03n01 crmd[845]: notice: handle_request: Current ping state: S_NOT_DC Jan 27 20:32:05 an-c03n01 cibadmin[1469]: notice: crm_log_args: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone"> <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone"> <expression attribute="#uname" operation="ne" value="an-c03n01.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/> </rule> </rsc_location> Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: Call cib_create failed (-76): Name not unique on network Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <failed> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <failed_update id="drbd-fence-by-handler-r0-drbd_r0_Clone" object_type="rsc_location" operation="cib_create" reason="Name not unique on network"> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone"> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone"> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <expression attribute="#uname" operation="ne" value="an-c03n01.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </rule> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </rsc_location> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </failed_update> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </failed> Jan 27 20:32:05 an-c03n01 kernel: [19827.302587] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100) Jan 27 20:32:05 an-c03n01 kernel: [19827.302590] drbd r0: fence-peer helper broken, returned 1 Jan 27 20:32:05 an-c03n01 kernel: [19827.302607] drbd r0: helper command: /sbin/drbdadm fence-peer r0 Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: WARNING DATA INTEGRITY at RISK: could not place the fencing constraint! Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1484]: invoked for r0 Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Device 'fence_n01_virsh' already existed in device list (2 active devices) Jan 27 20:32:05 an-c03n01 kernel: [19827.328528] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100) Jan 27 20:32:05 an-c03n01 kernel: [19827.328532] drbd r0: fence-peer helper broken, returned 1 Jan 27 20:32:05 an-c03n01 kernel: [19827.328553] drbd r0: helper command: /sbin/drbdadm fence-peer r0 Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1513]: invoked for r0 Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1484]: WARNING constraint <expression attribute="#uname" <expression operation="ne" <expression value="an-c03n02.alteeve.ca" <rsc_location rsc="drbd_r0_Clone" <rule role="Master" <rule score="-INFINITY" already exists Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1484]: WARNING DATA INTEGRITY at RISK: could not place the fencing constraint! Jan 27 20:32:05 an-c03n01 kernel: [19827.359166] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100) Jan 27 20:32:05 an-c03n01 kernel: [19827.359170] drbd r0: fence-peer helper broken, returned 1 Jan 27 20:32:05 an-c03n01 kernel: [19827.359193] drbd r0: helper command: /sbin/drbdadm fence-peer r0 Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1513]: WARNING constraint <expression attribute="#uname" <expression operation="ne" <expression value="an-c03n02.alteeve.ca" <rsc_location rsc="drbd_r0_Clone" <rule role="Master" <rule score="-INFINITY" already exists Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1513]: WARNING DATA INTEGRITY at RISK: could not place the fencing constraint! Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Added 'fence_n02_virsh' to the device list (2 active devices) Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1541]: invoked for r0 Jan 27 20:32:05 an-c03n01 kernel: [19827.379932] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100) Jan 27 20:32:05 an-c03n01 kernel: [19827.379935] drbd r0: fence-peer helper broken, returned 1 Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1541]: WARNING constraint <expression attribute="#uname" <expression operation="ne" <expression value="an-c03n02.alteeve.ca" <rsc_location rsc="drbd_r0_Clone" <rule role="Master" <rule score="-INFINITY" already exists Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1541]: WARNING DATA INTEGRITY at RISK: could not place the fencing constraint! Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: ERROR: r0: Called drbdadm -c /etc/drbd.conf primary r0 Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: ERROR: r0: Exit code 17 Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: ERROR: r0: Command output: Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: CRIT: Refusing to be promoted to Primary without UpToDate data Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: WARNING: promotion failed; sleep 15 # to prevent tight recovery loop Jan 27 20:32:05 an-c03n01 kernel: [19827.597081] drbd r0: Handshake successful: Agreed network protocol version 101 Jan 27 20:32:05 an-c03n01 kernel: [19827.597084] drbd r0: Agreed to support TRIM on protocol level Jan 27 20:32:05 an-c03n01 kernel: [19827.597142] drbd r0: conn( WFConnection -> WFReportParams ) Jan 27 20:32:05 an-c03n01 kernel: [19827.597145] drbd r0: Starting asender thread (from drbd_r_r0 [1347]) Jan 27 20:32:05 an-c03n01 kernel: [19827.606053] block drbd0: drbd_sync_handshake: Jan 27 20:32:05 an-c03n01 kernel: [19827.606057] block drbd0: self AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D bits:0 flags:0 Jan 27 20:32:05 an-c03n01 kernel: [19827.606058] block drbd0: peer 853E72BBF0C9260D:AA966D5345E69DAA:4F366962CD263E3C:4F356962CD263E3D bits:0 flags:0 Jan 27 20:32:05 an-c03n01 kernel: [19827.606060] block drbd0: uuid_compare()=-1 by rule 50 Jan 27 20:32:05 an-c03n01 kernel: [19827.606065] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( Consistent -> Outdated ) pdsk( DUnknown -> UpToDate ) Jan 27 20:32:05 an-c03n01 kernel: [19827.606296] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan 27 20:32:05 an-c03n01 kernel: [19827.606388] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan 27 20:32:05 an-c03n01 kernel: [19827.606391] block drbd0: conn( WFBitMapT -> WFSyncUUID ) Jan 27 20:32:05 an-c03n01 kernel: [19827.607961] block drbd0: updated sync uuid AA976D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D Jan 27 20:32:05 an-c03n01 kernel: [19827.608137] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Jan 27 20:32:05 an-c03n01 kernel: [19827.609229] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Jan 27 20:32:05 an-c03n01 kernel: [19827.609243] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) Jan 27 20:32:05 an-c03n01 kernel: [19827.609251] block drbd0: Began resync as SyncTarget (will sync 0 KB [0 bits set]). Jan 27 20:32:05 an-c03n01 kernel: [19827.610184] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Jan 27 20:32:05 an-c03n01 kernel: [19827.610188] block drbd0: updated UUIDs 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA Jan 27 20:32:05 an-c03n01 kernel: [19827.610191] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Jan 27 20:32:05 an-c03n01 kernel: [19827.610627] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 Jan 27 20:32:05 an-c03n01 crm-unfence-peer.sh[1589]: invoked for r0 Jan 27 20:32:05 an-c03n01 cibadmin[1603]: notice: crm_log_args: Invoked: cibadmin -D -X <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone"/> Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:32:05 an-c03n01 kernel: [19827.637304] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Device 'fence_n01_virsh' already existed in device list (2 active devices) Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Added 'fence_n02_virsh' to the device list (2 active devices) Jan 27 20:32:20 an-c03n01 lrmd[842]: notice: operation_finished: drbd_r0_promote_0:1408:stderr [ 0: State change failed: (-2) Need access to UpToDate data ] Jan 27 20:32:20 an-c03n01 lrmd[842]: notice: operation_finished: drbd_r0_promote_0:1408:stderr [ Command 'drbdsetup primary 0' terminated with exit code 17 ] Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=30, rc=1, cib-update=19, confirmed=true) unknown error Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: an-c03n01.alteeve.ca-drbd_r0_promote_0:30 [ \n ] Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update relayed from an-c03n02.alteeve.ca Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-drbd_r0 (1) Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 28: fail-count-drbd_r0=1 Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update relayed from an-c03n02.alteeve.ca Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-drbd_r0 (1390872740) Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 31: last-failure-drbd_r0=1390872740 Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update relayed from an-c03n02.alteeve.ca Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-drbd_r0 (2) Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 34: fail-count-drbd_r0=2 Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update relayed from an-c03n02.alteeve.ca Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-drbd_r0 (1390872740) Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 37: last-failure-drbd_r0=1390872740 Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000) Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 39: master-drbd_r0=10000 Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=31, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=32, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_demote_0 (call=33, rc=0, cib-update=20, confirmed=true) ok Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=35, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n01 kernel: [19842.604453] drbd r0: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7) Jan 27 20:32:20 an-c03n01 kernel: [19842.605419] drbd r0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown ) Jan 27 20:32:20 an-c03n01 kernel: [19842.605458] drbd r0: asender terminated Jan 27 20:32:20 an-c03n01 kernel: [19842.605460] drbd r0: Terminating drbd_a_r0 Jan 27 20:32:20 an-c03n01 kernel: [19842.605841] drbd r0: Connection closed Jan 27 20:32:20 an-c03n01 kernel: [19842.605849] drbd r0: conn( Disconnecting -> StandAlone ) Jan 27 20:32:20 an-c03n01 kernel: [19842.605850] drbd r0: receiver terminated Jan 27 20:32:20 an-c03n01 kernel: [19842.605860] drbd r0: Terminating drbd_r_r0 Jan 27 20:32:20 an-c03n01 kernel: [19842.605885] block drbd0: disk( Outdated -> Failed ) Jan 27 20:32:20 an-c03n01 kernel: [19842.617080] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Jan 27 20:32:20 an-c03n01 kernel: [19842.617085] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:32:20 an-c03n01 kernel: [19842.617103] block drbd0: disk( Failed -> Diskless ) Jan 27 20:32:20 an-c03n01 kernel: [19842.617174] block drbd0: drbd_bm_resize called with capacity == 0 Jan 27 20:32:20 an-c03n01 kernel: [19842.617202] drbd r0: Terminating drbd_w_r0 Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (<null>) Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_stop_0 (call=36, rc=0, cib-update=21, confirmed=true) ok Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent delete 43: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent delete 45: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status Jan 27 20:32:21 an-c03n01 kernel: [19842.840388] drbd r0: Starting worker thread (from drbdsetup [1818]) Jan 27 20:32:21 an-c03n01 kernel: [19842.840614] block drbd0: disk( Diskless -> Attaching ) Jan 27 20:32:21 an-c03n01 kernel: [19842.840687] drbd r0: Method to ensure write ordering: drain Jan 27 20:32:21 an-c03n01 kernel: [19842.840689] block drbd0: max BIO size = 1048576 Jan 27 20:32:21 an-c03n01 kernel: [19842.840692] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024) Jan 27 20:32:21 an-c03n01 kernel: [19842.840694] block drbd0: drbd_bm_resize called with capacity == 41937592 Jan 27 20:32:21 an-c03n01 kernel: [19842.840770] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160 Jan 27 20:32:21 an-c03n01 kernel: [19842.840772] block drbd0: size = 20 GB (20968796 KB) Jan 27 20:32:21 an-c03n01 kernel: [19842.850197] block drbd0: bitmap READ of 160 pages took 10 jiffies Jan 27 20:32:21 an-c03n01 kernel: [19842.850288] block drbd0: recounting of set bits took additional 0 jiffies Jan 27 20:32:21 an-c03n01 kernel: [19842.850290] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:32:21 an-c03n01 kernel: [19842.850295] block drbd0: disk( Attaching -> Outdated ) Jan 27 20:32:21 an-c03n01 kernel: [19842.850297] block drbd0: attached to UUIDs 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA Jan 27 20:32:21 an-c03n01 kernel: [19842.856274] drbd r0: conn( StandAlone -> Unconnected ) Jan 27 20:32:21 an-c03n01 kernel: [19842.856311] drbd r0: Starting receiver thread (from drbd_w_r0 [1819]) Jan 27 20:32:21 an-c03n01 kernel: [19842.856332] drbd r0: receiver (re)started Jan 27 20:32:21 an-c03n01 kernel: [19842.856340] drbd r0: conn( Unconnected -> WFConnection ) Jan 27 20:32:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=37, rc=0, cib-update=22, confirmed=true) ok Jan 27 20:32:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=38, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_monitor_60000 (call=39, rc=0, cib-update=23, confirmed=false) ok Jan 27 20:32:21 an-c03n01 kernel: [19843.356430] drbd r0: Handshake successful: Agreed network protocol version 101 Jan 27 20:32:21 an-c03n01 kernel: [19843.356432] drbd r0: Agreed to support TRIM on protocol level Jan 27 20:32:21 an-c03n01 kernel: [19843.356473] drbd r0: conn( WFConnection -> WFReportParams ) Jan 27 20:32:21 an-c03n01 kernel: [19843.356475] drbd r0: Starting asender thread (from drbd_r_r0 [1829]) Jan 27 20:32:21 an-c03n01 kernel: [19843.362052] block drbd0: drbd_sync_handshake: Jan 27 20:32:21 an-c03n01 kernel: [19843.362056] block drbd0: self 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA bits:0 flags:0 Jan 27 20:32:21 an-c03n01 kernel: [19843.362057] block drbd0: peer FD6969A6E17CBA41:853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA bits:0 flags:0 Jan 27 20:32:21 an-c03n01 kernel: [19843.362059] block drbd0: uuid_compare()=-1 by rule 50 Jan 27 20:32:21 an-c03n01 kernel: [19843.362063] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Jan 27 20:32:21 an-c03n01 kernel: [19843.365473] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan 27 20:32:21 an-c03n01 kernel: [19843.365579] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan 27 20:32:21 an-c03n01 kernel: [19843.365583] block drbd0: conn( WFBitMapT -> WFSyncUUID ) Jan 27 20:32:21 an-c03n01 kernel: [19843.367483] block drbd0: updated sync uuid 853F72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA Jan 27 20:32:21 an-c03n01 kernel: [19843.367693] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Jan 27 20:32:21 an-c03n01 kernel: [19843.368877] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Jan 27 20:32:21 an-c03n01 kernel: [19843.368892] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) Jan 27 20:32:21 an-c03n01 kernel: [19843.368899] block drbd0: Began resync as SyncTarget (will sync 0 KB [0 bits set]). Jan 27 20:32:21 an-c03n01 kernel: [19843.369304] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Jan 27 20:32:21 an-c03n01 kernel: [19843.369309] block drbd0: updated UUIDs FD6969A6E17CBA40:0000000000000000:853F72BBF0C9260C:853E72BBF0C9260D Jan 27 20:32:21 an-c03n01 kernel: [19843.369313] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Jan 27 20:32:21 an-c03n01 kernel: [19843.369433] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 Jan 27 20:32:21 an-c03n01 crm-unfence-peer.sh[1900]: invoked for r0 Jan 27 20:32:21 an-c03n01 kernel: [19843.384987] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) ==== Enable logs from an-c03n02: ==== Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.140.7 Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++ 0.141.1 fcc6dc293b799186774cfb583055eb9f Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: -- <nvpair id="drbd_r0_Clone-meta_attributes-target-role" name="target-role" value="Stopped"/> Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: ++ <cib admin_epoch="0" cib-last-written="Mon Jan 27 20:32:04 2014" crm_feature_set="3.0.7" epoch="141" have-quorum="1" num_updates="1" update-client="crm_resource" update-origin="an-c03n01.alteeve.ca" validate-with="pacemaker-1.2" dc-uuid="2"/> Jan 27 20:32:04 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:0 (an-c03n01.alteeve.ca) Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:1 (an-c03n02.alteeve.ca) Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-169.bz2 Jan 27 20:32:04 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 11: start drbd_r0_start_0 on an-c03n01.alteeve.ca Jan 27 20:32:04 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: start drbd_r0:1_start_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.451554] drbd r0: Starting worker thread (from drbdsetup [21714]) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452326] block drbd0: disk( Diskless -> Attaching ) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452402] drbd r0: Method to ensure write ordering: drain Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452404] block drbd0: max BIO size = 1048576 Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452407] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452409] block drbd0: drbd_bm_resize called with capacity == 41937592 Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452467] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160 Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452469] block drbd0: size = 20 GB (20968796 KB) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.453954] block drbd0: bitmap READ of 160 pages took 1 jiffies Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454067] block drbd0: recounting of set bits took additional 1 jiffies Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454069] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454073] block drbd0: disk( Attaching -> Consistent ) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454076] block drbd0: attached to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D Jan 27 20:32:05 an-c03n02 kernel: [ 5235.460539] drbd r0: conn( StandAlone -> Unconnected ) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.460598] drbd r0: Starting receiver thread (from drbd_w_r0 [21715]) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.461937] drbd r0: receiver (re)started Jan 27 20:32:05 an-c03n02 kernel: [ 5235.461957] drbd r0: conn( Unconnected -> WFConnection ) Jan 27 20:32:05 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5) Jan 27 20:32:05 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 24: master-drbd_r0=5 Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=27, rc=0, cib-update=40, confirmed=true) ok Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 44: notify drbd_r0_post_notify_start_0 on an-c03n01.alteeve.ca Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 45: notify drbd_r0:1_post_notify_start_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=28, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: run_graph: Transition 5 (Complete=10, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-169.bz2): Stopped Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:0 (Slave -> Master an-c03n02.alteeve.ca) Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:1 (Slave -> Master an-c03n01.alteeve.ca) Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-170.bz2 Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 52: notify drbd_r0_pre_notify_promote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 54: notify drbd_r0_pre_notify_promote_0 on an-c03n01.alteeve.ca Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=29, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: promote drbd_r0_promote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 16: promote drbd_r0_promote_0 on an-c03n01.alteeve.ca Jan 27 20:32:05 an-c03n02 kernel: [ 5235.599706] drbd r0: helper command: /sbin/drbdadm fence-peer r0 Jan 27 20:32:05 an-c03n02 crm-fence-peer.sh[21814]: invoked for r0 Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: handle_request: Current ping state: S_TRANSITION_ENGINE Jan 27 20:32:05 an-c03n02 cibadmin[21846]: notice: crm_log_args: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone"> <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone"> <expression attribute="#uname" operation="ne" value="an-c03n02.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/> </rule> </rsc_location> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.141.5 Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++ 0.142.1 c0646876db9897523b58236bb6890452 Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <cib admin_epoch="0" epoch="141" num_updates="5"/> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone"> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone"> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <expression attribute="#uname" operation="ne" value="an-c03n02.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ </rule> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ </rsc_location> Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:32:05 an-c03n02 crm-fence-peer.sh[21814]: INFO peer is reachable, my disk is Consistent: placed constraint 'drbd-fence-by-handler-r0-drbd_r0_Clone' Jan 27 20:32:05 an-c03n02 cib[21128]: warning: update_results: Action cib_create failed: Name not unique on network (cde=-76) Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures <failed> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures <failed_update id="drbd-fence-by-handler-r0-drbd_r0_Clone" object_type="rsc_location" operation="cib_create" reason="Name not unique on network"> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone"> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone"> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures <expression attribute="#uname" operation="ne" value="an-c03n01.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures </rule> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures </rsc_location> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures </failed_update> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures </failed> Jan 27 20:32:05 an-c03n02 cib[21128]: warning: cib_process_request: Completed cib_create operation for section constraints: Name not unique on network (rc=-76, origin=an-c03n01.alteeve.ca/cibadmin/2, version=0.142.1) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651646] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 4 (0x400) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651650] drbd r0: fence-peer helper returned 4 (peer was fenced) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651660] drbd r0: pdsk( DUnknown -> Outdated ) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651666] block drbd0: role( Secondary -> Primary ) disk( Consistent -> UpToDate ) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651876] block drbd0: new current UUID 853E72BBF0C9260D:AA966D5345E69DAA:4F366962CD263E3C:4F356962CD263E3D Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=30, rc=0, cib-update=42, confirmed=true) ok Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Added 'fence_n01_virsh' to the device list (2 active devices) Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Device 'fence_n02_virsh' already existed in device list (2 active devices) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962021] drbd r0: Handshake successful: Agreed network protocol version 101 Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962023] drbd r0: Agreed to support TRIM on protocol level Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962069] drbd r0: conn( WFConnection -> WFReportParams ) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962072] drbd r0: Starting asender thread (from drbd_r_r0 [21724]) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968085] block drbd0: drbd_sync_handshake: Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968090] block drbd0: self 853E72BBF0C9260D:AA966D5345E69DAA:4F366962CD263E3C:4F356962CD263E3D bits:0 flags:0 Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968092] block drbd0: peer AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D bits:0 flags:0 Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968094] block drbd0: uuid_compare()=1 by rule 70 Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968100] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Consistent ) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968256] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan 27 20:32:05 an-c03n02 kernel: [ 5235.971293] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan 27 20:32:05 an-c03n02 kernel: [ 5235.971299] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972381] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972395] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972402] block drbd0: Began resync as SyncSource (will sync 0 KB [0 bits set]). Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972433] block drbd0: updated sync UUID 853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA:4F366962CD263E3C Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.142.2 Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++ 0.143.1 fbd603d69e81ccfe94726267b74d5322 Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone"> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone"> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <expression attribute="#uname" operation="ne" value="an-c03n02.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- </rule> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- </rsc_location> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <cib admin_epoch="0" cib-last-written="Mon Jan 27 20:32:05 2014" crm_feature_set="3.0.7" epoch="143" have-quorum="1" num_updates="1" update-client="cibadmin" update-origin="an-c03n01.alteeve.ca" validate-with="pacemaker-1.2" dc-uuid="2"/> Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:32:05 an-c03n02 kernel: [ 5236.007605] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Jan 27 20:32:05 an-c03n02 kernel: [ 5236.007612] block drbd0: updated UUIDs 853E72BBF0C9260D:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA Jan 27 20:32:05 an-c03n02 kernel: [ 5236.007618] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Added 'fence_n01_virsh' to the device list (2 active devices) Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Device 'fence_n02_virsh' already existed in device list (2 active devices) Jan 27 20:32:20 an-c03n02 crmd[21133]: warning: status_from_rc: Action 16 (drbd_r0_promote_0) on an-c03n01.alteeve.ca failed (target: 0 vs. rc: 1): Error Jan 27 20:32:20 an-c03n02 crmd[21133]: warning: update_failcount: Updating failcount for drbd_r0 on an-c03n01.alteeve.ca after failed promote: rc=1 (update=value++, time=1390872740) Jan 27 20:32:20 an-c03n02 crmd[21133]: warning: update_failcount: Updating failcount for drbd_r0 on an-c03n01.alteeve.ca after failed promote: rc=1 (update=value++, time=1390872740) Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 53: notify drbd_r0_post_notify_promote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_post_notify_promote_0 on an-c03n01.alteeve.ca Jan 27 20:32:20 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000) Jan 27 20:32:20 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 32: master-drbd_r0=10000 Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=31, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: run_graph: Transition 6 (Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-170.bz2): Complete Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:32:20 an-c03n02 pengine[21132]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca: unknown error (1) Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: LogActions: Demote drbd_r0:1 (Master -> Slave an-c03n01.alteeve.ca) Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: LogActions: Recover drbd_r0:1 (Master an-c03n01.alteeve.ca) Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-171.bz2 Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_pre_notify_demote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 57: notify drbd_r0_pre_notify_demote_0 on an-c03n01.alteeve.ca Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=32, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 16: demote drbd_r0_demote_0 on an-c03n01.alteeve.ca Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 56: notify drbd_r0_post_notify_demote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 58: notify drbd_r0_post_notify_demote_0 on an-c03n01.alteeve.ca Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=33, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 48: notify drbd_r0_pre_notify_stop_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 50: notify drbd_r0_pre_notify_stop_0 on an-c03n01.alteeve.ca Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 3: stop drbd_r0_stop_0 on an-c03n01.alteeve.ca Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969170] block drbd0: State change failed: Refusing to be Primary while peer is not outdated Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969190] block drbd0: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- } Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969196] block drbd0: wanted = { cs:TearDown ro:Primary/Unknown ds:UpToDate/DUnknown s---F- } Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969201] drbd r0: State change failed: Refusing to be Primary while peer is not outdated Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969205] drbd r0: mask = 0x1f0 val = 0x70 Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969218] drbd r0: old_conn:WFReportParams wanted_conn:TearDown Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969396] drbd r0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> Outdated ) Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969407] drbd r0: asender terminated Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969408] drbd r0: Terminating drbd_a_r0 Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969457] block drbd0: new current UUID FD6969A6E17CBA41:853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969708] drbd r0: Connection closed Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969717] drbd r0: conn( TearDown -> Unconnected ) Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969718] drbd r0: receiver terminated Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969719] drbd r0: Restarting receiver thread Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969720] drbd r0: receiver (re)started Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969725] drbd r0: conn( Unconnected -> WFConnection ) Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 49: notify drbd_r0_post_notify_stop_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=35, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: run_graph: Transition 7 (Complete=21, Pending=0, Fired=0, Skipped=7, Incomplete=5, Source=/var/lib/pacemaker/pengine/pe-input-171.bz2): Stopped Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:32:20 an-c03n02 pengine[21132]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca: unknown error (1) Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:1 (an-c03n01.alteeve.ca) Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 8: /var/lib/pacemaker/pengine/pe-input-172.bz2 Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 44: notify drbd_r0_pre_notify_start_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=36, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 15: start drbd_r0_start_0 on an-c03n01.alteeve.ca Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 45: notify drbd_r0_post_notify_start_0 on an-c03n02.alteeve.ca (local) Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 46: notify drbd_r0_post_notify_start_0 on an-c03n01.alteeve.ca Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=37, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 16: monitor drbd_r0_monitor_60000 on an-c03n01.alteeve.ca Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: run_graph: Transition 8 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-172.bz2): Complete Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721118] drbd r0: Handshake successful: Agreed network protocol version 101 Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721120] drbd r0: Agreed to support TRIM on protocol level Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721145] drbd r0: conn( WFConnection -> WFReportParams ) Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721146] drbd r0: Starting asender thread (from drbd_r_r0 [21724]) Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730101] block drbd0: drbd_sync_handshake: Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730104] block drbd0: self FD6969A6E17CBA41:853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA bits:0 flags:0 Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730106] block drbd0: peer 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA bits:0 flags:0 Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730107] block drbd0: uuid_compare()=1 by rule 70 Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730111] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Consistent ) Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730229] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730496] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730499] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731835] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0) Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731848] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731861] block drbd0: Began resync as SyncSource (will sync 0 KB [0 bits set]). Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731888] block drbd0: updated sync UUID FD6969A6E17CBA41:853F72BBF0C9260D:853E72BBF0C9260D:AA976D5345E69DAA Jan 27 20:32:21 an-c03n02 kernel: [ 5251.750241] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Jan 27 20:32:21 an-c03n02 kernel: [ 5251.750248] block drbd0: updated UUIDs FD6969A6E17CBA41:0000000000000000:853F72BBF0C9260D:853E72BBF0C9260D Jan 27 20:32:21 an-c03n02 kernel: [ 5251.750253] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) ==== What? After about a moment, things sort of clear up: ==== [root at an-c03n02 ~]# pcs status Cluster name: an-cluster-03 Last updated: Mon Jan 27 20:34:37 2014 Last change: Mon Jan 27 20:32:05 2014 via cibadmin on an-c03n01.alteeve.ca Stack: corosync Current DC: an-c03n02.alteeve.ca (2) - partition with quorum Version: 1.1.10-19.el7-368c726 2 Nodes configured 4 Resources configured Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ] Full list of resources: fence_n01_virsh (stonith:fence_virsh): Started an-c03n01.alteeve.ca fence_n02_virsh (stonith:fence_virsh): Started an-c03n02.alteeve.ca Master/Slave Set: drbd_r0_Clone [drbd_r0] Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ] Failed actions: drbd_r0_promote_0 on an-c03n01.alteeve.ca 'unknown error' (1): call=30, status=complete, last-rc-change='Mon Jan 27 20:32:05 2014', queued=15187ms, exec=0ms PCSD Status: an-c03n01.alteeve.ca: an-c03n01.alteeve.ca: Online an-c03n02.alteeve.ca: an-c03n02.alteeve.ca: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled ==== Post-enable logs from an-c03n01: ==== Jan 27 20:33:21 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000) Jan 27 20:33:21 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 48: master-drbd_r0=10000 Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=41, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=42, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:33:21 an-c03n01 kernel: [19903.079190] block drbd0: role( Secondary -> Primary ) Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=43, rc=0, cib-update=25, confirmed=true) ok Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=44, rc=0, cib-update=0, confirmed=true) ok ==== Post-enable logs from an-c03n01: ==== Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:33:21 an-c03n02 pengine[21132]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca: unknown error (1) Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:1 (Slave -> Master an-c03n01.alteeve.ca) Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-173.bz2 Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 3: cancel drbd_r0_cancel_60000 on an-c03n01.alteeve.ca Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 53: notify drbd_r0_pre_notify_promote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_pre_notify_promote_0 on an-c03n01.alteeve.ca Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=38, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: run_graph: Transition 9 (Complete=4, Pending=0, Fired=0, Skipped=3, Incomplete=5, Source=/var/lib/pacemaker/pengine/pe-input-173.bz2): Stopped Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 27 20:33:21 an-c03n02 pengine[21132]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca: unknown error (1) Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:1 (Slave -> Master an-c03n01.alteeve.ca) Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 10: /var/lib/pacemaker/pengine/pe-input-174.bz2 Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 52: notify drbd_r0_pre_notify_promote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 54: notify drbd_r0_pre_notify_promote_0 on an-c03n01.alteeve.ca Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=39, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 17: promote drbd_r0_promote_0 on an-c03n01.alteeve.ca Jan 27 20:33:21 an-c03n02 kernel: [ 5311.444071] block drbd0: peer( Secondary -> Primary ) Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 53: notify drbd_r0_post_notify_promote_0 on an-c03n02.alteeve.ca (local) Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_post_notify_promote_0 on an-c03n01.alteeve.ca Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=40, rc=0, cib-update=0, confirmed=true) ok Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: run_graph: Transition 10 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-174.bz2): Complete Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] ==== I have no idea what's going wrong here... I'd love for any insight/help. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?