[DRBD-user] Need help setting fence handler and pacemaker 1.1.10 with DRBD 8.3.16 config

Digimer lists at alteeve.ca
Thu Jun 19 06:54:29 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi all,

   This is something of a repeat of my questions on the Pacemaker 
mailing list, but I think I am dealing with a DRBD issue, so allow me to 
ask again here.

   I have drbd configured thus:

====
# /etc/drbd.conf
common {
     protocol               C;
     net {
         allow-two-primaries;
         after-sb-0pri    discard-zero-changes;
         after-sb-1pri    discard-secondary;
         after-sb-2pri    disconnect;
     }
     disk {
         fencing          resource-and-stonith;
     }
     syncer {
         rate             40M;
     }
     handlers {
         fence-peer       /usr/lib/drbd/crm-fence-peer.sh;
     }
}

# resource r0 on an-a04n01.alteeve.ca: not ignored, not stacked
resource r0 {
     on an-a04n01.alteeve.ca {
         device           /dev/drbd0 minor 0;
         disk             /dev/sda5;
         address          ipv4 10.10.40.1:7788;
         meta-disk        internal;
     }
     on an-a04n02.alteeve.ca {
         device           /dev/drbd0 minor 0;
         disk             /dev/sda5;
         address          ipv4 10.10.40.2:7788;
         meta-disk        internal;
     }
}

# resource r1 on an-a04n01.alteeve.ca: not ignored, not stacked
resource r1 {
     on an-a04n01.alteeve.ca {
         device           /dev/drbd1 minor 1;
         disk             /dev/sda6;
         address          ipv4 10.10.40.1:7789;
         meta-disk        internal;
     }
     on an-a04n02.alteeve.ca {
         device           /dev/drbd1 minor 1;
         disk             /dev/sda6;
         address          ipv4 10.10.40.2:7789;
         meta-disk        internal;
     }
}
====

I've setup pacemaker thus:

====
Cluster Name: an-anvil-04
Corosync Nodes:

Pacemaker Nodes:
  an-a04n01.alteeve.ca an-a04n02.alteeve.ca

Resources:
  Master: drbd_r0_Clone
   Meta Attrs: master-max=2 master-node-max=1 clone-max=2 
clone-node-max=1 notify=true
   Resource: drbd_r0 (class=ocf provider=linbit type=drbd)
    Attributes: drbd_resource=r0
    Operations: monitor interval=30s (drbd_r0-monitor-interval-30s)
  Master: lvm_n01_vg0_Clone
   Meta Attrs: master-max=2 master-node-max=1 clone-max=2 
clone-node-max=1 notify=true
   Resource: lvm_n01_vg0 (class=ocf provider=heartbeat type=LVM)
    Attributes: volgrpname=an-a04n01_vg0
    Operations: monitor interval=30s (lvm_n01_vg0-monitor-interval-30s)

Stonith Devices:
  Resource: fence_n01_ipmi (class=stonith type=fence_ipmilan)
   Attributes: pcmk_host_list=an-a04n01.alteeve.ca ipaddr=an-a04n01.ipmi 
action=reboot login=admin passwd=Initial1 delay=15
   Operations: monitor interval=60s (fence_n01_ipmi-monitor-interval-60s)
  Resource: fence_n02_ipmi (class=stonith type=fence_ipmilan)
   Attributes: pcmk_host_list=an-a04n02.alteeve.ca ipaddr=an-a04n02.ipmi 
action=reboot login=admin passwd=Initial1
   Operations: monitor interval=60s (fence_n02_ipmi-monitor-interval-60s)
Fencing Levels:

Location Constraints:
   Resource: drbd_r0_Clone
     Constraint: drbd-fence-by-handler-r0-drbd_r0_Clone
       Rule: score=-INFINITY role=Master 
(id:drbd-fence-by-handler-r0-rule-drbd_r0_Clone)
         Expression: #uname ne an-a04n01.alteeve.ca 
(id:drbd-fence-by-handler-r0-expr-drbd_r0_Clone)
Ordering Constraints:
   promote drbd_r0_Clone then start lvm_n01_vg0_Clone (Mandatory) 
(id:order-drbd_r0_Clone-lvm_n01_vg0_Clone-mandatory)
Colocation Constraints:

Cluster Properties:
  cluster-infrastructure: cman
  dc-version: 1.1.10-14.el6_5.3-368c726
  last-lrm-refresh: 1403147476
  no-quorum-policy: ignore
  stonith-enabled: true
====

Note the -INFINITY rule, I didn't add that, crm-fence-peer.sh did on 
start. This brings me to the question;

When pacemaker starts the DRBD resource, immediately the peer is 
resource fenced. Note that only r0 is configured at this time to 
simplify debugging this issue.

Here is the DRBD related /var/log/messages entries on node 1 (an-a04n01) 
from when I start pacemaker:

====
Jun 19 00:14:22 an-a04n01 crmd[16895]:   notice: do_state_transition: 
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL origin=do_election_check ]
Jun 19 00:14:22 an-a04n01 attrd[16893]:   notice: attrd_local_callback: 
Sending full refresh (origin=crmd)
Jun 19 00:14:22 an-a04n01 pengine[16894]:   notice: unpack_config: On 
loss of CCM Quorum: Ignore
Jun 19 00:14:22 an-a04n01 pengine[16894]:   notice: LogActions: Start 
fence_n01_ipmi#011(an-a04n01.alteeve.ca)
Jun 19 00:14:22 an-a04n01 pengine[16894]:   notice: LogActions: Start 
fence_n02_ipmi#011(an-a04n02.alteeve.ca)
Jun 19 00:14:22 an-a04n01 pengine[16894]:   notice: LogActions: Start 
drbd_r0:0#011(an-a04n01.alteeve.ca)
Jun 19 00:14:22 an-a04n01 pengine[16894]:   notice: LogActions: Start 
drbd_r0:1#011(an-a04n02.alteeve.ca)
Jun 19 00:14:22 an-a04n01 pengine[16894]:   notice: process_pe_message: 
Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-230.bz2
Jun 19 00:14:22 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 8: monitor fence_n01_ipmi_monitor_0 on 
an-a04n02.alteeve.ca
Jun 19 00:14:22 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 4: monitor fence_n01_ipmi_monitor_0 on 
an-a04n01.alteeve.ca (local)
Jun 19 00:14:22 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 9: monitor fence_n02_ipmi_monitor_0 on 
an-a04n02.alteeve.ca
Jun 19 00:14:22 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 5: monitor fence_n02_ipmi_monitor_0 on 
an-a04n01.alteeve.ca (local)
Jun 19 00:14:22 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 6: monitor drbd_r0:0_monitor_0 on an-a04n01.alteeve.ca 
(local)
Jun 19 00:14:22 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 10: monitor drbd_r0:1_monitor_0 on an-a04n02.alteeve.ca
Jun 19 00:14:23 an-a04n01 crmd[16895]:   notice: process_lrm_event: LRM 
operation drbd_r0_monitor_0 (call=14, rc=7, cib-update=28, 
confirmed=true) not running
Jun 19 00:14:23 an-a04n01 crmd[16895]:   notice: process_lrm_event: 
an-a04n01.alteeve.ca-drbd_r0_monitor_0:14 [ \n ]
Jun 19 00:14:23 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 3: probe_complete probe_complete on 
an-a04n01.alteeve.ca (local) - no waiting
Jun 19 00:14:23 an-a04n01 attrd[16893]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: probe_complete (true)
Jun 19 00:14:23 an-a04n01 attrd[16893]:   notice: attrd_perform_update: 
Sent update 4: probe_complete=true
Jun 19 00:14:23 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 7: probe_complete probe_complete on 
an-a04n02.alteeve.ca - no waiting
Jun 19 00:14:23 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 11: start fence_n01_ipmi_start_0 on 
an-a04n01.alteeve.ca (local)
Jun 19 00:14:23 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 13: start fence_n02_ipmi_start_0 on an-a04n02.alteeve.ca
Jun 19 00:14:23 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 15: start drbd_r0:0_start_0 on an-a04n01.alteeve.ca 
(local)
Jun 19 00:14:24 an-a04n01 stonith-ng[16891]:   notice: 
stonith_device_register: Device 'fence_n01_ipmi' already existed in 
device list (2 active devices)
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 17: start drbd_r0:1_start_0 on an-a04n02.alteeve.ca
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: process_lrm_event: LRM 
operation fence_n01_ipmi_start_0 (call=19, rc=0, cib-update=29, 
confirmed=true) ok
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 12: monitor fence_n01_ipmi_monitor_60000 on 
an-a04n01.alteeve.ca (local)
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 14: monitor fence_n02_ipmi_monitor_60000 on 
an-a04n02.alteeve.ca
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: process_lrm_event: LRM 
operation fence_n01_ipmi_monitor_60000 (call=24, rc=0, cib-update=30, 
confirmed=false) ok
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: Starting worker thread 
(from cqueue [3265])
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: disk( Diskless -> 
Attaching )
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: Found 4 transactions (126 
active extents) in activity log.
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: Method to ensure write 
ordering: flush
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: drbd_bm_resize called 
with capacity == 909525832
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: resync bitmap: 
bits=113690729 words=1776418 pages=3470
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: size = 434 GB (454762916 KB)
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: bitmap READ of 3470 pages 
took 8 jiffies
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: recounting of set bits 
took additional 16 jiffies
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked 
out-of-sync by on disk bit-map.
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: disk( Attaching -> 
Consistent )
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: attached to UUIDs 
561F3328043888C0:0000000000000000:052A1A6B59936EC5:05291A6B59936EC5
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: conn( StandAlone -> 
Unconnected )
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: Starting receiver thread 
(from drbd0_worker [17045])
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: receiver (re)started
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: conn( Unconnected -> 
WFConnection )
Jun 19 00:14:24 an-a04n01 attrd[16893]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-drbd_r0 (5)
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: process_lrm_event: LRM 
operation drbd_r0_start_0 (call=21, rc=0, cib-update=31, confirmed=true) ok
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 48: notify drbd_r0:0_post_notify_start_0 on 
an-a04n01.alteeve.ca (local)
Jun 19 00:14:24 an-a04n01 attrd[16893]:   notice: attrd_perform_update: 
Sent update 9: master-drbd_r0=5
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 49: notify drbd_r0:1_post_notify_start_0 on 
an-a04n02.alteeve.ca
Jun 19 00:14:24 an-a04n01 attrd[16893]:   notice: attrd_perform_update: 
Sent update 11: master-drbd_r0=5
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: process_lrm_event: LRM 
operation drbd_r0_notify_0 (call=28, rc=0, cib-update=0, confirmed=true) ok
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: run_graph: Transition 0 
(Complete=23, Pending=0, Fired=0, Skipped=2, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-230.bz2): Stopped
Jun 19 00:14:24 an-a04n01 pengine[16894]:   notice: unpack_config: On 
loss of CCM Quorum: Ignore
Jun 19 00:14:24 an-a04n01 pengine[16894]:   notice: LogActions: Promote 
drbd_r0:0#011(Slave -> Master an-a04n01.alteeve.ca)
Jun 19 00:14:24 an-a04n01 pengine[16894]:   notice: LogActions: Promote 
drbd_r0:1#011(Slave -> Master an-a04n02.alteeve.ca)
Jun 19 00:14:24 an-a04n01 pengine[16894]:   notice: process_pe_message: 
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-231.bz2
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 52: notify drbd_r0_pre_notify_promote_0 on 
an-a04n01.alteeve.ca (local)
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 54: notify drbd_r0_pre_notify_promote_0 on 
an-a04n02.alteeve.ca
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: process_lrm_event: LRM 
operation drbd_r0_notify_0 (call=31, rc=0, cib-update=0, confirmed=true) ok
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 13: promote drbd_r0_promote_0 on an-a04n01.alteeve.ca 
(local)
Jun 19 00:14:24 an-a04n01 crmd[16895]:   notice: te_rsc_command: 
Initiating action 16: promote drbd_r0_promote_0 on an-a04n02.alteeve.ca
Jun 19 00:14:24 an-a04n01 kernel: block drbd0: helper command: 
/sbin/drbdadm fence-peer minor-0
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: Handshake successful: 
Agreed network protocol version 97
Jun 19 00:14:25 an-a04n01 crm-fence-peer.sh[17156]: invoked for r0
Jun 19 00:14:25 an-a04n01 cibadmin[17188]:   notice: crm_log_args: 
Invoked: cibadmin -C -o constraints -X <rsc_location rsc="drbd_r0_Clone" 
id="drbd-fence-by-handler-r0-drbd_r0_Clone">#012  <rule role="Master" 
score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">#012 
    <expression attribute="#uname" operation="ne" 
value="an-a04n01.alteeve.ca" 
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>#012 
</rule>#012</rsc_location>
Jun 19 00:14:25 an-a04n01 crmd[16895]:   notice: handle_request: Current 
ping state: S_TRANSITION_ENGINE
Jun 19 00:14:25 an-a04n01 cib[16890]:   notice: cib:diff: Diff: --- 0.94.19
Jun 19 00:14:25 an-a04n01 cib[16890]:   notice: cib:diff: Diff: +++ 
0.95.1 4f095b8add6dcbb173de1254bf02fcf6
Jun 19 00:14:25 an-a04n01 cib[16890]:   notice: cib:diff: -- <cib 
admin_epoch="0" epoch="94" num_updates="19"/>
Jun 19 00:14:25 an-a04n01 cib[16890]:   notice: cib:diff: ++ 
<rsc_location rsc="drbd_r0_Clone" 
id="drbd-fence-by-handler-r0-drbd_r0_Clone">
Jun 19 00:14:25 an-a04n01 cib[16890]:   notice: cib:diff: ++ 
<rule role="Master" score="-INFINITY" 
id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
Jun 19 00:14:25 an-a04n01 cib[16890]:   notice: cib:diff: ++ 
<expression attribute="#uname" operation="ne" 
value="an-a04n01.alteeve.ca" 
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
Jun 19 00:14:25 an-a04n01 cib[16890]:   notice: cib:diff: ++         </rule>
Jun 19 00:14:25 an-a04n01 cib[16890]:   notice: cib:diff: ++ 
</rsc_location>
Jun 19 00:14:25 an-a04n01 stonith-ng[16891]:   notice: unpack_config: On 
loss of CCM Quorum: Ignore
Jun 19 00:14:25 an-a04n01 crm-fence-peer.sh[17156]: INFO peer is 
reachable, my disk is Consistent: placed constraint 
'drbd-fence-by-handler-r0-drbd_r0_Clone'
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: helper command: 
/sbin/drbdadm fence-peer minor-0 exit code 4 (0x400)
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: fence-peer helper 
returned 4 (peer was fenced)
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: role( Secondary -> 
Primary ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> Outdated )
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: new current UUID 
25DF173CF8D89023:561F3328043888C0:052A1A6B59936EC5:05291A6B59936EC5
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: conn( WFConnection -> 
WFReportParams )
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: Starting asender thread 
(from drbd0_receiver [17062])
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: data-integrity-alg: 
<not-used>
Jun 19 00:14:25 an-a04n01 stonith-ng[16891]:   notice: 
stonith_device_register: Device 'fence_n01_ipmi' already existed in 
device list (2 active devices)
Jun 19 00:14:25 an-a04n01 cib[16890]:  warning: update_results: Action 
cib_create failed: Name not unique on network (cde=-76)
Jun 19 00:14:25 an-a04n01 cib[16890]:    error: cib_process_create: CIB 
Update failures   <failed>
Jun 19 00:14:25 an-a04n01 cib[16890]:    error: cib_process_create: CIB 
Update failures     <failed_update 
id="drbd-fence-by-handler-r0-drbd_r0_Clone" object_type="rsc_location" 
operation="cib_create" reason="Name not unique on network">
Jun 19 00:14:25 an-a04n01 cib[16890]:    error: cib_process_create: CIB 
Update failures       <rsc_location rsc="drbd_r0_Clone" 
id="drbd-fence-by-handler-r0-drbd_r0_Clone">
Jun 19 00:14:25 an-a04n01 cib[16890]:    error: cib_process_create: CIB 
Update failures         <rule role="Master" score="-INFINITY" 
id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
Jun 19 00:14:25 an-a04n01 cib[16890]:    error: cib_process_create: CIB 
Update failures           <expression attribute="#uname" operation="ne" 
value="an-a04n02.alteeve.ca" 
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
Jun 19 00:14:25 an-a04n01 cib[16890]:    error: cib_process_create: CIB 
Update failures         </rule>
Jun 19 00:14:25 an-a04n01 cib[16890]:    error: cib_process_create: CIB 
Update failures       </rsc_location>
Jun 19 00:14:25 an-a04n01 cib[16890]:    error: cib_process_create: CIB 
Update failures     </failed_update>
Jun 19 00:14:25 an-a04n01 cib[16890]:    error: cib_process_create: CIB 
Update failures   </failed>
Jun 19 00:14:25 an-a04n01 cib[16890]:  warning: cib_process_request: 
Completed cib_create operation for section constraints: Name not unique 
on network (rc=-76, origin=an-a04n02.alteeve.ca/cibadmin/2, version=0.95.1)
Jun 19 00:14:25 an-a04n01 stonith-ng[16891]:   notice: 
stonith_device_register: Added 'fence_n02_ipmi' to the device list (2 
active devices)
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: drbd_sync_handshake:
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: self 
25DF173CF8D89023:561F3328043888C0:052A1A6B59936EC5:05291A6B59936EC5 
bits:0 flags:0
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: peer 
561F3328043888C0:0000000000000000:052A1A6B59936EC4:05291A6B59936EC5 
bits:0 flags:0
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: uuid_compare()=1 by rule 70
Jun 19 00:14:25 an-a04n01 kernel: block drbd0: peer( Unknown -> 
Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> 
Consistent )
Jun 19 00:14:25 an-a04n01 crmd[16895]:   notice: process_lrm_event: LRM 
operation drbd_r0_promote_0 (call=34, rc=0, cib-update=33, 
confirmed=true) ok
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: helper command: 
/sbin/drbdadm before-resync-source minor-0
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: helper command: 
/sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: conn( WFBitMapS -> 
SyncSource ) pdsk( Consistent -> Inconsistent )
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: Began resync as 
SyncSource (will sync 0 KB [0 bits set]).
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: updated sync UUID 
25DF173CF8D89023:56203328043888C0:561F3328043888C0:052A1A6B59936EC5
Jun 19 00:14:26 an-a04n01 cib[16890]:   notice: cib:diff: Diff: --- 0.95.2
Jun 19 00:14:26 an-a04n01 cib[16890]:   notice: cib:diff: Diff: +++ 
0.96.1 86f147e11a7e9934f7b2a686715dcca6
Jun 19 00:14:26 an-a04n01 cib[16890]:   notice: cib:diff: -- 
<rsc_location rsc="drbd_r0_Clone" 
id="drbd-fence-by-handler-r0-drbd_r0_Clone">
Jun 19 00:14:26 an-a04n01 cib[16890]:   notice: cib:diff: -- 
<rule role="Master" score="-INFINITY" 
id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
Jun 19 00:14:26 an-a04n01 cib[16890]:   notice: cib:diff: -- 
<expression attribute="#uname" operation="ne" 
value="an-a04n01.alteeve.ca" 
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
Jun 19 00:14:26 an-a04n01 cib[16890]:   notice: cib:diff: --         </rule>
Jun 19 00:14:26 an-a04n01 cib[16890]:   notice: cib:diff: -- 
</rsc_location>
Jun 19 00:14:26 an-a04n01 cib[16890]:   notice: cib:diff: ++ <cib 
admin_epoch="0" cib-last-written="Thu Jun 19 00:14:26 2014" 
crm_feature_set="3.0.7" epoch="96" have-quorum="1" num_updates="1" 
update-client="cibadmin" update-origin="an-a04n02.alteeve.ca" 
validate-with="pacemaker-1.2" dc-uuid="an-a04n01.alteeve.ca"/>
Jun 19 00:14:26 an-a04n01 stonith-ng[16891]:   notice: unpack_config: On 
loss of CCM Quorum: Ignore
Jun 19 00:14:26 an-a04n01 stonith-ng[16891]:   notice: 
stonith_device_register: Device 'fence_n01_ipmi' already existed in 
device list (2 active devices)
Jun 19 00:14:26 an-a04n01 stonith-ng[16891]:   notice: 
stonith_device_register: Added 'fence_n02_ipmi' to the device list (2 
active devices)
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: Resync done (total 1 sec; 
paused 0 sec; 0 K/sec)
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: updated UUIDs 
25DF173CF8D89023:0000000000000000:56203328043888C0:561F3328043888C0
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: conn( SyncSource -> 
Connected ) pdsk( Inconsistent -> UpToDate )
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: bitmap WRITE of 3470 
pages took 9 jiffies
Jun 19 00:14:26 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked 
out-of-sync by on disk bit-map.
====

This causes pacemaker to declare the resource failed, though eventually 
the crm-unfence-peer.sh is called, clearing the -INFINITY constraint and 
node 2 (an-a04n02) finally promotes to Primary. However, the drbd_r0 
resource remains listed as failed.

I have no idea why this is happening, and I am really stumped. Any help 
would be much appreciated.

digimer

PS - RHEL 6.5 fully updated, DRBD 8.3.15, Pacemaker 1.1.10

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



More information about the drbd-user mailing list