<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>
Hi everyone,<br>First, I would like to express my pleasure using DRBD!<br>Here is my situation:<br> <br>Two-node setup, using cman and pacemaker, don't care about quorum, no stonith<BR>Master-Slave DRBD resource<br>Fence resource only<br>I noticed that under certain settings (powering on/off nodes enough times) the secondary node may never becomes promoted when primary is shutdown. <br>Here is a sample log (attached)<br> <br>Jan 18 08:34:52 NODE-1 crmd: [2054]: info: do_lrm_rsc_op: Performing key=7:89911:0:aac20e27-939f-439c-b461-e668262718b3 op=drbd_fsroot:0_promote_0 )<br>Jan 18 08:34:52 NODE-1 lrmd: [2051]: info: rsc:drbd_fsroot:0:299768: promote<br>Jan 18 08:34:52 NODE-1 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0<br>Jan 18 08:34:52 NODE-1 corosync[1759]: [TOTEM ] Automatically recovered ring 1<br>Jan 18 08:34:53 NODE-1 crm-fence-peer.sh[24325]: invoked for fsroot<br>Jan 18 08:34:53 NODE-1 corosync[1759]: [TOTEM ] Automatically recovered ring 1<br>Jan 18 08:34:53 NODE-1 crm-fence-peer.sh[24325]: WARNING peer is unreachable, my disk is Consistent: did not place the constraint!<br>Jan 18 08:34:53 NODE-1 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 5 (0x500)<br>Jan 18 08:34:53 NODE-1 kernel: block drbd0: fence-peer helper returned 5 (peer unreachable, doing nothing since disk != UpToDate)<br>Jan 18 08:34:53 NODE-1 kernel: block drbd0: State change failed: Need access to UpToDate data<br>Jan 18 08:34:53 NODE-1 kernel: block drbd0: state = { cs:WFConnection ro:Secondary/Unknown ds:Consistent/DUnknown r--- }<br>Jan 18 08:34:53 NODE-1 kernel: block drbd0: wanted = { cs:WFConnection ro:Primary/Unknown ds:Consistent/DUnknown r--- }<br>Jan 18 08:34:53 NODE-1 lrmd: [2051]: info: RA output: (drbd_fsroot:0:promote:stderr) 0: State change failed: (-2) Need access to UpToDate data<br>Jan 18 08:34:53 NODE-1 lrmd: [2051]: info: RA output: (drbd_fsroot:0:promote:stderr) Command 'drbdsetup 0 primary' terminated with exit code 17<br>Jan 18 08:34:53 NODE-1 drbd[24286]: ERROR: fsroot: Called drbdadm -c /etc/drbd.conf primary fsroot<br>Jan 18 08:34:53 NODE-1 drbd[24286]: ERROR: fsroot: Exit code 17<br>Jan 18 08:34:53 NODE-1 drbd[24286]: ERROR: fsroot: Command output:<br>Jan 18 08:34:53 NODE-1 lrmd: [2051]: info: RA output: (drbd_fsroot:0:promote:stdout)<br>Jan 18 08:34:53 NODE-1 drbd[24286]: CRIT: Refusing to be promoted to Primary without UpToDate data<br>Jan 18 08:34:53 NODE-1 lrmd: [2051]: WARN: Managed drbd_fsroot:0:promote process 24286 exited with return code 1.<br>Jan 18 08:34:53 NODE-1 crmd: [2054]: info: process_lrm_event: LRM operation drbd_fsroot:0_promote_0 (call=299768, rc=1, cib-update=209843, confirmed=true) unknown error<br>Jan 18 08:34:53 NODE-1 crmd: [2054]: WARN: status_from_rc: Action 7 (drbd_fsroot:0_promote_0) on NODE-1 failed (target: 0 vs. rc: 1): Error<br>Jan 18 08:34:53 NODE-1 crmd: [2054]: WARN: update_failcount: Updating failcount for drbd_fsroot:0 on NODE-1 after failed promote: rc=1 (update=value++, time=1326893693)<br>Jan 18 08:34:53 NODE-1 attrd: [2052]: info: attrd_local_callback: Expanded fail-count-drbd_fsroot:0=value++ to 29977<br>Jan 18 08:34:53 NODE-1 attrd: [2052]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-drbd_fsroot:0 (29977)<br>Jan 18 08:34:53 NODE-1 crmd: [2054]: info: abort_transition_graph: match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=drbd_fsroot:0_last_failure_0, magic=0:1;7:89911:0:aac20e27-939f-439c-b461-e668262718b3, cib=0.6.263577) : Event failed<BR> <BR>It seems to me that not promoting/fencing is worse alternative in case the other node is really shutdown and no stonith is configured to be used.<BR><br>As a workaround changeing the next line in /usr/lib/drbd/crm-fence-peer.sh solves this<br>... <br>try_place_constraint()<BR>...<br>- unreachable/Consistent/outdated)<br>+ unreachable/Consistent/outdated|\<br>+ unreachable/Consistent/unknown)<BR><br>What say you?<BR> <BR>I use <br>Linux 2.6.32-220.2.1.el6.i686 #1 SMP Thu Dec 22 18:50:52 GMT 2011 i686 i686 i386 GNU/Linux<BR> <BR>kmod-drbd83-8.3.8-1.el6.i686<br>drbd83-8.3.8-1.el6.i686<BR> <BR>corosync-1.4.1-4.el6.i686<br>corosynclib-1.4.1-4.el6.i686<br>pacemaker-1.1.6-3.el6.i686<br>pacemaker-libs-1.1.6-3.el6.i686<br>pacemaker-cluster-libs-1.1.6-3.el6.i686<br>pacemaker-cli-1.1.6-3.el6.i686<br>cman-3.0.12.1-23.el6.i686<BR> <BR>Best,<BR>Oren<BR> <BR>                                            </div></body>
</html>