<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>It seems to me that adding a configuration timout indicating how long to wait before allowing promoting is required, possibly indefinitely by def.<br>I understand why you might want to wait for either primary up again or manual recovery.<br>However, in active stand two node setup with the system req. to be up ALL the time there is another approach.<br>Promote old secondary after a timeout.<br>If old primary was down for long time - we are up quickly&nbsp;&nbsp;and old primary should sync - fine.<br>If old primary was down shortly but beyond timeout, SB handlers should recover, possibly with manual recovery.<br>Acceptable since we couldnt wait forever<br><br>What say you?<br>Oren<br><br>> Date&#58; Thu, 19 Jan 2012 23&#58;15&#58;00 &#43;0100<br>> From&#58; lars.ellenberg&#64;linbit.com<br>> To&#58; drbd-user&#64;lists.linbit.com<br>> Subject&#58; Re&#58; &#91;DRBD-user&#93; Promote fails in state &#61; &#123; cs&#58;WFConnection ro&#58;Secondary/Unknown ds&#58;Consistent/DUnknown r--- &#125;<br>> <br>> On Thu, Jan 19, 2012 at 11&#58;52&#58;03AM &#43;0000, Oren Nechushtan wrote&#58;<br>> &#62; <br>> &#62; <br>> &#62; <br>> &#62; <br>> &#62; Hi everyone,<br>> &#62; First, I would like to express my pleasure using DRBD&#33;<br>> &#62; Here is my situation&#58;<br>> &#62;  <br>> &#62; Two-node setup, using cman and pacemaker, don&#39;t care about quorum, no stonithMaster-Slave DRBD resource<br>> &#62; Fence resource only<br>> &#62; I noticed that under certain settings &#40;powering on/off nodes enough times&#41; the secondary node may never becomes promoted when primary is shutdown. <br>> <br>> I &#42;think&#42; that is intentional, and preventing potential data divergence,<br>> in the following scenario&#58;<br>> <br>>  &#42; all good, Primary --- connected --- Secondary<br>>  &#42; Kill Secondary, Primary continues.<br>>  &#42; Powerdown Primary.<br>>  &#42; Bring up Secondary only.<br>> <br>> What use is fencing, if a fencing loop would cause data divergence anyways.<br>> <br>> &#62; Here is a sample log &#40;attached&#41;<br>> &#62;  <br>> &#62; Jan 18 08&#58;34&#58;52 NODE-1 crmd&#58; &#91;2054&#93;&#58; info&#58; do_lrm_rsc_op&#58; Performing key&#61;7&#58;89911&#58;0&#58;aac20e27-939f-439c-b461-e668262718b3 op&#61;drbd_fsroot&#58;0_promote_0 &#41;<br>> &#62; Jan 18 08&#58;34&#58;52 NODE-1 lrmd&#58; &#91;2051&#93;&#58; info&#58; rsc&#58;drbd_fsroot&#58;0&#58;299768&#58; promote<br>> &#62; Jan 18 08&#58;34&#58;52 NODE-1 kernel&#58; block drbd0&#58; helper command&#58; /sbin/drbdadm fence-peer minor-0<br>> &#62; Jan 18 08&#58;34&#58;52 NODE-1 corosync&#91;1759&#93;&#58;   &#91;TOTEM &#93; Automatically recovered ring 1<br>> &#62; Jan 18 08&#58;34&#58;53 NODE-1 crm-fence-peer.sh&#91;24325&#93;&#58; invoked for fsroot<br>> &#62; Jan 18 08&#58;34&#58;53 NODE-1 corosync&#91;1759&#93;&#58;   &#91;TOTEM &#93; Automatically recovered ring 1<br>> <br>> &#62; Jan 18 08&#58;34&#58;53 NODE-1 crm-fence-peer.sh&#91;24325&#93;&#58; WARNING peer is unreachable, my disk is Consistent&#58; did not place the constraint&#33;<br>> <br>> This is it.<br>> <br>> -- <br>> &#58; Lars Ellenberg<br>> &#58; LINBIT &#124; Your Way to High Availability<br>> &#58; DRBD/HA support and consulting http&#58;//www.linbit.com<br>> _______________________________________________<br>> drbd-user mailing list<br>> drbd-user&#64;lists.linbit.com<br>> http&#58;//lists.linbit.com/mailman/listinfo/drbd-user<br>                                               </div></body>
</html>