<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>It seems to me that adding a configuration timout indicating how long to wait before allowing promoting is required, possibly indefinitely by def.<br>I understand why you might want to wait for either primary up again or manual recovery.<br>However, in active stand two node setup with the system req. to be up ALL the time there is another approach.<br>Promote old secondary after a timeout.<br>If old primary was down for long time - we are up quickly and old primary should sync - fine.<br>If old primary was down shortly but beyond timeout, SB handlers should recover, possibly with manual recovery.<br>Acceptable since we couldnt wait forever<br><br>What say you?<br>Oren<br><br>> Date: Thu, 19 Jan 2012 23:15:00 +0100<br>> From: lars.ellenberg@linbit.com<br>> To: drbd-user@lists.linbit.com<br>> Subject: Re: [DRBD-user] Promote fails in state = { cs:WFConnection ro:Secondary/Unknown ds:Consistent/DUnknown r--- }<br>> <br>> On Thu, Jan 19, 2012 at 11:52:03AM +0000, Oren Nechushtan wrote:<br>> > <br>> > <br>> > <br>> > <br>> > Hi everyone,<br>> > First, I would like to express my pleasure using DRBD!<br>> > Here is my situation:<br>> > <br>> > Two-node setup, using cman and pacemaker, don't care about quorum, no stonithMaster-Slave DRBD resource<br>> > Fence resource only<br>> > I noticed that under certain settings (powering on/off nodes enough times) the secondary node may never becomes promoted when primary is shutdown. <br>> <br>> I *think* that is intentional, and preventing potential data divergence,<br>> in the following scenario:<br>> <br>> * all good, Primary --- connected --- Secondary<br>> * Kill Secondary, Primary continues.<br>> * Powerdown Primary.<br>> * Bring up Secondary only.<br>> <br>> What use is fencing, if a fencing loop would cause data divergence anyways.<br>> <br>> > Here is a sample log (attached)<br>> > <br>> > Jan 18 08:34:52 NODE-1 crmd: [2054]: info: do_lrm_rsc_op: Performing key=7:89911:0:aac20e27-939f-439c-b461-e668262718b3 op=drbd_fsroot:0_promote_0 )<br>> > Jan 18 08:34:52 NODE-1 lrmd: [2051]: info: rsc:drbd_fsroot:0:299768: promote<br>> > Jan 18 08:34:52 NODE-1 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0<br>> > Jan 18 08:34:52 NODE-1 corosync[1759]: [TOTEM ] Automatically recovered ring 1<br>> > Jan 18 08:34:53 NODE-1 crm-fence-peer.sh[24325]: invoked for fsroot<br>> > Jan 18 08:34:53 NODE-1 corosync[1759]: [TOTEM ] Automatically recovered ring 1<br>> <br>> > Jan 18 08:34:53 NODE-1 crm-fence-peer.sh[24325]: WARNING peer is unreachable, my disk is Consistent: did not place the constraint!<br>> <br>> This is it.<br>> <br>> -- <br>> : Lars Ellenberg<br>> : LINBIT | Your Way to High Availability<br>> : DRBD/HA support and consulting http://www.linbit.com<br>> _______________________________________________<br>> drbd-user mailing list<br>> drbd-user@lists.linbit.com<br>> http://lists.linbit.com/mailman/listinfo/drbd-user<br>                                            </div></body>
</html>