[DRBD-user] Promote fails in state = { cs:WFConnection ro:Secondary/Unknown ds:Consistent/DUnknown r--- }

Fri Jan 20 07:55:21 CET 2012

It seems to me that adding a configuration timout indicating how long to wait before allowing promoting is required, possibly indefinitely by def.
I understand why you might want to wait for either primary up again or manual recovery.
However, in active stand two node setup with the system req. to be up ALL the time there is another approach.
Promote old secondary after a timeout.
If old primary was down for long time - we are up quickly  and old primary should sync - fine.
If old primary was down shortly but beyond timeout, SB handlers should recover, possibly with manual recovery.
Acceptable since we couldnt wait forever

What say you?
Oren

> Date: Thu, 19 Jan 2012 23:15:00 +0100
> From: lars.ellenberg at linbit.com
> To: drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] Promote fails in state = { cs:WFConnection ro:Secondary/Unknown ds:Consistent/DUnknown r--- }
> 
> On Thu, Jan 19, 2012 at 11:52:03AM +0000, Oren Nechushtan wrote:
> > 
> > 
> > 
> > 
> > Hi everyone,
> > First, I would like to express my pleasure using DRBD!
> > Here is my situation:
> >  
> > Two-node setup, using cman and pacemaker, don't care about quorum, no stonithMaster-Slave DRBD resource
> > Fence resource only
> > I noticed that under certain settings (powering on/off nodes enough times) the secondary node may never becomes promoted when primary is shutdown. 
> 
> I *think* that is intentional, and preventing potential data divergence,
> in the following scenario:
> 
>  * all good, Primary --- connected --- Secondary
>  * Kill Secondary, Primary continues.
>  * Powerdown Primary.
>  * Bring up Secondary only.
> 
> What use is fencing, if a fencing loop would cause data divergence anyways.
> 
> > Here is a sample log (attached)
> >  
> > Jan 18 08:34:52 NODE-1 crmd: [2054]: info: do_lrm_rsc_op: Performing key=7:89911:0:aac20e27-939f-439c-b461-e668262718b3 op=drbd_fsroot:0_promote_0 )
> > Jan 18 08:34:52 NODE-1 lrmd: [2051]: info: rsc:drbd_fsroot:0:299768: promote
> > Jan 18 08:34:52 NODE-1 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
> > Jan 18 08:34:52 NODE-1 corosync[1759]:   [TOTEM ] Automatically recovered ring 1
> > Jan 18 08:34:53 NODE-1 crm-fence-peer.sh[24325]: invoked for fsroot
> > Jan 18 08:34:53 NODE-1 corosync[1759]:   [TOTEM ] Automatically recovered ring 1
> 
> > Jan 18 08:34:53 NODE-1 crm-fence-peer.sh[24325]: WARNING peer is unreachable, my disk is Consistent: did not place the constraint!
> 
> This is it.
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120120/dcfde9fb/attachment.htm>