[DRBD-user] Corosync and DRBD fencing: one or both?

Jake Smith jsmith at argotec.com
Tue Aug 30 18:32:57 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

----- Original Message -----
> From: "Digimer" <linux at alteeve.com>
> To: "William Seligman" <seligman at nevis.columbia.edu>
> Cc: drbd-user at lists.linbit.com
> Sent: Tuesday, August 30, 2011 12:24:16 PM
> Subject: Re: [DRBD-user] Corosync and DRBD fencing: one or both?
> On 08/30/2011 11:25 AM, William Seligman wrote:
> > On 8/29/11 4:42 PM, Digimer wrote:
> >> On 08/29/2011 03:36 PM, William Seligman wrote:
> >>> A general question: I have a Corosync+Pacemaker with DRBD setup
> >>> on Linux; I'll
> >>> give the details if it's relevant. Corosync+Pacemaker controls
> >>> DRBD start, stop,
> >>> and promotion. I've implemented fencing via STONITH as Corosync
> >>> resources.
> >>>
> >>> I have not put fencing in the drbd.conf file; I was under the
> >>> impression that
> >>> Corosync+Pacemaker would take of STONITHing a node if there's a
> >>> DRBD problem. Is
> >>> this correct? Or should I have fencing/STONITH configured in both
> >>> Corosync and
> >>> drbd.conf?
> >>>
> >>> Does the answer change between a primary/secondary versus
> >>> dual-primary setup?
> >>
> >> You still want to configure fencing, but you can use the
> >> 'crm-fence-peer.sh' handler. Using this with
> >> 'resource-and-stonith' will
> >> tell DRBD to block I/O until the fence succeeds, preventing it
> >> from
> >> going dual-primary (even if just for the brief moment between
> >> fault and
> >> fence).
> > 
> > I may be dense, but I find the answer ambiguous; perhaps I didn't
> > ask the
> > question the right way.
> > 
> > Let me ask in a differen way: If I have fencing set up in corosync,
> > and corosync
> > controls drbd, do I also need fencing in drbd.conf?
> Yes you do.
> There is the potential for a period of time between the fault and
> it's
> detection by Pacemaker. During this time, if DRBD is not
> appropriately
> configured, both sides could go StandAlone/Primary. Once that
> happens,
> you've got a split-brain.
> The 'crm-fence-peer.sh' is used in drbd.conf to let DRBD block IO and
> call a fence via pacemaker. This will result in two fence calls,
> which
> is obviously overkill, but that isn't what we're after. The
> corresponding "resource-and-stonith" argument is what matters. That
> is
> what will block IO at the DRBD level until the fence call succeeds.

My 2 cents:

If you have more than just DRBD under Pacemaker you *may* not want to fence a node just because the DRBD connection has failed if other services are still properly functioning... *but* you would still want to prevent DRBD on each node from thinking all was well and going their own separate ways.

So I personally let Pacemaker handle STONITH - if comms between nodes fails then STONITH is necessary.  However at the DRBD level I use crm-fence-peer for resource only.  This way if replication/comms within DRBD breaks one node is fenced preventing it from promoting DRBD resource but the node is not STONITH'd.

I personally don't want all my services/different DRBD resources migrating just because something may have gone haywire with a single DRBD resource as opposed to the whole node



More information about the drbd-user mailing list