[DRBD-user] Best Practice with DRBD RHCS and GFS2?

Mon Oct 18 18:29:32 CEST 2010

Hi Colin,

Inline reply below:

On Fri, Oct 15, 2010 at 2:01 PM, Colin Simpson <Colin.Simpson at iongeo.com>wrote:

> Hi
>
> I have a working test cluster RH Cluster Suite with various GFS2 file
> systems on top of a DRBD Primary/Primary device.
>
> I have the recommended GFS setup in drbd.conf i.e
>
> allow-two-primaries;
>   after-sb-0pri discard-zero-changes;
>   after-sb-1pri discard-secondary;
>   after-sb-2pri disconnect;
>
> Now I have been trying to think of the danger scenarios that might arise
> with my setup.
>
> So I have a few questions (maybe quite a few):
>
> 1/ When one node is brought back up after being down it starts to sync
> up to the "newer" copy (I'm hoping).
>
> I presume GFS shouldn't be mounted at this point on the just brought up
> node (as data will not be consistent between the two GFS mounts and the
> block device will be changing underneath it)?
>

The drbd service should start before the clvmd service.  The syncing node
will sync and/or be immediately ready for use when clvmd comes up.  I do
this to assure this is the case:

/usr/bin/patch <<EOF

--- clvmd.orig  2010-09-13 17:15:17.000000000 -0500
+++ clvmd       2010-09-13 17:36:46.000000000 -0500
@@ -7,6 +7,8 @@
 #
 ### BEGIN INIT INFO
 # Provides: clvmd
+# Required-Start: drbd
+# Required-Stop: drbd
 # Short-Description: Clustered LVM Daemon
 ### END INIT INFO
EOF

/usr/bin/patch <<EOF
--- drbd.orig   2010-09-13 17:15:17.000000000 -0500
+++ drbd        2010-09-13 17:39:46.000000000 -0500
@@ -15,8 +15,8 @@
 # Should-Stop:    sshd multipathd
 # Default-Start:  2 3 4 5
 # Default-Stop:   0 1 6
-# X-Start-Before: heartbeat corosync
-# X-Stop-After:   heartbeat corosync
+# X-Start-Before: heartbeat corosync clvmd
+# X-Stop-After:   heartbeat corosync clvmd
 # Short-Description:    Control drbd resources.
 ### END INIT INFO
EOF
cd -

# setup proper order and make sure it sticks
for X in drbd clvmd ; do
  /sbin/chkconfig $X resetpriorities
done


I mean, does it or is there any way of running drbd so it ignores the
> out of date primary's data (on the node just brought up) and passes all
> the requests through to the "good" primary (until it is sync'd)?
>

That's what it does from my observation.


>
> Should I have my own start up script to only start cman and clvmd when I
> finally see
>
>  1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate
>
> and not
>
>  1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate
>
> , what is recommended (or what do people do)? Or is there some way of
> achieving this already?
>

Nah.  Just make sure drbd starts before clvmd.


>
> Just starting up cman still seems to try to start services that then
> fail out even before clvmd is running (including services that are
> children of FS's in the cluster.conf file):
>
> <clusterfs fstype="gfs" ref="datahome">
>        <nfsexport ref="tcluexports">
>                <nfsclient name=" " ref="NFSdatahomeclnt"/>
>        </nfsexport>
> </clusterfs>
>
> So I'm presuming I need to delay starting cman and clvmd and not just
> clvmd?
>

clvmd should be dependent on drbd, that is all.


>
> I'd like automatic cluster recovery.
>
> 2/ Is discard-older-primary not better in a Primary/Primary? Or is it
> inappropriate in dual Primary?
>

With the split-brain settings you mentioned further up, you have automatic
recovery for the safe cases.  Depending on your data,
"discard-least-changes" may be a policy you can look at.  For the non-safe
cases, I prefer human intervention personally.


>
> 3/ Is there any merit in stopping one node first always so you know for
> start up which one has the most up to date data (say if their is a start
> up PSU failure)? Will a shutdown DRBD node with a stopped GFS and drbd
> still have a consistent (though out of date file system)?
>

DRBD metadata tracks which one is most up-to-date.


>
> 4/ I was thinking the bad (hopefully unlikely) scenario where you bring
> up an out of date node A (older than B's data), it maybe hopefully comes
> up clean (if the above question allows). It starts working, some time
> later you bring up node B which originally had a later set of data
> before A and B went down originally.
>

That should be prevented by something like:

  startup {
    wfc-timeout 0 ;       # Wait forever for initial connection
    degr-wfc-timeout 60;  # Wait only 60 seconds if this node was a
degraded cluster
  }

"A" would wait indefinitely for "B" to start.  Only if you manually
goto the console and type "yes" to abort the wfc-timeout will "A" come
up inconsistent.


> Based on the recommended config. Will B now take all A's data ?
>

Nope.  You have to manually resolve.


>
> 4/ Is it good practice (or even possible) to use the same private
> interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd uses? RHCS
> seems to make this hard, use an internal interface for cluster comms and
> have the services presented on a different interface.
>

That's a performance issue and depends on how fast your interconnect is.  If
your backing-storage can saturate the link DRBD is over, you'll want to run
the totem protocol over a different interconnect.  If you're using something
like InfiniBand or 10Gbe it likely will not be a problem unless you have
some wicked-fast solid-state backing storage.

Cheers,
-JR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20101018/e613142d/attachment.htm>