[DRBD-user] Best Practice with DRBD RHCS and GFS2?
J. Ryan Earl
oss at jryanearl.us
Mon Oct 18 18:29:32 CEST 2010
Inline reply below:
On Fri, Oct 15, 2010 at 2:01 PM, Colin Simpson <Colin.Simpson at iongeo.com>wrote:
> I have a working test cluster RH Cluster Suite with various GFS2 file
> systems on top of a DRBD Primary/Primary device.
> I have the recommended GFS setup in drbd.conf i.e
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> Now I have been trying to think of the danger scenarios that might arise
> with my setup.
> So I have a few questions (maybe quite a few):
> 1/ When one node is brought back up after being down it starts to sync
> up to the "newer" copy (I'm hoping).
> I presume GFS shouldn't be mounted at this point on the just brought up
> node (as data will not be consistent between the two GFS mounts and the
> block device will be changing underneath it)?
The drbd service should start before the clvmd service. The syncing node
will sync and/or be immediately ready for use when clvmd comes up. I do
this to assure this is the case:
--- clvmd.orig 2010-09-13 17:15:17.000000000 -0500
+++ clvmd 2010-09-13 17:36:46.000000000 -0500
@@ -7,6 +7,8 @@
### BEGIN INIT INFO
# Provides: clvmd
+# Required-Start: drbd
+# Required-Stop: drbd
# Short-Description: Clustered LVM Daemon
### END INIT INFO
--- drbd.orig 2010-09-13 17:15:17.000000000 -0500
+++ drbd 2010-09-13 17:39:46.000000000 -0500
@@ -15,8 +15,8 @@
# Should-Stop: sshd multipathd
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
-# X-Start-Before: heartbeat corosync
-# X-Stop-After: heartbeat corosync
+# X-Start-Before: heartbeat corosync clvmd
+# X-Stop-After: heartbeat corosync clvmd
# Short-Description: Control drbd resources.
### END INIT INFO
# setup proper order and make sure it sticks
for X in drbd clvmd ; do
/sbin/chkconfig $X resetpriorities
I mean, does it or is there any way of running drbd so it ignores the
> out of date primary's data (on the node just brought up) and passes all
> the requests through to the "good" primary (until it is sync'd)?
That's what it does from my observation.
> Should I have my own start up script to only start cman and clvmd when I
> finally see
> 1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate
> and not
> 1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate
> , what is recommended (or what do people do)? Or is there some way of
> achieving this already?
Nah. Just make sure drbd starts before clvmd.
> Just starting up cman still seems to try to start services that then
> fail out even before clvmd is running (including services that are
> children of FS's in the cluster.conf file):
> <clusterfs fstype="gfs" ref="datahome">
> <nfsexport ref="tcluexports">
> <nfsclient name=" " ref="NFSdatahomeclnt"/>
> So I'm presuming I need to delay starting cman and clvmd and not just
clvmd should be dependent on drbd, that is all.
> I'd like automatic cluster recovery.
> 2/ Is discard-older-primary not better in a Primary/Primary? Or is it
> inappropriate in dual Primary?
With the split-brain settings you mentioned further up, you have automatic
recovery for the safe cases. Depending on your data,
"discard-least-changes" may be a policy you can look at. For the non-safe
cases, I prefer human intervention personally.
> 3/ Is there any merit in stopping one node first always so you know for
> start up which one has the most up to date data (say if their is a start
> up PSU failure)? Will a shutdown DRBD node with a stopped GFS and drbd
> still have a consistent (though out of date file system)?
DRBD metadata tracks which one is most up-to-date.
> 4/ I was thinking the bad (hopefully unlikely) scenario where you bring
> up an out of date node A (older than B's data), it maybe hopefully comes
> up clean (if the above question allows). It starts working, some time
> later you bring up node B which originally had a later set of data
> before A and B went down originally.
That should be prevented by something like:
wfc-timeout 0 ; # Wait forever for initial connection
degr-wfc-timeout 60; # Wait only 60 seconds if this node was a
"A" would wait indefinitely for "B" to start. Only if you manually
goto the console and type "yes" to abort the wfc-timeout will "A" come
> Based on the recommended config. Will B now take all A's data ?
Nope. You have to manually resolve.
> 4/ Is it good practice (or even possible) to use the same private
> interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd uses? RHCS
> seems to make this hard, use an internal interface for cluster comms and
> have the services presented on a different interface.
That's a performance issue and depends on how fast your interconnect is. If
your backing-storage can saturate the link DRBD is over, you'll want to run
the totem protocol over a different interconnect. If you're using something
like InfiniBand or 10Gbe it likely will not be a problem unless you have
some wicked-fast solid-state backing storage.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the drbd-user