Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Colin,
Inline reply below:
On Fri, Oct 15, 2010 at 2:01 PM, Colin Simpson <Colin.Simpson at iongeo.com>wrote:
> Hi
>
> I have a working test cluster RH Cluster Suite with various GFS2 file
> systems on top of a DRBD Primary/Primary device.
>
> I have the recommended GFS setup in drbd.conf i.e
>
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
>
> Now I have been trying to think of the danger scenarios that might arise
> with my setup.
>
> So I have a few questions (maybe quite a few):
>
> 1/ When one node is brought back up after being down it starts to sync
> up to the "newer" copy (I'm hoping).
>
> I presume GFS shouldn't be mounted at this point on the just brought up
> node (as data will not be consistent between the two GFS mounts and the
> block device will be changing underneath it)?
>
The drbd service should start before the clvmd service. The syncing node
will sync and/or be immediately ready for use when clvmd comes up. I do
this to assure this is the case:
/usr/bin/patch <<EOF
--- clvmd.orig 2010-09-13 17:15:17.000000000 -0500
+++ clvmd 2010-09-13 17:36:46.000000000 -0500
@@ -7,6 +7,8 @@
#
### BEGIN INIT INFO
# Provides: clvmd
+# Required-Start: drbd
+# Required-Stop: drbd
# Short-Description: Clustered LVM Daemon
### END INIT INFO
EOF
/usr/bin/patch <<EOF
--- drbd.orig 2010-09-13 17:15:17.000000000 -0500
+++ drbd 2010-09-13 17:39:46.000000000 -0500
@@ -15,8 +15,8 @@
# Should-Stop: sshd multipathd
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
-# X-Start-Before: heartbeat corosync
-# X-Stop-After: heartbeat corosync
+# X-Start-Before: heartbeat corosync clvmd
+# X-Stop-After: heartbeat corosync clvmd
# Short-Description: Control drbd resources.
### END INIT INFO
EOF
cd -
# setup proper order and make sure it sticks
for X in drbd clvmd ; do
/sbin/chkconfig $X resetpriorities
done
I mean, does it or is there any way of running drbd so it ignores the
> out of date primary's data (on the node just brought up) and passes all
> the requests through to the "good" primary (until it is sync'd)?
>
That's what it does from my observation.
>
> Should I have my own start up script to only start cman and clvmd when I
> finally see
>
> 1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate
>
> and not
>
> 1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate
>
> , what is recommended (or what do people do)? Or is there some way of
> achieving this already?
>
Nah. Just make sure drbd starts before clvmd.
>
> Just starting up cman still seems to try to start services that then
> fail out even before clvmd is running (including services that are
> children of FS's in the cluster.conf file):
>
> <clusterfs fstype="gfs" ref="datahome">
> <nfsexport ref="tcluexports">
> <nfsclient name=" " ref="NFSdatahomeclnt"/>
> </nfsexport>
> </clusterfs>
>
> So I'm presuming I need to delay starting cman and clvmd and not just
> clvmd?
>
clvmd should be dependent on drbd, that is all.
>
> I'd like automatic cluster recovery.
>
> 2/ Is discard-older-primary not better in a Primary/Primary? Or is it
> inappropriate in dual Primary?
>
With the split-brain settings you mentioned further up, you have automatic
recovery for the safe cases. Depending on your data,
"discard-least-changes" may be a policy you can look at. For the non-safe
cases, I prefer human intervention personally.
>
> 3/ Is there any merit in stopping one node first always so you know for
> start up which one has the most up to date data (say if their is a start
> up PSU failure)? Will a shutdown DRBD node with a stopped GFS and drbd
> still have a consistent (though out of date file system)?
>
DRBD metadata tracks which one is most up-to-date.
>
> 4/ I was thinking the bad (hopefully unlikely) scenario where you bring
> up an out of date node A (older than B's data), it maybe hopefully comes
> up clean (if the above question allows). It starts working, some time
> later you bring up node B which originally had a later set of data
> before A and B went down originally.
>
That should be prevented by something like:
startup {
wfc-timeout 0 ; # Wait forever for initial connection
degr-wfc-timeout 60; # Wait only 60 seconds if this node was a
degraded cluster
}
"A" would wait indefinitely for "B" to start. Only if you manually
goto the console and type "yes" to abort the wfc-timeout will "A" come
up inconsistent.
> Based on the recommended config. Will B now take all A's data ?
>
Nope. You have to manually resolve.
>
> 4/ Is it good practice (or even possible) to use the same private
> interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd uses? RHCS
> seems to make this hard, use an internal interface for cluster comms and
> have the services presented on a different interface.
>
That's a performance issue and depends on how fast your interconnect is. If
your backing-storage can saturate the link DRBD is over, you'll want to run
the totem protocol over a different interconnect. If you're using something
like InfiniBand or 10Gbe it likely will not be a problem unless you have
some wicked-fast solid-state backing storage.
Cheers,
-JR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20101018/e613142d/attachment.htm>