<div>Hi Colin,</div><div><br></div>Inline reply below:<br><br><div class="gmail_quote">On Fri, Oct 15, 2010 at 2:01 PM, Colin Simpson <span dir="ltr"><<a href="mailto:Colin.Simpson@iongeo.com">Colin.Simpson@iongeo.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi<br>
<br>
I have a working test cluster RH Cluster Suite with various GFS2 file<br>
systems on top of a DRBD Primary/Primary device.<br>
<br>
I have the recommended GFS setup in drbd.conf i.e<br>
<br>
allow-two-primaries;<br>
after-sb-0pri discard-zero-changes;<br>
after-sb-1pri discard-secondary;<br>
after-sb-2pri disconnect;<br>
<br>
Now I have been trying to think of the danger scenarios that might arise<br>
with my setup.<br>
<br>
So I have a few questions (maybe quite a few):<br>
<br>
1/ When one node is brought back up after being down it starts to sync<br>
up to the "newer" copy (I'm hoping).<br>
<br>
I presume GFS shouldn't be mounted at this point on the just brought up<br>
node (as data will not be consistent between the two GFS mounts and the<br>
block device will be changing underneath it)?<br></blockquote><div><br></div><div>The drbd service should start before the clvmd service. The syncing node will sync and/or be immediately ready for use when clvmd comes up. I do this to assure this is the case:</div>
<div><br></div><div><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><pre style="word-wrap: break-word; white-space: pre-wrap; ">/usr/bin/patch <<EOF
--- clvmd.orig 2010-09-13 17:15:17.000000000 -0500
+++ clvmd 2010-09-13 17:36:46.000000000 -0500
@@ -7,6 +7,8 @@
#
### BEGIN INIT INFO
# Provides: clvmd
+# Required-Start: drbd
+# Required-Stop: drbd
# Short-Description: Clustered LVM Daemon
### END INIT INFO
EOF
/usr/bin/patch <<EOF
--- drbd.orig 2010-09-13 17:15:17.000000000 -0500
+++ drbd 2010-09-13 17:39:46.000000000 -0500
@@ -15,8 +15,8 @@
# Should-Stop: sshd multipathd
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
-# X-Start-Before: heartbeat corosync
-# X-Stop-After: heartbeat corosync
+# X-Start-Before: heartbeat corosync clvmd
+# X-Stop-After: heartbeat corosync clvmd
# Short-Description: Control drbd resources.
### END INIT INFO
EOF
cd -
# setup proper order and make sure it sticks
for X in drbd clvmd ; do
/sbin/chkconfig $X resetpriorities
done</pre><pre style="word-wrap: break-word; white-space: pre-wrap; "><br></pre></span></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
I mean, does it or is there any way of running drbd so it ignores the<br>
out of date primary's data (on the node just brought up) and passes all<br>
the requests through to the "good" primary (until it is sync'd)?<br></blockquote><div><br></div><div>That's what it does from my observation.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Should I have my own start up script to only start cman and clvmd when I<br>
finally see<br>
<br>
1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate<br>
<br>
and not<br>
<br>
1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate<br>
<br>
, what is recommended (or what do people do)? Or is there some way of<br>
achieving this already?<br></blockquote><div><br></div><div>Nah. Just make sure drbd starts before clvmd.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Just starting up cman still seems to try to start services that then<br>
fail out even before clvmd is running (including services that are<br>
children of FS's in the cluster.conf file):<br>
<br>
<clusterfs fstype="gfs" ref="datahome"><br>
<nfsexport ref="tcluexports"><br>
<nfsclient name=" " ref="NFSdatahomeclnt"/><br>
</nfsexport><br>
</clusterfs><br>
<br>
So I'm presuming I need to delay starting cman and clvmd and not just<br>
clvmd?<br></blockquote><div><br></div><div>clvmd should be dependent on drbd, that is all.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
I'd like automatic cluster recovery.<br>
<br>
2/ Is discard-older-primary not better in a Primary/Primary? Or is it<br>
inappropriate in dual Primary?<br></blockquote><div><br></div><div>With the split-brain settings you mentioned further up, you have automatic recovery for the safe cases. Depending on your data, "discard-least-changes" may be a policy you can look at. For the non-safe cases, I prefer human intervention personally.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
3/ Is there any merit in stopping one node first always so you know for<br>
start up which one has the most up to date data (say if their is a start<br>
up PSU failure)? Will a shutdown DRBD node with a stopped GFS and drbd<br>
still have a consistent (though out of date file system)?<br></blockquote><div><br></div><div>DRBD metadata tracks which one is most up-to-date.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
4/ I was thinking the bad (hopefully unlikely) scenario where you bring<br>
up an out of date node A (older than B's data), it maybe hopefully comes<br>
up clean (if the above question allows). It starts working, some time<br>
later you bring up node B which originally had a later set of data<br>
before A and B went down originally.<br></blockquote><div><br></div><div>That should be prevented by something like:</div><div><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><pre style="word-wrap: break-word; white-space: pre-wrap; ">
startup {
wfc-timeout 0 ; # Wait forever for initial connection
degr-wfc-timeout 60; # Wait only 60 seconds if this node was a degraded cluster
}</pre><pre style="word-wrap: break-word; white-space: pre-wrap; "><font class="Apple-style-span" face="arial"><span class="Apple-style-span" style="white-space: normal; font-size: small;">"A" would wait indefinitely for "B" to start. Only if you manually goto the console and type "yes" to abort the wfc-timeout will "A" come up inconsistent.</span></font></pre>
</span></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Based on the recommended config. Will B now take all A's data ?<br></blockquote><div><br></div><div>Nope. You have to manually resolve.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
4/ Is it good practice (or even possible) to use the same private<br>
interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd uses? RHCS<br>
seems to make this hard, use an internal interface for cluster comms and<br>
have the services presented on a different interface.<br></blockquote><div><br></div><div>That's a performance issue and depends on how fast your interconnect is. If your backing-storage can saturate the link DRBD is over, you'll want to run the totem protocol over a different interconnect. If you're using something like InfiniBand or 10Gbe it likely will not be a problem unless you have some wicked-fast solid-state backing storage.</div>
<div> </div><div>Cheers,</div><div>-JR</div></div><br>