<div>Hi Colin,</div><div><br></div>Inline reply below:<br><br><div class="gmail_quote">On Fri, Oct 15, 2010 at 2:01 PM, Colin Simpson <span dir="ltr">&lt;<a href="mailto:Colin.Simpson@iongeo.com">Colin.Simpson@iongeo.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi<br>

<br>

I have a working test cluster RH Cluster Suite with various GFS2 file<br>

systems on top of a DRBD Primary/Primary device.<br>

<br>

I have the recommended GFS setup in drbd.conf i.e<br>

<br>

allow-two-primaries;<br>

   after-sb-0pri discard-zero-changes;<br>

   after-sb-1pri discard-secondary;<br>

   after-sb-2pri disconnect;<br>

<br>

Now I have been trying to think of the danger scenarios that might arise<br>

with my setup.<br>

<br>

So I have a few questions (maybe quite a few):<br>

<br>

1/ When one node is brought back up after being down it starts to sync<br>

up to the &quot;newer&quot; copy (I&#39;m hoping).<br>

<br>

I presume GFS shouldn&#39;t be mounted at this point on the just brought up<br>

node (as data will not be consistent between the two GFS mounts and the<br>

block device will be changing underneath it)?<br></blockquote><div><br></div><div>The drbd service should start before the clvmd service.  The syncing node will sync and/or be immediately ready for use when clvmd comes up.  I do this to assure this is the case:</div>

<div><br></div><div><span class="Apple-style-span" style="font-family: &#39;Times New Roman&#39;; font-size: medium; "><pre style="word-wrap: break-word; white-space: pre-wrap; ">/usr/bin/patch &lt;&lt;EOF

--- clvmd.orig  2010-09-13 17:15:17.000000000 -0500

+++ clvmd       2010-09-13 17:36:46.000000000 -0500

@@ -7,6 +7,8 @@

 #

 ### BEGIN INIT INFO

 # Provides: clvmd

+# Required-Start: drbd

+# Required-Stop: drbd

 # Short-Description: Clustered LVM Daemon

 ### END INIT INFO

EOF

/usr/bin/patch &lt;&lt;EOF

--- drbd.orig   2010-09-13 17:15:17.000000000 -0500

+++ drbd        2010-09-13 17:39:46.000000000 -0500

@@ -15,8 +15,8 @@

 # Should-Stop:    sshd multipathd

 # Default-Start:  2 3 4 5

 # Default-Stop:   0 1 6

-# X-Start-Before: heartbeat corosync

-# X-Stop-After:   heartbeat corosync

+# X-Start-Before: heartbeat corosync clvmd

+# X-Stop-After:   heartbeat corosync clvmd

 # Short-Description:    Control drbd resources.

 ### END INIT INFO

EOF

cd -

# setup proper order and make sure it sticks

for X in drbd clvmd ; do 

  /sbin/chkconfig $X resetpriorities

done</pre><pre style="word-wrap: break-word; white-space: pre-wrap; "><br></pre></span></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

I mean, does it or is there any way of running drbd so it ignores the<br>

out of date primary&#39;s data (on the node just brought up) and passes all<br>

the requests through to the &quot;good&quot; primary (until it is sync&#39;d)?<br></blockquote><div><br></div><div>That&#39;s what it does from my observation.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

Should I have my own start up script to only start cman and clvmd when I<br>

finally see<br>

<br>

 1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate<br>

<br>

and not<br>

<br>

 1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate<br>

<br>

, what is recommended (or what do people do)? Or is there some way of<br>

achieving this already?<br></blockquote><div><br></div><div>Nah.  Just make sure drbd starts before clvmd.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

Just starting up cman still seems to try to start services that then<br>

fail out even before clvmd is running (including services that are<br>

children of FS&#39;s in the cluster.conf file):<br>

<br>

&lt;clusterfs fstype=&quot;gfs&quot; ref=&quot;datahome&quot;&gt;<br>

        &lt;nfsexport ref=&quot;tcluexports&quot;&gt;<br>

                &lt;nfsclient name=&quot; &quot; ref=&quot;NFSdatahomeclnt&quot;/&gt;<br>

        &lt;/nfsexport&gt;<br>

&lt;/clusterfs&gt;<br>

<br>

So I&#39;m presuming I need to delay starting cman and clvmd and not just<br>

clvmd?<br></blockquote><div><br></div><div>clvmd should be dependent on drbd, that is all.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

I&#39;d like automatic cluster recovery.<br>

<br>

2/ Is discard-older-primary not better in a Primary/Primary? Or is it<br>

inappropriate in dual Primary?<br></blockquote><div><br></div><div>With the split-brain settings you mentioned further up, you have automatic recovery for the safe cases.  Depending on your data, &quot;discard-least-changes&quot; may be a policy you can look at.  For the non-safe cases, I prefer human intervention personally.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

3/ Is there any merit in stopping one node first always so you know for<br>

start up which one has the most up to date data (say if their is a start<br>

up PSU failure)? Will a shutdown DRBD node with a stopped GFS and drbd<br>

still have a consistent (though out of date file system)?<br></blockquote><div><br></div><div>DRBD metadata tracks which one is most up-to-date.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

4/ I was thinking the bad (hopefully unlikely) scenario where you bring<br>

up an out of date node A (older than B&#39;s data), it maybe hopefully comes<br>

up clean (if the above question allows). It starts working, some time<br>

later you bring up node B which originally had a later set of data<br>

before A and B went down originally.<br></blockquote><div><br></div><div>That should be prevented by something like:</div><div><span class="Apple-style-span" style="font-family: &#39;Times New Roman&#39;; font-size: medium; "><pre style="word-wrap: break-word; white-space: pre-wrap; ">

  startup {

    wfc-timeout 0 ;       # Wait forever for initial connection

    degr-wfc-timeout 60;  # Wait only 60 seconds if this node was a degraded cluster

  }</pre><pre style="word-wrap: break-word; white-space: pre-wrap; "><font class="Apple-style-span" face="arial"><span class="Apple-style-span" style="white-space: normal; font-size: small;">&quot;A&quot; would wait indefinitely for &quot;B&quot; to start.  Only if you manually goto the console and type &quot;yes&quot; to abort the wfc-timeout will &quot;A&quot; come up inconsistent.</span></font></pre>

</span></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

Based on the recommended config. Will B now take all A&#39;s data ?<br></blockquote><div><br></div><div>Nope.  You have to manually resolve.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

4/ Is it good practice (or even possible) to use the same private<br>

interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd uses? RHCS<br>

seems to make this hard, use an internal interface for cluster comms and<br>

have the services presented on a different interface.<br></blockquote><div><br></div><div>That&#39;s a performance issue and depends on how fast your interconnect is.  If your backing-storage can saturate the link DRBD is over, you&#39;ll want to run the totem protocol over a different interconnect.  If you&#39;re using something like InfiniBand or 10Gbe it likely will not be a problem unless you have some wicked-fast solid-state backing storage.</div>

<div> </div><div>Cheers,</div><div>-JR</div></div><br>