[DRBD-user] Best Practice with DRBD RHCS and GFS2?

Fri Oct 15 21:01:05 CEST 2010

Hi 

I have a working test cluster RH Cluster Suite with various GFS2 file
systems on top of a DRBD Primary/Primary device. 

I have the recommended GFS setup in drbd.conf i.e 

allow-two-primaries;
   after-sb-0pri discard-zero-changes;
   after-sb-1pri discard-secondary;
   after-sb-2pri disconnect;

Now I have been trying to think of the danger scenarios that might arise
with my setup.

So I have a few questions (maybe quite a few):

1/ When one node is brought back up after being down it starts to sync
up to the "newer" copy (I'm hoping). 

I presume GFS shouldn't be mounted at this point on the just brought up
node (as data will not be consistent between the two GFS mounts and the
block device will be changing underneath it)?  

It seems to have caused Oops's in GFS kernel modules when I have tried
before.

I mean, does it or is there any way of running drbd so it ignores the
out of date primary's data (on the node just brought up) and passes all
the requests through to the "good" primary (until it is sync'd)?

Should I have my own start up script to only start cman and clvmd when I
finally see

 1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate

and not

 1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate

, what is recommended (or what do people do)? Or is there some way of
achieving this already?

Just starting up cman still seems to try to start services that then
fail out even before clvmd is running (including services that are
children of FS's in the cluster.conf file):

<clusterfs fstype="gfs" ref="datahome">
	<nfsexport ref="tcluexports">
		<nfsclient name=" " ref="NFSdatahomeclnt"/>
	</nfsexport>
</clusterfs>

So I'm presuming I need to delay starting cman and clvmd and not just
clvmd? 

I'd like automatic cluster recovery.

2/ Is discard-older-primary not better in a Primary/Primary? Or is it
inappropriate in dual Primary?

3/ Is there any merit in stopping one node first always so you know for
start up which one has the most up to date data (say if their is a start
up PSU failure)? Will a shutdown DRBD node with a stopped GFS and drbd
still have a consistent (though out of date file system)?

4/ I was thinking the bad (hopefully unlikely) scenario where you bring
up an out of date node A (older than B's data), it maybe hopefully comes
up clean (if the above question allows). It starts working, some time
later you bring up node B which originally had a later set of data
before A and B went down originally.

Based on the recommended config. Will B now take all A's data ? Will you
end up with a mishmash of A and B's data at the block level (upsetting
GFS)? Or will A take B's data? B taking all A's data seems best to me
(least worst), as things may well have moved on quite a bit and we'd
hope B wasn't too far behind when it went down.

4/ Is it good practice (or even possible) to use the same private
interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd uses? RHCS
seems to make this hard, use an internal interface for cluster comms and
have the services presented on a different interface.

For reference my drbd.conf test version is below.

Hopefully this is pretty clear, though I'm not convinced I've been....

Thanks

Colin

global {
  usage-count yes;
}
common {
  protocol C;
}

resource r0 {
  syncer {
		verify-alg md5;
		rate 70M;
	}

  startup {
	become-primary-on both;
  }

  on edi1tcn1 {
    device    /dev/drbd1;
    disk      /dev/sda3;
    address   192.168.9.61:7789;
    meta-disk internal;
  }

  on edi1tcn2 {
    device    /dev/drbd1;
    disk      /dev/sda3;
    address   192.168.9.62:7789;
    meta-disk internal;
  }

  net {
   allow-two-primaries;
   after-sb-0pri discard-zero-changes;
   after-sb-1pri discard-secondary;
   after-sb-2pri disconnect;
  }
}

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.