[DRBD-user] Best Practice with DRBD RHCS and GFS2?

Colin Simpson Colin.Simpson at iongeo.com
Wed Oct 20 17:41:18 CEST 2010


Thanks for your reply. It's sadly maybe not what I'm seeing. 

If I just boot the system with cman, drbd then clvmd coming up in that
order. The GFS2 mounts hang (until fully sync'd up) and I get a nasty
kernel error (this is Centos 5.5 as a test before moving upto RHEL for
production):

Oct 20 15:47:44 testnode2 kernel: "echo 0
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 20 15:47:44 testnode2 kernel: gfs2_quotad   D 00000114  2812  3942
11                3941 (L-TLB)
Oct 20 15:47:44 testnode2 kernel:        f237aec0 00000046 fd146610
00000114 00000000 f21f1800 f237ae80 00000006 
Oct 20 15:47:44 testnode2 kernel:        f3101550 fd14e179 00000114
00007b69 00000001 f310165c c28197c4 c2953580 
Oct 20 15:47:44 testnode2 kernel:        f8f0e21c c281a164 f27a8b70
00000000 f1d1c6c0 00000018 f27a8b50 ffffffff 
Oct 20 15:47:44 testnode2 kernel: Call Trace:
Oct 20 15:47:44 testnode2 kernel:  [<f8f0e21c>] gdlm_bast+0x0/0x78
[lock_dlm]
Oct 20 15:47:44 testnode2 kernel:  [<f901210e>] just_schedule+0x5/0x8
[gfs2]

It all works clean if I wait for the drbd to be fully in sync, before
clvmd is started. 

Any thoughts?

Do I need a syncer { verify-alg } at all, if I'm always just taking the
newer data? 

Did my config file look OK apart from the startup options you
recommended?

Thanks agian

Colin

On Mon, 2010-10-18 at 17:29 +0100, J. Ryan Earl wrote:
> Hi Colin,
> 
> 
> Inline reply below:
> 
> On Fri, Oct 15, 2010 at 2:01 PM, Colin Simpson
> <Colin.Simpson at iongeo.com> wrote:
>         Hi
>         
>         I have a working test cluster RH Cluster Suite with various
>         GFS2 file
>         systems on top of a DRBD Primary/Primary device.
>         
>         I have the recommended GFS setup in drbd.conf i.e
>         
>         allow-two-primaries;
>           after-sb-0pri discard-zero-changes;
>           after-sb-1pri discard-secondary;
>           after-sb-2pri disconnect;
>         
>         Now I have been trying to think of the danger scenarios that
>         might arise
>         with my setup.
>         
>         So I have a few questions (maybe quite a few):
>         
>         1/ When one node is brought back up after being down it starts
>         to sync
>         up to the "newer" copy (I'm hoping).
>         
>         I presume GFS shouldn't be mounted at this point on the just
>         brought up
>         node (as data will not be consistent between the two GFS
>         mounts and the
>         block device will be changing underneath it)?
> 
> 
> The drbd service should start before the clvmd service.  The syncing
> node will sync and/or be immediately ready for use when clvmd comes
> up.  I do this to assure this is the case:
> 
> 
> /usr/bin/patch <<EOF
> --- clvmd.orig  2010-09-13 17:15:17.000000000 -0500
> +++ clvmd       2010-09-13 17:36:46.000000000 -0500
> @@ -7,6 +7,8 @@
>  #
>  ### BEGIN INIT INFO
>  # Provides: clvmd
> +# Required-Start: drbd
> +# Required-Stop: drbd
>  # Short-Description: Clustered LVM Daemon
>  ### END INIT INFO
> EOF
> 
> /usr/bin/patch <<EOF
> --- drbd.orig   2010-09-13 17:15:17.000000000 -0500
> +++ drbd        2010-09-13 17:39:46.000000000 -0500
> @@ -15,8 +15,8 @@
>  # Should-Stop:    sshd multipathd
>  # Default-Start:  2 3 4 5
>  # Default-Stop:   0 1 6
> -# X-Start-Before: heartbeat corosync
> -# X-Stop-After:   heartbeat corosync
> +# X-Start-Before: heartbeat corosync clvmd
> +# X-Stop-After:   heartbeat corosync clvmd
>  # Short-Description:    Control drbd resources.
>  ### END INIT INFO
> EOF
> cd -
> 
> # setup proper order and make sure it sticks
> for X in drbd clvmd ; do 
>   /sbin/chkconfig $X resetpriorities
> done
> 
>         I mean, does it or is there any way of running drbd so it
>         ignores the
>         out of date primary's data (on the node just brought up) and
>         passes all
>         the requests through to the "good" primary (until it is
>         sync'd)?
> 
> 
> That's what it does from my observation.
>  
>         
>         Should I have my own start up script to only start cman and
>         clvmd when I
>         finally see
>         
>          1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate
>         
>         and not
>         
>          1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate
>         
>         , what is recommended (or what do people do)? Or is there some
>         way of
>         achieving this already?
> 
> 
> Nah.  Just make sure drbd starts before clvmd.
>  
>         
>         Just starting up cman still seems to try to start services
>         that then
>         fail out even before clvmd is running (including services that
>         are
>         children of FS's in the cluster.conf file):
>         
>         <clusterfs fstype="gfs" ref="datahome">
>                <nfsexport ref="tcluexports">
>                        <nfsclient name=" " ref="NFSdatahomeclnt"/>
>                </nfsexport>
>         </clusterfs>
>         
>         So I'm presuming I need to delay starting cman and clvmd and
>         not just
>         clvmd?
> 
> 
> clvmd should be dependent on drbd, that is all.
>  
>         
>         I'd like automatic cluster recovery.
>         
>         2/ Is discard-older-primary not better in a Primary/Primary?
>         Or is it
>         inappropriate in dual Primary?
> 
> 
> With the split-brain settings you mentioned further up, you have
> automatic recovery for the safe cases.  Depending on your data,
> "discard-least-changes" may be a policy you can look at.  For the
> non-safe cases, I prefer human intervention personally.
>  
>         
>         3/ Is there any merit in stopping one node first always so you
>         know for
>         start up which one has the most up to date data (say if their
>         is a start
>         up PSU failure)? Will a shutdown DRBD node with a stopped GFS
>         and drbd
>         still have a consistent (though out of date file system)?
> 
> 
> DRBD metadata tracks which one is most up-to-date.
>  
>         
>         4/ I was thinking the bad (hopefully unlikely) scenario where
>         you bring
>         up an out of date node A (older than B's data), it maybe
>         hopefully comes
>         up clean (if the above question allows). It starts working,
>         some time
>         later you bring up node B which originally had a later set of
>         data
>         before A and B went down originally.
> 
> 
> That should be prevented by something like:
> startup {
>     wfc-timeout 0 ;       # Wait forever for initial connection
>     degr-wfc-timeout 60;  # Wait only 60 seconds if this node was a degraded cluster
>   }
> "A" would wait indefinitely for "B" to start.  Only if you manually goto the console and type "yes" to abort the wfc-timeout will "A" come up inconsistent.
>         
>         Based on the recommended config. Will B now take all A's
>         data ?
> 
> 
> Nope.  You have to manually resolve.
>  
>         
>         4/ Is it good practice (or even possible) to use the same
>         private
>         interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd
>         uses? RHCS
>         seems to make this hard, use an internal interface for cluster
>         comms and
>         have the services presented on a different interface.
> 
> 
> That's a performance issue and depends on how fast your interconnect
> is.  If your backing-storage can saturate the link DRBD is over,
> you'll want to run the totem protocol over a different interconnect.
>  If you're using something like InfiniBand or 10Gbe it likely will not
> be a problem unless you have some wicked-fast solid-state backing
> storage.
>  
> Cheers,
> -JR
> 

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.





More information about the drbd-user mailing list