[DRBD-user] DRBD Recovery actions without Pacemaker

Adam Goryachev mailinglists at websitemanagers.com.au
Fri Jul 8 02:02:25 CEST 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Please don't top-post, it makes the thread harder to read (ie, harder to 
help you).

On 08/07/16 02:01, James Ault wrote:
>


>
>
> -James Ault, http://www.linkedin.com/in/aultj/ 
> http://tinyurl.com/link2jimault <http://tinyurl.com/link2jimault>
> Life's Biggest Decision is... <http://www.bornofthespirit.today/>
>
>
> On Thu, Jul 7, 2016 at 10:48 AM, Lars Ellenberg 
> <lars.ellenberg at linbit.com <mailto:lars.ellenberg at linbit.com>> wrote:
>
>     On Thu, Jul 07, 2016 at 07:16:51AM -0400, James Ault wrote:
>     > Here is a scenario:
>     >
>     > Two identical servers running RHEL 6.7,
>     > Three RAID5 targets, with one Logical volume group and one
>     logical volume
>     > defined on top of each target.
>     > A DRBD device defined on top of each logical volume, and then an
>     XFS file
>     > system defined on top of each DRBD device.
>     >
>     > The two identical servers are right on top of one another in the
>     rack, and
>     > connected by a single ethernet cable for a private network.
>     >
>     > The configuration works as far as synchronization between DRBD
>     devices.
>     >
>     > We do NOT have pacemaker as part of this configuration at
>     management's
>     > request.
>     >
>     > We have the XFS file system mounted on server1, and this file
>     system is
>     > exported via NFS.
>     >
>     > The difficulty lies in performing failover actions without pacemaker
>     > automation.
>     >
>     > The file system is mounted, and those status flags on the file
>     system are
>     > successfully mirrored to server2.
>     >
>     > If I disconnected all wires from server1 to simulate system
>     failure, and
>     > promoted server2 to primary on one of these file systems, and
>     attempted to
>     > mount it, the error displayed is "file system already mounted".
>     >
>     > I have searched the xfs_admin and mount man pages thoroughly to
>     find an
>     > option that would help me overcome this state.
>     >
>     > Our purpose of replication is to preserve and recover data in
>     case of
>     > failure, but we are unable to recover or use the secondary copy
>     in our
>     > current configuration.
>     >
>     > How can I recover and use this data without introducing
>     pacemaker to our
>     > configuration?
>
>     If you want to do manual failover (I believe we have that also
>     documented in the User's Guide), all you do is
>
>     drbdadm primary $res
>     mount /dev/drbdX /some/where
>
>     That's also exactly what pacemaker would do.
>
>     If that does not work,
>     you have it either "auto-mounted" already by something,
>     or you have some file system UUID conflict,
>     or something else is very wrong.
>
>
> I see the Manual Failover section of the DRBD 8.4.x manual, and I see 
> that it requires that the file system be umounted before attempting to 
> promote and mount the file system on the secondary.
>
Assuming san1 is primary, and san2 is secondary.

If you want to do a "nice" failover:
a) on san1 stop whatever processes are "using" the filesystem (eg, NFS, 
samba, etc...)
b) on san1 umount the filesystem
c) on san1 change DRBD resource to secondary
d) on san2 change DRBD resource to primary
e) on san2 mount the filesystem
f) on san2 start whatever processes to export the filesystem (eg NFS, 
samba, etc)

> What I meant by "those status flags" in my first message is that when 
> a node mounts a file system, that file system is marked as mounted 
> somewhere on that device.   The "mounted" status flag is what I'm 
> trying to describe, and I'm not sure if I have the correct name for it.

Me neither, and I'm not familiar with XFS at all, however, the unclean 
failover looks like this:

a) san1 crashes, on san2, it sees the remote is missing, and changes to 
disconnected status
b) on san2 change DRBD resource to primary
c) on san2 mount the filesystem
d) on san2 start whatever processes to export the filesystem (eg NFS, 
samba, etc)

As far as step (C), this would be an identical process as if you were 
not using DRBD at all, and the machine had "crashed", and you had 
rebooted it, and were now trying to mount the FS. ie, its just a 
standard unclean mount. Maybe you need to run a fsck first, maybe there 
is some other process, but generally, most FS's I've used, you simply 
mount it and it will either "clean up" (if it is a journal based FS) or 
continue as normal until it encounters some corruption/error.
> Does pacemaker or manual failover handle the case where a file server 
> experiences a hard failure where the umount operation is impossible? 
>    How can the secondary copy of the file system be mounted if the 
> umount operation never occurred and cannot occur on server1?

Yes, pacemaker simply automates the above processes, so that the 
decision to do the failover, and the actual failover process will happen 
more quickly (hopefully before your clients/services notice any 
interruption).

BTW, have you actually tried it yet? You should definitely test a number 
of scenarios, so if you have a scenario with a specific problem, please 
provide a description of what you did, what commands you tried, and the 
output of those commands so we can provide better information.

Hope that helps...

-- 
Adam Goryachev Website Managers www.websitemanagers.com.au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160708/374d95fd/attachment.htm>


More information about the drbd-user mailing list