[DRBD-user] Question about using DRBD to do snapshots on AWS EBS volumes

Thu Feb 19 12:53:23 CET 2015

On Fri, Feb 06, 2015 at 07:45:49PM +0000, Giles Thomas wrote:
> Hi there,
> 
> We have DRBD set up to create live backups of important data on a
> file server on Amazon EC2.  It's a simple setup, with a DRBD primary
> and a secondary, both with a bunch of storage made up of EBS volumes
> bound together via RAID, with LVM on top of that, and DRBD on top of
> the LVM.   Or, more succinctly, EBS -> RAID -> LVM -> DRBD.
> 
> This setup works well to help us handle the risks of a disk failure,
> and also has made it easy for us to move between AWS availability
> zones.  But we'd like to use it to do proper snapshots for backups
> to protect from user errors (deleting an important file etc.)   We
> have a couple of questions about the best way to do this.
> 
> We've read this blog post
> <http://blogs.linbit.com/p/277/use-backup-with-drbd/> about doing
> backups of DRBD volumes using LVM, which says:
> 
>    Essentially one would disconnect the Secondary, [use LVM to]
>    snapshot the backing device, mount the snapshot, perform the
>    backups, umount the snapshot, reconnect the Secondary.
> 
> We've tried this (the mickey-mouse test script is below), and it
> appears to work fine.
> 
> But: as far as we understand, the reason LVM is useful in this
> process is that it allows you to mount and access the backing
> device, as this would normally be inaccessible even when the server
> is disconnected, because DRBD is keeping it locked.  For example, in
> a test environment, on the secondary, with
> /dev/important_volume/real_data as the disk to which the DRBD
> resource r0 is syncing:
> 
>    root at giles-devbackup1:~# drbdadm disconnect r0
>    root at giles-devbackup1:~# mkdir /mnt/foo
>    root at giles-devbackup1:~# mount /dev/important_volume/real_data /mnt/foo
>    mount: /dev/mapper/important_volume-real_data already mounted or
>    /mnt/foo busy
> 
> Are we right in thinking that the purpose of using LVM is purely to
> get a mountable device that you can run the backup from?  After all,

*AND* it keeps you from introducing data divergence aka
split-brain, which you would later need to clean up.

> if the server you're running on has been disconnected from the
> primary, the volume won't be being written to, so it presumably
> isn't to avoid "blurring" as the disk changes underneath your feet
> while you're backing up.
> 
> If we're right in that belief, we think we don't need to use LVM
> snapshots for our specific environment.  We're using Amazon EBS, and
> that has its own snapshot functionality.   It doesn't work very well
> if data changes under its feet, but if we use "drbdadm disconnect"
> before it starts and reconnect when it's done, perhaps it will work?
> So we're wondering if instead we can use this, perhaps simpler,
> backup methodology:
> 
>  * drbdadm disconnect the resource on the secondary
>  * Use the AWS API to do their kind of snapshot on the underlying
>    disks.  Wait for these to complete.
>  * drbdadm connect the resource secondary
>  * Make sure you wait for the primary to have synced fully before doing
>    this again!

Should just work.

You only use DRBD "disconnect" to get "stable data",
which you then can snapshot by other means...

Don't forget to excercise *restore* as well.

If you use DRBD internal meta data, it would be backed up,
and restored, as well. Which means a restore would completely confuse
DRBD, to the point where it may require a full resync, or even refuse to
talk to the peer at all ("unrelated data").

If you use DRBD external meta data, its even more confusing,
as the meta data would not reflect what is in the data area,
and there would not be a resync of relevant areas,
causing unexpected data divergence.

As long as you do file level backup and restore,
there shold not be an issue.

> Again, we've done this in a testing environment and it looked like
> it worked -- but we're not sure if it will work in production.
> 
> Is anyone out there doing backups on Amazon like that?  Any thoughts?
> 
> 
> All the best,
> 
> Giles
> 
> PS here's our mickey-mouse script to do backups using LVM snapshot:
> 
>    echo Start off with a clean backup disk
>    mkfs.xfs -f /dev/xvdf
>    mount /dev/xvdf /mnt/backup
> 
>    echo Disconnect us - the secondary
>    drbdadm disconnect r0
> 
>    echo Snapshot the backing device
>    SNAPSHOT_ID=`date +%Y%m%dT%H%M`
>    lvcreate -L100M --snapshot --name $SNAPSHOT_ID
>    /dev/important_volume/real_data
> 
>    echo Mount the snapshot
>    mount /dev/important_volume/$SNAPSHOT_ID /mnt/snapshot/
> 
>    echo Perform the backup-- primitive version
>    cp -pr /mnt/snapshot/* /mnt/backup/
> 
>    echo Unmount the snapshot
>    umount /mnt/snapshot
> 
>    echo Remove the snapshot
>    lvremove -f /dev/important_volume/$SNAPSHOT_ID
> 
>    echo Reconnect us -- the secondary
>    drbdadm connect r0
> 
>    echo Umount the backup disk
>    umount /mnt/backup

For this, technically you would not even need to disconnect.
Though of course, depending on your snapshot performance,
it may impact performance on the primary quite heavily.
Which, I guess, is why the disconnect is in the original script.

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed