[DRBD-user] Question about using DRBD to do snapshots on AWS EBS volumes

Fri Feb 27 17:16:00 CET 2015

Hi Lars,

On 19/02/15 11:53, Lars Ellenberg wrote:
> On Fri, Feb 06, 2015 at 07:45:49PM +0000, Giles Thomas wrote:
>> Are we right in thinking that the purpose of using LVM [on the secondary during backups] is purely to
>> get a mountable device that you can run the backup from?  After all,
> *AND* it keeps you from introducing data divergence aka
> split-brain, which you would later need to clean up.

So just to make sure I understand that correctly -- you're saying it 
stops us from accidentally writing to the secondary while it's 
disconnected from the primary.  Makes sense, that would obviously be a 
Bad Thing.

[Out of interest, would it be technically feasible for a future version 
of DRBD to make the disk available on the secondary in a read-only 
state?  I assume it wouldn't be easy (as otherwise it would probably 
have been done), and I'm not necessarily advocating it as a new feature 
-- but I am wondering if there are any deep technical barriers that 
would make it completely impossible.]

>
>>   * drbdadm disconnect the resource on the secondary
>>   * Use the AWS API to do their kind of snapshot on the underlying
>>     disks.  Wait for these to complete.
>>   * drbdadm connect the resource secondary
>>   * Make sure you wait for the primary to have synced fully before doing
>>     this again!
> Should just work.
>
> You only use DRBD "disconnect" to get "stable data",
> which you then can snapshot by other means...

An update on this: it looks like it works.  As it turns out, AWS's EBS 
snapshots actually don't even need to complete because they work with a 
frozen state of the data at the point they're started.  So a process 
that *seems to be* working well for us is:

  * drbdadm disconnect the resource on the secondary
  * Use the AWS API to do their kind of snapshot on the underlying 
disks.  Wait for these to *start* -- that is, show up as "pending" on 
the AWS console.
  * drbdadm connect the resource on the secondary
  * Make sure you wait for the primary to have synced fully before doing 
this again.

The reason I say this "seems to be" working is that we're having one 
problem with the snapshots we take this way.  The filesystem is XFS, and 
when we reconnect the volumes, they cannot be mounted with the project 
quota flag set.  They can be mounted without that flag, and all of the 
files are there, but if the flag is set, the mount hangs, and running 
XFS repair just crashes.  There are a couple of possibilities there, but 
it certainly seems to be an XFS thing.  Its quota settings aren't 
journalled, and we suspect that the disconnect means that the filesystem 
is in an inconsistent state quota-wise when the snapshot is taken.

We're considering moving to ext4fs, as that journals its quotas which 
might make it a better choice for our use case (and in general for 
backing up volumes while they're live).

> Don't forget to excercise *restore* as well.

Absolutely! :-)

> If you use DRBD internal meta data, it would be backed up,
> and restored, as well. Which means a restore would completely confuse
> DRBD, to the point where it may require a full resync, or even refuse to
> talk to the peer at all ("unrelated data").
>
> If you use DRBD external meta data, its even more confusing,
> as the meta data would not reflect what is in the data area,
> and there would not be a resync of relevant areas,
> causing unexpected data divergence.

Thanks, that's useful to know.  The underlying setup we have is a bunch 
of EBS volumes RAID-0-ed together into one large volume using mdadm, 
then LVM on top of that to split it into one smallish metadata volume 
for DRBD and one large one for the data.  So the EBS backup snapshots 
both the data and the metadata.

I suspect in the kind of catastrophic failure we'd need these backups to 
recover from, we'd wind up restoring the disks from the snapshots and 
reassembling the RAID array and LVM, but then wiping the metadata 
volume, connecting the result to a new DRBD primary, and then hooking it 
up to a secondary which had completely fresh disks (albeit with the 
whole RAID/LVM thing already set up) -- it takes about 10 hours to do a 
full sync, and while there would be a risk of a disk failure blowing 
everything up during that time window, that risk would probably be 
outweighed by the value of having DRBD having a completely fresh set of 
metadata.

> As long as you do file level backup and restore,
> there shold not be an issue.

Brilliant, many thanks!

All the best,

Giles

-- 
Giles Thomas <giles at pythonanywhere.com>

PythonAnywhere: Develop and host Python from your browser
<https://www.pythonanywhere.com/>

A product from PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK