[DRBD-user] Guide to growing DRBD on raw partitions under clustered LVM

Fri Aug 8 16:21:41 CEST 2014

On Tue, Aug 05, 2014 at 09:13:41AM -0400, Digimer wrote:
> On 05/08/14 05:11 AM, Lars Ellenberg wrote:
> >On Mon, Aug 04, 2014 at 10:06:01PM -0400, Digimer wrote:
> >>Hi all,
> >>
> >>   Your friendly neighbourhood documentor decided to work out how to
> >>grow storage in HA clusters using DRBD.
> >>
> >>https://alteeve.ca/w/Anvil!_Tutorial_2_-_Growing_Storage
> >>
> >>   It's based around the main Anvil! tutorial, but it should be easy
> >>to adapt to pretty much any situation that is using DRBD on raw
> >>partitions. In my case, I do this because it gets complicated mixing
> >>LVM under DRBD and clustered LVM over DRBD.
> >>
> >>   Spelling/grammer/logic mistakes, please let me know. :)
> >
> >Many people find your tutorials.
> >Because, well, you know ;-)
> >
> >So my guess is, people will find this tutorial soon,
> >when they search for DRBD and resize,
> >right next to the DRBD User's Guide.
> >
> >We will update that one, if we have not already.
> >
> >But it won't hurt to add a warning note to your tutorial
> >about resizing as well:
> >I seriously broke certain scenarios of resize,
> >leading to data loss/corruption.
> >
> >I paste the commit that fixed it below.
> >The issue is with *offline* resize with *internal* meta data,
> >while intending to keep the data in place.
> >
> >For this scenario,
> >drbdmeta up to and including 8.4.2 is ok afaik,
> >drbdmeta in 8.4.3 and 8.4.4 is broken,
> >drbdmeta from drbd-utils 8.9.0 and above is ok again.
> >
> >(No comment on your tutorial itself,
> >I did not really read through it yet.)
> >
> >commit 4413afe8ed9819d45e7575c635e062ce2202cbbe
> >Author: Lars Ellenberg <lars.ellenberg at linbit.com>
> >Date:   Fri Jun 6 18:11:55 2014 +0200
> >
> >     drbdmeta: fix data corruption during offline resize
> >
> >     Regression introduced in 8.4.3, still present in 8.4.4.
> >
> >     When offline resizing internal meta data, drbdmeta forgot to
> >     properly re-initialize the new meta data offsets in time,
> >     and would move the old meta data into the existing data area,
> >     starting with offset 0 instead, thereby corrupting the first part
> >     of the data, up to the size of the old bitmap area.
> >     (embeded disk image partition table, file system super block,
> >     top level directory, all gone!)
> 
> Hiya,
> 
> I've got a pretty prominent warning to a) test first on a test
> system and b) ensure you have good backups. :)
> 
> As for resize risk; I decided to approach this a little differently,
> and I would be very interested in your opinion of it:
> 
> In the scenario I used, I added an extra 300 GB to the underlying
> disk (/dev/sda), and I have two partition backing two DRBD resources
> (sda5 -> drbd0 and sda6 -> drbd1). I wanted to split the new space
> evenly between the two resources, thus growing each DRBD resource by
> 150 GB.
> 
> Originally, I had planned to move sda6 down the disk by 150GB, but
> parted's 'move' command is deprecated in EL6 and gone in EL7. Even
> then, the move command refused to move sda6 because it didn't have a
> recognized FS on it.
> 
> So instead, I decided to migrate services to one node and withdraw
> the other, stopping DRBD entirely on the first node. I wiped the MD,
> deleted the partitions, recreated the partitions with the new
> geometry and rebooted.
> 
> When back up, I zero'ed out the start of the new partitions and then
> ran 'create-md' on the new partitions. This would, as I understand
> it, create the new MD at the new end position on each node, correct?
> I then attached -> invalidated (paranoid, I am) -> connected and
> waited for the two resources to do a full resync. Once sync'ed, I
> migrated services over and repeated the process on the second node.
> 
> Once the second node was done, a quick 'pvresize' was enough to
> recognize the new space.
> 
> This approach, if I am correct, avoids the bug you mentioned because
> I didn't use DRBD's resize tool. If I am correct, the main downside
> to my approach is the time needed for a full resync,

Actually, *two* full syncs for every minor DRBD.

> and the understanding that there is no redundancy until the resyncs
> complete.

> Comments?

LVM below DRBD makes things much more convenient in this case.
You don't have to move data at all.

If you need to move data below DRBD, and cannot do that while DRBD is
online, you are degraded for that time in any case.

Parted move: you would not need that anyways.
ddrescue will do the job just fine ;-)
You just have to carefully calculate the offsets and lengths of the
partitions before and after, and operate on the whole device,
and for overlapping source and target, need to do it iteratively
in the right direction with the right chunksize...

Anyways, if you need to move all the data anyways, your procedure may
even be faster, as it reads from one IO subsystem and writes to the
other (via network), instead of reading and writing from/to the same.

Heavily depends on network and IO subsystems, of course,
and where the bottleneck is.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com