[DRBD-user] Resyncing over a slow link

Wed Nov 18 14:25:00 CET 2009

On Tue, Nov 17, 2009 at 08:27:37AM -0800, Brian Marshall wrote:
> Hello,
>
> Please pardon any mistakes, as I am relatively new to DRBD.
> I've recently started to use DRBD to synchronize a 2TB volume over a
> low-speed (10-14 megabit) wireless link to an offsite location. I
> initialized the original volume using ocfs2 and drbd and copied to the
> second volume using dd, so the data sets should have started out identical.
> I installed the disks at the remote location and brought up the array in
> active-active synchronous mode. Writes on either volume would show up
> instantly and everything seemed to be alright, but I wanted to verify
> that the data sets were actually in sync, so I did an invalidate-remote

you'd should say _verify_ if you mean verify.
don't say invalidate, it really does _in_validate,
i.e. assumes everything to be out of sync.

> from the local node and watched in horror as it tried to re-sync the
> entire array. The operation was set to take 2-3 weeks, and everytime one
>  or other node went down the operation started again from 0%!

2TiByte / 10MBit/s
(2<<40) / ((10/8) << 20) / s

1.6 * (1<<20) seconds, 19.5 days ;)

right.

don't let the "starting from 0%" worry you, the syncer stats in
/proc/drbd will _always_ count from 0% to 100% of the currently
running synchronisation, so the total amount to be synced in each
run will slowly decrease.

> Reading the mailing lists, it appeared that their was an assumption that
> the disk was always the performance bottleneck, and the case of slow
> links was not really considered by the developers.

Come again?

Developers did not consider the case where you want
a write rate of 1GBit over a 10 Mbit connection?

Of course not.

Developers did not consider the case where someone voluntarily
asks DRBD to perform a full sync of 2TiB over a 10Mbit link,
and then complains that this takes 20 days?

Of course not.

> Seeing as I contribute to reasonably large OSS projects, I am interested
> in trying to add some functionality to DRBD to help people out in
> situations similar to my own (as well as help with my own dilemma).
> My question is this: is the code functionally separated to the point
> that it would be possible to add a second code path for the
> synchronization algorithm? I would be interested in adding support for
> synchronizing devices using a binary diffing library like librsync to
> trade CPU cycles for bandwidth.

Maybe you should read the User's Guide
before complaining about the lack of a Developer's Guide.

There is checksum based resync.

There is drbd-proxy, which can do compression.

You may also want to read about "truck based replication"
in the DRBD User's Guide.

> Please let me know if this is theoretically possible given the current
> architecture of DRBD, and if so, any starting tips you might be able to
> think of. I couldn't find a developers guide, so I'll be jumping into
> this cold.

> Also let me know if I'm being terribly naive in thinking that
> there is some way I'll be able to implement something workable in less
> that the 3 weeks it will take for the array to sync naturally.

Absolutely ;)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed