[DRBD-user] DRBD over the Internet

Wed Jun 28 11:05:26 CEST 2006

/ 2006-06-28 09:38:57 +0100
\ Mark Olliver:
> I currently have DRBD setup with one server in a Colo in Ireland
> and the other in a Colo in UK. They are running over a VPN which
> restricts the throughput to around 1-2Mbs due to hardware (might
> swap this).
> 
> They are both set to Protocol A (although C works too).
> 
> What I would like to know is what are the best setting for
> sndbuf-size and timeout’s etc? the reason I ask is because
> usually very little data at a time is written to these machines
> as they are mostly read from. So in those cases I could get away
> with protocol C. However I have switched to A as sometime I have
> to copy that data on the disks for tests or upgrades (about once
> or twice a week). When I copy the data this could be up to 8Gb’s
> worth. My issue is this takes a long time to sync naturally, but
> I don’t want it to slow the primary server down as other people
> still need to read/write there little amounts without too much
> effect.
> 
> When this is happening I could just detach the secondary device
> from the DRBD and then reconnect it afterwards but as this does
> get around the blocking but I was wondering if there is a better
> way.

during resync after disconnect, the sync target is inconsistent.
that means that during the resync of those 8GByte via 1MBit/s
link, (which takes about 18 hours), you won't be able to switch to
primary on the sync-target site, probably not even able to mount
it after forcing drbd and fsck'ing the device. so you are in
degraded mode for a day.

drbd does not do "asynchonous" nor "time shift/ delay" synchonization,
which would make it possible to slowly, but still consistently, apply
the modifications on the "master" site to the "backup" site, with less
impact to the throughput on the "master" site during bursts of changes
as long as the medium time average of changes stays below the
replication link bandwidth.
we might implement such functionality in our commercial variant, though.

there is always a tradeof: obviously in case of "master site down" you'd
lose a considerable amount of data/transactions/changes/writes...
but you'd have at least a consistent data set on the remote site at all
times, even if it is "slightly" out of date during bursts of changes.

so even with protocol "A" you just decrease the initial latency for a
few medium-sized writes. you cannot increase the drbd write bandwidth.

write bandwith on a connected drbd is the minimum of local io-bandwith
and network bandwidth, so with your 1MBit link you will get a write rate
of about 150 KByte/sec (thats CD-Rom speed 1, iirc)...
during resync this will be even less, since some of the network
bandwidth is used up by the syncer.

since you seem to be happy (well, can work with) that write rate,
probably your best bet is indeed to disconnect, deploy your 8GB update,
maybe take a snapshot/backup on the remote site (so you have some
consistent state to return to when desaster strikes), reconnect,
and live with a degraded cluster for a day.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.