[DRBD-user] DRBD over long distances

Fri Oct 12 14:26:09 CEST 2007

On Friday 12 October 2007 12:13:14 Jens Beyer wrote:
> Hi,
>
> we are currently using DRBD very succesfully, mirroring several
> TB of data for some time now. (At this point let me thank you
> for this really great product :)).
>
> Now we seek to establish disaster redundancy for some data by
> sharing it between two different datacenters preferably using DRBD.
> I am currently testing a cluster with nodes separated a few km
> using Protocol A/B/C (ping times around 1 ms) with drbd 8.0.6
> successfully.
>
> To test larger distances with higher latencies we tried to let drbd run
> via some TCP-proxies onto a third machine forwarding between the
> real drbd hosts (for example using ssh-forwarders, or other userspace
> relay-utilities). Using protocol C this works usually very good but due
> to rising latencies write access gets slower and slower (up to factor 25
> with a 10 ms delay for each tcp-packet through the proxy).
>
> To overcome this we tried to use protocol A as it is more important
> to us to have some data than to be synchronous.
> Using A on a low latency network (LAN) works unproblematic. While
> using higher latency setups initial connection setup (like bitmap
> exchange) usually works (the connect seems to be more difficult)
> we get a lot 'BrokenPipe' states while data is transferred (often
> after a few bytes). The latter usually only gets fixed by disconnecting
> drbd on one host (starting the cyle all over).
>
> Sometimes we get
> [6214.280107] drbd0: BUG! md_sync_timer expired! Worker calls
> drbd_md_sync().
>
> while usually drbd just constantly tries to reconnect:
> [6220.255205] drbd0: sock was shut down by peer
> [6220.255217] drbd0: conn( WFReportParams -> BrokenPipe )
> [6220.255230] drbd0: short read expecting header on sock: r=0
> [6220.255236] drbd0: My msock connect got accepted onto peer's sock!
> [6226.251812] drbd0: tl_clear()
> [6226.251818] drbd0: Connection closed
> [6226.251829] drbd0: conn( BrokenPipe -> Unconnected )
> [6226.251841] drbd0: conn( Unconnected -> WFConnection )
> [6226.295766] drbd0: conn( WFConnection -> WFReportParams )
> [6226.296156] drbd0: sock was shut down by peer
> [6226.296164] drbd0: conn( WFReportParams -> BrokenPipe )
> [6226.296172] drbd0: short read expecting header on sock: r=0
> [6226.296178] drbd0: My msock connect got accepted onto peer's sock!
> [6232.288608] drbd0: tl_clear()
> (( repeating until drbd disconnect r0 ))
>
> I was wondering if someone has experience running DRBD on long distances or
> high latencies link with ping times up to tens of ms possibly with
> protocoll A?
>
> Any hints or tips to tune such setups are welcome.
>

Hi Jens,

Interesting...

We (in the sense of LINBIT) have just developed a DRBD-proxy tailored
for especially this purpose. We just have not yet told anybody about it.
Maybe you want to contact us in that regard...

-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :