Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, we are currently using DRBD very succesfully, mirroring several TB of data for some time now. (At this point let me thank you for this really great product :)). Now we seek to establish disaster redundancy for some data by sharing it between two different datacenters preferably using DRBD. I am currently testing a cluster with nodes separated a few km using Protocol A/B/C (ping times around 1 ms) with drbd 8.0.6 successfully. To test larger distances with higher latencies we tried to let drbd run via some TCP-proxies onto a third machine forwarding between the real drbd hosts (for example using ssh-forwarders, or other userspace relay-utilities). Using protocol C this works usually very good but due to rising latencies write access gets slower and slower (up to factor 25 with a 10 ms delay for each tcp-packet through the proxy). To overcome this we tried to use protocol A as it is more important to us to have some data than to be synchronous. Using A on a low latency network (LAN) works unproblematic. While using higher latency setups initial connection setup (like bitmap exchange) usually works (the connect seems to be more difficult) we get a lot 'BrokenPipe' states while data is transferred (often after a few bytes). The latter usually only gets fixed by disconnecting drbd on one host (starting the cyle all over). Sometimes we get [6214.280107] drbd0: BUG! md_sync_timer expired! Worker calls drbd_md_sync(). while usually drbd just constantly tries to reconnect: [6220.255205] drbd0: sock was shut down by peer [6220.255217] drbd0: conn( WFReportParams -> BrokenPipe ) [6220.255230] drbd0: short read expecting header on sock: r=0 [6220.255236] drbd0: My msock connect got accepted onto peer's sock! [6226.251812] drbd0: tl_clear() [6226.251818] drbd0: Connection closed [6226.251829] drbd0: conn( BrokenPipe -> Unconnected ) [6226.251841] drbd0: conn( Unconnected -> WFConnection ) [6226.295766] drbd0: conn( WFConnection -> WFReportParams ) [6226.296156] drbd0: sock was shut down by peer [6226.296164] drbd0: conn( WFReportParams -> BrokenPipe ) [6226.296172] drbd0: short read expecting header on sock: r=0 [6226.296178] drbd0: My msock connect got accepted onto peer's sock! [6232.288608] drbd0: tl_clear() (( repeating until drbd disconnect r0 )) I was wondering if someone has experience running DRBD on long distances or high latencies link with ping times up to tens of ms possibly with protocoll A? Any hints or tips to tune such setups are welcome. Jens -- jens.beyer at 1und1.de