[DRBD-user] Drbd hang on write

Mon Jul 31 17:12:29 CEST 2006

/ 2006-07-31 10:20:06 -0400
\ Claude Pelletier:
> Hi All,
> 
> 
> We had a few discussion about this issu.
> 
> With all the exchange on the subject.
> I think I can say I understand how drbd work on a high latency network.

this time, it is not the latency,
but the bandwidth that is your problem.

> Just to make sure I really understand.
> 
> So since I'm working with a high latency network
> I have decided to use protocol A
> 
> Now this protocol work with 2 must.
> 1) the data have to be written on the primary machine disk
> 2) the data must be write in the tcp/ip buffer.
> 
> So the problem I get is when I copy large file (350MB and more) on the
> primary machine the drbd partition seems to stop accepting writing on
> his primary partition until the network start emptying the tcp/ip
> buffer. Cause with this large file copy over the 10mb line just seems
> to be to slow to avoid drbd to stop momentaraly the writing on the
> primary partition.
> 
> So with this said my question is :
> 
> I went on to the drbd web page from linbit to read about DRBD+
> 
> I was wondering if I would work with this version(intead of the free 
> version) if I would get some improvment.
> I see on the webpage a few of the features that are optimize to get better 
> performance.

yes, DRBD+ does perform better in various areas.
but your problem is the bandwith of the 10MBit line.
using DRBD+ won't make your 10MBit line something else.

there is really very little any software can do here.  even if we
could have a cache of "350MB and more" on the primary side, either
in ram or in a temporary on-disk buffer,
[ yes, someday we'll do that, but more to be able to keep the sync
  target consistent during sync, not to make a 10MBit line apear
  as something else ]
that cache will eventually become full, too: it can only be
depleted with the bandwidth of the replication link.
and in case of a primary failure, the secondary would have no idea
about all the changes that only went into that cache, in this case
several hundred MB worth of data. this does not make sense.

either you get down your average write rate to something more in the
range the replication link can handle, or you get more bandwidth for the
replication link, or you don't use drbd, but e.g. nightly lvm snapshots
and rsync to get a consistent though out-of-date "image" of the data to
your secondary site.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.