[DRBD-user] Drbd hang on write

Thu Jul 13 10:23:18 CEST 2006

/ 2006-07-12 14:39:33 -0700
\ Tim Johnson:
> Hi Lars and all,
> 
> Back to looking at this issue... :)
> 
> Just to try and clarify a few points..  As Claude mentions below, there
> is no system degradation except for the drbd partition on which the
> write is being performed (reads, writes on other partitions, etc.).  The
> file system lock (even ls hangs) only occurs for large files on the
> order of 500 MB.  There is apparently no problem for smaller files.
> They were previously on a LAN environment with a network bandwidth of
> 100 MB/s and had no problem with the large files, but did after moving
> to a WAN with a bandwidth of 10 MB/s.  They are also using protocol A.
> 
> We did find this interesting performance related link at
> http://www.drbd.org/performance.html .  This was tested with version
> 0.6.6 and so may no longer be relevant, but it is interesting to note
> that the author did not get the expected performance with protocol A or
> B, and said they were "unusable" (which was fine for what was needed
> then).
> 
> Further specific points addressed embedded below...
> 
> > 
> > 
> > I have 2 IBM I series server using Linux
> > 
> > The drbd 0.7.19 version is install on them.
> > the 2 server are running on a 10mb lines in a wan configuration
> > 
> > The thing I would like to do is explain in detail what's happenning 
> > and the way I see it.
> > 
> > 1 - all_extents,protocol and snd_buffer parameter have been changed
> > 2 - Take drbd down and back up on both side( to make sure the changed 
> > have take
> > effect)
> > 3 - Start a copy of a 300MB files on the partition /dev/drbd2
> > 4 - The copy goes all the way
> > 5 - After about 30 sec to 1 minute when the copy finish we can't have 
> > access to the /dev/drbd2 partition (true win samba or just doing a ls
> of the partition)
> >      all the other drbd partition and the system it self show no
> degradation.
> > 6 - We see in the cat /proc/drbd the bytes of this partition going 
> > from primary to secondary
> > 7 - When the copy is done from primary to secondary the partition 
> > /dev/drbd2 become back available and performance is
> >      back to normal on this partition(no other part of the linux 
> > system is affect by this)
> > 
> > 
> > So what I see in all this.
> > 
> > It look like drbd doesn't really do is copy from primary to secondary 
> > in the background.
> 
> Lars wrote: there is no "background" or "forground".
> drbd is synchronous.
> 
> I think there may be a misunderstanding here.  I believe Claude is
> trying to say here that our understanding is that using Protocol A, drbd
> just sends out the data and does not wait for an ACK from the other
> side, but just gets on with its work after the data has been flushed to
> disk (and the the local tcp send buffer, which is, I suspect, at the
> root of this problem).  In this sense, drbd is, to my understanding,
> asynchronous.  In fact, when the backup node is not running drbd, there
> is no problem with file system access, so the problem does not appear to
> be directly related with disk I/O speed.  Perhaps there is something I
> am missing?

so once the tcp send buffer is full, it will only be depleted with the
rate of the replication link bandwidth. there you are.

> > My impression was that drbd would complete is copy in the backround 
> > with out slowing down the access to the fs on the primary machine.
> > I really hope this is not a concept issue.
> 
> Lars wrote: maybe a misconception on your side.
> obviously drbd cannot write faster than either of your io subsystems,
> nor the replication network bandwith.
> 
> 
> Lars wrote:what write rate do you observe? [*]
> what is your raw network bandwidth?
> 
> [*] write rate: _including_ fsync. "time cp hugefile somewhere" does not
> count, since cp does not do fsync (afaik). there are plenty of benchmark
> tools out there, as a rough estimate something like "sync;sync; time dd
> if=/dev/zero bs=1M  count=1024 of=blob ; time sync; time sync;"
> could do...
> 
> ---Still waiting for this...  Tim
> 
> > If I copy let say a 500MB files the same thing happen except it happen
> 
> > even before the copy to the primary finish and it can even abort the 
> > copy.
> 
> Lars wrote: well, smaller than that might fit in your local cache, and
> somewhen later the file system decides to flush it.  larger than this,
> and it needs to flush it to disk even during operation.
> 
> > Im really suprise this thing didn't pop up before in drbd forum.
> > To me it's basic.
> 
> Lars wrote: tell me your network bandwidth, disk throughput and observed
> write rates with connected drbd, and we'll see what is basic.
> maybe we can tune something. alas without knowing the hard limits and
> the currently achieved figures, this is hard to tell.
> 
> Still waiting for this...
> 
> > I really hope there is a parameter somewhere that would fix this.
> > 
> > The way I see it, it's really a drbd problem cause the system itself 
> > still response very good.
> > 
> > It's really the partition that hang until it complete a copy from 
> > primary to secondary.
> > 
> > All the other partition under drbd doesn't hang.
> > 
> > To make all parition hang I would just copy 3 files in each partition 
> > and I would be able to hang all the drbd partition at once.
> 
> Lars wrote: or change to a 2.4 kernel :)
> 
> Hope this helps clarify the thinking here..just for reference, the
> configuration file looks like:

> I suspect the sndbuf-size should be increased, but web sites I've seen
> have had warnings about taking it to 1MB.   

right. and still, it would only "speed up" the first MB that fits into
the buffer, not the GB of data of the large file that comes after it.

> Are there any further insights you or someone might provide given the
> info that we've got?

as long as you don't provide the data point we are still waiting for,
I cannot say whether there is room for tuning or not.

with a 10 Mega_Bit_ per second replication link,
you get about _one_ mega _byte_ per second netto bandwidth.
so copy a 500 MB file will take ~500 secons.
this more than eight minutes.
this is expected.

whether read requests will be served in time, drbd has not much
influence on. you could play with the io scheduler of the lower
devices (grep .  /sys/block/*/queue/scheduler). if it happens to
be "anticipatory", this will not do you any good in this
situation, you'd be better off setting it to "deadline" (echo
deadline > /sys/...).  this won't speed up the writes, but may
improve the latency of "concurrent" reads.

I really think this whole thing is not a problem of drbd,
but a problem with wrong expectations.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.