[DRBD-user] Drbd hang on write

Fri Aug 11 20:09:30 CEST 2006

 Hi all,

Finally looking yet again at this issue, and I think I have an update.
We managed to provision 2 machines in a WAN environment in our lab and
reproduced the issue copying over a 723 MB file.  Sure enough, the
partition freeze.  In fact, I then tried strace ls to find exactly where
it was hanging, and it turned out to be the second argument to
getdents64 ...a pointer to a structure containing inode info, etc.
Looks like something put a lock on the inodes.  Since this seemed like a
file system issue, I checked that, and we (and the customer as well)
were using reiserfs (best suited for large filesystem with lots of small
files..as is my understanding).  I also noticed earlier problems with
resierfs and drbd on the net
(http://www.gossamer-threads.com/lists/drbd/users/9331,
http://marc.theaimsgroup.com/?l=linux-ha&m=95814523220399&w=2 , and a
few other misc. links).  Also, not just during copying, but sometimes at
random intervals (during quiescent periods) ls would hang at the same
spot for a long time (sometimes 20 minutes or longer).  However, it
never hung if drbd was not running on the backup (also what the customer
observed).   BTW, the reiserfs version was 3.6--something.

So, I changed the file system to ext3 (yeah, xfs may have been a better
option, but I'm rushing through this :) ) and repeated the tests.  No
problems at all.

So, it looks like the problem lies in some coupling between what the
file system is doing (rebalancing the tree, perhaps?)  and drbd
mirroring to the other side.  I guess at this stage, I leave it to
Philipp Reisner and Hans Reiser to fight out :) .    Any other thoughts
on this issue?

Cheers,
Tim

Tim Johnson
Senior Software Engineer
Vision Solutions, Inc.

17911 Von Karman Ave,  5th Floor
Irvine, CA 92614
UNITED STATES

Tel: +1 (949) 253-6528
Fax: +1 (949) 225-0287
Email: tjohnson at visionsolutions.com
<http://www.visionsolutions.com/>
Disclaimer - 8/11/2006
The contents of this e-mail (and any attachments) are confidential, may be privileged, and may contain copyright material of Vision Solutions, Inc. or third parties. You may only reproduce or distribute the material if you are expressly authorized by Vision Solutions to do so. If you are not the intended recipient, any use, disclosure or copying of this e-mail (and any attachments) is unauthorized. If you have received this e-mail in error, please immediately delete it and any copies of it from your system and notify us via e-mail at helpdesk at visionsolutions.com 
-----Original Message-----
From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Lars Ellenberg
Sent: Thursday, July 13, 2006 1:23 AM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] Drbd hang on write

/ 2006-07-12 14:39:33 -0700
\ Tim Johnson:
> Hi Lars and all,
> 
> Back to looking at this issue... :)
> 
> Just to try and clarify a few points..  As Claude mentions below, 
> there is no system degradation except for the drbd partition on which 
> the write is being performed (reads, writes on other partitions, 
> etc.).  The file system lock (even ls hangs) only occurs for large 
> files on the order of 500 MB.  There is apparently no problem for
smaller files.
> They were previously on a LAN environment with a network bandwidth of 
> 100 MB/s and had no problem with the large files, but did after moving

> to a WAN with a bandwidth of 10 MB/s.  They are also using protocol A.
> 
> We did find this interesting performance related link at 
> http://www.drbd.org/performance.html .  This was tested with version
> 0.6.6 and so may no longer be relevant, but it is interesting to note 
> that the author did not get the expected performance with protocol A 
> or B, and said they were "unusable" (which was fine for what was 
> needed then).
> 
> Further specific points addressed embedded below...
> 
> > 
> > 
> > I have 2 IBM I series server using Linux
> > 
> > The drbd 0.7.19 version is install on them.
> > the 2 server are running on a 10mb lines in a wan configuration
> > 
> > The thing I would like to do is explain in detail what's happenning 
> > and the way I see it.
> > 
> > 1 - all_extents,protocol and snd_buffer parameter have been changed
> > 2 - Take drbd down and back up on both side( to make sure the 
> > changed have take
> > effect)
> > 3 - Start a copy of a 300MB files on the partition /dev/drbd2
> > 4 - The copy goes all the way
> > 5 - After about 30 sec to 1 minute when the copy finish we can't 
> > have access to the /dev/drbd2 partition (true win samba or just 
> > doing a ls
> of the partition)
> >      all the other drbd partition and the system it self show no
> degradation.
> > 6 - We see in the cat /proc/drbd the bytes of this partition going 
> > from primary to secondary
> > 7 - When the copy is done from primary to secondary the partition
> > /dev/drbd2 become back available and performance is
> >      back to normal on this partition(no other part of the linux 
> > system is affect by this)
> > 
> > 
> > So what I see in all this.
> > 
> > It look like drbd doesn't really do is copy from primary to 
> > secondary in the background.
> 
> Lars wrote: there is no "background" or "forground".
> drbd is synchronous.
> 
> I think there may be a misunderstanding here.  I believe Claude is 
> trying to say here that our understanding is that using Protocol A, 
> drbd just sends out the data and does not wait for an ACK from the 
> other side, but just gets on with its work after the data has been 
> flushed to disk (and the the local tcp send buffer, which is, I 
> suspect, at the root of this problem).  In this sense, drbd is, to my 
> understanding, asynchronous.  In fact, when the backup node is not 
> running drbd, there is no problem with file system access, so the 
> problem does not appear to be directly related with disk I/O speed.  
> Perhaps there is something I am missing?

so once the tcp send buffer is full, it will only be depleted with the
rate of the replication link bandwidth. there you are.

> > My impression was that drbd would complete is copy in the backround 
> > with out slowing down the access to the fs on the primary machine.
> > I really hope this is not a concept issue.
> 
> Lars wrote: maybe a misconception on your side.
> obviously drbd cannot write faster than either of your io subsystems, 
> nor the replication network bandwith.
> 
> 
> Lars wrote:what write rate do you observe? [*] what is your raw 
> network bandwidth?
> 
> [*] write rate: _including_ fsync. "time cp hugefile somewhere" does 
> not count, since cp does not do fsync (afaik). there are plenty of 
> benchmark tools out there, as a rough estimate something like 
> "sync;sync; time dd if=/dev/zero bs=1M  count=1024 of=blob ; time
sync; time sync;"
> could do...
> 
> ---Still waiting for this...  Tim
> 
> > If I copy let say a 500MB files the same thing happen except it 
> > happen
> 
> > even before the copy to the primary finish and it can even abort the

> > copy.
> 
> Lars wrote: well, smaller than that might fit in your local cache, and

> somewhen later the file system decides to flush it.  larger than this,

> and it needs to flush it to disk even during operation.
> 
> > Im really suprise this thing didn't pop up before in drbd forum.
> > To me it's basic.
> 
> Lars wrote: tell me your network bandwidth, disk throughput and 
> observed write rates with connected drbd, and we'll see what is basic.
> maybe we can tune something. alas without knowing the hard limits and 
> the currently achieved figures, this is hard to tell.
> 
> Still waiting for this...
> 
> > I really hope there is a parameter somewhere that would fix this.
> > 
> > The way I see it, it's really a drbd problem cause the system itself

> > still response very good.
> > 
> > It's really the partition that hang until it complete a copy from 
> > primary to secondary.
> > 
> > All the other partition under drbd doesn't hang.
> > 
> > To make all parition hang I would just copy 3 files in each 
> > partition and I would be able to hang all the drbd partition at
once.
> 
> Lars wrote: or change to a 2.4 kernel :)
> 
> Hope this helps clarify the thinking here..just for reference, the 
> configuration file looks like:

> I suspect the sndbuf-size should be increased, but web sites I've seen
> have had warnings about taking it to 1MB.   

right. and still, it would only "speed up" the first MB that fits into
the buffer, not the GB of data of the large file that comes after it.

> Are there any further insights you or someone might provide given the 
> info that we've got?

as long as you don't provide the data point we are still waiting for, I
cannot say whether there is room for tuning or not.

with a 10 Mega_Bit_ per second replication link, you get about _one_
mega _byte_ per second netto bandwidth.
so copy a 500 MB file will take ~500 secons.
this more than eight minutes.
this is expected.

whether read requests will be served in time, drbd has not much
influence on. you could play with the io scheduler of the lower devices
(grep .  /sys/block/*/queue/scheduler). if it happens to be
"anticipatory", this will not do you any good in this situation, you'd
be better off setting it to "deadline" (echo deadline > /sys/...).  this
won't speed up the writes, but may improve the latency of "concurrent"
reads.

I really think this whole thing is not a problem of drbd, but a problem
with wrong expectations.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user