[DRBD-user] [OT] rsync issues [Was Re: Read performance?]

David Masover ninja at slaphack.com
Tue Jun 5 15:23:11 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tuesday 05 June 2007 04:12:12 Lars Ellenberg wrote:
> On Mon, Jun 04, 2007 at 11:58:30AM -0500, David Masover wrote:
> > On Monday 04 June 2007 08:56:41 Lars Ellenberg wrote:
> > > you talk about DRBD + OCFS2 on a low bandwidth high latency link,
> > > and you want to improve on the "sluggish" behaviour by making
> > > drbd replication asynchronous, buffering writes for minutes.
>
> [ and here I explained why this does not make sense ... ]
>
> > Except I gave up OCFS2 ...
>
> and there you tell us that you gave up on OCFS2 anyways.
> so why did you suggest it at all.

I only mentioned it as background information. I very clearly said that I was 
not still using it.

> > dd if=/dev/zero of=/dev/drbd0 bs=1M count=1
> >
> > The above can hang as long as it takes to get across the network. And if,
> > at the same time, I'm trying to do this:
> >
> > dd if=/dev/drbd0 of=/dev/null bs=1M count=1
> >
> > Then sure, let that one hang, too.
>
> no, it should not necessarily hang.
> if you tune your io scheduler for that purpose,
> it may finish long before the writing dd.
> and it may or may not return "stale" data, obviously.

It's actually a bit of a race condition here. If I write one block (of 
whatever size block DRBD is syncing), and then attempt to read the same 
block, that read could hang until the block is synchronized across the array.

Therefore, you could have a situation where the "reader" dd gets ahead of 
the "writer" dd, but if that is a problem for you, it would happen on 
ordinary block devices, too.

> so yes, if you have dependencies on previous writes,
> you need to wait for them to be completed first.

This kind of locking should already be handled by the filesystem, I would 
think. That is, if it knows that a particular read depends on a particular 
write, it can block that read itself, or block the whole FS until the write 
is synced.

It looks like I'm misunderstanding, and there actually is no parallel access, 
and it is the IO scheduler which is deciding to do all these writes in a row 
first before trying a read. Does that make sense?

> > What I don't get is this:
> >
> > dd if=/dev/drbd0 of=/dev/null bs=1M count=1 skip=1
> >
> > That one should be instantaneous,
>
> no, since "skip" for dd litteraly means skip, it does not mean seek.
> it means read /dev/drbd0 from the start (offset 0), throw away the first
> megabyte, then ouput the second megabyte.

That is not true (as Luciano Rocha clarified). And by the way, if it were, it 
would mean that dd has no way to seek on the input file, since dd's "seek" is 
on output.

> > even while the other two are blocking. This behavior may well not work
> > for a clustered filesystem, like OCFS2. However, it would work very
> > well for a single-machine filesystem, like ResierFS -- and it seems
> > like this should be possible whenever allow-two-primaries is disabled.
>
> again, if you want short read latency, tune your system that way.
> reads in drbd do not block on writes.

In other words, you could say they are asynchronous (non-blocking) with 
respect to writes. Is my language here really so difficult to understand?

> if it apears to you that way, I'm sorry. but they don't.

> tune the io scheduler(s).
> get the queue length down.

Thanks. That's what I needed to know, I think. It looks like you're telling me 
that DRBD is behaving the way I want it to, and it is the block device or IO 
scheduler which is causing this behavior.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070605/6ca73313/attachment.pgp>


More information about the drbd-user mailing list