Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sun, Jun 03, 2007 at 04:26:10PM -0500, David Masover wrote: > On Sunday 03 June 2007 10:18:30 Lars Ellenberg wrote: > > > For example, DRBD+OCFS2 on a pair of load-balancing, redundant webservers, > > > connected over relatively low bandwidth / high latency / the Internet -- now > > > try unpacking some large-ish new web package onto the shared partition. > > > Suddenly, every website running off of these would be DOWN until the transfer > > > was complete. > > > > > > That's a purely hypothetical situation, but really, how hard would it be to at > > > least have a reader thread and a writer thread? (Don't you already have that, > > > anyway? Maybe better locking or something?) > > > > it just _technically_ makes no sense at all. > > Would you like to explain why not? you talk about DRBD + OCFS2 on a low bandwidth high latency link, and you want to improve on the "sluggish" behaviour by making drbd replication asynchronous, buffering writes for minutes. well, you cannot run a cluster file system that expects a shared storage, and does its best to ensure cache coherency using a lock manager, on an asynchronously replicated block device. sorry. to use any cluster file system on top of drbd, you need to run drbd protocol C. and if you have high latency, that will not perform well. even then, probably the real source of latency and bad performance will be the lock manager in the cluster file system. bouncing locks over a high latency link is no fun. so it does not make sense. > > and that may be due to some misunderstandings about vocabulary used, > > and the technical implications those things have, and about what > > jobs the different components of a system have. > > Given my understanding of the system, I expected a certain, specific result > from my dd experiments. I got a different result. > > The behavior I want would make DRBD over poor connectivity (high latency, low > bandwidth) much more livable, so I'd like to think it is the correct > behavior. If so, there is a bug in DRBD, either in design or implementation. > > If not, I would like someone to explain to me why it is correct behavior for > it to take five minutes to read one megabyte from a fully synced and up to > date DRBD device. I did not refer to your original post about your bad read performance, even though I very much think that you do something very wrong there. I did directly refer to your cluster files system on high latency link with aynchronously replicated storage. > I think I did, hence the dd statistics. > > Alright, if it makes you feel better, here is the problem, with no > implementation details: When the DRBD device is busy over a high-latency, > low-bandwidth network (VPN), even when the local device is consistent and up > to date, local reads can block for five or ten MINUTES at a time. well. use a file system then, that does not hang minutes in getdentry, even on local storage. ok, now, that was only an educated guess on my part. but, I suggest that the request queue is probably congested, probably you try to use reiserfs which tends to write even on reads just to "rebalance something". (well, at least it did that once. we don't use reiserfs too much). so try a different fs, try different mount options, try different io schedulers, try tuning the respective io scheduler. monitor the io subsystem as well as drbd, and find out what actually happens for your linear reads of the device. find out what else is accessing those spindles at the same time, causing the delay. > I can see no reason whatsoever why, in this state, a read (which is only > allowed on the primary machine) should ever take longer than it would take to > read from the physical disk. well, it normally does not. normally it has performance very close to what an dm-linear mapping would have. so there is something very wrong in your system. And I am very confident it is not drbd. I know my benchmarks. in any case, we are developing drbd "addons", or lets say a protocol extension of drbd, to actually allow asynchronous replication with some tunable backlog of minutes/hours. but you obviously won't be able to use this for a cluster file system like ocfs2/gfs. cheers. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :