Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Monday 04 June 2007 08:56:41 Lars Ellenberg wrote: > you talk about DRBD + OCFS2 on a low bandwidth high latency link, > and you want to improve on the "sluggish" behaviour by making > drbd replication asynchronous, buffering writes for minutes. Except I gave up OCFS2 when I discovered that older versions like to kernel panic when they lose a connection. Newer versions are nicer -- they only reboot. But since one end is on a server which runs more than just a DRBD cluster, I cannot afford to bring down the ENTIRE machine just because OCFS2 is lagging. Also, not entirely. I want writes to be asynchronous with respect to reads, not with respect to other writes. In other words: dd if=/dev/zero of=/dev/drbd0 bs=1M count=1 The above can hang as long as it takes to get across the network. And if, at the same time, I'm trying to do this: dd if=/dev/drbd0 of=/dev/null bs=1M count=1 Then sure, let that one hang, too. What I don't get is this: dd if=/dev/drbd0 of=/dev/null bs=1M count=1 skip=1 That one should be instantaneous, even while the other two are blocking. This behavior may well not work for a clustered filesystem, like OCFS2. However, it would work very well for a single-machine filesystem, like ResierFS -- and it seems like this should be possible whenever allow-two-primaries is disabled. > > I think I did, hence the dd statistics. > > > > Alright, if it makes you feel better, here is the problem, with no > > implementation details: When the DRBD device is busy over a high-latency, > > low-bandwidth network (VPN), even when the local device is consistent and > > up to date, local reads can block for five or ten MINUTES at a time. > > well. > use a file system then, that does not hang minutes in getdentry, > even on local storage. > ok, now, that was only an educated guess on my part. It does not, when the devices are synced and the FS is not otherwise busy. > but, I suggest that the request queue is probably congested, probably you > try to use reiserfs which tends to write even on reads just to "rebalance > something". (well, at least it did that once. we don't use reiserfs too > much). I'm guessing the "write on reads" is atimes, which I have disabled anyway. I am using ReiserFS, but that is irrelevant, because of said dd benchmarks. I can pretty much guarantee that when told to read, dd does not also write (at least, I hope not!) I suppose one possibility is that ReiserFS is trying to write to that initial one megabyte that I was trying to read, but that seems a bit coincidental. I think when I get the chance, I might try shrinking the filesystem so that I have a bit of space at the end which I know is not being touched by it. > in any case, we are developing drbd "addons", or lets say a protocol > extension of drbd, to actually allow asynchronous replication with some > tunable backlog of minutes/hours. That would be perfect for this application -- if the writes can stay in the same order. The reason being, if DRBD re-orders these writes, and the primary node crashes while trying to replicate, the FS is corrupt. If the writes are in their original order, however, and the primary node crashes (or loses power, or blows up), the secondary node can simply replay the journal (on a journalling filesystem). That's actually why I didn't do one obvious solution -- I could have the backup script force a disconnect of the primary node, and reconnect when the backup is done. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 827 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070604/87a3ddad/attachment.pgp>