[Drbd-dev] New Features
moreyroof at gmail.com
Sun Sep 21 01:04:44 CEST 2008
This is pretty much what I was thinking. For the btree, generations,
and ref-count a good example to look at is how btrfs (This is the other
project I have started to mess with) works. The design is very
efficient and I think we could use a very close match for our setup. I
haven't read the paper you sent yet but will get to that today.
Let me know how you would like to start and I can start working a proof
of concept and we can see how to go from there.
Lars Ellenberg wrote:
> On Sat, Sep 20, 2008 at 03:18:21PM +0200, Lars Ellenberg wrote:
> "Write-Back" cache:
> some things to think of when introducing a write back cache,
> * need to do some cache coherency protocol
> * need to track which block is where, so we can read the correct
> version in case it has not yet been committed to final location
> * if using a ram buffer as log disk, we need to track the latest
> position for overwrites.
> * if we have stages, i.e. ram buffer first, then log disk, then real
> storage, we are the most flexible.
> if peers are sufficiently close, we can send_page from the ram
> buffer (and calculate checksums there, for data integrity).
> if we use it as ring buffer, we'd not have to worry about
> inconsistencies resulting from changes to in-flight buffers,
> as they are all private.
> * if we can use some efficient combination of digital tree,
> btree and hash table to track which block is where,
> we might be able to track a large, staged log device
> as a sort of log-structured block device, making snapshots after the
> fact for data generations still covered by the log very easy.
> * we need a good refcount scheme on the ram buffers.
> of course we can start out "simple", and just provide a static cache, no
> ring buffer or anything.
> this should probably be implemented as a generic device-mapper target,
> which also makes testing much easier.
> which would make it possible to even add it to the current drbd
> by just stacking it in front of the "lower-level device".
> for the "write-back" to "write-through" change,
> we only need a minimal change in the current drbd module, which we can
> enable based on the type of the device directly below us.
> we could detect whether its a device-mapper target,
> if so, which one, and access its special methods if any.
> this still sound a little quirky, so I'd suggest to introduce a special
> BIO_RW_WRITE_THROUGH (to be defined) bit for the bi_flags.
> when not using it, that is write back.
> when using it, it would trigger a flush of any pending requests,
> and a direct remapping to the lower level device.
> BIO_RW_BARRIER requests would still need to trigger a flush as well,
> and to go straight through.
> in the new architecture, where "drbd" probably becomes just a special
> implementation and collection of device-mapper targets, communicating
> with other device mapper targets becomes more easy (I hope).
> does that make sense?
More information about the drbd-dev