[Drbd-dev] New Features

Sun Sep 21 01:04:44 CEST 2008

This is pretty much what I was thinking.  For the btree, generations, 
and ref-count a good example to look at is how btrfs (This is the other 
project I have started to mess with) works.  The design is very 
efficient and I think we could use a very close match for our setup.  I 
haven't read the paper you sent yet but will get to that today.

Let me know how you would like to start and I can start working a proof 
of concept and we can see how to go from there.

-Morey

Lars Ellenberg wrote:
> On Sat, Sep 20, 2008 at 03:18:21PM +0200, Lars Ellenberg wrote:
>
> "Write-Back" cache:
>
> some things to think of when introducing a write back cache,
>  * need to do some cache coherency protocol
>  * need to track which block is where, so we can read the correct
>    version in case it has not yet been committed to final location
>  * if using a ram buffer as log disk, we need to track the latest
>    position for overwrites.
>  * if we have stages, i.e. ram buffer first, then log disk, then real
>    storage, we are the most flexible.
>    if peers are sufficiently close, we can send_page from the ram
>    buffer (and calculate checksums there, for data integrity).
>    if we use it as ring buffer, we'd not have to worry about
>    inconsistencies resulting from changes to in-flight buffers,
>    as they are all private.
>  * if we can use some efficient combination of digital tree,
>    btree and hash table to track which block is where,
>    we might be able to track a large, staged log device
>    as a sort of log-structured block device, making snapshots after the
>    fact for data generations still covered by the log very easy.
>  * we need a good refcount scheme on the ram buffers.
>
> of course we can start out "simple", and just provide a static cache, no
> ring buffer or anything.
>
> this should probably be implemented as a generic device-mapper target,
> which also makes testing much easier.
>
> which would make it possible to even add it to the current drbd
> by just stacking it in front of the "lower-level device".
>
> for the "write-back" to "write-through" change,
> we only need a minimal change in the current drbd module, which we can
> enable based on the type of the device directly below us.
> we could detect whether its a device-mapper target,
> if so, which one, and access its special methods if any.
>
> this still sound a little quirky, so I'd suggest to introduce a special
> BIO_RW_WRITE_THROUGH (to be defined) bit for the bi_flags.
>
> when not using it, that is write back.
> when using it, it would trigger a flush of any pending requests,
> and a direct remapping to the lower level device.
>
> BIO_RW_BARRIER requests would still need to trigger a flush as well,
> and to go straight through.
>
> in the new architecture, where "drbd" probably becomes just a special
> implementation and collection of device-mapper targets, communicating
> with other device mapper targets becomes more easy (I hope).
>
> does that make sense?
>
>