[DRBD-user] Re: [DRBD-announce] drbd-0.7.0.tar.gz

Fri Jul 16 20:07:00 CEST 2004

> hi,
> 
> I assume it will compile and run on Kernel 2.6.

DRBD 0.7 will compile and run on recent 2.4 and 2.6.

linux kernel 2.6. ist our main development target now,
and that is what we do our internal tests on.

the main difference for you as DRBD users is:

  * we can do a typical FULL SYNC within TWO MINUTES.

  * we need extra block device storage for meta-data

  * we will be able to do rolling upgrades of drbd when we need to,
    i.e. if we change some drbd-protocol internals, drbd will be able to
    talk to its predecessor, and therefore to its successor, too.

details:

  for each configured DRBD, we now require 128MB reserved storage on
  some block device for our meta-data.
  this may be the same block device which is the "lower level device" or
  "backing storage" or whatever; we call this "internal meta-data".
  or this may be some other, dedicated, block device, where you can put
  the meta-data for several DRBD; we call this "external meta-data", and
  you have to explicitly specify an index for it (so DRBD knows which
  128MB chunk is meant).

  we store there:

    generation counters
      used to determin which node has the best (i.e. consistent, most
      recent, and up-to-date) data.

    activity log
      a nice implementation of a list of least recently used "extents",
      where an extent is a 4MB big chunk of the storage. any extent that
      is written to will first be added to this log.
      so when we crash, we still need a "full sync", only we just *know*
      that we do not need to consider extents which are not in the log.
      so by limiting the log size to 257 extents, you have to sync only
      about 1GB of data, regardless of actual device size, which may be
      1TB or larger.

    persistent bitmap
      so what I just wrote is not exactly true, since after a crash,
      the other node will take over, and start modifying data, changing
      its activity log. but we remember in our bitmap (with granularity
      of 4KB blocks per bit) which blocks we modified, and before we
      change the log, we write out the bitmap to persistent storage.
      so still, we later can skip those blocks of which we know that
      they are still untouched.

  128MB (-some KB for the gen-counts and act-log) bitmap allows
  for ca 4TB (4294967296 kB) maximum storage per DRBD.

  maximum size of the "activity log" is currently 3843 extents, and
  allows for an active set size of ~15GB, (that will be synced on crash).

  if we assume for now an average sync throughput of 10MB/sec (with good
  harware you should be able to get 60-90 MB/sec, or even more),
  synching 1GB (~ 257 active extents) will take ca. 100 seconds.

    ****
    *** That means that a "typicall full sync" 
    *** will take less than two minutes now!
    ****

  a _real_ full sync (i.e the entire storage) is only neccessary,
  if you have to replace the physical storage device in one of the
  nodes, or the complete node, or if you run one node StandAlone for a
  decent amount of time, and it changed all storage blocks. 

  it will typically take place anyways when you first setup the devices.
  (though one can avoid it...
   details upon request or in some online guide ytbw...)

it does *NOT* yet support concurrent access.
to access the device, still the node has to be in Primary state for it.

we probably have still some races with state changes, when DRBD
internally decides to change state (because it detects network failure
or something) and there is an operator (drbdadm/drbdsetup) request
*at the very same time*... these are not easy to reproduce, and
difficult to get rid of. but this is currently the only known (to me)
issue we have with drbd 0.7.

 ***
 ** so, go ahead, get it, try it, beat it as hard as you can.
 ** and tell us how it behaves, and when it breaks, (if it breaks).
 ***

ah, and LinBit of course are happy when you checkout some support
contract... or want to do some donation to the project... or want to
contribut in one way or an other.

	Lars Ellenberg