[DRBD-user] DRBD 8.4.2: "drbdadm verify" just do not work

Tue Sep 25 12:52:32 CEST 2012

On Mon, Sep 24, 2012 at 04:46:45PM +0200, Markus Müller wrote:
> Hi Florian,
> >>Does anybody see real bennefits of drbd against a solution with nbd
> >>and raid1?
> >
> >From Florian's blog from back in 2009: [...]
> thanks for that hints. I think I have some arguments for this, see
> it at the end of the mail.
> >I personally would like to see any possible issues fixed than
> >cobble together something like that. As such, we've been avoiding
> >using verify at all, because in the 8.3.x trees, using it often
> >caused DRBD to hang and require a full system reboot because the
> >kernel module would become completely unresponsive. I haven't had
> >any problems with it in 8.4 yet, but I'm still a little skittish
> >around the command in general.
> >
> >I've been invalidating hosts I know are bad. I too, would like to
> >see verify become reliable and useful long term.
> >
> If you have a bug, and fix it, this is comprehensible and okay. A
> productive ready software for data replication should not have often
> problems or bugs about inconsistent data, however it is still
> reasonable for free software. But I just cannot use a data
> replication software which has a buggy verification of the integrity
> of the data! This is not a question about if it is paying software
> or not, it is about if it is usable at all or not. If an operator
> get wrong informations about the integrity of its data when he
> explicit test about this, then this disqualifies the software at all
> in nearly any use case! You cannot rely on a software which maybe
> tell you everything is okay but this isn't for sure!
> 
> I think I will really switch to nbd and raid; causes:
> 
> Comparable features:
> - The File system has the same damages with drbd(mode A or B) as
> with nbd+raid; if the primary fully crashes or gets immediate power
> off you have missing data! -> No benefit for drbd

With protocol B, you do not have missing data on primary crash.
With asynchronous replication, you may, that is right.
But you explicitly asked for it.

There is no equivalent to DRBD protocol B in MD.

The write-mostly + write-behind feature of MD may go in the same
direction as DRBD protocol A.
But as far as I can see, there is no equivalent to the DRBD
write-ordering, so it seems to suffer from the same "lower layers may
potentially reorder stuff" issue.

MD limits the impact of this potential reorder issue by limitting the
number of "behind" requests to 256 (by default).  But this does not
avoid the issue, only makes it less likely to trigger.

Whether or not it will trigger is very much dependent on the actual
IO subsystem behind that "nbd" (or iscsi or whatever) device,
and what the relevant layers may do.

DRBD gets the ordering right, by analysing the requests, and inserting
"DRBD protocol barriers" between reorder domains whenever it detects
a potential write-after-write dependency.
Ok, so drbd had this bug with its "barrier" (wo:b) mode, on kernels
where this had to be mapped to drain + FLUSH/FUA, when it assumed no
drain would be necessary; but that is fixed now again.
When barriers still mapped to barriers, or for the drbd "flush" and
"drain" modes (wo:f, wo:d), it got it right all along.

> - Only reading to local disc and writing to both sides is also
> available on md-raid -> No benefit for drbd
> - Reading also on the secondary node if the local node gets faulty
> is also available on md-raid -> No benefit for drbd
> 
> Features only from drbd:
> - Auto re sync/ management features / split brain: This is a realy
> benefit but with a bad feeling not able to verify this.

So you have bad feelings. Cannot argue with that.

But the rest of this paragraph is nonsense:

> You have to
> do always a full re sync if any problems might be there, if you want
> to be sure about your data. You don't have an "high availability
> cluster" if you have to shutdown everything and use a custom tool to
> verify if it is in sync, because the build in tool is completely
> broken. (Mode C might give you this ha re sync benefit today, but it
> unusable in most use cases because of its bad performance (2-3 M
> Byte/sec))

I see 100 times that claimed "bad" performance, and beyond.

> Features only from nbd+raid:
> - Nbd+raid can verify it it is in sync (echo check >sync_action to
> /sys/devices/virtual/block/mdX/md/sync_action).

I maintain that DRBD verify does work, though...

> - Nbd+raid has much better performance than drbd.

This is a very broad statement, which I think won't hold true in general.

In my experience, on identical hardware, with comparable configuration,
you'd get comparable performance.
With comparable configuration I mean enabled write-intent bitmap on MD.

Note that MD bitmap updates are not forced to stable storage, afaik, so
you must have a battery backed write cache (or all volatile buffers and
caches disabled).  The equivalent DRBD setting would be to disable all
but "drain" mode.

> - Nbd+raid is much easier to configure; you often do mdsetup anyway,
> now you just do a parameter to let nbd only write and add the remote
> device to your raid.

It is always easier to do something you are already used to.

some goodies of MD not mentioned yet:
- MD can mirror to an arbitrary number of "disks".
  (but good luck in reassembling the bits and pieces "correctly"
   if this breaks in various ways)
  (drbd is still working on the N>2 node case, and making progress)
- recent MD does a "bad blocks" list, so for a (limited) number of
  blocks can deal with IO errors without immediately kicking the array
  member.
  (not sure if that is a feature that should be implemented in a
  replicating solution as DRBD at all...)

Anyways.

MDs intention is RAID.
DRBDs intention is replication.
Though those two are similar in many ways,
they are not the same thing.

MD is a powerful tool.
Used correctly it can to many fine things for you,
including replication.

As I said several times, if it works for you, use it.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com