Fwd: Re: [DRBD-user] lvm2, striping, and drbd

Mon Aug 27 22:30:19 CEST 2007

On Mon, Aug 27, 2007 at 07:58:47AM -0700, kbyrd-drbd at memcpy.com wrote:
> On Mon, 27 Aug 2007 07:37:42 +0200, Lars Ellenberg
> <lars.ellenberg at linbit.com> wrote:
> > On Sun, Aug 26, 2007 at 06:00:16PM -0700, kbyrd-drbd wrote:
> >>
> >> I haven't used drbd yet, I'm getting ready to test and deploy it.
> >> I've searched the archives I've seen various posts about md and
> >> lvm2 and I know some of this has been covered. But, I'm confused
> >> about the current state of things with 0.8
> >>
> >> I'd like something that feels like clustered RAID1+0 (that's
> >> striping on top of drbd). My ideal plan: lvm2 striping on top of
> >> four drbd pairs active/active pairs. I'd run GFS on top of this.
> >> Do I need cLVM instead of LVM2 for this. Does LVM striping even
> >> work with drbd?
> >
> > don't.
> >
> > DRBD does not (yet) support consistency bundling of several drbd.
> > so whenever you have a connection loss, your four devices will
> > disconnect in a slightly different order.
> > consistency of your stripe set cannot be guaranteed.
> >
> >
> 
> 
> Thanks for trudging though this again, Is this on "network urrp,
> followed by full box failure"? I saw your earlier post in the
> archives about disconnection at  slightly different times and
> somehow I read into it more than two nodes (drbd+ or something).
> You mean some writes will complete on  /dev/drbd3 but not on
> /dev/drbd1 (as an example).

yes.

even on switchover, no failures involved,
drbd could not guarantee consistency over several devices,
currently you just cannot switch several drbds from/to
Primary/Secondary at exact the same time.

it would only work if all components are cooperative.
it would work because you umounted the filesystem first,
and nothing will change the block device(s) afterwards.

at this point, I recommend to have the logical things
you put on one drbd self-contained.

> Does protocol C help this at all?

no.

not without also implementing io-freeze,
and coordinating the different devices.

> Based on my own instinct and readings in this archive, RAID1+0
> (striping on top of drbd) jsut seems more "right'. because it could
> withstand more single drive failures (as long as no single drbd pair
> went out) and drbd re-syncing from failure would be faster. That is,
> when I replace a down drive, drbd only has to sync from the pair vs.
> re-syncing the entire stripe set from the other side. I thought I
> read that "md" would have problems but that LVM2 or cLVM would be
> ok. It appears I just wanted it to be so, but it is not.

try it.  it may even work well.
I see some race conditions where things could go wrong.
but I am slightly paranoid in that regard :)

> > I also think you'd get better performance out of drbd on top of raid.
> >
> 
> I suppose that's because we're sending less "writes" across the wire?
> That is drbd is sending a 16K write instead of 4 4K stripe chunks
> with all their acks?

drbd is currently fine with up to 32k bios.
we'll probably increase that sooner or later.

> So RAID0+1 then. Any difference using LVM striping vs md? I suppose
> There's no problem with running md on each node because as far as
> each md knows, there's just data coming from drbd. It's ignorant of
> the other side? Am I right in saying that in this situation if a
> drive  goes down, the whole local stripe set goes down and I'm doing
> all I/O across the network? Then when I replace one drive, doesn't
> the whole stripe set (all 4 drives worth of data) have to be synced
> across the network? I guess I'd like to hear from people that run
> md or lvm underneath drbd. When I searched the archives, I filtered
> these out, looking specifially for the other case.

I'd always use a local raid level with redundancy.

but yes, if your raid passes io errors up to drbd, drbd normally would
detach from that "disk", and when you latter attach it again, it would
assume you replaced the "whole" disk, and do a full resync.

with deep understanding of drbd meta data and resync process, as well as
the exact layout of the underlying raid, one could trick something.

thanks for the feature request :)

> In this RADI0+1 case, I don't see how LVM buys me anything
> underneath drbd. Wouldn't adding capacity later underneath drbd
> cause problems?

no, why?

> On the re-sync issue, I guess I could run two separate drbd sets,
> dividing my data across the two so that any resync only had 2 drives
> to deal with.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.