[DRBD-user] 3-node active/active/active config?

Tue Nov 17 18:50:54 CET 2009

On Tue, Nov 17, 2009 at 10:10:13AM -0500, Jiann-Ming Su wrote:
> On Tue, Nov 17, 2009 at 4:42 AM, Lars Ellenberg
> <lars.ellenberg at linbit.com> wrote:
> > On Sun, Nov 15, 2009 at 01:08:30AM -0500, Jiann-Ming Su wrote:
> >> Don't know if this has been discussed before, but it seems like with
> >> 8.3's dual primary support, it's possible to set up a 3-node, all
> >> active drbd cluster.
> >>
> >> Each node would be dual primary with each other node.  Node A would
> >> form drbd0 with node B and drbd1 with node C.  Node B would have drbd0
> >> with A and drbd2 with C.  And, C would have drbd1 with A and drbd2
> >> with B.
> >>
> >> Then you can use software raid to create md0 on each node with its
> >> respective drbd devices.
> >>
> >> Noce A:  md0 (drbd0, drbd1)
> >> Node B:  md0 (drbd0, drbd2)
> >> Node C:  md0 (drbd1, drbd2)
> >>
> >> Then format md0 on each node with a cluster filesystem like ocfs2 with
> >> each node as a member of the cluster.
> >>
> >> Do the drbd experts here have any thoughts on how this idea would go
> >> horribly wrong?  Thanks for any input and feedback.
> >
> >
> >  A-MD(A-DRBD0(A-sdx), A-DRBD1(A-sdy))
> >  B-MD(B-DRBD0(B-sdx),                 B-DRBD2(B-sdy))
> >  C-MD(                C-DRBD1(C-sdx), C-DRBD2(C-sdy))
> >
> >
> > So you write block N on A-MD.
> > It will hit A-DRBD0, and A-DRBD1,
> >  which will hit local disks A-sdx, A-sdy,
> >  and via DRBD replication B-sdx, and C-sdx.
> >
> > Still with me?
> > Great.
> >
> > Now, you want to read that block N on B,
> > which decides in its read balancing path to fetch
> > block N from B-sdy.
> >
> > It will surely get _some_ data from there,
> > but certainly not the data you just wrote on A-MD.
> >
> > To summ it up in one word:
> > DON'T.
> >
> 
> Thanks for the insight!  So there's no easy way to verify a write has
> sync'd across all the nodes?  For the application I want to use this
> type of  3-node config on, I think I'm willing to sacrifice the
> performance for the data replication.

No. You did not understand.

It is not a question of performance.
Or whether a write reached all *nodes*.

In your setup, it is technically *impossible*
for a write to reach all lower level *disks*.

A sure way to data corruption.

Got me this time?

 ;)

So by all means: use one iSCSI on DRBD cluster,
and have any number of ocfs2 clients via iSCSI.
Or double check if NFS can do the trick for you.

again, your fancy cool and whatever setup won't work.
DO NOT DO THIS.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed