[DRBD-user] Cluster filesystem question

Thu Dec 1 20:31:47 CET 2011

On Thu, Dec 01, 2011 at 02:18:42PM -0500, Digimer wrote:
> On 12/01/2011 02:13 PM, Lars Ellenberg wrote:
> > On Thu, Dec 01, 2011 at 01:58:15PM -0500, Kushnir, Michael (NIH/NLM/LHC) [C] wrote:
> >> Hi Lars,
> >>
> >> I'm a bit confused by this discussion. Can you please clarify the difference?
> >>
> >> What I think you are saying is:
> >>
> >> OK:
> >> Dual-primary DRBD -> cluster aware something (OCFS, GFS, clvmd, etc...) -> exported via iSCSI on both nodes -> multipathed on the client
> > 
> > No.
> > 
> > OK:
> > Dual-primary DRBD (done right) -> cluster aware something (OCFS, GFS, clvmd, etc...)
> > 
> > NOT OK:
> > -> exported via iSCSI on both nodes -> multipathed on the client
> > 
> > NOT OK:
> > anything non-cluster-aware using it "concurrently" on both nodes.
> 
> What I've done in the past, and perhaps it isn't the wisest (Lars,
> Florian?), is to create a Dual-primary DRBD (with fencing!), then export
> it as-is to my nodes using a floating/virtual IP address managed by a
> simple cluster.
> 
> Then on the clients (all of whom are in the same cluster), I mount the
> iSCSI target and set it up as a clustered LVM PV/VG/LVs. If you need a
> normal FS, then format one or more of the LVs using a cluster-aware FS.
> 
> When the primary node (the one with the floating IP) fails, all the
> cluster has to do is move the IP down to the backup node and it's ready
> to go.

And that's where you made it "OK" again: you arbitrate which side you
talk to by having the IP available on one node only.
The targets are not used "concurrently".

> I suppose you could just as easily do Primary/Secondary and
> include the promotion of the backup to primary as part of the failover,
> too.

Yes, and that would be the recommended approach, obviously.

Depending on how you configure your iSCSI targets, the way you do (did?)
it, you could even run into cache inconsistencies: if you go through
page cache/buffer cache, you need some layer responsible
for cache coherence, but this setup has none.
(in ietd speak: only blockio allowed; similar for other targets).

> In my case, knowing I had fencing in place already, I went for the
> "simpler" cluster config of managing an IP only.
> 
> Caveat - I did not read the thread before now. If this is totally out to
> left field, my apologies. :)

The original question was somehow cluster file system related,
someone suggested that dual-primary DRBD + two independend iSCSI
targets + multipath or MC/S on the initiator side might be an option.

What we try to explain here,
and apparently fail at explaining good enough,
is that an initiator, regardless of multipath or MC/S,
assumes (and relies uppon) that it talks to *ONE AND THE SAME* target
(via multiple paths), but now in fact talks to two different,
independend targets, that do not know about each other.

And that can not work.

> -- 
> Digimer
> E-Mail:              digimer at alteeve.com
> Freenode handle:     digimer
> Papers and Projects: http://alteeve.com
> Node Assassin:       http://nodeassassin.org
> "omg my singularity battery is dead again.
> stupid hawking radiation." - epitron

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com