[DRBD-user] Cluster filesystem question

Florian Haas florian at hastexo.com
Wed Nov 30 09:31:32 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Nov 30, 2011 at 2:08 AM, John Lauro <john.lauro at covenanteyes.com> wrote:
> Hmm, I just reread that article...  it sounds like it's just paranoid about split brain.
>
> Wouldn't protocol C cause the I/O (or at least write I/O) to fail if the link went down?  or at least have that as an option of how to configure drbd?  From the manual for protocol C:  Synchronous replication protocol. Local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed. So split brain shouldn't be possible if network loss between the two active nodes, or else protocol C is being violated.  You should have to manually tell drbd that the other node is down.

For a very, very, very odd definition of high availability. Node dies,
and now we should have to intervene for the other node to take over?

Protocol C replication is synchronous, _unless disconnected_. Please
reread the documentation about disconnected mode:

http://www.drbd.org/users-guide-8.3/s-node-failure.html

Note that in a shared-nothing environment, and that's what DRBD is,
any one node has no way to tell network failure from remote node
failure -- all it can tell is that the peer no longer responds.

Now, in dual-Primary mode disconnection _automatically_ and by
definition means DRBD split brain. You get data divergence, your data
sets are no longer synchronous, or in Lars' words, "it blows up in
your face". Unless you freeze, at the moment of disconnection, all I/O
on the device, and then evict one node from the cluster by a forced
power-down. This is what the "fencing resource-and-stonith" option is
intended for, and in combination with a fence-peer handler that would
_immediately_ kick the node from the cluster would prevent this split
brain from happening. Of course, it also amounts to a cluster
shoot-out every time you have the slightest hiccup on your replication
network.

And that only applies to the DRBD side of things, but as Lars said,
you'd also need a cluster aware iSCSI target to make the "multipath
iSCSI target on dual-Primary DRBD" scenario work.

Insanely, some people do this dual-Primary-with-iSCSI-target dance to
_avoid_ what they think is the complexity of a cluster manager, and
forgo the cluster configuration altogether. You can't do that. Nothing
would be there to coordinate your fencing, if you did.

Cheers,
Florian

-- 
Need help with DRBD?
http://www.hastexo.com/knowledge/drbd



More information about the drbd-user mailing list