[DRBD-user] Down sync

Fri Jul 24 14:07:43 CEST 2020

On 7/24/20 12:22 PM, Juan Sevilla wrote:
> Hi,
>
> Thanks for your response. I need use primary/primary for using the
> storage blocks by clustered filesystem.

The question is whether you actually need a clustered filesystem in the
first place. That is why I asked about the use-case and applications
running on those systems.

> In general this configuration is running ok, also when i do a
> intensive using of the replicated local disks.

With a resource that is replicated between more than two nodes while two
nodes are in the Primary role, data could be corrupted, maybe not during
replication, but possibly if a node has an outage, or if data is
resynced later. This is not a supported configuration.

> My problem is that, occasionally, appears a disconnect between nodes.

That should normally lead to an immediate power-off of the node that was
lost. If it doesn't, then it's misconfigured, and the result is at least
a split-brain situation, apart from the potential data corruption due to
what I wrote above.

E.g., let's assume node A is Primary, node B is Primary, node C is
Secondary.
Now A disconnects from B, but both are connected to C, and applications
still read and write data on A and B.
A is missing the data that's being written on B, and read requests on A
read old data after an update of that same data on node B.
The same is true the other way around. So the result is, that you have
two different unrelated data sets on A and B, and the state of any
applications that rely on the cluster filesystem may be corrupted.
But then, node C is even more interesting, because that one gets updates
from both, node A and node B, which have diverged. So the data on node C
could be a completely corrupted mix of unrelated updates from A and B,
which may even corrupt the filesystem's data structures, thereby making
the filesystem unreadable.
Upon reconnect, what is supposed to happen?
Node A and Node B are split-brained and cannot sync. Even if you sync
those two, node C's data cannot be recovered, so you would have to
full-sync it for the data on it to make any sense again.

And that's just the tip of the iceberg with regards to the background
story on why Dual Primary multi-node-clusters are opening Pandora's box
in many interesting ways...

>
> I don't know what you want say when you refer to multiresource
> active/active. Is it a alternative to dual primary/primary.

Multiple resources/volumes. Some Primary on node A, others Primary on
node B. Normally grouped with applications that are independent.
E.g. two different database instances that can run on different nodes.
Instead of keeping those on the same filesystem, each DB instance gets a
separate mountpoint for its data, and each mountpoint is backed by a
separate DRBD resource.
Resource db1 is Primary with database instance 1 running on node A,
Secondary on nodes B and C.
Resource db2 is Primary with database instance 2 running on node B,
Secondary on nodes A and C.
That's a multi-resource active/active setup.

br,
Robert