>> Yes, thanks, I've said for many years that HA is not a replacement for disaster recovery. Still, it is better to avoid downtime than to recover from it, and one of the main ways to achieve that is through redundancy, preferably a shared-nothing approach. If I have a cool 5-node cluster and the whole thing goes down because the filesystem gets corrupted, I can restore from backup, but management is going to wonder why a 5-node cluster could not provide availability. So the question remains: how to eliminate the filesystem as the SPOF?
> Then eliminate the shared filesystem and replicate data on application
> level.
> - MySQL has Galera
> - Dovecot has dsync
Even this approach just moves the SPOF up from the FS to the SQL engine.

The problem here is that you're still confusing redundancy with data
integrity. To avoid data corruption, you need a layer that understands
your data at a sufficient level to know what corruption looks like. Data
integrity is yet another topic, and still separate from HA.

DRBD, and other HA tools, don't analyze the data, and nor should they
(imagine the security and privacy concerns that would open up). If the
HA layer is given data to replicate, it's job is to faithfully and
accurately replicate the data.

I think the real solution is not technical, it's expectations
management. Your managers need to understand what each part of their
infrastructure does and does not do. This way, if the concerns around
data corruption are sufficient, they can invest in tools to protect the
data integrity at the logical layer.

HA protects against component failure. That's it's job, and it does it
well, when well implemented.

