[DRBD-user] The Problem of File System Corruption w/DRBD

Thu Jun 3 21:35:30 CEST 2021

> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com <drbd-user-
> bounces at lists.linbit.com> On Behalf Of Digimer
> Sent: Thursday, June 3, 2021 11:43 AM
> To: Robert Sander <r.sander at heinlein-support.de>; drbd-
> user at lists.linbit.com
> Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD
>
> On 2021-06-03 11:09 a.m., Robert Sander wrote:
> > Hi,
> >
> > Am 03.06.21 um 14:50 schrieb Eric Robinson:
> >
> >> Yes, thanks, I've said for many years that HA is not a replacement for
> disaster recovery. Still, it is better to avoid downtime than to recover from it,
> and one of the main ways to achieve that is through redundancy, preferably
> a shared-nothing approach. If I have a cool 5-node cluster and the whole
> thing goes down because the filesystem gets corrupted, I can restore from
> backup, but management is going to wonder why a 5-node cluster could not
> provide availability. So the question remains: how to eliminate the filesystem
> as the SPOF?
> >
> > Then eliminate the shared filesystem and replicate data on application
> > level.
> >
> > - MySQL has Galera
> > - Dovecot has dsync
> >
> > Regards
>
> Even this approach just moves the SPOF up from the FS to the SQL engine.
>
> The problem here is that you're still confusing redundancy with data
> integrity. To avoid data corruption, you need a layer that understands your
> data at a sufficient level to know what corruption looks like. Data integrity is
> yet another topic, and still separate from HA.
>

> DRBD, and other HA tools, don't analyze the data, and nor should they
> (imagine the security and privacy concerns that would open up). If the HA
> layer is given data to replicate, it's job is to faithfully and accurately replicate
> the data.
>

It seems like the two are sometimes intertwined. If GFS2, for example, about integrity or redundancy? But I'm not really asking how to prevent filesystem corruption. I'm asking (perhaps stupidly) the best/easiest way to make a filesystem redundant.

> I think the real solution is not technical, it's expectations management. Your
> managers need to understand what each part of their infrastructure does
> and does not do. This way, if the concerns around data corruption are
> sufficient, they can invest in tools to protect the data integrity at the logical
> layer.
>
> HA protects against component failure. That's it's job, and it does it well,
> when well implemented.
>

The filesystem is not a hardware component, but it is a cluster resource. The other cluster resources are redundant, with that sole exception. I'm just looking for a way around that problem. If there isn't one, then there isn't.

> --
> Digimer
> Papers and Projects: https://alteeve.com/w/ "I am, somehow, less
> interested in the weight and convolutions of Einstein’s brain than in the near
> certainty that people of equal talent have lived and died in cotton fields and
> sweatshops." - Stephen Jay Gould
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-
> user at lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.