[DRBD-user] The Problem of File System Corruption w/DRBD

Mon Jun 7 17:07:32 CEST 2021

> -----Original Message-----
> From: Gionatan Danti <g.danti at assyoma.it>
> Sent: Sunday, June 6, 2021 11:02 AM
> To: Eric Robinson <eric.robinson at psmnv.com>
> Cc: Robert Altnoeder <robert.altnoeder at linbit.com>; drbd-
> user at lists.linbit.com
> Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD
>
> Il 2021-06-04 15:08 Eric Robinson ha scritto:
> > Those are all good points. Since the three legs of the information
> > security triad are confidentiality, integrity, and availability, this
> > is ultimately a security issue. We all know that information security
> > is not about eliminating all possible risks, as that is an
> > unattainable goal. It is about mitigating risks to acceptable levels.
> > So I guess it boils down to how each person evaluates the risks in
> > their own environment. Over my 38-year career, and especially the past
> > 15 years of using Linux HA, I've seen more filesystem-type issues than
> > the other possible issues you mentioned, so that one tends to feature
> > more prominently on my risk radar.
>
> For the very limited goal of protecting from filesystem corruptions, you can
> use a snapshot/CoW layer as thinlvm. Keep multiple rolling snapshots and
> you can recover from sudden filesystem corruption. However this is simply
> move the SPOF down to the CoW layer (thinlvm, which is quite complex by
> itself and can be considered a stripped-down
> filesystem/allocator) or up to the application layer (where corruptions are
> relatively quite common).
>
> That said, nowadays a mature filesystem as EXT4 and XFS can be corrupted
> (barring obscure bugs) only by:
> - a double mount from different machines;
> - a direct write to the underlying raw disks;
> - a serious hardware issue.
>
> For what it is worth I am now accustomed to ZFS strong data integrity
> guarantee, but I fully realize that this does *not* protect from any
> corruptions scenario by itself, not even on XFS-over-ZVOL-over-DRBD-over-
> ZFS setups. If anything, a more complex filesystem (and I/O setup) has
> *greater* chances of exposing uncommon bugs.
>
> So: I strongly advise on placing your filesystem over a snapshot layer, but do
> not expect this to shield from any storage related issue.
> Regards.
>

That would require a model where DRBD is sandwiched between two LVM layers. First, the DRBD backing device is an LVM partition. Then we create an LVM partition on top of DRBD, and create our filesystem on top of that. I've tried that approach before and had very poor success with cluster failover, due to the LVM resource agent not working as expected, volumes going inactive when they should be active, etc. Maybe it's just too complex for my brain. 😊

If rolling snapshots are an acceptable solution, why not just periodically snapshot the whole drbd volume? Then, in the unlikely event that filesystem corruption occurs, fall back to the snapshot from before the corruption happened. I assume that would require a full drbd resync from primary to secondary, but that's probably easier than restoring from backup media.
Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.