[DRBD-user] The Problem of File System Corruption w/DRBD

Thu Jun 3 20:11:16 CEST 2021

On 03/06/2021 13:50, Eric Robinson wrote:
>> -----Original Message-----
>> From: Digimer <lists at alteeve.ca>
>> Sent: Wednesday, June 2, 2021 7:23 PM
>> To: Eric Robinson <eric.robinson at psmnv.com>; drbd-user at lists.linbit.com
>> Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD
>>
>> On 2021-06-02 5:17 p.m., Eric Robinson wrote:
>>> Since DRBD lives below the filesystem, if the filesystem gets
>>> corrupted, then DRBD faithfully replicates the corruption to the other
>>> node. Thus the filesystem is the SPOF in an otherwise shared-nothing
>> architecture.
>>> What is the recommended way (if there is one) to avoid the filesystem
>>> SPOF problem when clusters are based on DRBD?
>>>
>>> -Eric
>>
>> To start, HA, like RAID, is not a replacement for backups. That is the answer
>> to a situation like this... HA (and other availability systems like RAID) protect
>> against component failure. If a node fails, the peer recovers automatically
>> and your services stay online. That's what DRBD and other HA solutions strive
>> to provide; uptime.
>>
>> If you want to protect against corruption (accidental or intentional, a-la
>> cryptolockers), you need a robust backup system to _compliment_ your HA
>> solution.
>>
> 
> Yes, thanks, I've said for many years that HA is not a replacement for disaster recovery. Still, it is better to avoid downtime than to recover from it, and one of the main ways to achieve that is through redundancy, preferably a shared-nothing approach. If I have a cool 5-node cluster and the whole thing goes down because the filesystem gets corrupted, I can restore from backup, but management is going to wonder why a 5-node cluster could not provide availability. So the question remains: how to eliminate the filesystem as the SPOF?
> 

Some of the things being discussed here have nothing to do with drbd. 
drbd provides a raw block level device. It knows nothing about nor cares 
what layers you place above it, whether they be filesystems or some 
other block layer such as LVM or bcache.

It does a very specific job; ensure the blocks you write to a drbd 
device get replicated and stored in real time on one or more other 
distributed hosts. If you write a 512byte size block of random garbage 
to a drbd device it will (and should) write the exact same garbage to 
the other distributed hosts too, so that if you read that same 512byte 
block back from any 1 of those individual hosts, you'll get the exact 
same garbage back.

The OP stated "if the filesystem gets corrupted, then DRBD faithfully 
replicates the corruption to the other node." Good! That's exactly what 
we want it to do. What we definitely do NOT want is for drbd to 
manipulate the block data given to it in any way whatsoever, we want it 
to faithfully replicate this.