[DRBD-user] Real live risk of data loss w/o flush

Wed Sep 8 16:13:04 CEST 2010

On Mon, Aug 09, 2010 at 11:08:22AM +0200, Sebastian Hetze wrote:
> Hi *,
> 
> I would like to get more info about the real live risk of data loss
> with DRBD using no-disk-barrier and no-disk-flushes on RAID
> controllers without BBU.
> 
> If I understand things correctly, DRBD adds barriers into the data
> stream from primary to secondary (at least) on each flush of the
> underlying primary device.

No, after any IO completion visible to upper layers.

> Without barrier support it flushes the
> secondary on each flush of the primary.  This happens to make shure,
> subsequent operations that rely on the data to be commited on disk
> find the same state on the secondary in case of a failover.
> 
> If I use a RAID controller with BBU, that takes care for all data that
> has reached the controller cache to survive (some) crashes or power
> failures.
> 
> But what are the scenarios where I really suffer data loss without
> BBU?  And is my risk of data loss hihger with DRBD than it would be
> without?
> 
> The primary use case for DRBD as I see it is failure of one node in
> the cluster that leads to a failover to the secondary. In this case we
> have one survivor and this survivor has plenty of time to flush all
> data from the cache buffer to its disk before the failover proceeds.
> And reads would give me the cached data meanwhile.
> The benefit I get from the BBU in this situation is this flush time.
> After that time, the data on disk is exactly the same, so there is no
> additional protection against data corruption that might arise from
> faulty data sent by the primary during the crash. As soon as this data
> is in secondary cache it will be written to disk sooner or later.
> 
> If this is correct so far,

I think so.

> the remaining risk is simultanious
> (power-)failure of both nodes.

Not quite.
There is also the single node crash.

If the secondary crashes, then later comes back,
we need to resync everything that _may_ be different.

That includes anything that is changed since we lost the Secondary.
It also includes everything that has been in flight to the Secondary.
AND it includes everything that may have been lost due to volatile
caches.

Similar for primary crash.  The amount of data that may have been lost
due to volatile caches, but is no longer covered by the activity log
may fail to be resynced, as DRBD has no further way to track that.

> If this happens, there are several
> causes of trouble.  I suffer real service downtime although I have
> spent so much money for high availability. I might get asked why I did
> not spend the little extra money on independent UPS for both nodes.
> Data on the secondary might have been written out of order leading to
> an inconsistent state. On the primary, without BBU an queued flush
> might have succeeded or not, but the write order is correct.

With volatile caches involved, and without cache flushes at appropriate
times, you can forget about write ordering. It can no longer be
controlled by the OS.

> I will likely suffer data loss in this scenario, but there is no
> additional risk by using DRBD.
>
> On boot after (power-)recovery the
> primary needs a file system check to cleanup possible damage but this is
> exactly the same risk as in the standalone case.  Even with BBU (on the
> primary) in this scenario I would rely on the primary data more than on
> the secondary. So the only case where I would really get extra
> reliablity from barriers and in order flushes on the secondary would be
> if only my secondary has a BBU and the primary does not.

Only that with volatile caches without flushes, now there is potential
for DRBD to "forget" to resync parts of the disk that would need to be
resynced, as the corresponding data has been lost in volatile caches.
So you get potential for data divergence without DRBD having any chance
to know about it.

> What is your opinion and possibly your experience with using
> no-disk-barrier and no-disk-flushes without BBU RAID?  The reason for
> me asking is the huge latency I suffer using flushes in my setup
> where I run several virtual KVM instances in DRBD containers without
> BBU RAID. These virtual systems frequently flush disks and these
> operations occasionally queue up to a substantial epoch of 100 or even
> higher.

Try disable all barriers and flushes,
but put the disks in write-through instead
(and hope that they don't forget about that setting).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed