[DRBD-user] Clarification on barriers vs flushes

Lars Ellenberg lars.ellenberg at linbit.com
Tue Oct 3 10:18:28 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On Tue, Sep 26, 2017 at 11:01:33PM +0200, Gionatan Danti wrote:
> Hi list,
> I would like to have a clarification how barriers and flushes work to
> preserve write ordering.

That is a bit hard to clarify,
because it changed a few times in Linux kernel.

"Today" the Linux block layer "contract" on write-after-write
dependencies (write-ordering, write order fidelity), is thus:

If you want to have something ("A", transaction information maybe) 
written to block storage before you write something else ("B", commit 
block, maybe), you have to wait for the completion of A before you even 
submit B.

If you not only care for it to be on "block storage"
(below the operating system) but for "stable storage"
(below any volatile caching layer that may lose data on power loss),
in addition to waiting for completion of A, you then have to do a
"cache flush" and wait for its completion, before you then submit B,
AND adding an other cache flush after the completion of B.
The cache flush before B can be done implicitly by adding a "FLUSH 
CACHE" flag on B, the cache flush after B itself can be omitted by 
adding a "FUA" flag on B.

If these where in fact only two requests, A and B, you could also just
add FUA on both (and hope the drivers will sort it out for you).
But if "A" was several requests, which in themselves do not have 
ordering dependencies, but only need to be all on stable storage
before B hits, then you don't want to take the penalty of adding FUA
to all of them, so you submit them all, and flush once, later.

That is from the high-level view of the Linux block device API.
Drivers may chose to convert a FLUSH/FUA into a FLUSH; SUBMIT; FLUSH;
or other things, as long as the above described simple semantics work.

"Before", Linux tried to make use of NCQ or TCQ, and then used the
concept of "barriers" (BIO_RW_BARRIER flag), where you in theory could
keep submitting, and just add these flags to indicate your ordering
requirements to lower layers.

Apparently that turned out to be not beneficial enough, especially if
all users had to anticipate failures and carry around fallback code and
usually behave as if the fallback code was active anyways, just in case.

DRBD tries to still be compatible with 2.6.5 (ok, rhel5, which is not 
quite the same). Also, once introduced, configuration keywords are hard
to remove. Which is why the "no barrier" option is still there,
but a no-op with today's kernels.

> From my understanding, two main methods exists to preserve write order and
> to force data to stable storage:
> 1) write barriers, which are expanded in buffer-draining cache flush ->
> write -> cache flush operations;
> 2) FUAs, which can *possibly* (depending on the underlying disk *and* kernel
> version/bootparams) be expanded in non-draining cache flush -> write w/FUA
> bit operations.
> However, from drbd.conf man page I read the following:
> barrier:
> The first requires that the driver of the backing storage device support
> barriers (called 'tagged command queuing' in SCSI and 'native command
> queuing' in SATA speak). The use of this method can be enabled by setting
> the disk-barrier options to yes.
> flush:
> The second requires that the backing device support disk flushes (called
> 'force unit access' in the drive vendors speak). The use of this method can
> be disabled setting disk-flushes to no.
> This seems confusing: write barriers do not depend on TCQ/NCQ/FUAs, rather
> on the CACHE_FLUSH ATA/SCSI command. On the other hand, "flush" seems to
> really refer to FUAs which, in turn, depend on NCQ (which are only mentioned
> in the "barrier" paragraph).
> So, is the man page outdated, or I am missing something? Which is the safest
> method? Morever, as many kernels disable FUAs on SATA disks (transforming
> FUAs in drain + cache flushes), does the two setting really made any
> difference?

As I said, "Barriers", in the sense that the linux kernel high level
block device api used the term "back then" (BIO_RW_BARRIER), do no
longer exist in today's Linux kernels. That however does not mean
we could drop the config keyword, nor that we can drop the functionality
there yet, until all "old" kernels are really dead.

: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
please don't Cc me, but send to list -- I'm subscribed

More information about the drbd-user mailing list