[DRBD-user] Concurrent writes

Lars Ellenberg lars.ellenberg at linbit.com
Mon Apr 20 12:13:58 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Apr 17, 2009 at 06:52:00PM -0400, Gennadiy Nerubayev wrote:
> On Thu, Apr 16, 2009 at 3:31 AM, Lars Ellenberg
> <lars.ellenberg at linbit.com>wrote:
> 
> > On Wed, Apr 15, 2009 at 03:06:39PM -0400, Gennadiy Nerubayev wrote:
> > > I've been seeing "Concurrent local write" messages under certain
> > workloads
> > > and environments - in particular this has been observed with a SCST
> > target
> > > running on top of DRBD, and benchmarks running on both bare windows
> > > initiators and windows virtuals through ESX initiator. What does this
> > > warning actually entail, and why would it happen?
> >
> > if there is a write request A in flight (submitted, but not yet completed)
> > to offset a, with size x, and while this is still not completed yet
> > an other write request B is submitted to offset b with size y,
> > and these requests do overlap,
> >
> > that is a "concurrent local write".
> >
> > layers below DRBD may reorder writes.
> >
> > which means these workloads violate write ordering constraints.
> >
> > problem:
> > as DRBD replicates the requests, these writes might get reordered on the
> > other node as well.  so the may end up on the lower level device in
> > different order.
> >
> > as they do overlap, the resulting data on the both replicas
> > may end up being different.
> >
> > submitting a new write request overlapping with an in flight write
> > request is bad practice on any IO subsystem, as it may violate write
> > ordering, and the result in general is undefined.
> >
> > with DRBD in particular, it may even cause data divergence of the
> > replicas, _if_ the layers below DRBD on both nodes decide to reorder
> > these two requests. The likeliness of which is difficult to guess.
> >
> > in short: DRBD detects that the layer using it is broken.
> >
> > most likely it is simply the windos io stack that is broken,
> > as the initiators and targets involved simply forward the requests
> > issued by the windows file system and block device layer.
> >
> > DRBD cannot help you with that. It simply is the only IO stack paranoid
> > enough to actually _check_ for that condition, and report it.
> > because DRBD promises to create _exact_, bitwise identical, replicas.
> > and this condition in general may cause data divergence.
> 
> 
> Thanks for the detailed information. I've been able to confirm that it only
> happens when blockio mode is used,

when using fileio, these writes pass through the linux page cache first,
which will "filter out" such "double writes" to the same area.

> with both SCST and IET, on either ESX
> (windows or linux virtuals) or Windows as initiator. I've also observed that
> it only apparently happens during random io workloads, and not sequential.
> What I'm still trying to understand is the following:
> 
> 1. Whether the write concurrency (obviously without warnings) would still
> happen to the bare disk if you remove the DRBD layer, which seems unlikely
> due to so many usage cases involving blockio targets, both with and without
> DRBD

I'm pretty much certain that these do happen withouth DRBD,
and go unnoticed.

> 2. What's the worst case scenario (lost write? corrupt data? unknown
> consistency?) that can result from concurrent writes?

DRBD currently _drops_ the later write.
if a write is detected as a concurrent local write,
this one is never submitted nor send, and just "completed",
pretending that it had been successfully written.

we considered to _fail_ such writes with EIO,
but decided to rather complain loudly, but pretend success.

in short: if DRBD detects a "concurrent local write",
that write is lost.

> 3. How would one be able to verify that #2 happened?

low level compare with known good data.

> 4. I'd think others would report concurrency warnings as well due to the
> relatively common usage scenario (and google does show a few hits), but
> people have yet to actually report an actual problem..


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list