[DRBD-user] Concurrent writes

Sat Apr 18 00:52:00 CEST 2009

On Thu, Apr 16, 2009 at 3:31 AM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Wed, Apr 15, 2009 at 03:06:39PM -0400, Gennadiy Nerubayev wrote:
> > I've been seeing "Concurrent local write" messages under certain
> workloads
> > and environments - in particular this has been observed with a SCST
> target
> > running on top of DRBD, and benchmarks running on both bare windows
> > initiators and windows virtuals through ESX initiator. What does this
> > warning actually entail, and why would it happen?
>
> if there is a write request A in flight (submitted, but not yet completed)
> to offset a, with size x, and while this is still not completed yet
> an other write request B is submitted to offset b with size y,
> and these requests do overlap,
>
> that is a "concurrent local write".
>
> layers below DRBD may reorder writes.
>
> which means these workloads violate write ordering constraints.
>
> problem:
> as DRBD replicates the requests, these writes might get reordered on the
> other node as well.  so the may end up on the lower level device in
> different order.
>
> as they do overlap, the resulting data on the both replicas
> may end up being different.
>
> submitting a new write request overlapping with an in flight write
> request is bad practice on any IO subsystem, as it may violate write
> ordering, and the result in general is undefined.
>
> with DRBD in particular, it may even cause data divergence of the
> replicas, _if_ the layers below DRBD on both nodes decide to reorder
> these two requests. The likeliness of which is difficult to guess.
>
> in short: DRBD detects that the layer using it is broken.
>
> most likely it is simply the windos io stack that is broken,
> as the initiators and targets involved simply forward the requests
> issued by the windows file system and block device layer.
>
> DRBD cannot help you with that. It simply is the only IO stack paranoid
> enough to actually _check_ for that condition, and report it.
> because DRBD promises to create _exact_, bitwise identical, replicas.
> and this condition in general may cause data divergence.

Thanks for the detailed information. I've been able to confirm that it only
happens when blockio mode is used, with both SCST and IET, on either ESX
(windows or linux virtuals) or Windows as initiator. I've also observed that
it only apparently happens during random io workloads, and not sequential.
What I'm still trying to understand is the following:

1. Whether the write concurrency (obviously without warnings) would still
happen to the bare disk if you remove the DRBD layer, which seems unlikely
due to so many usage cases involving blockio targets, both with and without
DRBD
2. What's the worst case scenario (lost write? corrupt data? unknown
consistency?) that can result from concurrent writes?
3. How would one be able to verify that #2 happened?
4. I'd think others would report concurrency warnings as well due to the
relatively common usage scenario (and google does show a few hits), but
people have yet to actually report an actual problem..

Thanks,

-Gennadiy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090417/0713ae12/attachment.htm>