[DRBD-user] Concurrent writes

Lars Ellenberg lars.ellenberg at linbit.com
Thu Apr 16 09:31:34 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Apr 15, 2009 at 03:06:39PM -0400, Gennadiy Nerubayev wrote:
> I've been seeing "Concurrent local write" messages under certain workloads
> and environments - in particular this has been observed with a SCST target
> running on top of DRBD, and benchmarks running on both bare windows
> initiators and windows virtuals through ESX initiator. What does this
> warning actually entail, and why would it happen?

if there is a write request A in flight (submitted, but not yet completed)
to offset a, with size x, and while this is still not completed yet
an other write request B is submitted to offset b with size y,
and these requests do overlap,

that is a "concurrent local write".

layers below DRBD may reorder writes.

which means these workloads violate write ordering constraints.

problem:
as DRBD replicates the requests, these writes might get reordered on the
other node as well.  so the may end up on the lower level device in
different order.

as they do overlap, the resulting data on the both replicas
may end up being different.

submitting a new write request overlapping with an in flight write
request is bad practice on any IO subsystem, as it may violate write
ordering, and the result in general is undefined.

with DRBD in particular, it may even cause data divergence of the
replicas, _if_ the layers below DRBD on both nodes decide to reorder
these two requests. The likeliness of which is difficult to guess.

in short: DRBD detects that the layer using it is broken.

most likely it is simply the windos io stack that is broken,
as the initiators and targets involved simply forward the requests
issued by the windows file system and block device layer.

DRBD cannot help you with that. It simply is the only IO stack paranoid
enough to actually _check_ for that condition, and report it.
because DRBD promises to create _exact_, bitwise identical, replicas.
and this condition in general may cause data divergence.

-- 
: Lars Ellenberg                
: LINBIT HA-Solutions GmbH
: DRBD®/HA support and consulting    http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list