Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Apr 17, 2009 at 06:52:00PM -0400, Gennadiy Nerubayev wrote: > On Thu, Apr 16, 2009 at 3:31 AM, Lars Ellenberg > <lars.ellenberg at linbit.com>wrote: > > > On Wed, Apr 15, 2009 at 03:06:39PM -0400, Gennadiy Nerubayev wrote: > > > I've been seeing "Concurrent local write" messages under certain > > workloads > > > and environments - in particular this has been observed with a SCST > > target > > > running on top of DRBD, and benchmarks running on both bare windows > > > initiators and windows virtuals through ESX initiator. What does this > > > warning actually entail, and why would it happen? > > > > if there is a write request A in flight (submitted, but not yet completed) > > to offset a, with size x, and while this is still not completed yet > > an other write request B is submitted to offset b with size y, > > and these requests do overlap, > > > > that is a "concurrent local write". > > > > layers below DRBD may reorder writes. > > > > which means these workloads violate write ordering constraints. > > > > problem: > > as DRBD replicates the requests, these writes might get reordered on the > > other node as well. so the may end up on the lower level device in > > different order. > > > > as they do overlap, the resulting data on the both replicas > > may end up being different. > > > > submitting a new write request overlapping with an in flight write > > request is bad practice on any IO subsystem, as it may violate write > > ordering, and the result in general is undefined. > > > > with DRBD in particular, it may even cause data divergence of the > > replicas, _if_ the layers below DRBD on both nodes decide to reorder > > these two requests. The likeliness of which is difficult to guess. > > > > in short: DRBD detects that the layer using it is broken. > > > > most likely it is simply the windos io stack that is broken, > > as the initiators and targets involved simply forward the requests > > issued by the windows file system and block device layer. > > > > DRBD cannot help you with that. It simply is the only IO stack paranoid > > enough to actually _check_ for that condition, and report it. > > because DRBD promises to create _exact_, bitwise identical, replicas. > > and this condition in general may cause data divergence. > > > Thanks for the detailed information. I've been able to confirm that it only > happens when blockio mode is used, when using fileio, these writes pass through the linux page cache first, which will "filter out" such "double writes" to the same area. > with both SCST and IET, on either ESX > (windows or linux virtuals) or Windows as initiator. I've also observed that > it only apparently happens during random io workloads, and not sequential. > What I'm still trying to understand is the following: > > 1. Whether the write concurrency (obviously without warnings) would still > happen to the bare disk if you remove the DRBD layer, which seems unlikely > due to so many usage cases involving blockio targets, both with and without > DRBD I'm pretty much certain that these do happen withouth DRBD, and go unnoticed. > 2. What's the worst case scenario (lost write? corrupt data? unknown > consistency?) that can result from concurrent writes? DRBD currently _drops_ the later write. if a write is detected as a concurrent local write, this one is never submitted nor send, and just "completed", pretending that it had been successfully written. we considered to _fail_ such writes with EIO, but decided to rather complain loudly, but pretend success. in short: if DRBD detects a "concurrent local write", that write is lost. > 3. How would one be able to verify that #2 happened? low level compare with known good data. > 4. I'd think others would report concurrency warnings as well due to the > relatively common usage scenario (and google does show a few hits), but > people have yet to actually report an actual problem.. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed