Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Apr 20, 2009 at 12:13:58PM +0200, Lars Ellenberg wrote: > > > in short: DRBD detects that the layer using it is broken. > > > > > > most likely it is simply the windos io stack that is broken, > > > as the initiators and targets involved simply forward the requests > > > issued by the windows file system and block device layer. > > > > > > DRBD cannot help you with that. It simply is the only IO stack paranoid > > > enough to actually _check_ for that condition, and report it. > > > because DRBD promises to create _exact_, bitwise identical, replicas. > > > and this condition in general may cause data divergence. > > > > > > Thanks for the detailed information. I've been able to confirm that it only > > happens when blockio mode is used, > > when using fileio, these writes pass through the linux page cache first, > which will "filter out" such "double writes" to the same area. > > > with both SCST and IET, on either ESX > > (windows or linux virtuals) or Windows as initiator. I've also observed that > > it only apparently happens during random io workloads, and not sequential. > > What I'm still trying to understand is the following: > > > > 1. Whether the write concurrency (obviously without warnings) would still > > happen to the bare disk if you remove the DRBD layer, which seems unlikely > > due to so many usage cases involving blockio targets, both with and without > > DRBD > > I'm pretty much certain that these do happen withouth DRBD, > and go unnoticed. > > > 2. What's the worst case scenario (lost write? corrupt data? unknown > > consistency?) that can result from concurrent writes? > > DRBD currently _drops_ the later write. > if a write is detected as a concurrent local write, > this one is never submitted nor send, and just "completed", > pretending that it had been successfully written. > > we considered to _fail_ such writes with EIO, > but decided to rather complain loudly, but pretend success. btw. please post a few of the original log lines, they should read something like <comm>[pid] Concurrent local write detected! [DISCARD L] new: <sector offset>s +<size in bytes>; pending: <sector offset>s +<size in bytes> I'm curious as to what the actual overlap is, and in if there is any correlation between offsets. > in short: if DRBD detects a "concurrent local write", > that write is lost. > > > 3. How would one be able to verify that #2 happened? > > low level compare with known good data. > > > 4. I'd think others would report concurrency warnings as well due to the > > relatively common usage scenario (and google does show a few hits), but > > people have yet to actually report an actual problem.. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed