<div class="gmail_quote">On Mon, Apr 20, 2009 at 6:28 AM, Lars Ellenberg <span dir="ltr"><<a href="mailto:lars.ellenberg@linbit.com">lars.ellenberg@linbit.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div><div></div><div class="h5">On Mon, Apr 20, 2009 at 12:13:58PM +0200, Lars Ellenberg wrote:<br>
><br>
> > 2. What's the worst case scenario (lost write? corrupt data? unknown<br>
> > consistency?) that can result from concurrent writes?<br>
><br>
> DRBD currently _drops_ the later write.<br>
> if a write is detected as a concurrent local write,<br>
> this one is never submitted nor send, and just "completed",<br>
> pretending that it had been successfully written.<br>
><br>
> we considered to _fail_ such writes with EIO,<br>
> but decided to rather complain loudly, but pretend success.</div></div></blockquote><div><br>So to clarify, in a typical scenario an initiator should not be issuing a write request while one for the same (or overlapping) block is not yet returned by DRBD as successful, however that's what happens? Is it at all possible that DRBD returns success earlier than it should have (obviously I'm using protocol C)?<br>
<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
please post a few of the original log lines,<br>
they should read something like<br>
<comm>[pid] Concurrent local write detected!<br>
[DISCARD L] new: <sector offset>s +<size in bytes>;<br>
pending: <sector offset>s +<size in bytes><br>
<br>
I'm curious as to what the actual overlap is,<br>
and in if there is any correlation between offsets.</blockquote><div><br>Here's an example for 8k random writes:<br> <br></div></div>Apr 14 12:24:35 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 162328976s +8192; pending: 162328976s +8192<br>
Apr 14 12:24:38 srpt1 kernel: drbd0: scsi_tgt0[17271] Concurrent local write detected! [DISCARD L] new: 161385248s +8192; pending: 161385248s +8192<br>Apr 14 12:24:39 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 157655888s +8192; pending: 157655888s +8192<br>
Apr 14 12:24:39 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 165753872s +8192; pending: 165753872s +8192<br>Apr 14 12:24:39 srpt1 kernel: drbd0: scsi_tgt0[17271] Concurrent local write detected! [DISCARD L] new: 166654816s +8192; pending: 166654816s +8192<br>
Apr 14 12:24:40 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 158260592s +8192; pending: 158260592s +8192<br>Apr 14 12:24:40 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 163944704s +8192; pending: 163944704s +8192<br>
Apr 14 12:24:49 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 169511744s +8192; pending: 169511744s +8192<br>Apr 14 12:24:51 srpt1 kernel: drbd0: scsi_tgt0[17271] Concurrent local write detected! [DISCARD L] new: 170614416s +8192; pending: 170614416s +8192<br>
Apr 14 12:24:52 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 158642368s +8192; pending: 158642368s +8192<br><br>128k:<br><br>Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689092s +28672; pending: 562689144s +4096<br>
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689172s +20480; pending: 562689152s +32768<br>Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689148s +2048; pending: 562689144s +4096<br>
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689212s +2048; pending: 562689152s +32768<br>Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689152s +2048; pending: 562689152s +32768<br>
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689216s +2048; pending: 562689216s +24576<br>Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689156s +8192; pending: 562689152s +32768<br>
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689220s +28672; pending: 562689264s +8192<br>Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689292s +8192; pending: 562689280s +32768<br>
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689276s +2048; pending: 562689264s +8192<br>Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689280s +2048; pending: 562689280s +32768<br>
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689284s +4096; pending: 562689280s +32768<br><br>I know it shows two SCST threads, but it's the same thing even if I disable SCST threading (not to mention having the same thing happen with IET).<br>
<br>
As I was about to send this email, I got another discovery: Concurrent local writes do not happen when the DRBD device is disconnected. As soon as I reconnect, they reappear, and this is as mentioned above using protocol C. Do we need a protocol D now? :p<br>
<br>Thanks,<br><br>-Gennadiy<br>