Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Apr 20, 2009 at 6:28 AM, Lars Ellenberg <lars.ellenberg at linbit.com>wrote: > On Mon, Apr 20, 2009 at 12:13:58PM +0200, Lars Ellenberg wrote: > > > > > 2. What's the worst case scenario (lost write? corrupt data? unknown > > > consistency?) that can result from concurrent writes? > > > > DRBD currently _drops_ the later write. > > if a write is detected as a concurrent local write, > > this one is never submitted nor send, and just "completed", > > pretending that it had been successfully written. > > > > we considered to _fail_ such writes with EIO, > > but decided to rather complain loudly, but pretend success. > So to clarify, in a typical scenario an initiator should not be issuing a write request while one for the same (or overlapping) block is not yet returned by DRBD as successful, however that's what happens? Is it at all possible that DRBD returns success earlier than it should have (obviously I'm using protocol C)? please post a few of the original log lines, > they should read something like > <comm>[pid] Concurrent local write detected! > [DISCARD L] new: <sector offset>s +<size in bytes>; > pending: <sector offset>s +<size in bytes> > > I'm curious as to what the actual overlap is, > and in if there is any correlation between offsets. Here's an example for 8k random writes: Apr 14 12:24:35 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 162328976s +8192; pending: 162328976s +8192 Apr 14 12:24:38 srpt1 kernel: drbd0: scsi_tgt0[17271] Concurrent local write detected! [DISCARD L] new: 161385248s +8192; pending: 161385248s +8192 Apr 14 12:24:39 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 157655888s +8192; pending: 157655888s +8192 Apr 14 12:24:39 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 165753872s +8192; pending: 165753872s +8192 Apr 14 12:24:39 srpt1 kernel: drbd0: scsi_tgt0[17271] Concurrent local write detected! [DISCARD L] new: 166654816s +8192; pending: 166654816s +8192 Apr 14 12:24:40 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 158260592s +8192; pending: 158260592s +8192 Apr 14 12:24:40 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 163944704s +8192; pending: 163944704s +8192 Apr 14 12:24:49 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 169511744s +8192; pending: 169511744s +8192 Apr 14 12:24:51 srpt1 kernel: drbd0: scsi_tgt0[17271] Concurrent local write detected! [DISCARD L] new: 170614416s +8192; pending: 170614416s +8192 Apr 14 12:24:52 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write detected! [DISCARD L] new: 158642368s +8192; pending: 158642368s +8192 128k: Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689092s +28672; pending: 562689144s +4096 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689172s +20480; pending: 562689152s +32768 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689148s +2048; pending: 562689144s +4096 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689212s +2048; pending: 562689152s +32768 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689152s +2048; pending: 562689152s +32768 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689216s +2048; pending: 562689216s +24576 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689156s +8192; pending: 562689152s +32768 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689220s +28672; pending: 562689264s +8192 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write detected! [DISCARD L] new: 562689292s +8192; pending: 562689280s +32768 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689276s +2048; pending: 562689264s +8192 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689280s +2048; pending: 562689280s +32768 Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write detected! [DISCARD L] new: 562689284s +4096; pending: 562689280s +32768 I know it shows two SCST threads, but it's the same thing even if I disable SCST threading (not to mention having the same thing happen with IET). As I was about to send this email, I got another discovery: Concurrent local writes do not happen when the DRBD device is disconnected. As soon as I reconnect, they reappear, and this is as mentioned above using protocol C. Do we need a protocol D now? :p Thanks, -Gennadiy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090420/1451b040/attachment.htm>