[DRBD-user] Concurrent writes

Gennadiy Nerubayev parakie at gmail.com
Tue Apr 21 01:05:31 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Apr 20, 2009 at 6:28 AM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Mon, Apr 20, 2009 at 12:13:58PM +0200, Lars Ellenberg wrote:
> >
> > > 2. What's the worst case scenario (lost write? corrupt data? unknown
> > > consistency?) that can result from concurrent writes?
> >
> > DRBD currently _drops_ the later write.
> > if a write is detected as a concurrent local write,
> > this one is never submitted nor send, and just "completed",
> > pretending that it had been successfully written.
> >
> > we considered to _fail_ such writes with EIO,
> > but decided to rather complain loudly, but pretend success.
>

So to clarify, in a typical scenario an initiator should not be issuing a
write request while one for the same (or overlapping) block is not yet
returned by DRBD as successful, however that's what happens? Is it at all
possible that DRBD returns success earlier than it should have (obviously
I'm using protocol C)?

please post a few of the original log lines,
> they should read something like
> <comm>[pid] Concurrent local write detected!
> [DISCARD L] new: <sector offset>s +<size in bytes>;
>        pending: <sector offset>s +<size in bytes>
>
> I'm curious as to what the actual overlap is,
> and in if there is any correlation between offsets.


Here's an example for 8k random writes:

Apr 14 12:24:35 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write
detected! [DISCARD L] new: 162328976s +8192; pending: 162328976s +8192
Apr 14 12:24:38 srpt1 kernel: drbd0: scsi_tgt0[17271] Concurrent local write
detected! [DISCARD L] new: 161385248s +8192; pending: 161385248s +8192
Apr 14 12:24:39 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write
detected! [DISCARD L] new: 157655888s +8192; pending: 157655888s +8192
Apr 14 12:24:39 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write
detected! [DISCARD L] new: 165753872s +8192; pending: 165753872s +8192
Apr 14 12:24:39 srpt1 kernel: drbd0: scsi_tgt0[17271] Concurrent local write
detected! [DISCARD L] new: 166654816s +8192; pending: 166654816s +8192
Apr 14 12:24:40 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write
detected! [DISCARD L] new: 158260592s +8192; pending: 158260592s +8192
Apr 14 12:24:40 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write
detected! [DISCARD L] new: 163944704s +8192; pending: 163944704s +8192
Apr 14 12:24:49 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write
detected! [DISCARD L] new: 169511744s +8192; pending: 169511744s +8192
Apr 14 12:24:51 srpt1 kernel: drbd0: scsi_tgt0[17271] Concurrent local write
detected! [DISCARD L] new: 170614416s +8192; pending: 170614416s +8192
Apr 14 12:24:52 srpt1 kernel: drbd0: scsi_tgt1[17272] Concurrent local write
detected! [DISCARD L] new: 158642368s +8192; pending: 158642368s +8192

128k:

Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write
detected! [DISCARD L] new: 562689092s +28672; pending: 562689144s +4096
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write
detected! [DISCARD L] new: 562689172s +20480; pending: 562689152s +32768
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write
detected! [DISCARD L] new: 562689148s +2048; pending: 562689144s +4096
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write
detected! [DISCARD L] new: 562689212s +2048; pending: 562689152s +32768
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write
detected! [DISCARD L] new: 562689152s +2048; pending: 562689152s +32768
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write
detected! [DISCARD L] new: 562689216s +2048; pending: 562689216s +24576
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write
detected! [DISCARD L] new: 562689156s +8192; pending: 562689152s +32768
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write
detected! [DISCARD L] new: 562689220s +28672; pending: 562689264s +8192
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt0[21193] Concurrent local write
detected! [DISCARD L] new: 562689292s +8192; pending: 562689280s +32768
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write
detected! [DISCARD L] new: 562689276s +2048; pending: 562689264s +8192
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write
detected! [DISCARD L] new: 562689280s +2048; pending: 562689280s +32768
Apr 17 15:09:23 srpt1 kernel: drbd0: scsi_tgt1[21194] Concurrent local write
detected! [DISCARD L] new: 562689284s +4096; pending: 562689280s +32768

I know it shows two SCST threads, but it's the same thing even if I disable
SCST threading (not to mention having the same thing happen with IET).

As I was about to send this email, I got another discovery: Concurrent local
writes do not happen when the DRBD device is disconnected. As soon as I
reconnect, they reappear, and this is as mentioned above using protocol C.
Do we need a protocol D now? :p

Thanks,

-Gennadiy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090420/1451b040/attachment.htm>


More information about the drbd-user mailing list