[DRBD-user] how about read a block that not return to upper application in protocol C?

Jan Schermer jan at schermer.cz
Mon Sep 5 12:09:31 CEST 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> On 05 Sep 2016, at 11:50, Lars Ellenberg <lars.ellenberg at linbit.com> wrote:
> 
> On Mon, Sep 05, 2016 at 01:16:21AM +0800, Mia Lueng wrote:
>> Hi All:
>> In protocol C, a bio will return to upper application(execute
>> bi_endio()) when local bio is completed and  recieve the data ack
>> packet from peer.  But if  a write request to block N was submitted
>> and written to local disk, but not received the data ack from peer, a
>> read request to the same block N  is comming. The read request will
>> get the data of block N that was not returned to upper application.
>> 
>> Will this cause the application's(eg. oracle) logical error?
> 
> If you have dependencies between IO requests,
> you must not issue the second request,
> before the first has completed.
> 
> Think of local disk only.
> 
> You issue a WRITE to block X.
> Then, before that completed,
> you issue a READ to block X.
> (actual, direct, IO requests to the backend device,
> not catched by some intermediate caching layer)
> 
> The result of the READ is undefined.
> It may return old data, it may return new data,
> it may even return partially updated data.
> 
> Undefined.
> 

Actually I'm not sure this is true, depending of course on what you mean by "before that completed" - not completed or just not flushed? On a local disk even buffered write should cause subsequent reads to reflect the new contents, corner case here is DIRECT_IO on write but not on read, which is undefined. I'd expect that to be true with protocol C even in a multi-node setup, but I'm not sure what e.g. shared filesystems expect in this case.

Re: the original question - depends on how Oracle writes the data. If it writes the data synchronously then it will block until written everywhere, subsequent reads return the new data and that's how ACID compliant software should do it. If it doesn't use synchronous IO but a "weaker" variant like O_DIRECT, then that could present a race condition - O_DIRECT is not guaranteed to be unbuffered, it just works like that most of the time. And while some care is taken to accomodate applications that treat it like synchronous IO, I'd be vary to depend on it when more layers are involved that like to buffer stuff or if you simply have more than one application touching the same data.

Having said that, I expect DRBD to be doing the right thing, people use it for this (and I used it for this), but since enterprisey-software is almost always dependent on how things worked in the 80s it's something you should always test for yourself on a modern system :-)


> -- 
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT
> __
> please don't Cc me, but send to list -- I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list