[DRBD-user] DRBD fsync() seems to return before writing to disk

Phil Frost phil at macprofessionals.com
Fri Jun 22 19:03:09 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 06/19/2012 11:55 AM, Phil Frost wrote:
> I want to guarantee that fsync() doesn't return until writes have made 
> it to physical storage. In particular, I care about PostgreSQL 
> database integrity.

Well, this is proving very frustrating. I still don't know if I'm 
chasing behavior that simply isn't implemented, or isn't working in my 
environment. However, I'm very sure something is wrong here. I tried 
digging around in the source code (3.2.0 kernel from debian 
squeeze-backports) a bit, and I'm CCing drbd-dev since I don't imagine 
too many users read the code. I pretty much have no experience with 
block device programming, but I did find some good documentation in the 
kernel [1] that provided some good grep victims, specifically REQ_FLUSH 
and REQ_FUA. I found evidence that these are supported by DRBD, in 
drbd_main.c:

static u32 bio_flags_to_wire(struct drbd_conf *mdev, unsigned long bi_rw)
{
         if (mdev->agreed_pro_version >= 95)
                 return  (bi_rw & REQ_SYNC ? DP_RW_SYNC : 0) |
                         (bi_rw & REQ_FUA ? DP_FUA : 0) |
                         (bi_rw & REQ_FLUSH ? DP_FLUSH : 0) |
                         (bi_rw & REQ_DISCARD ? DP_DISCARD : 0);
         else
                 return bi_rw & REQ_SYNC ? DP_RW_SYNC : 0;
}

This appears to be responsible for encoding the block request flags into 
a network format for the peer, and there is an inverse function in 
drbd_receiver.c. However, [1] also says block device drivers (well, 
"request_fn based" drivers, but I don't know what that means, but I 
think it applies) must call blk_queue_flush to advertise support for 
REQ_FUA and REQ_FLUSH. grep tells me DRBD doesn't do this anywhere, but 
I do see it in other drivers I recognize, MD, loop, xen-blkfront, etc.

So, my hypothesis is that DRBD had the code to pass REQ_FUA and 
REQ_FLUSH through to the underlying device, but it never sees those 
flags because it doesn't claim to support them. So, they get stripped 
off by the block IO system, which figures the best it can do is drain 
the queue, which is clearly the Wrong Thing.

Unfortunately, I don't feel very qualified in this area, so can anyone 
tell me if I'm totally off base here? Any suggestions on how I might 
test this?

[1] 
http://www.mjmwired.net/kernel/Documentation/block/writeback_cache_control.txt




More information about the drbd-user mailing list