[DRBD-user] DRBD fsync() seems to return before writing to disk

Phillip Frost phil at macprofessionals.com
Thu Jun 21 13:10:42 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Jun 20, 2012, at 6:19 PM, Shaun Thomas wrote:

> I ran some tests on our (crappy) RAID0, comprised of two 300GB SAS drives. Here's what I got for varying block sizes:
> 
> DRBD connected:
> 8K - 26MB/s
> [...]
> DRBD disconnected:
> 8K - 57MB/s
> [...]
> Those disconnected numbers look not too far off from raw disk performance in a simple 2-disk RAID0.
> [...]
> And we could do all that because we have capacitor-backed RAID controllers.
> 
> We're seeing pretty much *exactly* what you should expect. This is with the 3.2.0 kernel as well. To make sure this wasn't fake, I monitored iostat and watched Dirty and Writeback from /proc/sys/meminfo. These are legit numbers obtained directly from dd and oflag=sync for all tests. 
> 
> So, I'm not sure about your own setup, but I can confirm that DRBD does honor sync in our case.

I think this demonstrates that O_SYNC causes writes to happen immediately rather than accumulating in the pagecache (I assume you observed Dirty stayed very low), but I don't think it demonstrates anything about DRBD issuing a sync to the underlying device when it receives a sync itself. We already know O_SYNC or fsync will flush the pagecache; this is a fuction of the Linux VFS and is not visible from DRBD's perspective. What's important is that when DRBD receives a sync operation, it passes it through to the lower layer, but as long as the BBU remains enabled, your RAID controller will treat all sync operations as no-ops because there's no volatile cache on the RAID device to be synced, so there's no change in behavior that we could observe.

To test my hypothesis, you'd have to disable the write-back cache. What you should see is a drop in performance of one or two orders of magnitude, going from your measured 7296 IOPS (57MB/s*1024/8k), an impossibility for any spinning media on this planet, to something limited by the rotation speed. If you had, for example, a 10000 RPM drive, anything faster than 10000/60 or 166 IOPS or 6 ms per IO means the IO syscalls must not be blocking until the data has reached nonvolatile storage (as requested by O_SYNC). You might also as much as double the IOPS with RAID-0 if you are performing small sequential writes rather than re-writing the same block between each sync, but even so this is an order of magnitude slower than what you just measured.

If you don't see a huge drop in performance after disabling the battery backed writeback cache, then we can conclude that DRBD or something else is eating the sync operations between userland (fsync(), open(O_SYNC), etc) and the underlying device. Not a problem if you have money to spend on battery-backed cache, and can tolerate the added risk of power loss when the battery has failed, or is reconditioning, or is being replaced, or power loss longer than battery hold time, but for everyone else, it's a big problem.



More information about the drbd-user mailing list