[DRBD-user] DRBD fsync() seems to return before writing to disk

Tue Jun 19 17:55:33 CEST 2012

I'm attempting to set up an NFS cluster, with a block device stack that 
looks like this:

ext4
LVM
DRBD
LVM
MD RAID
4 SATA drives on each node

I want to guarantee that fsync() doesn't return until writes have made 
it to physical storage. In particular, I care about PostgreSQL database 
integrity.

Postgres includes a simple test program which performs timing of fsync() 
operations [1]. I don't have any difficult performance requirements 
here, so I'm not using any battery-backed caches. Consequently, if 
fsync() is working, I shouldn't be able to do more than one per 
revolution of my platters. For the 7200 RPM drives I'm using, that's 8ms.

I'm running Debian Squeeze. I'm running Linux 3.2.0-0.bpo.2-amd64 from 
squeeze backports, which is necessary to get write barrier support on md 
raid devices. drbdadm status indicates version 8.3.11.

Now, I've verified that with an ext4 on the first three layers of my 
storage stack (LVM on MD RAID on SATA drives) I get a working fsync(). I 
know this because running test_fsync[1] gives latencies of just a bit 
over 8 ms. However, performing a similar test with the full stack, 
including DRBD, gives much lower latency numbers (1.2 - 2.0 ms), 
indicating that fsync() must not be working, because it's physically 
impossible for my 7200 RPM SATA drives to perform synchronous writes 
that fast. At least, that's how I understand it.

Looking for previous answers, I found [2], which says that DRBD doesn't 
offer write barriers to upper layers. I also found [3], which suggests 
to me that fsync() should work as I'm expecting it to work. Both of 
these posts are pretty old. Lastly, I found [4], which suggests that 
Linux is horribly broken and fsync doesn't really do anything, anyhow.

To be honest, I'm a little unclear on the relationship between fsync() 
and write barriers. Regardless, DRBD is clearly having some effect, 
since it makes fsync() about 7 times faster, which can't be good. So I'm 
wondering, is this expected behavior? Is my data integrity at risk? How 
can an application writing to a DRBD device be sure data has been 
written to nonvolatile storage, fsync() or something else? Is this 
something that would change if I moved to DRBD 8.4?

[1] http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm
[2] http://lists.linbit.com/pipermail/drbd-user/2008-September/010306.html
[3] http://lists.linbit.com/pipermail/drbd-user/2006-December/006105.html
[4] http://milek.blogspot.com/2010/12/linux-osync-and-write-barriers.html