Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'm attempting to set up an NFS cluster, with a block device stack that looks like this: ext4 LVM DRBD LVM MD RAID 4 SATA drives on each node I want to guarantee that fsync() doesn't return until writes have made it to physical storage. In particular, I care about PostgreSQL database integrity. Postgres includes a simple test program which performs timing of fsync() operations [1]. I don't have any difficult performance requirements here, so I'm not using any battery-backed caches. Consequently, if fsync() is working, I shouldn't be able to do more than one per revolution of my platters. For the 7200 RPM drives I'm using, that's 8ms. I'm running Debian Squeeze. I'm running Linux 3.2.0-0.bpo.2-amd64 from squeeze backports, which is necessary to get write barrier support on md raid devices. drbdadm status indicates version 8.3.11. Now, I've verified that with an ext4 on the first three layers of my storage stack (LVM on MD RAID on SATA drives) I get a working fsync(). I know this because running test_fsync[1] gives latencies of just a bit over 8 ms. However, performing a similar test with the full stack, including DRBD, gives much lower latency numbers (1.2 - 2.0 ms), indicating that fsync() must not be working, because it's physically impossible for my 7200 RPM SATA drives to perform synchronous writes that fast. At least, that's how I understand it. Looking for previous answers, I found [2], which says that DRBD doesn't offer write barriers to upper layers. I also found [3], which suggests to me that fsync() should work as I'm expecting it to work. Both of these posts are pretty old. Lastly, I found [4], which suggests that Linux is horribly broken and fsync doesn't really do anything, anyhow. To be honest, I'm a little unclear on the relationship between fsync() and write barriers. Regardless, DRBD is clearly having some effect, since it makes fsync() about 7 times faster, which can't be good. So I'm wondering, is this expected behavior? Is my data integrity at risk? How can an application writing to a DRBD device be sure data has been written to nonvolatile storage, fsync() or something else? Is this something that would change if I moved to DRBD 8.4? [1] http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm [2] http://lists.linbit.com/pipermail/drbd-user/2008-September/010306.html [3] http://lists.linbit.com/pipermail/drbd-user/2006-December/006105.html [4] http://milek.blogspot.com/2010/12/linux-osync-and-write-barriers.html