Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 23/06/11 10:49, Phil Stoneman wrote: > I need a bit of help understanding an issue we're seeing with our DRBD > setup. We have two Debian systems, with DRBD 8.3.8. We have a resource > configured with protocol A, which backs onto a single SATA disk on each > side. > > I am running the latency test from the DRBD documentation, multiple times: > while true; do dd if=/dev/zero of=$TEST_DEVICE bs=512 count=1000 > oflag=direct; sync; done > > If I stop drbd and run it against the backing device on each side, it > repeats again and again very quickly. If I start DRBD and run it against > the DRBD device on the primary side, it runs quickly for about 4 or 5 > repetitions, then slows right down. Investigation shows pe: is climbing > in /proc/drbd, and iostat -mxd 1 on the secondary node shows 100% disk > usage for the backing device. Note that this repeats in the other > direction if I swap primary/secondary roles. It's only the secondary > role that's seeing 100% disk usage, not the primary. > > When I use no-disk-barrier and no-disk-flushes, the problem goes away > entirely - but I'm reluctant to enable this permanently, as they're just > normal SATA drives without any battery backup or anything, and there are > scary warnings about doing that in the documentation :-) After a bunch more testing, it looks like DRBD on the secondary side only is not using (or regularly flushing) the write cache of the underlying storage. It's not doing this on the primary side, and I can actually see a reference to this in the manual[1]: "DRBD uses disk flushes for write operations both to its replicated data set and to its meta data.". I must be honest, I don't completely understand the rationale behind utilising the write cache on the primary side but not utilising it on the secondary side - it really hurts performance in some use cases! Still, now that I know what's going on, I'm a little more comfortable using no-disk-barrier and no-disk-flushes. It means that I might lose data written into the drive's write cache, but that's no worse a situation than a normal system using the hard drives natively. I'm still interested to hear the reason behind why drbd works that way though... Phil [1] http://www.drbd.org/users-guide/s-disk-flush-support.html