[DRBD-user] massive latency increases from the slave with barrier or flush enabled

Thu Jun 23 11:49:52 CEST 2011

Hi folks

I need a bit of help understanding an issue we're seeing with our DRBD 
setup. We have two Debian systems, with DRBD 8.3.8. We have a resource 
configured with protocol A, which backs onto a single SATA disk on each 
side.

I am running the latency test from the DRBD documentation, multiple times:
while true; do dd if=/dev/zero of=$TEST_DEVICE bs=512 count=1000 
oflag=direct; sync; done

If I stop drbd and run it against the backing device on each side, it 
repeats again and again very quickly. If I start DRBD and run it against 
the DRBD device on the primary side, it runs quickly for about 4 or 5 
repetitions, then slows right down. Investigation shows pe: is climbing 
in /proc/drbd, and iostat -mxd 1 on the secondary node shows 100% disk 
usage for the backing device. Note that this repeats in the other 
direction if I swap primary/secondary roles. It's only the secondary 
role that's seeing 100% disk usage, not the primary.

When I use no-disk-barrier and no-disk-flushes, the problem goes away 
entirely - but I'm reluctant to enable this permanently, as they're just 
normal SATA drives without any battery backup or anything, and there are 
scary warnings about doing that in the documentation :-)

Can someone recommend the appropriate action I should take to improve my 
performance please? Let me know if more information or debugging would 
be helpful.

Thanks!

Phil