eBoundHost: Artur
Thu Sep 25 02:05:21 CEST 2008

Hi All, 
| Putting together a system to compete with netapp and bluearc. I say 90% of it can be done with drbd and supermicro + LVM + ext3. 
| Problem: slow write when server1 and server2 are connected. 
| Here are my benchmarks: 
| dd if=/dev/zero bs=4096 count=10000 oflag=dsync of=/data/file1 
| DRBD Active and connected: 
| 40960000 bytes (41 MB) copied, 24.8387 seconds, 1.6 MB/s 
| DRBD Active and NOT connected: 
| 40960000 bytes (41 MB) copied, 10.8259 seconds, 3.8 MB/s 
| File system partition, NON DRBD, same disk: 
| 40960000 bytes (41 MB) copied, 10.3686 seconds, 4.0 MB/s 
| These are very consistent and repeatable. 
I am in no way surprised by these results. First note that you are writing out 4K blocks, then forcing a sync. So you can see already this is a slow operation from the non-DRBD results. As each time you write 4k you need to wait for it to sync before it can do the next 4k. 

Problem with doing a force sync on DRBD is you introduce network latency into the 'sync' pipeline.. so it writes 4K.. sends a sync and you have to wait for that to go over the network, get synced, and come back, before it can move onto the next 4k block. 

There isn't a whole lot you can do about this, but im not sure there is a 'fantastic' benchmark for real workload situations... anyway. 

You could try some crazy things to reduce your latency.. remove switches try better drivers and stuff (does that really work!? - but ultimately I think you will need to stop doing 4k synced writes.. or get some low latency interconnect gear - but I suspect it's rather expensive: 

Something about this seem silly/wrong to you? 

I don't know if it's silly, seems to be the faster the better! Once I have a better handle on this, we're going to be binding 3 or 4 interfaces to remove a bottleneck. Who knows, a rack full of hard drives may be just the thing for Dolphin. I think this is very cool and necessary to take DRBD to the enterprise. 

To get back to what you're saying about the latency but take a look at this: 

To my understanding we should be lagging by some percentage, but not by such a margin. 

