Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Dec 20, 2007 1:01 PM, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > > On Thu, Dec 20, 2007 at 11:08:56AM -0800, Art Age Software wrote: > > On Dec 20, 2007 3:05 AM, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > > > On Wed, Dec 19, 2007 at 04:41:37PM -0800, Art Age Software wrote: > > > > I have run some additional tests: > > > > > > > > 1) Disabled bonding on the network interfaces (both nodes). No > > > > significant change. > > > > > > > > 2) Changed the DRBD communication interface. Was using a direct > > > > crossover connection between the on-board NICs of the servers. I > > > > switched to Intel Gigabit NIC cards in both machines, connecting > > > > through a Gigabit switch. No significant change. > > > > > > > > 3) Ran a file copy from node1 to node2 via scp. Even with the > > > > additional overhead of scp, I get a solid 65 MB/sec. throughput. > > > > > > this is streaming. > > > completely different than what we measured below. > > > > > > > So, at this stage I have seemingly ruled out: > > > > > > > > 1) Slow IO subsystem (both machines measured and check out fine). > > > > > > > > 2) Bonding driver (additional latency) > > > > > > > > 3) On-board NICs (hardware/firmware problem) > > > > > > > > 4) Network copy speed. > > > > > > > > What's left? I'm stumped as to why DRBD can only do about 3.5 BM/sec. > > > > on this very fast hardware. > > > > > > doing one-by-one synchronous 4k writes, which are latency bound. > > > if you do streaming writes, it probably get up to your 65 MB/sec again. > > > > Ok, but we have tested that with and without DRBD by the dd command, > > right? So at this point, by all tests performed so far, it looks like > > DRBD is the bottleneck. What other tests can I perform that can say > > otherwise? > > sure. > but comparing 3.5 (with drbd) against 13.5 (without drbd) is bad enough, > no need to now compare it with some streaming number (65) to make it > look _really_ bad ;-) Sorry, my intent was not to make DRBD look bad. I think DRBD is **fantastic** and I just want to get it working properly. My point in trying the streaming test was simply to make sure that there was nothing totally broken on the network side. I suppose I should also try a streaming test to the DRBD device and compare that to the raw streaming number. And, back to my last question: What other tests can I perform at this point to narrow down the source of the (latency?) problem?