[DRBD-user] help? 8.x slow... even in standalone ?

Tom Brown wc-linbit.com at vmail.baremetal.com
Sat May 10 19:59:07 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


It seems I MUST be doing something absolutely idiotic. but I'm darned if I 
know what.

I've been running some very crude performance tests of drbd on some of our 
centos 5.1 kernels, since I got a couple rude shocks ...

The short of the story is that the 8.0 and 8.2 modules are SLOW, and I was 
getting crashes that seemed to implicate the 0.7.x module with .7
on stock centos kernels (so I can't use that).

There clearly seems to be something amiss, as my tests seem to
show 8.x as being slower in standalone mode than it is when
connected ?!


gt7 -> gt2 testing...  (i386 -> x86_64, centos kernels on both ends)

the primary/source node with the mounted filesystem is current centos 5.1 
box...

The command I was using for rough estimate of write throughput
was:
   sync; time sh -c "dd if=/dev/zero of=BIG bs=4096 count=128000; sync"

and then record the "real" time (e.g. wall clock). Basically I'm
timing how long it takes to dump almost a gig of data to disk...
The backing storage on both ends is a raid-0 stripe, which should
be capable of handling something close to 120 MByte/s, and with
0.7.25 in standalone we do get aprox 100 MByte/s and 68 MByte/s
over the wire... which is fine, but a home compiled 0.7 was implicated 
in kernel crashes (centos stock xen-kernel) after only a few hours 
running...  and the 8.x modules seem to cause a performance problem...

here's the rough numbers...  drbd versions, and network
protocols... All for the dump a gig to the target (through an
ext3 filesystem...)

drbd version: 8.2.5
  A      B        C    standalone
27.034  26.360 27.981  32.110
27.106  27.504 27.486  30.797
28.251  26.750 27.648  30.974
         26.049
         26.772

drbd version: 0.7.25
  A      B        C    standalone
15.426  15.545 15.697  10.872
15.649  15.511 15.055  10.681
15.735  15.334 15.153  10.157
         15.407

drbd version: 8.0.12
  A      B        C    standalone
28.489  27.956 26.684  30.695
28.166  27.382 27.465  30.480
                27.105  29.861

ok, lets try replacing the local (raid0) lvm with a raw disk

8.0.12
  A      B        C    standalone  BareMetal
                40.784 45.999       18.106
                41.078 46.058       17.052
                                    17.030

So, it doesn't seem to be some strange interaction with LVM, and
the node used for testing in this case is NOT the xen enabled one
that I originally had the trouble with...

Anyone have any ideas? Why would stand alone be worse than
connected? I used the exact same config files for the various
different modules... just changing the protocol versions, but
that didn't make much difference...

Obviously there are tweaks that can be done (moving the metadata
around etc) to speed things up, but why would the 8.x versions
be SOO much slower than the 0.7.x versions ?


    root at gt7 mnt]# uname -a
    Linux gt7.baremetal.com 2.6.18-53.1.19.el5 #1 SMP Wed May 7
    08:20:19 EDT 2008 i686 i686 i386 GNU/Linux

    [root at gt7 mnt]# more /etc/drbd.conf
    global {
       minor-count 5;
    }

    resource drbd-test {
        protocol               C;
        on gt7.baremetal.com {
 	   device           /dev/drbd0;
 	   #disk             /dev/vg1/drbd-test;
 	   disk             /dev/sdc;
 	   address          192.168.90.17:7790;
 	   meta-disk        internal;
        }
        on gt2.baremetal.com {
 	   device           /dev/drbd0;
 	   disk             /dev/vg1/drbd-test;
 	   address          192.168.90.15:7790;
 	   meta-disk        internal;
        }
        disk {
 	   on-io-error      detach;
        }
        syncer {
 	   rate             40M;
 	   al-extents       257;
        }
        startup {
 	   degr-wfc-timeout 120;
        }
    }



Help/thoughts? If I hadn't had very good experiences with 0.7.x
I'd just throw up my hands and walk away, but I've been running
0.7.23 on at least one node for ages and it's been great.

Oh, this is sort of funny, I got suspicious that my transfer
speeds were close to that Sync rate, so I cranked the sync rate
up to 100M, it didn't change the filesystem write performance,
but it when I disconnected and then reconnected the secondary and
primary, I was getting aproximately 90MByte/s being written to
the disk of the secondary... which reminds, these nodes are
connected via Gigabit ethernet switch (cisco WS-C2960G-24TC-L).

hhm, doesn't seem to be an ext3 issue either...

    [root at gt7 /]# sync; time sh -c "dd if=/dev/zero of=/dev/drbd0
       bs=4096 count=256000; sync"

    real    0m25.849s

(8.0.12 protoC, so ext3 would have been aprox 27 seconds, still
nowhere near as fast as the 15 seconds of 0.7.25)

oh, except that if you run it twice, you get radically different
results...

    real    0m13.537s
    real    0m13.721s

...


-Tom




More information about the drbd-user mailing list