Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
It seems I MUST be doing something absolutely idiotic. but I'm darned if I know what. I've been running some very crude performance tests of drbd on some of our centos 5.1 kernels, since I got a couple rude shocks ... The short of the story is that the 8.0 and 8.2 modules are SLOW, and I was getting crashes that seemed to implicate the 0.7.x module with .7 on stock centos kernels (so I can't use that). There clearly seems to be something amiss, as my tests seem to show 8.x as being slower in standalone mode than it is when connected ?! gt7 -> gt2 testing... (i386 -> x86_64, centos kernels on both ends) the primary/source node with the mounted filesystem is current centos 5.1 box... The command I was using for rough estimate of write throughput was: sync; time sh -c "dd if=/dev/zero of=BIG bs=4096 count=128000; sync" and then record the "real" time (e.g. wall clock). Basically I'm timing how long it takes to dump almost a gig of data to disk... The backing storage on both ends is a raid-0 stripe, which should be capable of handling something close to 120 MByte/s, and with 0.7.25 in standalone we do get aprox 100 MByte/s and 68 MByte/s over the wire... which is fine, but a home compiled 0.7 was implicated in kernel crashes (centos stock xen-kernel) after only a few hours running... and the 8.x modules seem to cause a performance problem... here's the rough numbers... drbd versions, and network protocols... All for the dump a gig to the target (through an ext3 filesystem...) drbd version: 8.2.5 A B C standalone 27.034 26.360 27.981 32.110 27.106 27.504 27.486 30.797 28.251 26.750 27.648 30.974 26.049 26.772 drbd version: 0.7.25 A B C standalone 15.426 15.545 15.697 10.872 15.649 15.511 15.055 10.681 15.735 15.334 15.153 10.157 15.407 drbd version: 8.0.12 A B C standalone 28.489 27.956 26.684 30.695 28.166 27.382 27.465 30.480 27.105 29.861 ok, lets try replacing the local (raid0) lvm with a raw disk 8.0.12 A B C standalone BareMetal 40.784 45.999 18.106 41.078 46.058 17.052 17.030 So, it doesn't seem to be some strange interaction with LVM, and the node used for testing in this case is NOT the xen enabled one that I originally had the trouble with... Anyone have any ideas? Why would stand alone be worse than connected? I used the exact same config files for the various different modules... just changing the protocol versions, but that didn't make much difference... Obviously there are tweaks that can be done (moving the metadata around etc) to speed things up, but why would the 8.x versions be SOO much slower than the 0.7.x versions ? root at gt7 mnt]# uname -a Linux gt7.baremetal.com 2.6.18-53.1.19.el5 #1 SMP Wed May 7 08:20:19 EDT 2008 i686 i686 i386 GNU/Linux [root at gt7 mnt]# more /etc/drbd.conf global { minor-count 5; } resource drbd-test { protocol C; on gt7.baremetal.com { device /dev/drbd0; #disk /dev/vg1/drbd-test; disk /dev/sdc; address 192.168.90.17:7790; meta-disk internal; } on gt2.baremetal.com { device /dev/drbd0; disk /dev/vg1/drbd-test; address 192.168.90.15:7790; meta-disk internal; } disk { on-io-error detach; } syncer { rate 40M; al-extents 257; } startup { degr-wfc-timeout 120; } } Help/thoughts? If I hadn't had very good experiences with 0.7.x I'd just throw up my hands and walk away, but I've been running 0.7.23 on at least one node for ages and it's been great. Oh, this is sort of funny, I got suspicious that my transfer speeds were close to that Sync rate, so I cranked the sync rate up to 100M, it didn't change the filesystem write performance, but it when I disconnected and then reconnected the secondary and primary, I was getting aproximately 90MByte/s being written to the disk of the secondary... which reminds, these nodes are connected via Gigabit ethernet switch (cisco WS-C2960G-24TC-L). hhm, doesn't seem to be an ext3 issue either... [root at gt7 /]# sync; time sh -c "dd if=/dev/zero of=/dev/drbd0 bs=4096 count=256000; sync" real 0m25.849s (8.0.12 protoC, so ext3 would have been aprox 27 seconds, still nowhere near as fast as the 15 seconds of 0.7.25) oh, except that if you run it twice, you get radically different results... real 0m13.537s real 0m13.721s ... -Tom