[DRBD-user] performance issues, questions

Mon Nov 20 16:01:35 CET 2006

Hello,

I have a mailstore that is about 1.2TB in size that I need to migrate from one
set of storage to another. In order to avoid a large amount of time offline
while the data copies, I am planning to use DRBD to replicate the bulk of the
data while the mail system is online. I believe I have the process for this
down, but I am concerned about the loss in performance I am seeing when DRBD is
enabled.

My test setup is two Dell PE2950 servers, each having a fully populated MD1000
array attached. There are fifteen 300GB 10K drives in a RAID10 configuration.
The servers are connected to a 1Gbit network and are connected to the same HP
switch, which is not being utilized for anything else at the moment.

I am using bonnie++ to test performance, using the following command:

   $ bonnie++ -u root -d /repl/ -s0 -n 256:10k:1k:128 -f

When run on each of the servers, here is what I get:

===== SERVER A =====
Version  1.03       ------Sequential Create------ --------Random Create--------
serverA             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max            /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
           256:10:0 22818  67 35094  39 22575  55 19946  57 36000  38 15274  42
====================
===== SERVER B =====
Version  1.03       ------Sequential Create------ --------Random Create--------
serverB             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max            /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
           256:10:0 22733  65 35465  40 22735  54 20955  60 36114  39 15237  42
====================

When setup for replication from server A to server B, here is what I get:

====================
Version  1.03       ------Sequential Create------ --------Random Create--------
serverA             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max            /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
           256:10:0 19270  57 18322  34 13527  43 18790  56 18523  31  8223  34
====================

My drbd.conf file looks like this:

====================
resource r0 {
        protocol C;
        incon-degr-cmd "echo 'DRBD: pri on incon-degr' | wall ; sleep 10";
        startup { wfc-timeout 0; degr-wfc-timeout 120; }
        disk { on-io-error detach; }
        net {
#               max-buffers 20480;
#               max-epoch-size 16384;
                sndbuf-size 512K;
        }
        syncer {
                rate 512M;
                group 1;
                al-extents 1024;
        }
        on serverA {
                device    /dev/drbd0;
                disk      /dev/vg1/repl;
                address   10.103.5.150:7788;
                meta-disk /dev/vg0/drbd-meta[0];
        }
        on serverB {
                device    /dev/drbd0;
                disk      /dev/vg1/repl;
                address   10.103.5.151:7788;
                meta-disk /dev/vg0/drbd-meta[0];
        }
}
====================

I have tried using the different protocols as well as various other settings
but nothing seems to really have much impact. So, I am curious what the best
steps to take next are in terms of finding the bottleneck. Thanks,

robert