Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
I have a mailstore that is about 1.2TB in size that I need to migrate from one
set of storage to another. In order to avoid a large amount of time offline
while the data copies, I am planning to use DRBD to replicate the bulk of the
data while the mail system is online. I believe I have the process for this
down, but I am concerned about the loss in performance I am seeing when DRBD is
enabled.
My test setup is two Dell PE2950 servers, each having a fully populated MD1000
array attached. There are fifteen 300GB 10K drives in a RAID10 configuration.
The servers are connected to a 1Gbit network and are connected to the same HP
switch, which is not being utilized for anything else at the moment.
I am using bonnie++ to test performance, using the following command:
$ bonnie++ -u root -d /repl/ -s0 -n 256:10k:1k:128 -f
When run on each of the servers, here is what I get:
===== SERVER A =====
Version 1.03 ------Sequential Create------ --------Random Create--------
serverA -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
256:10:0 22818 67 35094 39 22575 55 19946 57 36000 38 15274 42
====================
===== SERVER B =====
Version 1.03 ------Sequential Create------ --------Random Create--------
serverB -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
256:10:0 22733 65 35465 40 22735 54 20955 60 36114 39 15237 42
====================
When setup for replication from server A to server B, here is what I get:
====================
Version 1.03 ------Sequential Create------ --------Random Create--------
serverA -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
256:10:0 19270 57 18322 34 13527 43 18790 56 18523 31 8223 34
====================
My drbd.conf file looks like this:
====================
resource r0 {
protocol C;
incon-degr-cmd "echo 'DRBD: pri on incon-degr' | wall ; sleep 10";
startup { wfc-timeout 0; degr-wfc-timeout 120; }
disk { on-io-error detach; }
net {
# max-buffers 20480;
# max-epoch-size 16384;
sndbuf-size 512K;
}
syncer {
rate 512M;
group 1;
al-extents 1024;
}
on serverA {
device /dev/drbd0;
disk /dev/vg1/repl;
address 10.103.5.150:7788;
meta-disk /dev/vg0/drbd-meta[0];
}
on serverB {
device /dev/drbd0;
disk /dev/vg1/repl;
address 10.103.5.151:7788;
meta-disk /dev/vg0/drbd-meta[0];
}
}
====================
I have tried using the different protocols as well as various other settings
but nothing seems to really have much impact. So, I am curious what the best
steps to take next are in terms of finding the bottleneck. Thanks,
robert