Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
I'm using two identical machines, both packing 32GB of RAM, 16 Xeon
cores and a battery-backed hardware RAID-6 setup (SUN STK using Adaptec
AAC). Both machines are running Debian, Linux 2.6.24.3 and DRBD 8.0.11.
The two machines are connected using a dedicated gigabit ethernet link
(using an Intel 82571EB) with MTU set to 9000.
The following problem occurs: the secondary node reaches 100% disk
utilisation very quickly when writing to the DRBD block device using
sync. This results in a factor ~1000(!) performance decrease compared to
streamed write.
I've done several things already, all with little to no effect. I've
enabled jumbo frames, pinned the DRBD threads to a single core and to
several cores within the cpu, in/decreased the unplug-watermark and
played around with some of the net settings.
Here are some useful results from the tests I ran.
Ping both ways (the second node had the same result):
# ping -w 10 -f -s 4100 192.168.1.100
PING 192.168.1.100 (192.168.1.100) 4100(4128) bytes of data.
.
--- 192.168.1.100 ping statistics ---
21429 packets transmitted, 21428 received, 0% packet loss, time 10000ms
rtt min/avg/max/mdev = 0.231/0.444/0.700/0.090 ms, ipg/ewma 0.466/0.404 ms
Streamed dd to the local filesystem:
# dd if=/dev/zero of=/tmp/testfile bs=4096 count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.0920567 s, 445 MB/s
Synced dd to the local filesystem:
# dd if=/dev/zero of=/tmp/testfile bs=4096 count=10000 oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 5.39007 s, 7.6 MB/s
Streamed dd to the disconnected DRBD block device:
# dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.0769395 s, 532 MB/s
Synced dd to the disconnected DRBD block device:
# dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000 oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.84082 s, 10.7 MB/s
Streamed dd to the connected DRBD block device:
# dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.070894 s, 578 MB/s
Synced dd to the connected DRBD block device:
# dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000 oflag=dsync
28+0 records in
28+0 records out
114688 bytes (115 kB) copied, 15.2326 s, 7.5 kB/s
I ^C'd it. As you can see the performance grinds down to a halt when
writing to the block device using the dsync flag.
What worries me is the iostat output during the synced dd on the
secondary node. The primary node seems fine.
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.00 0.00 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 6.00 5.00 5.00 0.00 0.04
8.80 1.00 92.00 100.00 100.00
Completely saturised while only achieving 7kB/s throughput. When I switch
roles (primary becomes secondary and vice versa) the problem simply
moves to the other node.
Some snippets of configuration (from drdbsetup show):
disk {
size 0s _is_default; # bytes
on-io-error detach;
fencing resource-only;
}
net {
timeout 60 _is_default; # 1/10 seconds
max-epoch-size 16000;
max-buffers 16000;
unplug-watermark 128 _is_default;
connect-int 10 _is_default; # seconds
ping-int 10 _is_default; # seconds
sndbuf-size 524288; # bytes
ko-count 0 _is_default;
cram-hmac-alg "sha1";
shared-secret :);
after-sb-0pri disconnect _is_default;
after-sb-1pri disconnect _is_default;
after-sb-2pri disconnect _is_default;
rr-conflict disconnect _is_default;
ping-timeout 5 _is_default; # 1/10 seconds
}
syncer {
rate 51200k; # bytes/second
after -1 _is_default;
al-extents 257;
}
protocol C;
_this_host {
device "/dev/drbd0";
disk "/dev/sda5";
meta-disk internal;
address 192.168.1.101:7788;
}
_remote_host {
address 192.168.1.100:7788;
}
I've configured three DRBD devices in total, all using the same
parameters.
I hope you guys can help. I'm sorry for the lengthy post.
Thanks in advance,
Joris