[DRBD-user] Secondary node saturises RAID array

Wed Apr 9 17:02:51 CEST 2008

Hi,

I'm using two identical machines, both packing 32GB of RAM, 16 Xeon
cores and a battery-backed hardware RAID-6 setup (SUN STK using Adaptec
AAC). Both machines are running Debian, Linux 2.6.24.3 and DRBD 8.0.11.
The two machines are connected using a dedicated gigabit ethernet link
(using an Intel 82571EB) with MTU set to 9000.

The following problem occurs: the secondary node reaches 100% disk
utilisation very quickly when writing to the DRBD block device using
sync. This results in a factor ~1000(!) performance decrease compared to
streamed write.

I've done several things already, all with little to no effect. I've
enabled jumbo frames, pinned the DRBD threads to a single core and to
several cores within the cpu, in/decreased the unplug-watermark and
played around with some of the net settings.

Here are some useful results from the tests I ran.

Ping both ways (the second node had the same result):

# ping -w 10 -f -s 4100 192.168.1.100
PING 192.168.1.100 (192.168.1.100) 4100(4128) bytes of data.
.
--- 192.168.1.100 ping statistics ---
21429 packets transmitted, 21428 received, 0% packet loss, time 10000ms
rtt min/avg/max/mdev = 0.231/0.444/0.700/0.090 ms, ipg/ewma 0.466/0.404 ms

Streamed dd to the local filesystem:

# dd if=/dev/zero of=/tmp/testfile bs=4096 count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.0920567 s, 445 MB/s

Synced dd to the local filesystem:

# dd if=/dev/zero of=/tmp/testfile bs=4096 count=10000 oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 5.39007 s, 7.6 MB/s

Streamed dd to the disconnected DRBD block device:

# dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.0769395 s, 532 MB/s

Synced dd to the disconnected DRBD block device:

# dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000 oflag=dsync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.84082 s, 10.7 MB/s

Streamed dd to the connected DRBD block device:

# dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.070894 s, 578 MB/s

Synced dd to the connected DRBD block device:

# dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000 oflag=dsync
28+0 records in
28+0 records out
114688 bytes (115 kB) copied, 15.2326 s, 7.5 kB/s

I ^C'd it. As you can see the performance grinds down to a halt when
writing to the block device using the dsync flag.

What worries me is the iostat output during the synced dd on the
secondary node. The primary node seems fine.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     6.00    5.00    5.00     0.00     0.04    
8.80     1.00   92.00 100.00 100.00

Completely saturised while only achieving 7kB/s throughput. When I switch
roles (primary becomes secondary and vice versa) the problem simply
moves to the other node.

Some snippets of configuration (from drdbsetup show):
disk {
        size                    0s _is_default; # bytes
        on-io-error             detach;
        fencing                 resource-only;
}
net {
        timeout                 60 _is_default; # 1/10 seconds
        max-epoch-size          16000;
        max-buffers             16000;
        unplug-watermark        128 _is_default;
        connect-int             10 _is_default; # seconds
        ping-int                10 _is_default; # seconds
        sndbuf-size             524288; # bytes
        ko-count                0 _is_default;
        cram-hmac-alg           "sha1";
        shared-secret           :);
        after-sb-0pri           disconnect _is_default;
        after-sb-1pri           disconnect _is_default;
        after-sb-2pri           disconnect _is_default;
        rr-conflict             disconnect _is_default;
        ping-timeout            5 _is_default; # 1/10 seconds
}
syncer {
        rate                    51200k; # bytes/second
        after                   -1 _is_default;
        al-extents              257;
}
protocol C;
_this_host {
        device                  "/dev/drbd0";
        disk                    "/dev/sda5";
        meta-disk               internal;
        address                 192.168.1.101:7788;
}
_remote_host {
        address                 192.168.1.100:7788;
}

I've configured three DRBD devices in total, all using the same
parameters.

I hope you guys can help. I'm sorry for the lengthy post.

Thanks in advance,
Joris