Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I'm using two identical machines, both packing 32GB of RAM, 16 Xeon cores and a battery-backed hardware RAID-6 setup (SUN STK using Adaptec AAC). Both machines are running Debian, Linux 2.6.24.3 and DRBD 8.0.11. The two machines are connected using a dedicated gigabit ethernet link (using an Intel 82571EB) with MTU set to 9000. The following problem occurs: the secondary node reaches 100% disk utilisation very quickly when writing to the DRBD block device using sync. This results in a factor ~1000(!) performance decrease compared to streamed write. I've done several things already, all with little to no effect. I've enabled jumbo frames, pinned the DRBD threads to a single core and to several cores within the cpu, in/decreased the unplug-watermark and played around with some of the net settings. Here are some useful results from the tests I ran. Ping both ways (the second node had the same result): # ping -w 10 -f -s 4100 192.168.1.100 PING 192.168.1.100 (192.168.1.100) 4100(4128) bytes of data. . --- 192.168.1.100 ping statistics --- 21429 packets transmitted, 21428 received, 0% packet loss, time 10000ms rtt min/avg/max/mdev = 0.231/0.444/0.700/0.090 ms, ipg/ewma 0.466/0.404 ms Streamed dd to the local filesystem: # dd if=/dev/zero of=/tmp/testfile bs=4096 count=10000 10000+0 records in 10000+0 records out 40960000 bytes (41 MB) copied, 0.0920567 s, 445 MB/s Synced dd to the local filesystem: # dd if=/dev/zero of=/tmp/testfile bs=4096 count=10000 oflag=dsync 10000+0 records in 10000+0 records out 40960000 bytes (41 MB) copied, 5.39007 s, 7.6 MB/s Streamed dd to the disconnected DRBD block device: # dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000 10000+0 records in 10000+0 records out 40960000 bytes (41 MB) copied, 0.0769395 s, 532 MB/s Synced dd to the disconnected DRBD block device: # dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000 oflag=dsync 10000+0 records in 10000+0 records out 40960000 bytes (41 MB) copied, 3.84082 s, 10.7 MB/s Streamed dd to the connected DRBD block device: # dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000 10000+0 records in 10000+0 records out 40960000 bytes (41 MB) copied, 0.070894 s, 578 MB/s Synced dd to the connected DRBD block device: # dd if=/dev/zero of=/mnt/test/testfile bs=4096 count=10000 oflag=dsync 28+0 records in 28+0 records out 114688 bytes (115 kB) copied, 15.2326 s, 7.5 kB/s I ^C'd it. As you can see the performance grinds down to a halt when writing to the block device using the dsync flag. What worries me is the iostat output during the synced dd on the secondary node. The primary node seems fine. avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.00 0.00 0.00 100.00 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.00 5.00 5.00 0.00 0.04 8.80 1.00 92.00 100.00 100.00 Completely saturised while only achieving 7kB/s throughput. When I switch roles (primary becomes secondary and vice versa) the problem simply moves to the other node. Some snippets of configuration (from drdbsetup show): disk { size 0s _is_default; # bytes on-io-error detach; fencing resource-only; } net { timeout 60 _is_default; # 1/10 seconds max-epoch-size 16000; max-buffers 16000; unplug-watermark 128 _is_default; connect-int 10 _is_default; # seconds ping-int 10 _is_default; # seconds sndbuf-size 524288; # bytes ko-count 0 _is_default; cram-hmac-alg "sha1"; shared-secret :); after-sb-0pri disconnect _is_default; after-sb-1pri disconnect _is_default; after-sb-2pri disconnect _is_default; rr-conflict disconnect _is_default; ping-timeout 5 _is_default; # 1/10 seconds } syncer { rate 51200k; # bytes/second after -1 _is_default; al-extents 257; } protocol C; _this_host { device "/dev/drbd0"; disk "/dev/sda5"; meta-disk internal; address 192.168.1.101:7788; } _remote_host { address 192.168.1.100:7788; } I've configured three DRBD devices in total, all using the same parameters. I hope you guys can help. I'm sorry for the lengthy post. Thanks in advance, Joris