Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
I encountered this doing performance and stability testing with
iozone on a DRBD 2 node setup and stripped it down to this test case.
It reproduces on two different Linux derivates:
- RHEL 5.5
Linux wu-wien.ac.at 2.6.18-194.11.3.el5 #1 SMP Mon Aug 23 15:51:38 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
- GRML64 2010.04 (debian derived; 64bit)
Linux wu-wien.ac.at 2.6.33-grml64 #1 SMP PREEMPT Fri Apr 2 10:19:25 UTC 2010 x86_64 GNU/Linux
- GRML 2010.04 (32bit)
Linux wu-wien.ac.at 2.6.33-grml #1 SMP PREEMPT Fri Apr 2 10:16:25 UTC 2010 i686 GNU/Linux
DRBD Version: 8.3.8.1
Reproduction Steps:
-------------------
1. Load drbd module # on both nodes
2. drbdadm up r0 # on both nodes
3. drbdadm primary r0 # on node 1
4. drbdadm detach r0 # on node 1
=> Diskstate of primary is now Diskless
5. dd if=/dev/drbd0 of=/dev/null iflag=direct bs=9M count=50 # on node 1
Result: Transfer speed < 5 MByte/sec !!!
Expected: Transfer speed > 80 MByte/sec (near link bandwidth between nodes)
Actual transfer speed depends on ping-int parameter.
Because communication completely stops until a ping from primary kicks it again
(see tcpdump log and netstat output).
The border for directio starvation is equal max-buffers size parameter (default 8M).
# cat /proc/drbd
version: 8.3.8.1 (api:88/proto:86-94)
GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by root at grml, 2010-09-02 14:00:17
0: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r----
ns:0 nr:240896 dw:0 dr:732 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
# cat /etc/drbd.conf
resource r0 {
protocol C;
on hostA {
device /dev/drbd0;
disk /dev/vgsys/drbd_test_01;
address 10.0.0.1:7789;
meta-disk internal;
}
on HostB {
device /dev/drbd0;
disk /dev/vgsys/drbd_test_01;
address 10.0.0.2:7789;
meta-disk internal;
}
}
# drbdsetup 0 show
net {
timeout 60 _is_default; # 1/10 seconds
max-epoch-size 2048 _is_default;
max-buffers 2048 _is_default;
unplug-watermark 128 _is_default;
connect-int 10 _is_default; # seconds
ping-int 10 _is_default; # seconds
sndbuf-size 0 _is_default; # bytes
rcvbuf-size 0 _is_default; # bytes
ko-count 0 _is_default;
allow-two-primaries;
after-sb-0pri disconnect _is_default;
after-sb-1pri disconnect _is_default;
after-sb-2pri disconnect _is_default;
rr-conflict disconnect _is_default;
ping-timeout 5 _is_default; # 1/10 seconds
}
syncer {
rate 81920k; # bytes/second
after -1 _is_default;
al-extents 127 _is_default;
}
protocol C;
_this_host {
device minor 0;
address ipv4 10.0.0.1:7789;
}
_remote_host {
address ipv4 10.0.0.2:7789;
}
# dd if=/dev/drbd0 of=/dev/null iflag=direct bs=7M count=50
50+0 records in
50+0 records out
367001600 bytes (367 MB) copied, 4.35694 seconds, 84.2 MB/s
# dd if=/dev/drbd0 of=/dev/null iflag=direct bs=8M count=50
50+0 records in
50+0 records out
419430400 bytes (419 MB) copied, 4.47541 seconds, 93.7 MB/s
# dd if=/dev/drbd0 of=/dev/null iflag=direct bs=9M count=50
50+0 records in
50+0 records out
471859200 bytes (472 MB) copied, 112.375 seconds, 4.2 MB/s
# netstat -tnp
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 10.0.0.2:59085 10.0.0.1:7789 ESTABLISHED -
tcp 1704 0 10.0.0.2:7789 10.0.0.1:50675 ESTABLISHED -
# tcpdump -n port 7789
...
13:00:00.041803 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10817264 win 12279 <nop,nop,timestamp 487395 603261295>
13:00:00.041808 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10820160 win 12280 <nop,nop,timestamp 487395 603261295>
13:00:00.041844 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10823056 win 12287 <nop,nop,timestamp 487395 603261295>
13:00:00.079967 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10824000 win 12287 <nop,nop,timestamp 487407 603261295>
13:00:09.890176 IP 10.0.0.1.7789 > 10.0.0.2.59085: P 17:25(8) ack 16 win 6 <nop,nop,timestamp 490350 603261152> <<= 10 sec ping-int
13:00:09.890195 IP 10.0.0.2.59085 > 10.0.0.1.7789: P 16:24(8) ack 25 win 46 <nop,nop,timestamp 603271152 490350>
13:00:09.890368 IP 10.0.0.1.7789 > 10.0.0.2.59085: . ack 24 win 6 <nop,nop,timestamp 490350 603271152>
13:00:09.903786 IP 10.0.0.2.7789 > 10.0.0.1.50675: . 10824000:10826896(2896) ack 10249 win 1856 <nop,nop,timestamp 603271165 487407>
13:00:09.903792 IP 10.0.0.2.7789 > 10.0.0.1.50675: . 10826896:10828344(1448) ack 10249 win 1856 <nop,nop,timestamp 603271165 487407>
...
Kind Regards,
Roland
--
Roland.Friedwagner at wu.ac.at Phone: +43 1 31336 5377
IT Services - WU (Vienna University of Economics and Business)