Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I encountered this doing performance and stability testing with iozone on a DRBD 2 node setup and stripped it down to this test case. It reproduces on two different Linux derivates: - RHEL 5.5 Linux wu-wien.ac.at 2.6.18-194.11.3.el5 #1 SMP Mon Aug 23 15:51:38 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux - GRML64 2010.04 (debian derived; 64bit) Linux wu-wien.ac.at 2.6.33-grml64 #1 SMP PREEMPT Fri Apr 2 10:19:25 UTC 2010 x86_64 GNU/Linux - GRML 2010.04 (32bit) Linux wu-wien.ac.at 2.6.33-grml #1 SMP PREEMPT Fri Apr 2 10:16:25 UTC 2010 i686 GNU/Linux DRBD Version: 8.3.8.1 Reproduction Steps: ------------------- 1. Load drbd module # on both nodes 2. drbdadm up r0 # on both nodes 3. drbdadm primary r0 # on node 1 4. drbdadm detach r0 # on node 1 => Diskstate of primary is now Diskless 5. dd if=/dev/drbd0 of=/dev/null iflag=direct bs=9M count=50 # on node 1 Result: Transfer speed < 5 MByte/sec !!! Expected: Transfer speed > 80 MByte/sec (near link bandwidth between nodes) Actual transfer speed depends on ping-int parameter. Because communication completely stops until a ping from primary kicks it again (see tcpdump log and netstat output). The border for directio starvation is equal max-buffers size parameter (default 8M). # cat /proc/drbd version: 8.3.8.1 (api:88/proto:86-94) GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by root at grml, 2010-09-02 14:00:17 0: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r---- ns:0 nr:240896 dw:0 dr:732 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 # cat /etc/drbd.conf resource r0 { protocol C; on hostA { device /dev/drbd0; disk /dev/vgsys/drbd_test_01; address 10.0.0.1:7789; meta-disk internal; } on HostB { device /dev/drbd0; disk /dev/vgsys/drbd_test_01; address 10.0.0.2:7789; meta-disk internal; } } # drbdsetup 0 show net { timeout 60 _is_default; # 1/10 seconds max-epoch-size 2048 _is_default; max-buffers 2048 _is_default; unplug-watermark 128 _is_default; connect-int 10 _is_default; # seconds ping-int 10 _is_default; # seconds sndbuf-size 0 _is_default; # bytes rcvbuf-size 0 _is_default; # bytes ko-count 0 _is_default; allow-two-primaries; after-sb-0pri disconnect _is_default; after-sb-1pri disconnect _is_default; after-sb-2pri disconnect _is_default; rr-conflict disconnect _is_default; ping-timeout 5 _is_default; # 1/10 seconds } syncer { rate 81920k; # bytes/second after -1 _is_default; al-extents 127 _is_default; } protocol C; _this_host { device minor 0; address ipv4 10.0.0.1:7789; } _remote_host { address ipv4 10.0.0.2:7789; } # dd if=/dev/drbd0 of=/dev/null iflag=direct bs=7M count=50 50+0 records in 50+0 records out 367001600 bytes (367 MB) copied, 4.35694 seconds, 84.2 MB/s # dd if=/dev/drbd0 of=/dev/null iflag=direct bs=8M count=50 50+0 records in 50+0 records out 419430400 bytes (419 MB) copied, 4.47541 seconds, 93.7 MB/s # dd if=/dev/drbd0 of=/dev/null iflag=direct bs=9M count=50 50+0 records in 50+0 records out 471859200 bytes (472 MB) copied, 112.375 seconds, 4.2 MB/s # netstat -tnp Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 10.0.0.2:59085 10.0.0.1:7789 ESTABLISHED - tcp 1704 0 10.0.0.2:7789 10.0.0.1:50675 ESTABLISHED - # tcpdump -n port 7789 ... 13:00:00.041803 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10817264 win 12279 <nop,nop,timestamp 487395 603261295> 13:00:00.041808 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10820160 win 12280 <nop,nop,timestamp 487395 603261295> 13:00:00.041844 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10823056 win 12287 <nop,nop,timestamp 487395 603261295> 13:00:00.079967 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10824000 win 12287 <nop,nop,timestamp 487407 603261295> 13:00:09.890176 IP 10.0.0.1.7789 > 10.0.0.2.59085: P 17:25(8) ack 16 win 6 <nop,nop,timestamp 490350 603261152> <<= 10 sec ping-int 13:00:09.890195 IP 10.0.0.2.59085 > 10.0.0.1.7789: P 16:24(8) ack 25 win 46 <nop,nop,timestamp 603271152 490350> 13:00:09.890368 IP 10.0.0.1.7789 > 10.0.0.2.59085: . ack 24 win 6 <nop,nop,timestamp 490350 603271152> 13:00:09.903786 IP 10.0.0.2.7789 > 10.0.0.1.50675: . 10824000:10826896(2896) ack 10249 win 1856 <nop,nop,timestamp 603271165 487407> 13:00:09.903792 IP 10.0.0.2.7789 > 10.0.0.1.50675: . 10826896:10828344(1448) ack 10249 win 1856 <nop,nop,timestamp 603271165 487407> ... Kind Regards, Roland -- Roland.Friedwagner at wu.ac.at Phone: +43 1 31336 5377 IT Services - WU (Vienna University of Economics and Business)