[DRBD-user] Massive starvation in diskless state doing direct IO reads

Roland Friedwagner roland.friedwagner at wu.ac.at
Thu Sep 2 16:40:20 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I encountered this doing performance and stability testing with
iozone on a DRBD 2 node setup and stripped it down to this test case.
It reproduces on two different Linux derivates:

 - RHEL 5.5
   Linux wu-wien.ac.at 2.6.18-194.11.3.el5 #1 SMP Mon Aug 23 15:51:38 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
 - GRML64 2010.04 (debian derived; 64bit)
   Linux wu-wien.ac.at 2.6.33-grml64 #1 SMP PREEMPT Fri Apr 2 10:19:25 UTC 2010 x86_64 GNU/Linux
 - GRML 2010.04 (32bit)
   Linux wu-wien.ac.at 2.6.33-grml #1 SMP PREEMPT Fri Apr 2 10:16:25 UTC 2010 i686 GNU/Linux


DRBD Version: 8.3.8.1

Reproduction Steps:
-------------------
1. Load drbd module                                           # on both nodes
2. drbdadm up r0                                              # on both nodes
3. drbdadm primary r0                                         # on node 1
4. drbdadm detach r0                                          # on node 1
=> Diskstate of primary is now Diskless
5. dd if=/dev/drbd0 of=/dev/null iflag=direct bs=9M count=50  # on node 1

Result:   Transfer speed < 5 MByte/sec !!!
Expected: Transfer speed > 80 MByte/sec (near link bandwidth between nodes)

Actual transfer speed depends on ping-int parameter.
Because communication completely stops until a ping from primary kicks it again
(see tcpdump log and netstat output).
The border for directio starvation is equal max-buffers size parameter (default 8M).


# cat /proc/drbd
version: 8.3.8.1 (api:88/proto:86-94)
GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by root at grml, 2010-09-02 14:00:17
 0: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r----
    ns:0 nr:240896 dw:0 dr:732 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

# cat /etc/drbd.conf
resource r0 {
  protocol C;
  on hostA {
    device    /dev/drbd0;
    disk      /dev/vgsys/drbd_test_01;
    address   10.0.0.1:7789;
    meta-disk internal;
  }
  on HostB {
    device    /dev/drbd0;
    disk      /dev/vgsys/drbd_test_01;
    address   10.0.0.2:7789;
    meta-disk internal;
  }
}

# drbdsetup 0 show
net {
        timeout                 60 _is_default; # 1/10 seconds
        max-epoch-size          2048 _is_default;
        max-buffers             2048 _is_default;
        unplug-watermark        128 _is_default;
        connect-int             10 _is_default; # seconds
        ping-int                10 _is_default; # seconds
        sndbuf-size             0 _is_default; # bytes
        rcvbuf-size             0 _is_default; # bytes
        ko-count                0 _is_default;
        allow-two-primaries;
        after-sb-0pri           disconnect _is_default;
        after-sb-1pri           disconnect _is_default;
        after-sb-2pri           disconnect _is_default;
        rr-conflict             disconnect _is_default;
        ping-timeout            5 _is_default; # 1/10 seconds
}
syncer {
        rate                    81920k; # bytes/second
        after                   -1 _is_default;
        al-extents              127 _is_default;
}
protocol C;
_this_host {
        device                  minor 0;
        address                 ipv4 10.0.0.1:7789;
}
_remote_host {
        address                 ipv4 10.0.0.2:7789;
}

# dd if=/dev/drbd0 of=/dev/null iflag=direct bs=7M count=50
50+0 records in
50+0 records out
367001600 bytes (367 MB) copied, 4.35694 seconds, 84.2 MB/s
# dd if=/dev/drbd0 of=/dev/null iflag=direct bs=8M count=50
50+0 records in
50+0 records out
419430400 bytes (419 MB) copied, 4.47541 seconds, 93.7 MB/s
# dd if=/dev/drbd0 of=/dev/null iflag=direct bs=9M count=50
50+0 records in
50+0 records out
471859200 bytes (472 MB) copied, 112.375 seconds, 4.2 MB/s

# netstat -tnp
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp        0      0 10.0.0.2:59085              10.0.0.1:7789               ESTABLISHED -
tcp     1704      0 10.0.0.2:7789               10.0.0.1:50675              ESTABLISHED -

# tcpdump -n port 7789
...
13:00:00.041803 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10817264 win 12279 <nop,nop,timestamp 487395 603261295>
13:00:00.041808 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10820160 win 12280 <nop,nop,timestamp 487395 603261295>
13:00:00.041844 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10823056 win 12287 <nop,nop,timestamp 487395 603261295>
13:00:00.079967 IP 10.0.0.1.50675 > 10.0.0.2.7789: . ack 10824000 win 12287 <nop,nop,timestamp 487407 603261295>
13:00:09.890176 IP 10.0.0.1.7789 > 10.0.0.2.59085: P 17:25(8) ack 16 win 6 <nop,nop,timestamp 490350 603261152>      <<= 10 sec ping-int
13:00:09.890195 IP 10.0.0.2.59085 > 10.0.0.1.7789: P 16:24(8) ack 25 win 46 <nop,nop,timestamp 603271152 490350>
13:00:09.890368 IP 10.0.0.1.7789 > 10.0.0.2.59085: . ack 24 win 6 <nop,nop,timestamp 490350 603271152>
13:00:09.903786 IP 10.0.0.2.7789 > 10.0.0.1.50675: . 10824000:10826896(2896) ack 10249 win 1856 <nop,nop,timestamp 603271165 487407>
13:00:09.903792 IP 10.0.0.2.7789 > 10.0.0.1.50675: . 10826896:10828344(1448) ack 10249 win 1856 <nop,nop,timestamp 603271165 487407>
...

Kind Regards,
Roland

-- 
Roland.Friedwagner at wu.ac.at            Phone: +43 1 31336 5377
IT Services - WU (Vienna University of Economics and Business) 



More information about the drbd-user mailing list