[DRBD-user] Problem using DRBD 8.3.8.1/Intel 10Gb/x64/2.6.36

Sun Nov 21 14:30:02 CET 2010

Hi,

I did migrate an active passive mail server from Debian x32
(2.6.22/Intel Gb/Drbd 8.???) to Debian x64 (Lenny) (2.6.36/Intel
10Gb/Drbd 8.3.8.1).

More often than not, the directories located on the DRBD device (on the
master) become unaccessible.

****
$ cat /proc/drbd
version: 8.3.8.1 (api:88/proto:86-94)
srcversion: 70A36576ED3A03229A2AC68 
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:73567 nr:1091182 dw:1164749 dr:235348 al:745 bm:379 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:213 dw:213 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

The intel device is:
17:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT Network Connection (rev 01)

17:00.0 0200: 8086:10c8 (rev 01)
        Subsystem: 8086:a11c
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at fdfe0000 (32-bit, non-prefetchable) [size=128K]
        Memory at fdf80000 (32-bit, non-prefetchable) [size=256K]
        I/O ports at 6000 [size=32]
        Memory at fdf70000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
        Capabilities: [60] MSI-X: Enable- Mask- TabSize=18
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting <?>
        Capabilities: [140] Device Serial Number 64-45-49-ff-ff-21-1b-00
        Kernel driver in use: ixgbe

$ ifconfig eth6
eth6      Link encap:Ethernet  HWaddr 00:1b:21:49:45:64  
          inet adr:172.16.10.10  Bcast:172.16.10.255  Masque:255.255.255.0
          adr inet6: fe80::21b:21ff:fe49:4564/64 Scope:Lien
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:162398 errors:0 dropped:0 overruns:0 frame:0
          TX packets:79214 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:1000 
          RX bytes:1181374735 (1.1 GiB)  TX bytes:154836134 (147.6 MiB)

$ drbd-overview
  0:mail  Connected Primary/Secondary UpToDate/UpToDate C r---- /data/mail xfs 1.4T 424G 918G 32% 
  1:web   Connected Secondary/Primary UpToDate/UpToDate C r---- 

$ cat /etc/drbd.conf
resource mail {

  protocol C;
  handlers {
  pri-on-incon-degr "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
  }

  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }

  disk {
    on-io-error   detach;
  }

  net {
    sndbuf-size 0;
    max-epoch-size  4096;

  }

  syncer {
    rate 500M;
    al-extents 257;
  }

  on s1 {
    device     /dev/drbd0;
    disk       /dev/md2;
    address    172.16.10.10:7788;
    meta-disk  internal;

  }

  on s2 {
    device    /dev/drbd0;
    disk      /dev/md2;
    address   172.16.10.20:7788;
    meta-disk internal;
  }
}

strace repquota -a is available at:
http://pastebin.com/kELxfRJN

systat output:
Linux 2.6.36-dl380-x64 (s1) 	21.11.2010 	_x86_64_

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,82    0,15    0,45    0,30    0,00   98,27

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
drbd0            35,79       446,82       150,41   36788951   12383864
****

Is this a known issue ?

My 10Gb cards seems to work fine apart from this.

Should I try syncing over Gb links ?

Thanks

Laurent