[DRBD-user] problem with connection dropping

Tom Brown brown at esteem.com
Mon Aug 6 18:08:46 CEST 2007


On Saturday 04 August 2007 02:05, Jérôme Augé wrote:
> 2007/8/3, Tom Brown <brown at esteem.com>:
> > I have a problem with network cards failing for resource 0 (r0). I
> > thought it was the cheap network cards in both nodes. So, I replaced them
> > with Intel Pro/1000 Gb cards. The connection worked at first and the sync
> > finished without a problem. Then, after a few days, the connection went
> > back to Primary/Unknown. I can't get ping through on that interface
> > either. When I replaced the network cards I moved things around so the
> > network cards for r0 were in a different pci slot. Any ideas on what may
> > be going on here? Is this hardware issue? If so, any suggestions on a pci
> > network card to use?
>
> Hi,
>
> First, how are your machines connected : with a crossover cable or
> with a switch ?
A crossover cable.

> When your problem happen, do you see incoming traffic on your network
> interface (tcpdump -nli <ethX>) ? 
All I see are arps on zan (primary). jayna (secondary) has some odd traffic on 
eth1.

zan:~# tcpdump -nli eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
08:17:43.882316 arp who-has 192.168.1.4 tell 192.168.1.3
08:17:44.882350 arp who-has 192.168.1.4 tell 192.168.1.3
08:17:45.882419 arp who-has 192.168.1.4 tell 192.168.1.3
08:17:46.986483 arp who-has 192.168.1.4 tell 192.168.1.3
08:17:47.986556 arp who-has 192.168.1.4 tell 192.168.1.3
...

jayna:~# tcpdump -nli eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
08:41:35.274218 IP 192.168.1.4.2356 > 192.168.1.3.7788: S 
1832587988:1832587988(0) win 5840 <mss 1460,sackOK,timestamp 185904360 
0,nop,wscale 3>
08:41:35.274272 00:1b:21:01:7f:6b > 01:80:c2:00:00:01, ethertype Unknown 
(0x8808), length 60:
        0x0000:  0001 0680 0000 0000 0000 0000 0000 0000  ................
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
08:41:40.274423 arp who-has 192.168.1.3 tell 192.168.1.4
08:41:40.274525 00:1b:21:01:7f:6b > 01:80:c2:00:00:01, ethertype Unknown 
(0x8808), length 60:
        0x0000:  0001 0680 0000 0000 0000 0000 0000 0000  ................
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
08:41:41.274467 arp who-has 192.168.1.3 tell 192.168.1.4
08:41:41.274494 IP 192.168.1.4.2356 > 192.168.1.3.7788: S 
1832587988:1832587988(0) win 5840 <mss 1460,sackOK,timestamp 185905860 
0,nop,wscale 3>
08:41:41.274580 00:1b:21:01:7f:6b > 01:80:c2:00:00:01, ethertype Unknown 
(0x8808), length 60:
        0x0000:  0001 0680 0000 0000 0000 0000 0000 0000  ................
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
...

> do you have RX/TX
> drops/errors/overruns in the output of ifconfig on your ethX ? 
zan:
eth1      Link encap:Ethernet  HWaddr 00:1B:21:01:7F:6B
          inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:21ff:fe01:7f6b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:36440925 errors:0 dropped:272703 overruns:0 frame:0
          TX packets:63201611 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2928021626 (2.7 GiB)  TX bytes:1072467626 (1022.7 MiB)
          Base address:0xa400 Memory:f1800000-f1820000

jayna:
eth1      Link encap:Ethernet  HWaddr 00:1B:21:01:7E:71
          inet addr:192.168.1.4  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:21ff:fe01:7e71/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:63200415 errors:0 dropped:0 overruns:0 frame:0
          TX packets:36713697 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1072442662 (1022.7 MiB)  TX bytes:2946550726 (2.7 GiB)
          Base address:0xa400 Memory:f1800000-f1820000


> are the 
> link status and negotiated speed and duplex mode correct (ethtool
> <ethX>) ? If you are connected to a switch, you might need to check
> the speed/mode negotiation and link status from the switch side too.
zan:~# ethtool eth1
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: g
        Current message level: 0x00000007 (7)
        Link detected: yes

jayna:~# ethtool eth1
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: g
        Current message level: 0x00000007 (7)
        Link detected: yes

> You might also want to check for networking related kernel messages in
> dmesg/syslog (things like "ethX down", "NETDEV transmit timeout",
> "e1000 error", etc.)

Well, the dmesg log doesn't match on both nodes. zan is showing a 100 Mbps 
connection while jayna is showing a 1000 Mbps connection. The lights on the 
network cards both indicate a 1000 Mbps connection. Any ideas on this?

zan:~# grep e1000 /var/log/*
/var/log/dmesg:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit) 
00:1b:21:01:7f:6b
/var/log/dmesg:e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
/var/log/dmesg:e1000: eth1: e1000_watchdog: NIC Link is Up 100 Mbps Full 
Duplex
/var/log/dmesg.0:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit) 
00:1b:21:01:7f:6b
/var/log/dmesg.0:e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network 
Connection
/var/log/dmesg.0:e1000: eth1: e1000_watchdog: NIC Link is Up 100 Mbps Full 
Duplex

jayna:~# grep e1000 /var/log/*
/var/log/dmesg:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit) 
00:1b:21:01:7e:71
/var/log/dmesg:e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
/var/log/dmesg:e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full 
Duplex
/var/log/dmesg.0:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit) 
00:1b:21:01:7e:71
/var/log/dmesg.0:e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network 
Connection

I'm using the kernel driver for these cards. Should I be using the latest 
Intel driver?

Thanks,
Tom



More information about the drbd-user mailing list