Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Saturday 04 August 2007 02:05, Jérôme Augé wrote:
> 2007/8/3, Tom Brown <brown at esteem.com>:
> > I have a problem with network cards failing for resource 0 (r0). I
> > thought it was the cheap network cards in both nodes. So, I replaced them
> > with Intel Pro/1000 Gb cards. The connection worked at first and the sync
> > finished without a problem. Then, after a few days, the connection went
> > back to Primary/Unknown. I can't get ping through on that interface
> > either. When I replaced the network cards I moved things around so the
> > network cards for r0 were in a different pci slot. Any ideas on what may
> > be going on here? Is this hardware issue? If so, any suggestions on a pci
> > network card to use?
>
> Hi,
>
> First, how are your machines connected : with a crossover cable or
> with a switch ?
A crossover cable.
> When your problem happen, do you see incoming traffic on your network
> interface (tcpdump -nli <ethX>) ?
All I see are arps on zan (primary). jayna (secondary) has some odd traffic on
eth1.
zan:~# tcpdump -nli eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
08:17:43.882316 arp who-has 192.168.1.4 tell 192.168.1.3
08:17:44.882350 arp who-has 192.168.1.4 tell 192.168.1.3
08:17:45.882419 arp who-has 192.168.1.4 tell 192.168.1.3
08:17:46.986483 arp who-has 192.168.1.4 tell 192.168.1.3
08:17:47.986556 arp who-has 192.168.1.4 tell 192.168.1.3
...
jayna:~# tcpdump -nli eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
08:41:35.274218 IP 192.168.1.4.2356 > 192.168.1.3.7788: S
1832587988:1832587988(0) win 5840 <mss 1460,sackOK,timestamp 185904360
0,nop,wscale 3>
08:41:35.274272 00:1b:21:01:7f:6b > 01:80:c2:00:00:01, ethertype Unknown
(0x8808), length 60:
0x0000: 0001 0680 0000 0000 0000 0000 0000 0000 ................
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
08:41:40.274423 arp who-has 192.168.1.3 tell 192.168.1.4
08:41:40.274525 00:1b:21:01:7f:6b > 01:80:c2:00:00:01, ethertype Unknown
(0x8808), length 60:
0x0000: 0001 0680 0000 0000 0000 0000 0000 0000 ................
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
08:41:41.274467 arp who-has 192.168.1.3 tell 192.168.1.4
08:41:41.274494 IP 192.168.1.4.2356 > 192.168.1.3.7788: S
1832587988:1832587988(0) win 5840 <mss 1460,sackOK,timestamp 185905860
0,nop,wscale 3>
08:41:41.274580 00:1b:21:01:7f:6b > 01:80:c2:00:00:01, ethertype Unknown
(0x8808), length 60:
0x0000: 0001 0680 0000 0000 0000 0000 0000 0000 ................
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
...
> do you have RX/TX
> drops/errors/overruns in the output of ifconfig on your ethX ?
zan:
eth1 Link encap:Ethernet HWaddr 00:1B:21:01:7F:6B
inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21b:21ff:fe01:7f6b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:36440925 errors:0 dropped:272703 overruns:0 frame:0
TX packets:63201611 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2928021626 (2.7 GiB) TX bytes:1072467626 (1022.7 MiB)
Base address:0xa400 Memory:f1800000-f1820000
jayna:
eth1 Link encap:Ethernet HWaddr 00:1B:21:01:7E:71
inet addr:192.168.1.4 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21b:21ff:fe01:7e71/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:63200415 errors:0 dropped:0 overruns:0 frame:0
TX packets:36713697 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1072442662 (1022.7 MiB) TX bytes:2946550726 (2.7 GiB)
Base address:0xa400 Memory:f1800000-f1820000
> are the
> link status and negotiated speed and duplex mode correct (ethtool
> <ethX>) ? If you are connected to a switch, you might need to check
> the speed/mode negotiation and link status from the switch side too.
zan:~# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: umbg
Wake-on: g
Current message level: 0x00000007 (7)
Link detected: yes
jayna:~# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: umbg
Wake-on: g
Current message level: 0x00000007 (7)
Link detected: yes
> You might also want to check for networking related kernel messages in
> dmesg/syslog (things like "ethX down", "NETDEV transmit timeout",
> "e1000 error", etc.)
Well, the dmesg log doesn't match on both nodes. zan is showing a 100 Mbps
connection while jayna is showing a 1000 Mbps connection. The lights on the
network cards both indicate a 1000 Mbps connection. Any ideas on this?
zan:~# grep e1000 /var/log/*
/var/log/dmesg:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit)
00:1b:21:01:7f:6b
/var/log/dmesg:e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
/var/log/dmesg:e1000: eth1: e1000_watchdog: NIC Link is Up 100 Mbps Full
Duplex
/var/log/dmesg.0:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit)
00:1b:21:01:7f:6b
/var/log/dmesg.0:e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network
Connection
/var/log/dmesg.0:e1000: eth1: e1000_watchdog: NIC Link is Up 100 Mbps Full
Duplex
jayna:~# grep e1000 /var/log/*
/var/log/dmesg:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit)
00:1b:21:01:7e:71
/var/log/dmesg:e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
/var/log/dmesg:e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full
Duplex
/var/log/dmesg.0:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit)
00:1b:21:01:7e:71
/var/log/dmesg.0:e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network
Connection
I'm using the kernel driver for these cards. Should I be using the latest
Intel driver?
Thanks,
Tom