Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Saturday 04 August 2007 02:05, Jérôme Augé wrote: > 2007/8/3, Tom Brown <brown at esteem.com>: > > I have a problem with network cards failing for resource 0 (r0). I > > thought it was the cheap network cards in both nodes. So, I replaced them > > with Intel Pro/1000 Gb cards. The connection worked at first and the sync > > finished without a problem. Then, after a few days, the connection went > > back to Primary/Unknown. I can't get ping through on that interface > > either. When I replaced the network cards I moved things around so the > > network cards for r0 were in a different pci slot. Any ideas on what may > > be going on here? Is this hardware issue? If so, any suggestions on a pci > > network card to use? > > Hi, > > First, how are your machines connected : with a crossover cable or > with a switch ? A crossover cable. > When your problem happen, do you see incoming traffic on your network > interface (tcpdump -nli <ethX>) ? All I see are arps on zan (primary). jayna (secondary) has some odd traffic on eth1. zan:~# tcpdump -nli eth1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes 08:17:43.882316 arp who-has 192.168.1.4 tell 192.168.1.3 08:17:44.882350 arp who-has 192.168.1.4 tell 192.168.1.3 08:17:45.882419 arp who-has 192.168.1.4 tell 192.168.1.3 08:17:46.986483 arp who-has 192.168.1.4 tell 192.168.1.3 08:17:47.986556 arp who-has 192.168.1.4 tell 192.168.1.3 ... jayna:~# tcpdump -nli eth1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes 08:41:35.274218 IP 192.168.1.4.2356 > 192.168.1.3.7788: S 1832587988:1832587988(0) win 5840 <mss 1460,sackOK,timestamp 185904360 0,nop,wscale 3> 08:41:35.274272 00:1b:21:01:7f:6b > 01:80:c2:00:00:01, ethertype Unknown (0x8808), length 60: 0x0000: 0001 0680 0000 0000 0000 0000 0000 0000 ................ 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 08:41:40.274423 arp who-has 192.168.1.3 tell 192.168.1.4 08:41:40.274525 00:1b:21:01:7f:6b > 01:80:c2:00:00:01, ethertype Unknown (0x8808), length 60: 0x0000: 0001 0680 0000 0000 0000 0000 0000 0000 ................ 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 08:41:41.274467 arp who-has 192.168.1.3 tell 192.168.1.4 08:41:41.274494 IP 192.168.1.4.2356 > 192.168.1.3.7788: S 1832587988:1832587988(0) win 5840 <mss 1460,sackOK,timestamp 185905860 0,nop,wscale 3> 08:41:41.274580 00:1b:21:01:7f:6b > 01:80:c2:00:00:01, ethertype Unknown (0x8808), length 60: 0x0000: 0001 0680 0000 0000 0000 0000 0000 0000 ................ 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. ... > do you have RX/TX > drops/errors/overruns in the output of ifconfig on your ethX ? zan: eth1 Link encap:Ethernet HWaddr 00:1B:21:01:7F:6B inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::21b:21ff:fe01:7f6b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:36440925 errors:0 dropped:272703 overruns:0 frame:0 TX packets:63201611 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2928021626 (2.7 GiB) TX bytes:1072467626 (1022.7 MiB) Base address:0xa400 Memory:f1800000-f1820000 jayna: eth1 Link encap:Ethernet HWaddr 00:1B:21:01:7E:71 inet addr:192.168.1.4 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::21b:21ff:fe01:7e71/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:63200415 errors:0 dropped:0 overruns:0 frame:0 TX packets:36713697 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1072442662 (1022.7 MiB) TX bytes:2946550726 (2.7 GiB) Base address:0xa400 Memory:f1800000-f1820000 > are the > link status and negotiated speed and duplex mode correct (ethtool > <ethX>) ? If you are connected to a switch, you might need to check > the speed/mode negotiation and link status from the switch side too. zan:~# ethtool eth1 Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: g Current message level: 0x00000007 (7) Link detected: yes jayna:~# ethtool eth1 Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: g Current message level: 0x00000007 (7) Link detected: yes > You might also want to check for networking related kernel messages in > dmesg/syslog (things like "ethX down", "NETDEV transmit timeout", > "e1000 error", etc.) Well, the dmesg log doesn't match on both nodes. zan is showing a 100 Mbps connection while jayna is showing a 1000 Mbps connection. The lights on the network cards both indicate a 1000 Mbps connection. Any ideas on this? zan:~# grep e1000 /var/log/* /var/log/dmesg:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit) 00:1b:21:01:7f:6b /var/log/dmesg:e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection /var/log/dmesg:e1000: eth1: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex /var/log/dmesg.0:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit) 00:1b:21:01:7f:6b /var/log/dmesg.0:e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection /var/log/dmesg.0:e1000: eth1: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex jayna:~# grep e1000 /var/log/* /var/log/dmesg:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit) 00:1b:21:01:7e:71 /var/log/dmesg:e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection /var/log/dmesg:e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex /var/log/dmesg.0:e1000: 0000:00:0d.0: e1000_probe: (PCI:33MHz:32-bit) 00:1b:21:01:7e:71 /var/log/dmesg.0:e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection I'm using the kernel driver for these cards. Should I be using the latest Intel driver? Thanks, Tom