Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
One last thing: http://sourceforge.net/tracker/index.php?func=detail&aid=3486658&group_id=42302&atid=447449 Looks like a solution may be in the next release of the driver? -M On Mon, Mar 12, 2012 at 2:05 PM, Mark Deneen <mdeneen at gmail.com> wrote: > Marcelo, > > I experienced difficulties using ubuntu-server 10.04 with the e1000e > driver resetting the adapter. I updated to 1.6.3 at first, and things > have been fairly smooth since then. > > However, I just looked through the logs and saw this: > > [4542575.651842] e1000e 0000:01:00.0: eth1: Detected Hardware Unit Hang: > [4542575.651845] TDH <29> > [4542575.651847] TDT <2a> > [4542575.651848] next_to_use <2a> > [4542575.651850] next_to_clean <28> > [4542575.651851] buffer_info[next_to_clean]: > [4542575.651852] time_stamp <11b2455d4> > [4542575.651854] next_to_watch <29> > [4542575.651855] jiffies <11b245642> > [4542575.651858] next_to_watch.status <0> > [4542575.651860] MAC Status <80383> > [4542575.651862] PHY Status <792d> > [4542575.651864] PHY 1000BASE-T Status <3800> > [4542575.651866] PHY Extended Status <3000> > [4542575.651869] PCI Status <10> > > Now, this was half of a bonded pair, so services remained up, but I > was kind of bummed to see that it still happened. > > With that being said, the adapter _recovered_, which is something that > it did not do with the in-tree driver. I'd have to unplug the cable, > plug it in again and reconfigure the interface to get things working > again in that situation. > > I have the following adapters in this server pair: > > 00:19.0 Ethernet controller: Intel Corporation 82578DC Gigabit Network > Connection (rev 06) > 01:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit > Ethernet Controller (rev 06) > 01:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit > Ethernet Controller (rev 06) > > ...and I'm running kernel 3.0.4 vanilla, with scst 2.2 zero-copy > patches and drbd 8.3.12 > > Looking through older logs, it appears as if there have been a handful > of hangs, always with eth1. > > So, while I would like to be more helpful, it appears as if I'm in a > similar boat as you are. :) The only thing that is saving me right > now is the fact that the connection between servers is bonded. > > -M > > On Mon, Mar 12, 2012 at 12:06 PM, Marcelo Pereira <marcelops at gmail.com> wrote: >> Hi Mark, >> >> Please, tell me more about this. >> >> I have been struggling with my e1000e NIC, and the cluster isn't synching at >> all. It goes until 2.8% (out of 16Tb) and the connection simply drops off. >> That is why I want to use another NIC, which use a chipset other that this >> e1000e. >> >> I have already updated the driver to 1.9.5, and it's exactly what you said: >> "random nic resets". I thought it was (1) the cables, (2) the switch ports >> and finally (3) the switch itself. I have changed all these hardware, and >> still can't sync. >> >> Tell more about this issue with the Intel e1000e, please! >> >> Thanks, >> Marcelo >> >> On Mon, Mar 12, 2012 at 10:41 AM, Mark Deneen <mdeneen at gmail.com> wrote: >>> >>> Also, please check your e1000e driver version and update to the latest >>> stable version (out of tree). With the in-tree revision, we suffered >>> from random nic resets under certain conditions, which really gave >>> DRBD headaches. >>> >>> -M >>> >>> On Fri, Mar 9, 2012 at 11:37 AM, Marcelo Pereira <marcelops at gmail.com> >>> wrote: >>> > Hello guys, >>> > >>> > I have been facing a problem with a server who can't sync at all due to >>> > network issues. The NIC (two Intel e1000e) is showing several errors and >>> > is >>> > being dropped off. Sometimes at the primary node, sometimes at the >>> > secondary. It will not sync. >>> > >>> > I have an unused NIC on both servers, and I would like to use them >>> > instead. >>> > What should I do to move the configuration from one set of IP's to >>> > another?? >>> > >>> > So, that is my current /etc/drbd.conf: >>> > >>> > global { usage-count no; } >>> > resource pdc0 { >>> > protocol C; >>> > startup { wfc-timeout 0; degr-wfc-timeout 120; } >>> > disk { on-io-error detach; } # ACK! >>> > net { cram-hmac-alg "sha1"; shared-secret "xxxxxxx"; } >>> > syncer { rate 10M; verify-alg sha1; } >>> > on node0 { >>> > device /dev/drbd0; >>> > disk /dev/sdb; >>> > address 192.168.69.1:7788; >>> > meta-disk internal; >>> > } >>> > on node1 { >>> > device /dev/drbd0; >>> > disk /dev/sdc; >>> > address 192.168.69.2:7788; >>> > meta-disk internal; >>> > } >>> > } >>> > >>> > And that is what it should be after: >>> > >>> > global { usage-count no; } >>> > resource pdc0 { >>> > protocol C; >>> > startup { wfc-timeout 0; degr-wfc-timeout 120; } >>> > disk { on-io-error detach; } # ACK! >>> > net { cram-hmac-alg "sha1"; shared-secret "xxxxxxx"; } >>> > syncer { rate 10M; verify-alg sha1; } >>> > on node0 { >>> > device /dev/drbd0; >>> > disk /dev/sdb; >>> > address 10.0.0.1:7788; >>> > meta-disk internal; >>> > } >>> > on node1 { >>> > device /dev/drbd0; >>> > disk /dev/sdc; >>> > address 10.0.0.2:7788; >>> > meta-disk internal; >>> > } >>> > } >>> > >>> > Is it enough to just change the /etc/drbd.conf on both servers? What is >>> > the >>> > exact procedure? I'm using DRBD v8.2.6. >>> > >>> > Thanks, >>> > Marcelo >>> > >>> > _______________________________________________ >>> > drbd-user mailing list >>> > drbd-user at lists.linbit.com >>> > http://lists.linbit.com/mailman/listinfo/drbd-user >>> > >> >>