Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Marcelo, I experienced difficulties using ubuntu-server 10.04 with the e1000e driver resetting the adapter. I updated to 1.6.3 at first, and things have been fairly smooth since then. However, I just looked through the logs and saw this: [4542575.651842] e1000e 0000:01:00.0: eth1: Detected Hardware Unit Hang: [4542575.651845] TDH <29> [4542575.651847] TDT <2a> [4542575.651848] next_to_use <2a> [4542575.651850] next_to_clean <28> [4542575.651851] buffer_info[next_to_clean]: [4542575.651852] time_stamp <11b2455d4> [4542575.651854] next_to_watch <29> [4542575.651855] jiffies <11b245642> [4542575.651858] next_to_watch.status <0> [4542575.651860] MAC Status <80383> [4542575.651862] PHY Status <792d> [4542575.651864] PHY 1000BASE-T Status <3800> [4542575.651866] PHY Extended Status <3000> [4542575.651869] PCI Status <10> Now, this was half of a bonded pair, so services remained up, but I was kind of bummed to see that it still happened. With that being said, the adapter _recovered_, which is something that it did not do with the in-tree driver. I'd have to unplug the cable, plug it in again and reconfigure the interface to get things working again in that situation. I have the following adapters in this server pair: 00:19.0 Ethernet controller: Intel Corporation 82578DC Gigabit Network Connection (rev 06) 01:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 01:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) ...and I'm running kernel 3.0.4 vanilla, with scst 2.2 zero-copy patches and drbd 8.3.12 Looking through older logs, it appears as if there have been a handful of hangs, always with eth1. So, while I would like to be more helpful, it appears as if I'm in a similar boat as you are. :) The only thing that is saving me right now is the fact that the connection between servers is bonded. -M On Mon, Mar 12, 2012 at 12:06 PM, Marcelo Pereira <marcelops at gmail.com> wrote: > Hi Mark, > > Please, tell me more about this. > > I have been struggling with my e1000e NIC, and the cluster isn't synching at > all. It goes until 2.8% (out of 16Tb) and the connection simply drops off. > That is why I want to use another NIC, which use a chipset other that this > e1000e. > > I have already updated the driver to 1.9.5, and it's exactly what you said: > "random nic resets". I thought it was (1) the cables, (2) the switch ports > and finally (3) the switch itself. I have changed all these hardware, and > still can't sync. > > Tell more about this issue with the Intel e1000e, please! > > Thanks, > Marcelo > > On Mon, Mar 12, 2012 at 10:41 AM, Mark Deneen <mdeneen at gmail.com> wrote: >> >> Also, please check your e1000e driver version and update to the latest >> stable version (out of tree). With the in-tree revision, we suffered >> from random nic resets under certain conditions, which really gave >> DRBD headaches. >> >> -M >> >> On Fri, Mar 9, 2012 at 11:37 AM, Marcelo Pereira <marcelops at gmail.com> >> wrote: >> > Hello guys, >> > >> > I have been facing a problem with a server who can't sync at all due to >> > network issues. The NIC (two Intel e1000e) is showing several errors and >> > is >> > being dropped off. Sometimes at the primary node, sometimes at the >> > secondary. It will not sync. >> > >> > I have an unused NIC on both servers, and I would like to use them >> > instead. >> > What should I do to move the configuration from one set of IP's to >> > another?? >> > >> > So, that is my current /etc/drbd.conf: >> > >> > global { usage-count no; } >> > resource pdc0 { >> > protocol C; >> > startup { wfc-timeout 0; degr-wfc-timeout 120; } >> > disk { on-io-error detach; } # ACK! >> > net { cram-hmac-alg "sha1"; shared-secret "xxxxxxx"; } >> > syncer { rate 10M; verify-alg sha1; } >> > on node0 { >> > device /dev/drbd0; >> > disk /dev/sdb; >> > address 192.168.69.1:7788; >> > meta-disk internal; >> > } >> > on node1 { >> > device /dev/drbd0; >> > disk /dev/sdc; >> > address 192.168.69.2:7788; >> > meta-disk internal; >> > } >> > } >> > >> > And that is what it should be after: >> > >> > global { usage-count no; } >> > resource pdc0 { >> > protocol C; >> > startup { wfc-timeout 0; degr-wfc-timeout 120; } >> > disk { on-io-error detach; } # ACK! >> > net { cram-hmac-alg "sha1"; shared-secret "xxxxxxx"; } >> > syncer { rate 10M; verify-alg sha1; } >> > on node0 { >> > device /dev/drbd0; >> > disk /dev/sdb; >> > address 10.0.0.1:7788; >> > meta-disk internal; >> > } >> > on node1 { >> > device /dev/drbd0; >> > disk /dev/sdc; >> > address 10.0.0.2:7788; >> > meta-disk internal; >> > } >> > } >> > >> > Is it enough to just change the /etc/drbd.conf on both servers? What is >> > the >> > exact procedure? I'm using DRBD v8.2.6. >> > >> > Thanks, >> > Marcelo >> > >> > _______________________________________________ >> > drbd-user mailing list >> > drbd-user at lists.linbit.com >> > http://lists.linbit.com/mailman/listinfo/drbd-user >> > > >