[DRBD-user] Changing server's IP's

Mon Mar 12 19:09:38 CET 2012

One last thing:

http://sourceforge.net/tracker/index.php?func=detail&aid=3486658&group_id=42302&atid=447449

Looks like a solution may be in the next release of the driver?

-M

On Mon, Mar 12, 2012 at 2:05 PM, Mark Deneen <mdeneen at gmail.com> wrote:
> Marcelo,
>
> I experienced difficulties using ubuntu-server 10.04 with the e1000e
> driver resetting the adapter.  I updated to 1.6.3 at first, and things
> have been fairly smooth since then.
>
> However, I just looked through the logs and saw this:
>
> [4542575.651842] e1000e 0000:01:00.0: eth1: Detected Hardware Unit Hang:
> [4542575.651845]   TDH                  <29>
> [4542575.651847]   TDT                  <2a>
> [4542575.651848]   next_to_use          <2a>
> [4542575.651850]   next_to_clean        <28>
> [4542575.651851] buffer_info[next_to_clean]:
> [4542575.651852]   time_stamp           <11b2455d4>
> [4542575.651854]   next_to_watch        <29>
> [4542575.651855]   jiffies              <11b245642>
> [4542575.651858]   next_to_watch.status <0>
> [4542575.651860] MAC Status             <80383>
> [4542575.651862] PHY Status             <792d>
> [4542575.651864] PHY 1000BASE-T Status  <3800>
> [4542575.651866] PHY Extended Status    <3000>
> [4542575.651869] PCI Status             <10>
>
> Now, this was half of a bonded pair, so services remained up, but I
> was kind of bummed to see that it still happened.
>
> With that being said, the adapter _recovered_, which is something that
> it did not do with the in-tree driver.  I'd have to unplug the cable,
> plug it in again and reconfigure the interface to get things working
> again in that situation.
>
> I have the following adapters in this server pair:
>
> 00:19.0 Ethernet controller: Intel Corporation 82578DC Gigabit Network
> Connection (rev 06)
> 01:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
> Ethernet Controller (rev 06)
> 01:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit
> Ethernet Controller (rev 06)
>
> ...and I'm running kernel 3.0.4 vanilla, with scst 2.2 zero-copy
> patches and drbd 8.3.12
>
> Looking through older logs, it appears as if there have been a handful
> of hangs, always with eth1.
>
> So, while I would like to be more helpful, it appears as if I'm in a
> similar boat as you are.  :)  The only thing that is saving me right
> now is the fact that the connection between servers is bonded.
>
> -M
>
> On Mon, Mar 12, 2012 at 12:06 PM, Marcelo Pereira <marcelops at gmail.com> wrote:
>> Hi Mark,
>>
>> Please, tell me more about this.
>>
>> I have been struggling with my e1000e NIC, and the cluster isn't synching at
>> all. It goes until 2.8% (out of 16Tb) and the connection simply drops off.
>> That is why I want to use another NIC, which use a chipset other that this
>> e1000e.
>>
>> I have already updated the driver to 1.9.5, and it's exactly what you said:
>> "random nic resets". I thought it was (1) the cables, (2) the switch ports
>> and finally (3) the switch itself. I have changed all these hardware, and
>> still can't sync.
>>
>> Tell more about this issue with the Intel e1000e, please!
>>
>> Thanks,
>> Marcelo
>>
>> On Mon, Mar 12, 2012 at 10:41 AM, Mark Deneen <mdeneen at gmail.com> wrote:
>>>
>>> Also, please check your e1000e driver version and update to the latest
>>> stable version (out of tree).  With the in-tree revision, we suffered
>>> from random nic resets under certain conditions, which really gave
>>> DRBD headaches.
>>>
>>> -M
>>>
>>> On Fri, Mar 9, 2012 at 11:37 AM, Marcelo Pereira <marcelops at gmail.com>
>>> wrote:
>>> > Hello guys,
>>> >
>>> > I have been facing a problem with a server who can't sync at all due to
>>> > network issues. The NIC (two Intel e1000e) is showing several errors and
>>> > is
>>> > being dropped off. Sometimes at the primary node, sometimes at the
>>> > secondary. It will not sync.
>>> >
>>> > I have an unused NIC on both servers, and I would like to use them
>>> > instead.
>>> > What should I do to move the configuration from one set of IP's to
>>> > another??
>>> >
>>> > So, that is my current /etc/drbd.conf:
>>> >
>>> > global { usage-count no; }
>>> > resource pdc0 {
>>> >   protocol C;
>>> >   startup { wfc-timeout 0; degr-wfc-timeout 120; }
>>> >   disk { on-io-error detach; } # ACK!
>>> >   net { cram-hmac-alg "sha1"; shared-secret "xxxxxxx"; }
>>> >   syncer { rate 10M; verify-alg sha1; }
>>> >   on node0 {
>>> >     device /dev/drbd0;
>>> >     disk /dev/sdb;
>>> >     address 192.168.69.1:7788;
>>> >     meta-disk internal;
>>> >   }
>>> >   on node1 {
>>> >     device /dev/drbd0;
>>> >     disk /dev/sdc;
>>> >     address 192.168.69.2:7788;
>>> >     meta-disk internal;
>>> >   }
>>> > }
>>> >
>>> > And that is what it should be after:
>>> >
>>> > global { usage-count no; }
>>> > resource pdc0 {
>>> >   protocol C;
>>> >   startup { wfc-timeout 0; degr-wfc-timeout 120; }
>>> >   disk { on-io-error detach; } # ACK!
>>> >   net { cram-hmac-alg "sha1"; shared-secret "xxxxxxx"; }
>>> >   syncer { rate 10M; verify-alg sha1; }
>>> >   on node0 {
>>> >     device /dev/drbd0;
>>> >     disk /dev/sdb;
>>> >     address 10.0.0.1:7788;
>>> >     meta-disk internal;
>>> >   }
>>> >   on node1 {
>>> >     device /dev/drbd0;
>>> >     disk /dev/sdc;
>>> >     address 10.0.0.2:7788;
>>> >     meta-disk internal;
>>> >   }
>>> > }
>>> >
>>> > Is it enough to just change the /etc/drbd.conf on both servers? What is
>>> > the
>>> > exact procedure? I'm using DRBD v8.2.6.
>>> >
>>> > Thanks,
>>> > Marcelo
>>> >
>>> > _______________________________________________
>>> > drbd-user mailing list
>>> > drbd-user at lists.linbit.com
>>> > http://lists.linbit.com/mailman/listinfo/drbd-user
>>> >
>>
>>