[DRBD-user] "PingAck timeout" in a dual active/passive configuration

Thu Mar 6 14:59:19 CET 2014

Hi Alexandr,

Thank you for the response. I checked our bonding setup and I didn't see any issues (see below for details). We use the "broadcast" mode over cross cables, with no switches in between - direct connection between the two servers, seating side by side, connecting 2 NICs from one node to the other node's NIC cards. Is the broadcast mode the right choice in this configuration? I don't understand the MAC address reference in this context. Does DRBD check this info for Acks? That is if it sends on one NIC and receives on the other NIC it would drop the packet?
Also, given that DRBD uses TCP with built-in retransmits, over these cross cables, I really don't see how we could lose packets within the 6 seconds window? Please note that we monitor this network and report any issues (we use pacemaker). We didn't see any issues so far with this network.

As you can notice, I'm a bit lost here :)

Thank you,

Youssef

PS. Here is our bond setup for this HA network.
--

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (broadcast)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: c8:0a:a9:f1:a9:82
Slave queue ID: 0

Slave Interface: eth4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: c8:0a:a9:f1:a9:84
Slave queue ID: 0

Youssef,

Check your bonding mode!

It apperes that you loose packets, this can be because the mode is wrong or

MAC addresses wrong.

Best regards,

Alexandr A. Alexandrov

2014-03-06 0:38 GMT+04:00 Latrous, Youssef <YLatrous at broadviewnet.com<http://lists.linbit.com/mailman/listinfo/drbd-user>>:

>  Hello,

>

>

>

> We are currently experiencing a weird "PingAck" timeout on a system with

> two nodes, and an active/passive configuration. The two nodes are using a

> cross-cabled connection in a bonded two Giga NIC cards. This network never

> goes down and used only for DRDB and CRM cluster data exchange. It's barely

> used (very light load). We are running SLES 11 SP2, DRBD release 8.4.2, and

> pacemaker 1.1.7.

>

>

>

> We couldn't find a DRBD configuration option to setup the number of

> retries before giving up.

>

>

>

> Our concern is that we do not understand how a PingAck can timeout over

> such a reliable media? Any insight into this would be much appreciated.

>

>

>

> On the same note, are there any guards against it? Any best practices

> (setups) we could use to avoid this situation?

>

>

>

> Thanks for any help,

>

>

>

> Youssef

>

>

>

> _______________________________________________

> drbd-user mailing list

> drbd-user at lists.linbit.com<http://lists.linbit.com/mailman/listinfo/drbd-user>

> http://lists.linbit.com/mailman/listinfo/drbd-user

>

>

--

С уважением, ААА.

-------------- next part --------------

An HTML attachment was scrubbed...

URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140306/2544fc77/attachment.htm>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140306/2b5a220a/attachment.htm>