[DRBD-user] "PingAck timeout" in a dual active/passive configuration

Саша Александров shurrman at gmail.com
Fri Mar 7 10:59:44 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Yussef!

Well, I did not dig into details here, since I was involvd in some other
activities. However, we use DRBD+bonding in several deployments, and not
quite long ago we twice had EXACTLY  the same situation that you describe.
We use mode 0 on two 1Gb cross-connected interfaces, testing with iperf
gives ~ 200 Mb/sec transef rate (as expected).
Both situations described below had tha same symptoms: PingAck timeouts,
disconnects/reconnects, sync rate less than 1000 Kb (!).

Situation 1: 2 servers had same MAC addresses on eth0 (we have a software
that has a license bound to MAC of eth0 interface, so we have to deal with
that when moving resource) by mistake.

Situation 2: 2 servers had eth0 connected correctly, however eth1 on 1st
server had link up but connected to some other server, sond to the 2nd
server, and 2nd server had link down on eth1  (having eth0+eth1 bonded).

Both situations led to sympoms you describe.

In situation 1 we just corrected MACs and ofcourse everything got right.
In situation 2 we just put eth1 down on both servers (ifdown eth1) until
cabling issues were resolved, so bond interface had only one interface
active, and this also resolved the issue (ofcourse having speed at 1 Gb)

So you might want to try disabling 1st or 2nd NICs pairs, correspondingly.

Hope this helps

Best regards,
Alexandr A. Alexandrov


2014-03-06 17:59 GMT+04:00 Latrous, Youssef <YLatrous at broadviewnet.com>:

>  Hi Alexandr,
>
>
>
> Thank you for the response. I checked our bonding setup and I didn’t see
> any issues (see below for details). We use the “broadcast” mode over cross
> cables, with no switches in between - direct connection between the two
> servers, seating side by side, connecting 2 NICs from one node to the other
> node’s NIC cards. Is the broadcast mode the right choice in this
> configuration? I don’t understand the MAC address reference in this
> context. Does DRBD check this info for Acks? That is if it sends on one NIC
> and receives on the other NIC it would drop the packet?
>
> Also, given that DRBD uses TCP with built-in retransmits, over these cross
> cables, I really don’t see how we could lose packets within the 6 seconds
> window? Please note that we monitor this network and report any issues (we
> use pacemaker). We didn’t see any issues so far with this network.
>
>
>
> As you can notice, I’m a bit lost here J
>
>
>
> Thank you,
>
>
>
> Youssef
>
>
>
> PS. Here is our bond setup for this HA network.
>
> --
>
>
>
> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
>
>
>
> Bonding Mode: fault-tolerance (broadcast)
>
> MII Status: up
>
> MII Polling Interval (ms): 100
>
> Up Delay (ms): 0
>
> Down Delay (ms): 0
>
>
>
> Slave Interface: eth0
>
> MII Status: up
>
> Speed: 1000 Mbps
>
> Duplex: full
>
> Link Failure Count: 0
>
> Permanent HW addr: c8:0a:a9:f1:a9:82
>
> Slave queue ID: 0
>
>
>
> Slave Interface: eth4
>
> MII Status: up
>
> Speed: 1000 Mbps
>
> Duplex: full
>
> Link Failure Count: 0
>
> Permanent HW addr: c8:0a:a9:f1:a9:84
>
> Slave queue ID: 0
>
>
>
>
>
> Youssef,
>
>
>
> Check your bonding mode!
>
> It apperes that you loose packets, this can be because the mode is wrong or
>
> MAC addresses wrong.
>
>
>
> Best regards,
>
> Alexandr A. Alexandrov
>
>
>
>
>
> 2014-03-06 0:38 GMT+04:00 Latrous, Youssef <YLatrous at broadviewnet.com <http://lists.linbit.com/mailman/listinfo/drbd-user>>:
>
>
>
> >*  Hello,*
>
> >
>
> >
>
> >
>
> >* We are currently experiencing a weird “PingAck” timeout on a system with*
>
> >* two nodes, and an active/passive configuration. The two nodes are using a*
>
> >* cross-cabled connection in a bonded two Giga NIC cards. This network never*
>
> >* goes down and used only for DRDB and CRM cluster data exchange. It’s barely*
>
> >* used (very light load). We are running SLES 11 SP2, DRBD release 8.4.2, and*
>
> >* pacemaker 1.1.7.*
>
> >
>
> >
>
> >
>
> >* We couldn’t find a DRBD configuration option to setup the number of*
>
> >* retries before giving up.*
>
> >
>
> >
>
> >
>
> >* Our concern is that we do not understand how a PingAck can timeout over*
>
> >* such a reliable media? Any insight into this would be much appreciated.*
>
> >
>
> >
>
> >
>
> >* On the same note, are there any guards against it? Any best practices*
>
> >* (setups) we could use to avoid this situation?*
>
> >
>
> >
>
> >
>
> >* Thanks for any help,*
>
> >
>
> >
>
> >
>
> >* Youssef*
>
> >
>
> >
>
> >
>
> >* _______________________________________________*
>
> >* drbd-user mailing list*
>
> >* drbd-user at lists.linbit.com <http://lists.linbit.com/mailman/listinfo/drbd-user>*
>
> >* http://lists.linbit.com/mailman/listinfo/drbd-user <http://lists.linbit.com/mailman/listinfo/drbd-user>*
>
> >
>
> >
>
>
>
>
>
> --
>
> С уважением, ААА.
>
> -------------- next part --------------
>
> An HTML attachment was scrubbed...
>
> URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140306/2544fc77/attachment.htm>
>
>
>
>
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>


-- 
С уважением, ААА.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140307/4afd7faf/attachment.htm>


More information about the drbd-user mailing list