[DRBD-user] "PingAck timeout" in a dual active/passive configuration

Fri Mar 7 16:45:54 CET 2014

Hello,

We had the same problems, upgrade kernel- and userside to 8.4.4 
resolved this issue.
/---/
*/Best regards,/*
/Eugene Istomin/

On 06/03/2014 13:59, Latrous, Youssef wrote:

Hi Alexandr, 

Thank you for the response. I checked our bonding setup and I didn’t see 
any issues (see below for details). We use the “broadcast” mode over 
cross cables, with no switches in between - direct connection between the 
two servers, seating side by side, connecting 2 NICs from one node to the 
other node’s NIC cards. Is the broadcast mode the right choice in this 
configuration? I don’t understand the MAC address reference in this 
context. Does DRBD check this info for Acks? That is if it sends on one NIC 
and receives on the other NIC it would drop the packet? 
Why are you using broadcast mode? We have the same configuration with 
balance-rr and 3 NICs which works great.

Also, given that DRBD uses TCP with built-in retransmits, over these cross 
cables, I really don’t see how we could lose packets within the 6 seconds 
window? Please note that we monitor this network and report any issues 
(we use pacemaker). We didn’t see any issues so far with this network. 

As you can notice, I’m a bit lost here J 

Thank you, 

Youssef 

PS. Here is our bond setup for this HA network. 
-- 

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) 

Bonding Mode: fault-tolerance (broadcast) 
MII Status: up 
MII Polling Interval (ms): 100 
Up Delay (ms): 0 
Down Delay (ms): 0 

Slave Interface: eth0 
MII Status: up 
Speed: 1000 Mbps 
Duplex: full 
Link Failure Count: 0 
Permanent HW addr: c8:0a:a9:f1:a9:82 
Slave queue ID: 0 

Slave Interface: eth4 
MII Status: up 
Speed: 1000 Mbps 
Duplex: full 
Link Failure Count: 0 
Permanent HW addr: c8:0a:a9:f1:a9:84 
Slave queue ID: 0 

Youssef, 

Check your bonding mode! 
It apperes that you loose packets, this can be because the mode is wrong 
or 
MAC addresses wrong. 

Best regards, 
Alexandr A. Alexandrov 

2014-03-06 0:38 GMT+04:00 Latrous, Youssef <YLatrous at 
broadviewnet.com[1]>: 

>/  Hello,/ 
>/ / 
>/ / 
>/ / 
>/ We are currently experiencing a weird “PingAck” timeout on a system 
with/ 
>/ two nodes, and an active/passive configuration. The two nodes are 
using a/ 
>/ cross-cabled connection in a bonded two Giga NIC cards. This network 
never/ 
>/ goes down and used only for DRDB and CRM cluster data exchange. It’s 
barely/ 
>/ used (very light load). We are running SLES 11 SP2, DRBD release 8.4.2, 
and/ 
>/ pacemaker 1.1.7./ 
>/ / 
>/ / 
>/ / 
>/ We couldn’t find a DRBD configuration option to setup the number of/ 
>/ retries before giving up./ 
>/ / 
>/ / 
>/ / 
>/ Our concern is that we do not understand how a PingAck can timeout 
over/ 
>/ such a reliable media? Any insight into this would be much appreciated./ 
>/ / 
>/ / 
>/ / 
>/ On the same note, are there any guards against it? Any best practices/ 
>/ (setups) we could use to avoid this situation?/ 
>/ / 
>/ / 
>/ / 
>/ Thanks for any help,/ 
>/ / 
>/ / 
>/ / 
>/ Youssef/ 
>/ / 
>/ / 
>/ / 
>/ _______________________________________________/ 
>/ drbd-user mailing list/ 
>/ _drbd-user at lists.linbit.com_/ 
>/ /_/http://lists.linbit.com/mailman/listinfo/drbd-user/_ 
>/ / 
>/ / 

--  
С уважением, ААА. 
-------------- next part -------------- 
An HTML attachment was scrubbed... 
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140306/2544fc77/attachment.htm[2]> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140307/68e343e5/attachment.htm>