[DRBD-user] "PingAck timeout" in a dual active/passive configuration

Fri Mar 7 12:03:00 CET 2014

On 06/03/2014 13:59, Latrous, Youssef wrote:
>
> Hi Alexandr,
>
> Thank you for the response. I checked our bonding setup and I didn't 
> see any issues (see below for details). We use the "broadcast" mode 
> over cross cables, with no switches in between - direct connection 
> between the two servers, seating side by side, connecting 2 NICs from 
> one node to the other node's NIC cards. Is the broadcast mode the 
> right choice in this configuration? I don't understand the MAC address 
> reference in this context. Does DRBD check this info for Acks? That is 
> if it sends on one NIC and receives on the other NIC it would drop the 
> packet?
>
Why are you using broadcast mode? We have the same configuration with 
balance-rr and 3 NICs which works great.

> Also, given that DRBD uses TCP with built-in retransmits, over these 
> cross cables, I really don't see how we could lose packets within the 
> 6 seconds window? Please note that we monitor this network and report 
> any issues (we use pacemaker). We didn't see any issues so far with 
> this network.
>
> As you can notice, I'm a bit lost here J
>
> Thank you,
>
> Youssef
>
> PS. Here is our bond setup for this HA network.
>
> --
>
> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
>
> Bonding Mode: fault-tolerance (broadcast)
>
> MII Status: up
>
> MII Polling Interval (ms): 100
>
> Up Delay (ms): 0
>
> Down Delay (ms): 0
>
> Slave Interface: eth0
>
> MII Status: up
>
> Speed: 1000 Mbps
>
> Duplex: full
>
> Link Failure Count: 0
>
> Permanent HW addr: c8:0a:a9:f1:a9:82
>
> Slave queue ID: 0
>
> Slave Interface: eth4
>
> MII Status: up
>
> Speed: 1000 Mbps
>
> Duplex: full
>
> Link Failure Count: 0
>
> Permanent HW addr: c8:0a:a9:f1:a9:84
>
> Slave queue ID: 0
>
> Youssef,
>   
> Check your bonding mode!
> It apperes that you loose packets, this can be because the mode is wrong or
> MAC addresses wrong.
>   
> Best regards,
> Alexandr A. Alexandrov
>   
>   
> 2014-03-06 0:38 GMT+04:00 Latrous, Youssef <YLatrous at broadviewnet.com  <http://lists.linbit.com/mailman/listinfo/drbd-user>>:
>   
> >/   Hello,/
> >/  /
> >/  /
> >/  /
> >/  We are currently experiencing a weird "PingAck" timeout on a system with/
> >/  two nodes, and an active/passive configuration. The two nodes are using a/
> >/  cross-cabled connection in a bonded two Giga NIC cards. This network never/
> >/  goes down and used only for DRDB and CRM cluster data exchange. It's barely/
> >/  used (very light load). We are running SLES 11 SP2, DRBD release 8.4.2, and/
> >/  pacemaker 1.1.7./
> >/  /
> >/  /
> >/  /
> >/  We couldn't find a DRBD configuration option to setup the number of/
> >/  retries before giving up./
> >/  /
> >/  /
> >/  /
> >/  Our concern is that we do not understand how a PingAck can timeout over/
> >/  such a reliable media? Any insight into this would be much appreciated./
> >/  /
> >/  /
> >/  /
> >/  On the same note, are there any guards against it? Any best practices/
> >/  (setups) we could use to avoid this situation?/
> >/  /
> >/  /
> >/  /
> >/  Thanks for any help,/
> >/  /
> >/  /
> >/  /
> >/  Youssef/
> >/  /
> >/  /
> >/  /
> >/  _______________________________________________/
> >/  drbd-user mailing list/
> >/  drbd-user at lists.linbit.com  <http://lists.linbit.com/mailman/listinfo/drbd-user>/
> >/  http://lists.linbit.com/mailman/listinfo/drbd-user/
> >/  /
> >/  /
>   
>   
> -- 
> ? ?????????, ???.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140306/2544fc77/attachment.htm>
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140307/45bdcc38/attachment.htm>