[DRBD-user] "PingAck timeout" on system with multiple resources

Cédric Dufour - Idiap Research Institute cedric.dufour at idiap.ch
Tue Feb 4 11:25:40 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello again,

After comparing DRBD 8.3 and 8.4 source code, I see that conditional TCP_CORK-ing remains to be done in 8.4. Can it be the reason why we experiment PingAck problems on idle resources ?

PS: our cluster was running DRBD 8.3 beforehands and we had no such problem... but we were also using Infiniband SDP instead of IPoIB (so we can not know whether the problem really lies with DRBD).

Thanks for your insights,

Cédric

On 02/02/14 21:29, Cédric Dufour - Idiap Research Institute wrote:
> Hello,
>
> We are experiencing "PingAck timeout" on a system where multiple DRBD resources are configured (more exactly a pair of active/active Lustre MDS servers):
>
> A --- drbd0 --- B  [nfs-data] idle
> A --- drbd1 --- B  [nfs-apps] idle
> A --- drbd2 --- B  [nfs-tmp] idle
> A --> drbd3 --> B  [mdt1] heavy load
> A <-- drbd4 <-- B  [mdt2] heavy load
> A --- drbd5 --- B  [mgs] idle
>
> Our environment is DRBD 8.4.4, with "ping-int = 10s" and "ping-timeout = 25" (2.5s)
>
> The link between the two servers is 20Gb/s Infiniband (configured in datagram mode).
>
> Strangely, the timeout occurs on an idle resource (e.g. drbd1) when two of the other resources ('mdt1' and 'mdt2') are heavily loaded (and displaying no connection/timeout problem what-so-ever).
>
> Looking at the source code, I believe that DRBD cannot know about the potentially "congested" link (because of the heavily loaded resources, 'mdt1' and 'mdt2') and the potentially resulting PingAck timeout it may spawn for another idle one (e.g. 'drbd1'). Am I right ?
>
> Is there a way to circumvent this problem ?
>
> Thanks and best,
>
> Cédric Dufour
> -- 
>
> Cédric Dufour @ Idiap Research Institute
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140204/5f6d8853/attachment.htm>


More information about the drbd-user mailing list