Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Mar 17, 2009 at 10:15:36AM +0100, Lars Ellenberg wrote: > On Tue, Mar 17, 2009 at 09:57:48AM +0100, Iustin Pop wrote: > > Hi, > > > > I've searched the docs (and slightly the sources) but I can't find a > > definitive answer about the following. Note that I'm using DRBD 8.0.12, > > not latest version. > > > > The man page for drbdsetup makes it sound like both ends of a drbd pair > > will ping the other in case of no activity. However, in practice on an > > unused device, only one of the two TCP connections that is used by the > > DRBD pair sees pings; the other one sees no traffic at all. > > right. > "DRBD Pings" are only send via the "meta connection", > not the "data connetion". Aha, now I understand why there are two TCP connections (and the reasons why the meta connections are teared down nicely, but not the data ones). Thanks. > > (This would not usually be a problem, except that if one has iptables > > and conntrack enabled on the machine, and the device is not used for > > long enough time, the second TCP connection will be forgotten by the > > conntrack module) > > Ouch. > > hm. are there tcp keepalive packets? There would be, if only DRBD would enable them; but a quick “git grep SO_KEEPALIVE” returns nothing. > or should be do our in DRBD protocol keepalive as well? If it would be possible to enable pings on the data connection as well, that would be better I think, since DRBD can detect if the actual remote DRBD thread is responding correctly (and not just its TCP stack). I also don't know how DRBD would react to its connection being shutdown by the TCP stack when not having activity (as opposed to a send error during activity). But the TCP keepalive is indeed a quicker and smaller change (just setsockopt(..., SO_KEEPALIVE)) than changing the DRBD data protocol. Up to you if you think could be a common enough occurrence that it needs handling by DRBD. Anyway, for us I now know how to proceed (since this is the expected behaviour). thanks! iustin