[DRBD-user] Question about the 'ping' mechanism

Lars Ellenberg lars.ellenberg at linbit.com
Tue Mar 17 10:39:15 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Mar 17, 2009 at 10:27:21AM +0100, Iustin Pop wrote:
> > > The man page for drbdsetup makes it sound like both ends of a drbd pair
> > > will ping the other in case of no activity. However, in practice on an
> > > unused device, only one of the two TCP connections that is used by the
> > > DRBD pair sees pings; the other one sees no traffic at all.
> > 
> > right.
> > "DRBD Pings" are only send via the "meta connection",
> > not the "data connetion".
> 
> Aha, now I understand why there are two TCP connections (and the reasons
> why the meta connections are teared down nicely, but not the data ones).
> Thanks.
> 
> > > (This would not usually be a problem, except that if one has iptables
> > > and conntrack enabled on the machine, and the device is not used for
> > > long enough time, the second TCP connection will be forgotten by the
> > > conntrack module)
> > 
> > Ouch.
> > 
> > hm. are there tcp keepalive packets?
> 
> There would be, if only DRBD would enable them; but a quick “git grep
> SO_KEEPALIVE” returns nothing.

no, we do not use that option (yet).

> > or should be do our in DRBD protocol keepalive as well?
> 
> If it would be possible to enable pings on the data connection as well,
> that would be better I think, since DRBD can detect if the actual remote
> DRBD thread is responding correctly (and not just its TCP stack). I also
> don't know how DRBD would react to its connection being shutdown by the
> TCP stack when not having activity (as opposed to a send error during
> activity).

it probably won't even notice.
but if you think along that lines,
rather implement a "ping to disk".
have one file / block on the drbd, and write to that every few seconds.
then DRBD can detect if the actual remote _disk_ is responding,
and not just the kernel thread ;-)

and its just a matter of
 cd /into/mounted/drbd
 while sleep 10; do
	dd conv=fsync if=/dev/zero of=__probe__ bs=1b count=1
 done

> But the TCP keepalive is indeed a quicker and smaller change (just
> setsockopt(..., SO_KEEPALIVE)) than changing the DRBD data protocol. Up
> to you if you think could be a common enough occurrence that it needs
> handling by DRBD.
> 
> Anyway, for us I now know how to proceed (since this is the expected
> behaviour).

cheers,

-- 
: Lars Ellenberg                
: LINBIT HA-Solutions GmbH
: DRBD®/HA support and consulting    http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list