[Csync2] SSH mode, push vs. pull

Mon Apr 8 23:34:34 CEST 2013

I read some of the list archives discussing an SSH mode, for example:
http://lists.linbit.com/pipermail/csync2/2012-September/000911.html
I might be able to work on this if we can agree on desired features.

Obviously, I'd particularly love feedback from Lars, as gatekeeper. :)

I have a potential use case where I have constraints that I don't control.
These constraints would get in the way of using csync2 as it stands, but
so far it seems to me that csync2 is the closest match for what I need;
it implements the core functionality and connectivity is the only real
issue. I'll describe the constraints first, and then suggest what looks
to me like a solution.

I have a set of systems that are graphically distributed and generally
network-isolated, containing millions of files in several tens of thousands
(perhaps growing to hundreds of thousands) of directories that I need to
keep in sync, including honoring deletes, across all the systems.

The content needs to be eventually consistent, but temporary
inconsistency is OK.  I have a central location from which I can SSH
out to the distributed nodes, but not SSH in from the distributed nodes
to the central location; it is a hub-and-spoke architecture. I need
changes to flow into the hub from the distributed nodes, and out of the
hub to the distributed nodes. (Fortunately, "younger" conflict resolution
is always appropriate in my use case.)

Furthermore, the distributed nodes cannot SSH to each other.  All the
SSH connections go through NAT in pools, so $SSH_CLIENT is neither
reliable (recognizable from the initiator's standpoint) nor stable (it
might change from invocation to invocation). I can't open up new ports.

Finally, because these systems are parts of precisely replicated
cluster deployments, each system actually has the same
hostname (they need to appear identical internally no matter
which geographically distributed cluster a user is assigned to).

(My current thought is to use lsyncd to add entries to the hint table on
each system, but I'm still investigating that.)

If it weren't for the incompatibility between the system constraints and
csync2's policy and transport implementation, I would implement this by
having each distributed node have a configuration that lists only the hub
node and itself as hosts, and having the hub node list all the nodes, then
from the hub, orchestrate invoking csync2 alternately on each of the
distributed nodes in turn, and on the hub node.

Neither the transport limitations (bidirectional connectivity) nor the policy
limitations (e.g. reverse lookups required) seem central to csync2's design,
though.

What I'd be interested in is:

* The ability to specify hosts using syntax like:
  host host at ssh://hostname
  It's not clear to me how much of the URL syntax to honor.  But at
  least using the URL-style prefix would be unlikely to clash with
  real hostnames. Saying that it will use URL syntax at least
  reserves the space to add things like port specification via URL
  syntax at some point if desired.

* I think I understand from the previous list discussion that SSH should
  not do SSL because SSH will have been configured to do whatever
  authentication and encryption desired. That's clearly what I would
  want. That is, SSH should imply nossl always.

* Two ssh modes, separating "which end is pushing changes" from
  "which node initiated the network connection"

  - One mode in which it invokes csync2 -i on the remote system, where
    it acts very much like the current network port access.

  - Another mode ("pull mode" or "reverse mode" though I don't know what
    command line switch to use for it) that invokes csync2 on the remote
    with the same command line options (except the pull mode switch itself)
    and runs MODE_INETD locally, so csync2 functions as if the remote
    end had initiated the connection.

* An option to tell the server what the client wants the server to
  consider its hostname to be. I would see this as a new protocol
  command HOSTNAME that overrides myhostname, turned on
  by a new configuration item, and off by default. (This would solve
  both my "identical hostname" and "can't use $SSH_CLIENT
  issues.)

Lars, what do you think?  Worth more investigation on my part?

(I'm not asking you to do work other than merge changes, and that
only if you agree. I'm just not promising to deliver before being certain
that this will work...)

Thanks...