[Csync2] SSH mode, push vs. pull
Michael Johnson
mikjo.sas at gmail.com
Mon Apr 8 23:34:34 CEST 2013
I read some of the list archives discussing an SSH mode, for example:
http://lists.linbit.com/pipermail/csync2/2012-September/000911.html
I might be able to work on this if we can agree on desired features.
Obviously, I'd particularly love feedback from Lars, as gatekeeper. :)
I have a potential use case where I have constraints that I don't control.
These constraints would get in the way of using csync2 as it stands, but
so far it seems to me that csync2 is the closest match for what I need;
it implements the core functionality and connectivity is the only real
issue. I'll describe the constraints first, and then suggest what looks
to me like a solution.
I have a set of systems that are graphically distributed and generally
network-isolated, containing millions of files in several tens of thousands
(perhaps growing to hundreds of thousands) of directories that I need to
keep in sync, including honoring deletes, across all the systems.
The content needs to be eventually consistent, but temporary
inconsistency is OK. I have a central location from which I can SSH
out to the distributed nodes, but not SSH in from the distributed nodes
to the central location; it is a hub-and-spoke architecture. I need
changes to flow into the hub from the distributed nodes, and out of the
hub to the distributed nodes. (Fortunately, "younger" conflict resolution
is always appropriate in my use case.)
Furthermore, the distributed nodes cannot SSH to each other. All the
SSH connections go through NAT in pools, so $SSH_CLIENT is neither
reliable (recognizable from the initiator's standpoint) nor stable (it
might change from invocation to invocation). I can't open up new ports.
Finally, because these systems are parts of precisely replicated
cluster deployments, each system actually has the same
hostname (they need to appear identical internally no matter
which geographically distributed cluster a user is assigned to).
(My current thought is to use lsyncd to add entries to the hint table on
each system, but I'm still investigating that.)
If it weren't for the incompatibility between the system constraints and
csync2's policy and transport implementation, I would implement this by
having each distributed node have a configuration that lists only the hub
node and itself as hosts, and having the hub node list all the nodes, then
from the hub, orchestrate invoking csync2 alternately on each of the
distributed nodes in turn, and on the hub node.
Neither the transport limitations (bidirectional connectivity) nor the policy
limitations (e.g. reverse lookups required) seem central to csync2's design,
though.
What I'd be interested in is:
* The ability to specify hosts using syntax like:
host host at ssh://hostname
It's not clear to me how much of the URL syntax to honor. But at
least using the URL-style prefix would be unlikely to clash with
real hostnames. Saying that it will use URL syntax at least
reserves the space to add things like port specification via URL
syntax at some point if desired.
* I think I understand from the previous list discussion that SSH should
not do SSL because SSH will have been configured to do whatever
authentication and encryption desired. That's clearly what I would
want. That is, SSH should imply nossl always.
* Two ssh modes, separating "which end is pushing changes" from
"which node initiated the network connection"
- One mode in which it invokes csync2 -i on the remote system, where
it acts very much like the current network port access.
- Another mode ("pull mode" or "reverse mode" though I don't know what
command line switch to use for it) that invokes csync2 on the remote
with the same command line options (except the pull mode switch itself)
and runs MODE_INETD locally, so csync2 functions as if the remote
end had initiated the connection.
* An option to tell the server what the client wants the server to
consider its hostname to be. I would see this as a new protocol
command HOSTNAME that overrides myhostname, turned on
by a new configuration item, and off by default. (This would solve
both my "identical hostname" and "can't use $SSH_CLIENT
issues.)
Lars, what do you think? Worth more investigation on my part?
(I'm not asking you to do work other than merge changes, and that
only if you agree. I'm just not promising to deliver before being certain
that this will work...)
Thanks...
More information about the Csync2
mailing list