[Csync2] SSH mode, push vs. pull

Tue Apr 9 14:38:34 CEST 2013

Hi Michael,
  I really like your proposal and I'd like to share the thoughts I've had
that fit your ideas, I'm just putting them on the table without criticising
yours.

  I have a similar but simpler use case here and a few thoughts on the 2
main limitations.  One limitation, as you mentionned concerns the
connectivity, while the second limitation as you pointed out again is the
host identification.  I have not used csync enough to know if any other
limitations exist (ie concerning filesystem, data, etc)...  I assume not!
 ;)

  My use case is simple:  I have a number of machines on my LAN and one
server (VPS) online.  The only way my VPS can communicate with any node
inside the network is through a number of port forwardings on my router.
 This means that all nodes on the csync2-network is connected to via 2
different addresses except for the one online.  This or using a more
universal black-magic network trickery such as making every node think they
are outside.

  Note that I use two similar words for very different means:  A host is a
physical machine having one address, on one network.  A node is a csync2
virtual entity that correspond in most cases to a host.  A host could be
the home of several nodes (think of groups) and hosts may be identical in
many ways (clusters).  A node is strictly always unique and cloning a node
should be considered (in my idea) most "illegal".

  It is my opinion that the means to connect to the node are irrelevant so
long that connection with the desired node is established.  I like the idea
of using URLs to specify protocol, host address and port, but I would like
also the addition of aliases (think of a laptop being at home, at work or a
a client's facilities, same node, different connections so it appears as a
different host).  The rest of the URL could be used for identification of
the desired "resource" or the remote node (ie. Node A connects physically
to node B with "intent" to contact node C or to contact the ABC group).
 Also, I believe this may complicate things quite a lot more, but I see
some good utility in being able to create a network of node not limited to
one-on-one connections, as long as each node is connected to at least one
node which is itself connected to the rest of the network (a node isolated
with one dialup connection would never finish even "connecting" to a large
group).  The way I see it is to allow and accept "chaos" in the
connectivity: allowing different routes, addresses, allowing the remote end
to "prefer" or suggest another route (think dialup here again).  But this
chaotic freedom is nothing different from how the internet works:  If node
A cannot establish at direct connection to D, it will ask B and C for it.

  Next, I find the way to identify the nodes both to be tedious to maintain
and somewhat insecure.  Lack of security here is not in the sense of
exploit, but perhaps more in terms of accident prevention...  I do not
think that an IP address, or a hostname is sufficient to identify a node.
 They may change voluntarily (update of the network, topology, laptop
location, etc) or accidentally (as is the case for my home connection with
my ISP) and controlling for these changes is part of what I consider
"tedious".  What would be most useful is to establish a node's identity via
the use of public and private SSL keys generated at the moment a new node
is added to the group.  The public keys could be signed by the
administrator's authoritative key.  These keys would be unique for each
machine (virtual or physical) regardless if the hostname, IP or anything
else is the same (if you are maintaining mirrors, then you would just need
those keys to remain different).  If two nodes were to be found with the
same key, that key should be blocked (do not tolerate ambiguity for
identification).  The keys would be used to sign or decrypt some handshake
token and local host would then verify the remote host signed the token
with what corresponds to that node's public key (an md5 of the key could be
put in place of the node's name along with a human readable alias).  This
is what would allow a local host to connect to any remote host it can
regardless "who" they are.  Once connection is established, identification
follows and data exchange may commence.  This would separate networking
from identification totally.  It would then be easier to implement any kind
of networking means (such as proxy, tunneling, peer-to-peer, fifo-pipes,
offline-disk-based, etc).  And indeed, I would pile as much networking
flexibility as possible.

  These are the thoughts I had while trying to use csync2.  I'm still using
unison for my "production" work though.  My ideas may not be realistic, but
remember that science-fiction does indeed drive science not the other way
around (and if I recall right, Isaac Asimov did invent the word Robotics)!
 ;)

Simon

On Mon, Apr 8, 2013 at 5:34 PM, Michael Johnson <mikjo.sas at gmail.com> wrote:

> I read some of the list archives discussing an SSH mode, for example:
> http://lists.linbit.com/pipermail/csync2/2012-September/000911.html
> I might be able to work on this if we can agree on desired features.
>
> Obviously, I'd particularly love feedback from Lars, as gatekeeper. :)
>
> I have a potential use case where I have constraints that I don't control.
> These constraints would get in the way of using csync2 as it stands, but
> so far it seems to me that csync2 is the closest match for what I need;
> it implements the core functionality and connectivity is the only real
> issue. I'll describe the constraints first, and then suggest what looks
> to me like a solution.
>
> I have a set of systems that are graphically distributed and generally
> network-isolated, containing millions of files in several tens of thousands
> (perhaps growing to hundreds of thousands) of directories that I need to
> keep in sync, including honoring deletes, across all the systems.
>
> The content needs to be eventually consistent, but temporary
> inconsistency is OK.  I have a central location from which I can SSH
> out to the distributed nodes, but not SSH in from the distributed nodes
> to the central location; it is a hub-and-spoke architecture. I need
> changes to flow into the hub from the distributed nodes, and out of the
> hub to the distributed nodes. (Fortunately, "younger" conflict resolution
> is always appropriate in my use case.)
>
> Furthermore, the distributed nodes cannot SSH to each other.  All the
> SSH connections go through NAT in pools, so $SSH_CLIENT is neither
> reliable (recognizable from the initiator's standpoint) nor stable (it
> might change from invocation to invocation). I can't open up new ports.
>
> Finally, because these systems are parts of precisely replicated
> cluster deployments, each system actually has the same
> hostname (they need to appear identical internally no matter
> which geographically distributed cluster a user is assigned to).
>
> (My current thought is to use lsyncd to add entries to the hint table on
> each system, but I'm still investigating that.)
>
> If it weren't for the incompatibility between the system constraints and
> csync2's policy and transport implementation, I would implement this by
> having each distributed node have a configuration that lists only the hub
> node and itself as hosts, and having the hub node list all the nodes, then
> from the hub, orchestrate invoking csync2 alternately on each of the
> distributed nodes in turn, and on the hub node.
>
> Neither the transport limitations (bidirectional connectivity) nor the
> policy
> limitations (e.g. reverse lookups required) seem central to csync2's
> design,
> though.
>
>
> What I'd be interested in is:
>
> * The ability to specify hosts using syntax like:
>   host host at ssh://hostname
>   It's not clear to me how much of the URL syntax to honor.  But at
>   least using the URL-style prefix would be unlikely to clash with
>   real hostnames. Saying that it will use URL syntax at least
>   reserves the space to add things like port specification via URL
>   syntax at some point if desired.
>
> * I think I understand from the previous list discussion that SSH should
>   not do SSL because SSH will have been configured to do whatever
>   authentication and encryption desired. That's clearly what I would
>   want. That is, SSH should imply nossl always.
>
> * Two ssh modes, separating "which end is pushing changes" from
>   "which node initiated the network connection"
>
>   - One mode in which it invokes csync2 -i on the remote system, where
>     it acts very much like the current network port access.
>
>   - Another mode ("pull mode" or "reverse mode" though I don't know what
>     command line switch to use for it) that invokes csync2 on the remote
>     with the same command line options (except the pull mode switch itself)
>     and runs MODE_INETD locally, so csync2 functions as if the remote
>     end had initiated the connection.
>
> * An option to tell the server what the client wants the server to
>   consider its hostname to be. I would see this as a new protocol
>   command HOSTNAME that overrides myhostname, turned on
>   by a new configuration item, and off by default. (This would solve
>   both my "identical hostname" and "can't use $SSH_CLIENT
>   issues.)
>
> Lars, what do you think?  Worth more investigation on my part?
>
> (I'm not asking you to do work other than merge changes, and that
> only if you agree. I'm just not promising to deliver before being certain
> that this will work...)
>
> Thanks...
> _______________________________________________
> Csync2 mailing list
> Csync2 at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/csync2
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/csync2/attachments/20130409/ecdcf744/attachment.htm>