<div dir="ltr"><div>Hi Michael,<br></div><div>  I really like your proposal and I&#39;d like to share the thoughts I&#39;ve had that fit your ideas, I&#39;m just putting them on the table without criticising yours.</div><div>

<br></div><div>  I have a similar but simpler use case here and a few thoughts on the 2 main limitations.  One limitation, as you mentionned concerns the connectivity, while the second limitation as you pointed out again is the host identification.  I have not used csync enough to know if any other limitations exist (ie concerning filesystem, data, etc)...  I assume not!  ;)</div>

<div><br></div><div>  My use case is simple:  I have a number of machines on my LAN and one server (VPS) online.  The only way my VPS can communicate with any node inside the network is through a number of port forwardings on my router.  This means that all nodes on the csync2-network is connected to via 2 different addresses except for the one online.  This or using a more universal black-magic network trickery such as making every node think they are outside.</div>

<div><br></div><div>  Note that I use two similar words for very different means:  A host is a physical machine having one address, on one network.  A node is a csync2 virtual entity that correspond in most cases to a host.  A host could be the home of several nodes (think of groups) and hosts may be identical in many ways (clusters).  A node is strictly always unique and cloning a node should be considered (in my idea) most &quot;illegal&quot;.</div>

<div><br></div><div>  It is my opinion that the means to connect to the node are irrelevant so long that connection with the desired node is established.  I like the idea of using URLs to specify protocol, host address and port, but I would like also the addition of aliases (think of a laptop being at home, at work or a a client&#39;s facilities, same node, different connections so it appears as a different host).  The rest of the URL could be used for identification of the desired &quot;resource&quot; or the remote node (ie. Node A connects physically to node B with &quot;intent&quot; to contact node C or to contact the ABC group).  Also, I believe this may complicate things quite a lot more, but I see some good utility in being able to create a network of node not limited to one-on-one connections, as long as each node is connected to at least one node which is itself connected to the rest of the network (a node isolated with one dialup connection would never finish even &quot;connecting&quot; to a large group).  The way I see it is to allow and accept &quot;chaos&quot; in the connectivity: allowing different routes, addresses, allowing the remote end to &quot;prefer&quot; or suggest another route (think dialup here again).  But this chaotic freedom is nothing different from how the internet works:  If node A cannot establish at direct connection to D, it will ask B and C for it.</div>

<div><br></div><div>  Next, I find the way to identify the nodes both to be tedious to maintain and somewhat insecure.  Lack of security here is not in the sense of exploit, but perhaps more in terms of accident prevention...  I do not think that an IP address, or a hostname is sufficient to identify a node.  They may change voluntarily (update of the network, topology, laptop location, etc) or accidentally (as is the case for my home connection with my ISP) and controlling for these changes is part of what I consider &quot;tedious&quot;.  What would be most useful is to establish a node&#39;s identity via the use of public and private SSL keys generated at the moment a new node is added to the group.  The public keys could be signed by the administrator&#39;s authoritative key.  These keys would be unique for each machine (virtual or physical) regardless if the hostname, IP or anything else is the same (if you are maintaining mirrors, then you would just need those keys to remain different).  If two nodes were to be found with the same key, that key should be blocked (do not tolerate ambiguity for identification).  The keys would be used to sign or decrypt some handshake token and local host would then verify the remote host signed the token with what corresponds to that node&#39;s public key (an md5 of the key could be put in place of the node&#39;s name along with a human readable alias).  This is what would allow a local host to connect to any remote host it can regardless &quot;who&quot; they are.  Once connection is established, identification follows and data exchange may commence.  This would separate networking from identification totally.  It would then be easier to implement any kind of networking means (such as proxy, tunneling, peer-to-peer, fifo-pipes, offline-disk-based, etc).  And indeed, I would pile as much networking flexibility as possible.</div>

<div><br></div><div>  These are the thoughts I had while trying to use csync2.  I&#39;m still using unison for my &quot;production&quot; work though.  My ideas may not be realistic, but remember that science-fiction does indeed drive science not the other way around (and if I recall right, Isaac Asimov did invent the word Robotics)!  ;)</div>

<div><br></div><div>Simon</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Apr 8, 2013 at 5:34 PM, Michael Johnson <span dir="ltr">&lt;<a href="mailto:mikjo.sas@gmail.com" target="_blank">mikjo.sas@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I read some of the list archives discussing an SSH mode, for example:<br>

<a href="http://lists.linbit.com/pipermail/csync2/2012-September/000911.html" target="_blank">http://lists.linbit.com/pipermail/csync2/2012-September/000911.html</a><br>

I might be able to work on this if we can agree on desired features.<br>

<br>

Obviously, I&#39;d particularly love feedback from Lars, as gatekeeper. :)<br>

<br>

I have a potential use case where I have constraints that I don&#39;t control.<br>

These constraints would get in the way of using csync2 as it stands, but<br>

so far it seems to me that csync2 is the closest match for what I need;<br>

it implements the core functionality and connectivity is the only real<br>

issue. I&#39;ll describe the constraints first, and then suggest what looks<br>

to me like a solution.<br>

<br>

I have a set of systems that are graphically distributed and generally<br>

network-isolated, containing millions of files in several tens of thousands<br>

(perhaps growing to hundreds of thousands) of directories that I need to<br>

keep in sync, including honoring deletes, across all the systems.<br>

<br>

The content needs to be eventually consistent, but temporary<br>

inconsistency is OK.  I have a central location from which I can SSH<br>

out to the distributed nodes, but not SSH in from the distributed nodes<br>

to the central location; it is a hub-and-spoke architecture. I need<br>

changes to flow into the hub from the distributed nodes, and out of the<br>

hub to the distributed nodes. (Fortunately, &quot;younger&quot; conflict resolution<br>

is always appropriate in my use case.)<br>

<br>

Furthermore, the distributed nodes cannot SSH to each other.  All the<br>

SSH connections go through NAT in pools, so $SSH_CLIENT is neither<br>

reliable (recognizable from the initiator&#39;s standpoint) nor stable (it<br>

might change from invocation to invocation). I can&#39;t open up new ports.<br>

<br>

Finally, because these systems are parts of precisely replicated<br>

cluster deployments, each system actually has the same<br>

hostname (they need to appear identical internally no matter<br>

which geographically distributed cluster a user is assigned to).<br>

<br>

(My current thought is to use lsyncd to add entries to the hint table on<br>

each system, but I&#39;m still investigating that.)<br>

<br>

If it weren&#39;t for the incompatibility between the system constraints and<br>

csync2&#39;s policy and transport implementation, I would implement this by<br>

having each distributed node have a configuration that lists only the hub<br>

node and itself as hosts, and having the hub node list all the nodes, then<br>

from the hub, orchestrate invoking csync2 alternately on each of the<br>

distributed nodes in turn, and on the hub node.<br>

<br>

Neither the transport limitations (bidirectional connectivity) nor the policy<br>

limitations (e.g. reverse lookups required) seem central to csync2&#39;s design,<br>

though.<br>

<br>

<br>

What I&#39;d be interested in is:<br>

<br>

* The ability to specify hosts using syntax like:<br>

  host host@ssh://hostname<br>

  It&#39;s not clear to me how much of the URL syntax to honor.  But at<br>

  least using the URL-style prefix would be unlikely to clash with<br>

  real hostnames. Saying that it will use URL syntax at least<br>

  reserves the space to add things like port specification via URL<br>

  syntax at some point if desired.<br>

<br>

* I think I understand from the previous list discussion that SSH should<br>

  not do SSL because SSH will have been configured to do whatever<br>

  authentication and encryption desired. That&#39;s clearly what I would<br>

  want. That is, SSH should imply nossl always.<br>

<br>

* Two ssh modes, separating &quot;which end is pushing changes&quot; from<br>

  &quot;which node initiated the network connection&quot;<br>

<br>

  - One mode in which it invokes csync2 -i on the remote system, where<br>

    it acts very much like the current network port access.<br>

<br>

  - Another mode (&quot;pull mode&quot; or &quot;reverse mode&quot; though I don&#39;t know what<br>

    command line switch to use for it) that invokes csync2 on the remote<br>

    with the same command line options (except the pull mode switch itself)<br>

    and runs MODE_INETD locally, so csync2 functions as if the remote<br>

    end had initiated the connection.<br>

<br>

* An option to tell the server what the client wants the server to<br>

  consider its hostname to be. I would see this as a new protocol<br>

  command HOSTNAME that overrides myhostname, turned on<br>

  by a new configuration item, and off by default. (This would solve<br>

  both my &quot;identical hostname&quot; and &quot;can&#39;t use $SSH_CLIENT<br>

  issues.)<br>

<br>

Lars, what do you think?  Worth more investigation on my part?<br>

<br>

(I&#39;m not asking you to do work other than merge changes, and that<br>

only if you agree. I&#39;m just not promising to deliver before being certain<br>

that this will work...)<br>

<br>

Thanks...<br>

_______________________________________________<br>

Csync2 mailing list<br>

<a href="mailto:Csync2@lists.linbit.com">Csync2@lists.linbit.com</a><br>

<a href="http://lists.linbit.com/mailman/listinfo/csync2" target="_blank">http://lists.linbit.com/mailman/listinfo/csync2</a><br>

</blockquote></div><br></div>