[Csync2] csync2 and large datasets
Art -kwaak- van Breemen
ard+csync2 at telegraafnet.nl
Tue Oct 21 19:50:23 CEST 2008
For other devs that might just want to get their hands dirty:
csync2 and large datasets are really a pain...
The biggest problem is not the sqlite database itself, but the
design of csync2 wanting to do everything in one big go.
Even with csync2id, you might resync your entire dataset once in
a while. Doing so usually is a pain for the server, and even a
bigger pain for the administrator.
So I wanted it different(1):
Say we start off with a csync2 -hr /, which should result in a
csync2 -cr / .
But instead of going depth-first checking all that's there I want
to suggest to *only* check all the files in the current
directory, and for each directory found insert a new recursive
hint for that specific directory.
Handling removals in this setup is a pain, since you must check
the files in a specific directory, and the database is ordered
depth first, although using limits on queries and aborting or
continuing queries should make it fast.
So I wanted it different(2):
We already know that we want to check files on a regular basis,
just in case ...
So we make an extra table that contains status information of a
maintenance check mode.
In the maintenance check mode we will do a select * from file,
but limited to MAXTESTS. For each directory entry we will check
each file in that directory for changes or new files only.
If we have checked more than MAXTESTS this way (either by
parsing everything from the select, or all the checks in a
directory), we save our current position in the status
information and quit.
If we find new directories in a directory, we insert them, and
refresh our list of entries to check.
Deletes will be handled because we will check every item in the
database.
Next time we do the same, starting from the file we checked
last+1.
This maintenance mode item should be done per csync group.
(1) has a really nice usage, since you can give preference to
whatever you like. Any batch will have a limited run length.
(2) is "easy" to do, since it doesn't touch anything, except for
database creation.
--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in email?
More information about the Csync2
mailing list