[DRBD-user] [OT] rsync issues [Was Re: Read performance?]

Greg Freemyer greg.freemyer at gmail.com
Wed May 30 19:29:50 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On 5/30/07, David Masover <ninja at slaphack.com> wrote:
> On Wednesday 30 May 2007 10:04:07 Greg Freemyer wrote:
> > > Failing any of that, does anyone know a simple filesystem which can do
> > > what drbd does (raid1 over a network), but at the FS level? If not, I'm
> > > thinking of writing one using FUSE or something...
> >
> > If you don't need realtime, rdiff-backup is designed to do nightly
> > backups of large data sets across a WAN.  It keeps a full copy of your
> > data offsite, plus deltas to get you back to older versions.
> That's essentially what I'm doing. I'm using something called BackupPC, which
> has a number of methods for backups -- right now I'm using rsync for network
> and tar for local, although it does have support for Samba and other
> oddities. It compresses the files (using bzip2 right now, because the CPU is
> not even close to being the bottleneck, but it can do gzip, probably even
> lzo) and pools them -- that is, it keeps exactly one copy of each (identical)
> file (using a checksum), so it's kind of like having a delta, and really
> about as efficient for lots of small, binary-ish files. (This is mostly
> people's office documents -- Word and such -- so that makes sense.)
> We backup everything to one onsite server, which is nice, because it means
> that with BackupPC's web interface, if people do something stupid and delete
> a file, they can go browse the backup archives and restore it with no help
> from me.
> The first thing we tried was BackupPC (well, first, my own homebrew system,
> and then BackupPC when we found it was mostly the same thing), and a small
> rsync script to backup exactly one copy of the BackupPC archive offsite. The
> theory is, if the backup box in the office dies, we can always physically
> carry in the offsite backup box for awhile, and then build another offsite
> box.
> Except that rsync doesn't do so well with large numbers of small files. It's
> been awhile since we tried this, but basically, it built a file list at the
> beginning, and ate up more and more RAM as it tried to do that, and we were
> actually running out of RAM and swap when we left it running for a few days.
> Since we didn't really need realtime, one thing we talked about was rsync'ing
> the partition. But even though rsync can sync a file in-place, it refuses to
> operate on a block device, and this appears to be hard-coded somewhere.
> I'm guessing one alternative is to use DRBD that way -- to have the backup
> script disconnect DRBD before each backup, and reconnect after. The problem
> with that, as with the rsync-ing-the-whole-block-device approach, is what if
> the building blows up during this process? If that happens, we've now got a
> half-sync'd and completely useless offsite backup. (This probably also
> applies to rsync'ing the whole backup repository.)
> So, DRBD in synchronous mode is about the only thing I can think of, other
> than a few things like Lustre and Coda which have never worked for me (and
> Lustre is commercial and 2.4-only anyway).
> > I'm currently backing up a 70GB dataset each night.  On days with no
> > activity it is taking less than an hour and I have a convoluted setup.
> >  (I first backup locally to encfs encrypted fs, then rsync the whole
> > backup repository to an offsite location.)
> Either things have changed, or you don't have nearly as much stuff to backup
> as we do. Probably the biggest worry is the mailserver -- easily tens or
> hundreds of thousands of emails, and each is its own file.
David,  Hopefully someone else will help with your original drbd
question.  I changed the subject to rsync issues.

Last night I did 3 rsyncs to my offsite location.  From the one with
the most files (users home dirs):

Number of files: 283194
Number of files transferred: 28
Total file size: 10.51G bytes
Total transferred file size: 69.30M bytes
Literal data: 16.06M bytes
Matched data: 53.25M bytes
File list size: 23800093
File list generation time: 140.253 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 39.89M
Total bytes received: 73.48K

sent 39.89M bytes  received 73.48K bytes  49.74K bytes/sec
total size is 10.51G  speedup is 263.08

You can see 2min, 20 sec. to create the file list of 283,000 files.
You must have many millions of files if your running out of time on
that step.

I don't think rdiff-backup builds the list, so you may want to give it
a shot as your offsite tool directly.  My rdiff-backup run from last
night that created the above only took 1min 34sec, but it had a lot
less files to verify because of the delta files in the repository.
Backup completed successfully
       Direction: /home to /backup/home-rdiff
       Backup started: Wed May 30 01:16:18 2007
       Source files: 55435
       Source files size: 6747905761 (6.28 GB)
       New files size: 0 (0 bytes)
       Deleted files size: 0 (0 bytes)
       Destination size change: 2.11 MB
       Errors: 0

FYI: All the above is with a several year old P4 2.8GHz with 1GB of RAM.

FYI2: In theory rsync can survive a crash in the middle if you have
the right parameters.  I use:
rsync -avh --stats --links --partial-dir=/remote_backup_transfer_dir
--timeout=1800 /local_backup_dir login at server:remote_backup_dir/

The partial-dir and the timeout took some tweaking for me to figure out.

The partial-dir says to leave failed transfers in the transfer dir for
use on a future rsync call.  The timeout had to be long because I had
some multi-gigabyte files fail in the middle and it was taking rsync a
long time to restart their transfer on the next invocation.  I assume
it was running checksums to verify the partial file it had from the
previous run.

I then wrap my rsync in a retry loop (bash logic):
START=`date +%s`
MAX_TIME='14400'    # 4 hours

        rsync -avh --stats --links
--partial-dir=/remote_backup_transfer_dir --timeout=1800
/local_backup_dir login at server:remote_backup_dir/
if [ $? != 30 ]
NOW=`date +%s`

I had to do this because we had a period of unreliable transfers a
couple months ago.  Since then I have not seen the retry logic used,
but its nice to know it is there.

Hope that helps
Greg Freemyer
The Norcross Group
Forensics for the 21st Century

More information about the drbd-user mailing list