Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 5/30/07, David Masover <ninja at slaphack.com> wrote: > On Wednesday 30 May 2007 10:04:07 Greg Freemyer wrote: > > > Failing any of that, does anyone know a simple filesystem which can do > > > what drbd does (raid1 over a network), but at the FS level? If not, I'm > > > thinking of writing one using FUSE or something... > > > > If you don't need realtime, rdiff-backup is designed to do nightly > > backups of large data sets across a WAN. It keeps a full copy of your > > data offsite, plus deltas to get you back to older versions. > > That's essentially what I'm doing. I'm using something called BackupPC, which > has a number of methods for backups -- right now I'm using rsync for network > and tar for local, although it does have support for Samba and other > oddities. It compresses the files (using bzip2 right now, because the CPU is > not even close to being the bottleneck, but it can do gzip, probably even > lzo) and pools them -- that is, it keeps exactly one copy of each (identical) > file (using a checksum), so it's kind of like having a delta, and really > about as efficient for lots of small, binary-ish files. (This is mostly > people's office documents -- Word and such -- so that makes sense.) > > We backup everything to one onsite server, which is nice, because it means > that with BackupPC's web interface, if people do something stupid and delete > a file, they can go browse the backup archives and restore it with no help > from me. > > The first thing we tried was BackupPC (well, first, my own homebrew system, > and then BackupPC when we found it was mostly the same thing), and a small > rsync script to backup exactly one copy of the BackupPC archive offsite. The > theory is, if the backup box in the office dies, we can always physically > carry in the offsite backup box for awhile, and then build another offsite > box. > > Except that rsync doesn't do so well with large numbers of small files. It's > been awhile since we tried this, but basically, it built a file list at the > beginning, and ate up more and more RAM as it tried to do that, and we were > actually running out of RAM and swap when we left it running for a few days. > > Since we didn't really need realtime, one thing we talked about was rsync'ing > the partition. But even though rsync can sync a file in-place, it refuses to > operate on a block device, and this appears to be hard-coded somewhere. > > I'm guessing one alternative is to use DRBD that way -- to have the backup > script disconnect DRBD before each backup, and reconnect after. The problem > with that, as with the rsync-ing-the-whole-block-device approach, is what if > the building blows up during this process? If that happens, we've now got a > half-sync'd and completely useless offsite backup. (This probably also > applies to rsync'ing the whole backup repository.) > > So, DRBD in synchronous mode is about the only thing I can think of, other > than a few things like Lustre and Coda which have never worked for me (and > Lustre is commercial and 2.4-only anyway). > > > I'm currently backing up a 70GB dataset each night. On days with no > > activity it is taking less than an hour and I have a convoluted setup. > > (I first backup locally to encfs encrypted fs, then rsync the whole > > backup repository to an offsite location.) > > Either things have changed, or you don't have nearly as much stuff to backup > as we do. Probably the biggest worry is the mailserver -- easily tens or > hundreds of thousands of emails, and each is its own file. > David, Hopefully someone else will help with your original drbd question. I changed the subject to rsync issues. === Last night I did 3 rsyncs to my offsite location. From the one with the most files (users home dirs): === Number of files: 283194 Number of files transferred: 28 Total file size: 10.51G bytes Total transferred file size: 69.30M bytes Literal data: 16.06M bytes Matched data: 53.25M bytes File list size: 23800093 File list generation time: 140.253 seconds File list transfer time: 0.000 seconds Total bytes sent: 39.89M Total bytes received: 73.48K sent 39.89M bytes received 73.48K bytes 49.74K bytes/sec total size is 10.51G speedup is 263.08 === You can see 2min, 20 sec. to create the file list of 283,000 files. You must have many millions of files if your running out of time on that step. I don't think rdiff-backup builds the list, so you may want to give it a shot as your offsite tool directly. My rdiff-backup run from last night that created the above only took 1min 34sec, but it had a lot less files to verify because of the delta files in the repository. === Backup completed successfully Direction: /home to /backup/home-rdiff Backup started: Wed May 30 01:16:18 2007 Source files: 55435 Source files size: 6747905761 (6.28 GB) New files size: 0 (0 bytes) Deleted files size: 0 (0 bytes) Destination size change: 2.11 MB Errors: 0 === FYI: All the above is with a several year old P4 2.8GHz with 1GB of RAM. FYI2: In theory rsync can survive a crash in the middle if you have the right parameters. I use: rsync -avh --stats --links --partial-dir=/remote_backup_transfer_dir --timeout=1800 /local_backup_dir login at server:remote_backup_dir/ The partial-dir and the timeout took some tweaking for me to figure out. The partial-dir says to leave failed transfers in the transfer dir for use on a future rsync call. The timeout had to be long because I had some multi-gigabyte files fail in the middle and it was taking rsync a long time to restart their transfer on the next invocation. I assume it was running checksums to verify the partial file it had from the previous run. I then wrap my rsync in a retry loop (bash logic): START=`date +%s` MAX_TIME='14400' # 4 hours for (( DELTA=0 ; DELTA < MAX_TIME ; DELTA = NOW - START)) do rsync -avh --stats --links --partial-dir=/remote_backup_transfer_dir --timeout=1800 /local_backup_dir login at server:remote_backup_dir/ if [ $? != 30 ] then break fi NOW=`date +%s` done I had to do this because we had a period of unreliable transfers a couple months ago. Since then I have not seen the retry logic used, but its nice to know it is there. Hope that helps Greg -- Greg Freemyer The Norcross Group Forensics for the 21st Century