[DRBD-user] Read performance?

Wed May 30 17:46:42 CEST 2007

On Wednesday 30 May 2007 10:04:07 Greg Freemyer wrote:
> > Failing any of that, does anyone know a simple filesystem which can do
> > what drbd does (raid1 over a network), but at the FS level? If not, I'm
> > thinking of writing one using FUSE or something...
>
> If you don't need realtime, rdiff-backup is designed to do nightly
> backups of large data sets across a WAN.  It keeps a full copy of your
> data offsite, plus deltas to get you back to older versions.

That's essentially what I'm doing. I'm using something called BackupPC, which 
has a number of methods for backups -- right now I'm using rsync for network 
and tar for local, although it does have support for Samba and other 
oddities. It compresses the files (using bzip2 right now, because the CPU is 
not even close to being the bottleneck, but it can do gzip, probably even 
lzo) and pools them -- that is, it keeps exactly one copy of each (identical) 
file (using a checksum), so it's kind of like having a delta, and really 
about as efficient for lots of small, binary-ish files. (This is mostly 
people's office documents -- Word and such -- so that makes sense.)

We backup everything to one onsite server, which is nice, because it means 
that with BackupPC's web interface, if people do something stupid and delete 
a file, they can go browse the backup archives and restore it with no help 
from me.

The first thing we tried was BackupPC (well, first, my own homebrew system, 
and then BackupPC when we found it was mostly the same thing), and a small 
rsync script to backup exactly one copy of the BackupPC archive offsite. The 
theory is, if the backup box in the office dies, we can always physically 
carry in the offsite backup box for awhile, and then build another offsite 
box.

Except that rsync doesn't do so well with large numbers of small files. It's 
been awhile since we tried this, but basically, it built a file list at the 
beginning, and ate up more and more RAM as it tried to do that, and we were 
actually running out of RAM and swap when we left it running for a few days.

Since we didn't really need realtime, one thing we talked about was rsync'ing 
the partition. But even though rsync can sync a file in-place, it refuses to 
operate on a block device, and this appears to be hard-coded somewhere.

I'm guessing one alternative is to use DRBD that way -- to have the backup 
script disconnect DRBD before each backup, and reconnect after. The problem 
with that, as with the rsync-ing-the-whole-block-device approach, is what if 
the building blows up during this process? If that happens, we've now got a 
half-sync'd and completely useless offsite backup. (This probably also 
applies to rsync'ing the whole backup repository.)

So, DRBD in synchronous mode is about the only thing I can think of, other 
than a few things like Lustre and Coda which have never worked for me (and 
Lustre is commercial and 2.4-only anyway).

> I'm currently backing up a 70GB dataset each night.  On days with no
> activity it is taking less than an hour and I have a convoluted setup.
>  (I first backup locally to encfs encrypted fs, then rsync the whole
> backup repository to an offsite location.)

Either things have changed, or you don't have nearly as much stuff to backup 
as we do. Probably the biggest worry is the mailserver -- easily tens or 
hundreds of thousands of emails, and each is its own file.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070530/4172aea5/attachment.pgp>