[DRBD-user] Tape backups (yes, again)

Thu Jun 11 23:34:04 CEST 2009

On Thu, Jun 11, 2009 at 02:32:02PM -0400, Ken Dechick wrote:
> Hello all,
> 
>   I know this has been discussed before, but I am still trying to
> "sell" the whole DRBD/Heartbeat system to the higher-ups within my
> company and I can't find a solid answer on this here in the mailing
> list. So I will ask again.
> 
> I NEED to have tape backups - we are in the medical software business
> and having a tape to fall back on is crucial to our business model (if
> a client's office burns to the ground and both the primary and
> secondary servers in the 2-node cluster are gone for good, then a tape
> backup no more than a day old stored offsite is the only solution left
> - no doctor will tolerate losing much more data than this).
> 
> So let's forget the whole mounting secondary as read-only mess for
> now. What I am thinking is this:
> 
>   -at backup time (2AM?):

step missing: quiesce application and file system on the Primary.

>     -stop drbd and heartbeat on the secondary

no need. just "disconnect" drbd.

>     -bring down the dedicated eth1 connection to the primary (leaving eth0 still up so I can get in if need be)

no need.

>     -mount the sda4 partition (NOT the drbd0 device as drbd will be stopped) to it's normal position

BAD idea. you want to bypass drbd.
even a read-only mount does change data: file system journal replay,
superblock, etc.
to get it in sync again, you'd need to trigger a full sync (as drbd is
not aware of the changes.

really: bad idea.

better:
just disconnect drbd,
make it primary (yes, on the node you run the backup to tape).
this is to have it track the changes.
mount _drbd_ (read only, preferably)

>     -run my usual tape backup routine

right.

>     -unmount sda4 again
>     -bring eth1 back up
>     -start drbd and heartbeat again

scratch those, replace with
umount drbd, make secondary, tell it to "drbdadm -- --discard-my-data connect".

> I am thinking that in this way my users will still see NO downtime of
> the primary resource (unless of course there is a hardware failure
> during the tape backup while the secondary is offline!), and I still
> get a tape backup that is quite current. Once the secondary comes back
> up again anything that may have changed during the backup will
> replicate leaving me with only a tiny window of time to be without the
> secondary (an hour or two tops for my tape backup to run).
> 
> Could it really be this simple?

as just outlined: even simpler.

> We don't use lvm at all, just plain old ext3 file systems, so I
> believe this negates the whole lvm snapshot and then back that up
> dicussion I have seen here in the lists.

maybe you should reconsider that.
makes it even more easy.

> What are your thoughts? Currently we implement what we call a
> view-only backup server at some clients where a second server is up
> and running and sync'ed (using rsync) only once a night from the main,
> then a tape backup runs once the sync is done. In this way our
> aplpication is only offiline during the time it takes to complete the
> rsync. I am thinking that there is no need to do this at all if I have
> a DRBD/Heartbeat 2 node cluster. (I certainly don't need a 3rd machine
> and keep doing the rsync then tape like we do now do I??)

you could add a 3rd DRBD node, and do the backup from there.
this node could even be off-site itself, using DRBD protocol A,
possibly using DRBD Proxy.

read up on the "stacked three node drbd setup" in the users guide.

you do not lose redundancy on the main cluster while having the 3rd node
disconnected, and while it is connected, you even have a typically at
most few seconds old consistent off-site backup for desaster recovery.
and you don't even need to ship the tapes ;)

does that sound sensible?

-- 
: Lars Ellenberg                
: LINBIT HA-Solutions GmbH
: DRBD®/HA support and consulting    http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed