[DRBD-user] Tape backups (yes, again)

Greg Freemyer greg.freemyer at gmail.com
Thu Jun 11 20:46:41 CEST 2009

On Thu, Jun 11, 2009 at 2:32 PM, Ken Dechick<kend at medent.com> wrote:
> Hello all,
>   I know this has been discussed before, but I am still trying to "sell" the
> whole DRBD/Heartbeat system to the higher-ups within my company and I can't
> find a solid answer on this here in the mailing list. So I will ask again.
> I NEED to have tape backups - we are in the medical software business and
> having a tape to fall back on is crucial to our business model (if a
> client's office burns to the ground and both the primary and secondary
> servers in the 2-node cluster are gone for good, then a tape backup no more
> than a day old stored offsite is the only solution left - no doctor will
> tolerate losing much more data than this).
> So let's forget the whole mounting secondary as read-only mess for now. What
> I am thinking is this:
>   -at backup time (2AM?):
>     -stop drbd and heartbeat on the secondary
>     -bring down the dedicated eth1 connection to the primary (leaving eth0
> still up so I can get in if need be)
>     -mount the sda4 partition (NOT the drbd0 device as drbd will be stopped)
> to it's normal position
>     -run my usual tape backup routine
>     -unmount sda4 again
>     -bring eth1 back up
>     -start drbd and heartbeat again
> I am thinking that in this way my users will still see NO downtime of the
> primary resource (unless of course there is a hardware failure during the
> tape backup while the secondary is offline!), and I still get a tape backup
> that is quite current. Once the secondary comes back up again anything that
> may have changed during the backup will replicate leaving me with only a
> tiny window of time to be without the secondary (an hour or two tops for my
> tape backup to run).
> Could it really be this simple? We don't use lvm at all, just plain old ext3
> file systems, so I believe this negates the whole lvm snapshot and then back
> that up dicussion I have seen here in the lists. What are your thoughts?
> Currently we implement what we call a view-only backup server at some
> clients where a second server is up and running and sync'ed (using rsync)
> only once a night from the main, then a tape backup runs once the sync is
> done. In this way our aplpication is only offiline during the time it takes
> to complete the rsync. I am thinking that there is no need to do this at all
> if I have a DRBD/Heartbeat 2 node cluster. (I certainly don't need a 3rd
> machine and keep doing the rsync then tape like we do now do I??)
> Thanks in advance for any thoughts on this you can share!
> -Ken
> Kenneth M DeChick

2 issues I see.

The easy one is you should quiesce your primary as step one.  So first
quiesce and apps you have, then I assume ext3 has a userspace call to
quiesce it.  I know xfs does.  After you break the mirror, you can
release the primary back to performing i/o again.  Hopefully that is
just a short period of downtime.

The harder one.  you say just mount the backup r/w.  That means you
will get an uncontrolled divergence between the primary and backup.
The only way to sync back up is to do a full 100% sync.  That seems
crazy for a couple of reasons.

Much better is to mount the backup as truly readonly and then do your
backup.  By truly readonly I mean even in-flight journals/logs should
not be applied on mount.  Again I don't know if ext3 can do that, but
xfs can.

fyi: if you allow logs to be applied on the secondary, you again have
an uncontrolled divergence and a full 100% sync will be required to
bring things back together.

I'm sure there are other details you need to refine, but the above are
high-level steps you need to get in your plan from the start.

Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
First 99 Days Litigation White Paper -

The Norcross Group
The Intersection of Evidence & Technology

More information about the drbd-user mailing list