[DRBD-user] Tape backups (yes, again)
greg.freemyer at gmail.com
Thu Jun 11 20:46:41 CEST 2009
On Thu, Jun 11, 2009 at 2:32 PM, Ken Dechick<kend at medent.com> wrote:
> Hello all,
> I know this has been discussed before, but I am still trying to "sell" the
> whole DRBD/Heartbeat system to the higher-ups within my company and I can't
> find a solid answer on this here in the mailing list. So I will ask again.
> I NEED to have tape backups - we are in the medical software business and
> having a tape to fall back on is crucial to our business model (if a
> client's office burns to the ground and both the primary and secondary
> servers in the 2-node cluster are gone for good, then a tape backup no more
> than a day old stored offsite is the only solution left - no doctor will
> tolerate losing much more data than this).
> So let's forget the whole mounting secondary as read-only mess for now. What
> I am thinking is this:
> -at backup time (2AM?):
> -stop drbd and heartbeat on the secondary
> -bring down the dedicated eth1 connection to the primary (leaving eth0
> still up so I can get in if need be)
> -mount the sda4 partition (NOT the drbd0 device as drbd will be stopped)
> to it's normal position
> -run my usual tape backup routine
> -unmount sda4 again
> -bring eth1 back up
> -start drbd and heartbeat again
> I am thinking that in this way my users will still see NO downtime of the
> primary resource (unless of course there is a hardware failure during the
> tape backup while the secondary is offline!), and I still get a tape backup
> that is quite current. Once the secondary comes back up again anything that
> may have changed during the backup will replicate leaving me with only a
> tiny window of time to be without the secondary (an hour or two tops for my
> tape backup to run).
> Could it really be this simple? We don't use lvm at all, just plain old ext3
> file systems, so I believe this negates the whole lvm snapshot and then back
> that up dicussion I have seen here in the lists. What are your thoughts?
> Currently we implement what we call a view-only backup server at some
> clients where a second server is up and running and sync'ed (using rsync)
> only once a night from the main, then a tape backup runs once the sync is
> done. In this way our aplpication is only offiline during the time it takes
> to complete the rsync. I am thinking that there is no need to do this at all
> if I have a DRBD/Heartbeat 2 node cluster. (I certainly don't need a 3rd
> machine and keep doing the rsync then tape like we do now do I??)
> Thanks in advance for any thoughts on this you can share!
> Kenneth M DeChick
2 issues I see.
The easy one is you should quiesce your primary as step one. So first
quiesce and apps you have, then I assume ext3 has a userspace call to
quiesce it. I know xfs does. After you break the mirror, you can
release the primary back to performing i/o again. Hopefully that is
just a short period of downtime.
The harder one. you say just mount the backup r/w. That means you
will get an uncontrolled divergence between the primary and backup.
The only way to sync back up is to do a full 100% sync. That seems
crazy for a couple of reasons.
Much better is to mount the backup as truly readonly and then do your
backup. By truly readonly I mean even in-flight journals/logs should
not be applied on mount. Again I don't know if ext3 can do that, but
fyi: if you allow logs to be applied on the secondary, you again have
an uncontrolled divergence and a full 100% sync will be required to
bring things back together.
I'm sure there are other details you need to refine, but the above are
high-level steps you need to get in your plan from the start.
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
First 99 Days Litigation White Paper -
The Norcross Group
The Intersection of Evidence & Technology
More information about the drbd-user