[DRBD-user] Replication for disaster recovery

Sat Apr 26 13:05:52 CEST 2014

On Fri, Apr 25, 2014 at 11:47:45PM +0200, Velko wrote:
> Hi Digimer,
> 
> On 2014-Apr-25 10:22, Digimer wrote:
> > >I'm planing to use DRBD for DR and since I've never used it, I have a
> > >few questions.
> > >
> > >1. Have anyone used it for replication to different data center (both
> > >east coast of USA)? I was thinking of using protocol A without DRBD
> > >proxy (maybe protocol B). How does it work in real life?

Protocol B would still cost the network latency,
and only shave off the remote disk latency. So yes:

> > You would want Protocol A, or else your storage performance would
> > effectively drop to the same speed as the network. Of course, the
> > trade-off is that the remote side will often be a little behind
> > production (or a lot behind, depending of how much data is being
> > written compared to the network speed).
> 
> That is something I will have to live with.
> 
> > >2. Since there is a lot of 2-3 MB files in size of 1GB daily that are
> > >not that important in case I need to switch to secondary site, would it
> > >be possible to put those files on separate partition (DRBD device) and
> > >configure it to have lower priority? Maybe only periodically
> > >resynchronization? I understood that it is much more efficient.
> > >Partition with less changing files and InnoDB tables would have high
> > >priority.
> > 
> > Not easily. I suppose you could create a separate resource which
> > would use a different port and then try QoS, but that could get
> > tricky (in general, not DRBD specific, QoS can be a pita).
> 
> In that case, I'll just use rsync and cron for those files.

Consider csync2.

> > >3. Would having separate NIC for replications improve performances in
> > >this case?
> > 
> > Depends entirely on where your bottle-neck is. I suspect that will
> > be your WAN connection, which you didn't talk about. ;)
> 
> I just thought that main problem will be latency, not bandwidth. 100Mb
> can't be saturated in my usecase, but that's just theory and practice
> can be different.
> 
> > >4. Is it possible to configure DRBD to automatically demote in case of
> > >loosing internet connection?
> > 
> > Not safely, no. There is now reliable way to use fencing, and
> > without reliable fencing, only a human can ensure the state of the
> > lost node. To try and automate it would very likely lead to
> > split-brains.
> 
> You're talking about hardware fencing? That's possible only when servers
> are in the same DC. I'd like to automate demoting on primary server to
> avoid split-brains. Promoting secondary to primary would be done
> manually.  Imagine this situation: DC where primary server is looses
> network only partially. Because of it, I'm unable to access the server
> (and it can't see peer either) and I think that that server is offline
> although that server is still visible from some other locations. I
> promote secondary server to primary, change DNS records and now, until
> DNS servers are propagated to the whole world there will be two primary
> servers. That's why I thought that it would be good to make primary
> server demote itself in cases like this.

We cannot easily "demote" ourselves.
We can however freeze IO on connection loss,
which is exactly what happens if you configure
  fencing resource-and-stonith;

It will then freeze IO,
call the "fence-peer" handler (a script you can define in drbd.conf)
and unfreeze IO only once that handler returned an expected exit code,
the connection is re-established,
or the admin explicitly resumes IO.

In theory you could try to 
   force disconnect,
   then force detach,
   then resume

And all file systems/applications on top of it will get IO errors,
and may then panic ;-)

Maybe easier to just have it freeze,
and start a timer... if it is still frozen
after $timeout, hard-reset ;-)

You could talk with LINBIT,
we can make it work the way you need it.

[if that is technically possible;
 which is why I not said "the way you want it" ...
 experience shows that people "want" protocol P,
 the predictive mode... ;-)]

Cheers,
	Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com