[DRBD-user] Replication for disaster recovery

Sat Apr 26 19:31:47 CEST 2014

On 25/04/14 02:47 PM, Velko wrote:
> Hi Digimer,
>
> On 2014-Apr-25 10:22, Digimer wrote:
>>> I'm planing to use DRBD for DR and since I've never used it, I have a
>>> few questions.
>>>
>>> 1. Have anyone used it for replication to different data center (both
>>> east coast of USA)? I was thinking of using protocol A without DRBD
>>> proxy (maybe protocol B). How does it work in real life?
>>
>> You would want Protocol A, or else your storage performance would
>> effectively drop to the same speed as the network. Of course, the
>> trade-off is that the remote side will often be a little behind
>> production (or a lot behind, depending of how much data is being
>> written compared to the network speed).
>
> That is something I will have to live with.
>
>>> 2. Since there is a lot of 2-3 MB files in size of 1GB daily that are
>>> not that important in case I need to switch to secondary site, would it
>>> be possible to put those files on separate partition (DRBD device) and
>>> configure it to have lower priority? Maybe only periodically
>>> resynchronization? I understood that it is much more efficient.
>>> Partition with less changing files and InnoDB tables would have high
>>> priority.
>>
>> Not easily. I suppose you could create a separate resource which
>> would use a different port and then try QoS, but that could get
>> tricky (in general, not DRBD specific, QoS can be a pita).
>
> In that case, I'll just use rsync and cron for those files.
>
>>> 3. Would having separate NIC for replications improve performances in
>>> this case?
>>
>> Depends entirely on where your bottle-neck is. I suspect that will
>> be your WAN connection, which you didn't talk about. ;)
>
> I just thought that main problem will be latency, not bandwidth. 100Mb
> can't be saturated in my usecase, but that's just theory and practice
> can be different.
>
>>> 4. Is it possible to configure DRBD to automatically demote in case of
>>> loosing internet connection?
>>
>> Not safely, no. There is now reliable way to use fencing, and
>> without reliable fencing, only a human can ensure the state of the
>> lost node. To try and automate it would very likely lead to
>> split-brains.
>
> You're talking about hardware fencing? That's possible only when servers
> are in the same DC.

yes and no. If the WAN link is working, you can fence over distances. 
The problem is that your fencing will fail if the WAN fails, which means 
you cluster will fail to fence and stay blocked (as Lars explained in 
his reply).

 > I'd like to automate demoting on primary server to
> avoid split-brains. Promoting secondary to primary would be done
> manually.

I'm not sure I understand the benefit here. If the WAN link drops, 
fencing will fail and the nodes will be blocked. Forcing a demotion 
doesn't really help, as a human will need to step in anyway. Let that 
human decide what node takes what role.

> Imagine this situation: DC where primary server is looses
> network only partially. Because of it, I'm unable to access the server
> (and it can't see peer either) and I think that that server is offline

"I think" is not safe, you need to know for sure the state of the peer. 
This is what fencing does, normally. If fencing fails, a human *must* 
confirm the state of the peer before recovering. Anything less will lead 
to a split-brain.

> although that server is still visible from some other locations. I

If the other node is visible, it's fencing should also be accessible. 
Fence the node and recover.

> promote secondary server to primary, change DNS records and now, until
> DNS servers are propagated to the whole world there will be two primary
> servers. That's why I thought that it would be good to make primary
> server demote itself in cases like this.

Two stand-alone primaries is a split-brain. How do you recover from 
that? One of the node's data will need to be discarded.

To restate; both nodes should block until their peer is put into a known 
state; Be it by human or fencing action. If the nodes are blocked, then 
their role as Primary or Secondary shouldn't matter.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?