[DRBD-user] DRBD (XFS) + Pacemaker + Corosync with 2 node and arbiter (virtual node) for no split brain: Stonith, Quorum needed?

Fri Sep 19 10:30:31 CEST 2014

Hi aTTi,

   Comments in-line;

On 18/09/14 02:22 PM, aTTi wrote:
> Hi Digimer!
>
> Thanks your answer. I had a lot of questions and not just for Digimer - for all.
>
> So, if I had just 2 nodes with disabled quorum and I use fencing (aka
> STONITH) + pacemaker, it will be safe for production use? (other
> recommended settings what is not default? any howto?)

"Production ready" requires many things. Fencing is one of those things, 
of course, but there are others.

Details are hard to give without a better idea of your environment... 
What operating system? What versions of corosync, pacemaker and DRBD? etc.

With 2-node clusters, you need to put a delay on one node, and you need 
to be careful to avoid fence loops. That is to say, either don't let the 
cluster stack start on boot (always my recommendation), or at least use 
wait_for_all if you have corosync v2+.

See:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Giving_Nodes_More_Time_to_Start_and_Avoiding_.22Fence_Loops.22

> If the STONITH kills the slower node, it not makes data loss for
> slower server? It's a remote shutdown or power off / reset ? Or same
> as I start a shutdown as root?

With DRBD, both nodes stop writing when connection is lost. This way, 
when the slower node is powered off, no data is lost. If your OS itself 
uses a journaled file system and you're not doing something silly like 
using hardware RAID in write-through mode without a BBU, then the OS 
should be safe as well.

When the fenced server boots back up, DRBD on the surviving node will 
know just which blocks changed when the peer was gone, so it only has to 
copy that data to bring the peer back up to full sync state.

> So, if communication will break, happenings will be same in a western
> movie: faster kills the slower and only 1 will alive. Both node will
> die - it can be happen?

It can happen that both nodes die in some cases. This can be avoided 
with a few precautions; disable acpid if you have IPMI fencing and set a 
delay against one node.

Please read the section immediately below the example config file here:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Using_the_Fence_Devices

> With good setup and with no hardware error what is the most problems
> with DRBD? How can I proof that?

With good fencing, there are no problems. I have used in it production 
since 2009 on dozens of 2-node clusters all over north america. The 
trick is the good fencing.

> How can I find a documentation about DRBD test cases? Or recommended
> configurations and installation manual for 2 node with Centos 7?

I don't know how much documentation exists for CentOS 7, it is very new. 
However, the concepts in CentOS 6 are very similar.

You can read here a lot about the logic and concepts behind how we use 
DRBD in our 2-node clusters here:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Hooking_DRBD_into_the_Cluster.27s_Fencing

> Example situation:
> server 1 = DRBD active node with running services, server 2  = DRBD passive node
> server 1 had hardware error, went offline, server 2 will the active node
> server 2 set the virtual IP what needed for active, then starting services
> after server 1 hardware repair, server 1 will online again
> how can I switch back the most safest way if STONITH installed to
> server 1 be the active and server 2 be the passive node? I need a
> script? Or just few commands?

As soon as there is a problem, both nodes block and call a fence. The 
faster node powers off the slower node, gets confirmation that it is 
off, and *then* begins recovery. Maybe the fenced node will boot back 
up, or maybe it's a pile of rubble and will never power on again... it 
doesn't matter to the cluster.

Once the node is gone, the surviving machine will review the pacemaker 
configuration, determine what has to be done to recover your services, 
and then do that. What "that" means will depend entirely on your 
configuration.

An example might be to:

1. promote DRBD to primary
2. mount the file system on drbd
3. start a service like httpd or postgresql that uses the DRBD data
4. take over the virtual IP address

This is just an example though.

> Other situation:
> Any real life experience about to periodically (weekly, monthly)
> change the active and passive nodes? Like in the last example, server
> 1 active, server 2 passive, then monthly I change to be active the
> node 2. In January the active server 1 the active node, in February
> the server 2 is the active, in March again the server 1 will the
> active... for same server wear/abrasion.

Migration of services can be controlled however you want, but time-based 
migrations is not something I have seen. Nothing stops you from manually 
moving the services though, if you want. Generally though, services 
migrate in reaction to a specific event, like a component failure.

> You recommend me to use 3. node as backup node or not? And in what way
> to use the third node? As stacked node? Or ISCSI sync? Or normal
> passive node? (I don't want it. I want to be my DRBD solution simple
> and safe.)

A cluster does _NOT_ replace backups. You still need backups, always. 
Generally, I have a dedicated machine, in another building, that 
periodically rsync's the production data into a date-coded directory. 
This way, I can go back in time to retrieve deleted or corrupted files.

How you setup your backup though, is entirely up to you. Backup is very 
different from HA.

> Can I combine DRBD server pairs? Like server 1+2 is DRBD1 node 1+2,
> and server 3+4 is DRBD2 node 1+2. Then adding to DRBD1 the server 3 or
> 4, and for DRBD2 adding 3. node the server 1 or 2? Any point of this?
> Or to make more strange: adding DRBD1 node 3 storage space to DRBD2's
> disk space?
> I think it's not a good idea just I want to know. Also I had disk
> space for that, just asking as theoretically.

I don't know if it is possible, but I think it would be.

> If DRBD really safe with 2 nodes, I don't want use more nodes. I will
> make auto backup from data, I just want HA and no service stop and no
> data loss if server error. I know, DRBD just one part of HA solution,
> but it's important part.

As I said, I have used DRBD in 2-node clusters only for several years 
without any issue.

> You recommend to use at least 2 ring level with corosync? level 1 =
> crossover cable, level 2 = switch connection. Any disadvantages of
> that?

It's up to you. I use active/passive bonding with the network links 
spanning two switches for full network redundancy. Redundant rings are 
good, too. I go with bonding only because it protects all traffic, 
including DRBD traffic.

> Thank you again for your help.
> aTTi

Always happy to help.

PS - Please keep replies on the mailing list. Conversations like this 
can help others in the future when they are in the archives.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?