[DRBD-user] GFS on top of DRBD Primary/Primary

Thu May 1 18:34:40 CEST 2008

On Thu, 1 May 2008, Doug \superdug\ Smith wrote:

> What I would like to know, is anyone using GFS on top of DRBD?

Yes. I do for one.

> If so are you using the cluster tools inside of RHCS?

What do you mean by that? If you mean for configuring, I usually 
create/edit cluster.conf manually.

> Also, how do you mitigate a failure situation, more in reference to regaining 
> Consistency and Fencing of a node?

You set up failover domains and migrate the floating IP to one of the 
remaining servers. You can specify priorities to control the order of 
preference for what service(s) should fail over to what server(s).

If your fencing works correctly, failure will be handled transparently. 
Just make sure that you set up DRBD to fence the other node when it 
detects a failure - itherwise you can end up in a situation where DRBD 
disconnected, but cluster didn't fence, which results in split brain, and 
the data between the two copies will diverge. (you will need to use the 
stonith fencing option in drbd.conf and point it at the RHCS fencing 
script)

> I have GFS running on DRBD without issue right now, but I am having trouble 
> recovering from a network disconnect or reboot of a node in a mock failure 
> situation.
>
> Is there a way with the GFS setup to keep one node online after a failure?

What exactly is the problem? When a node fails, it gets powered off. When 
it powers back up, DRBD will automatically resync (assuming you have it 
set to start automatically), and when the node rejoins the cluster/fencing 
domain, the resource manager will try to get the migrated services back to 
the local node. All this time, the remaining node(s) will continue 
working.

If you are seeing the whole cluster just hang when a node fails, that 
means you didn't configure the fencing correctly. Check syslog for related 
messages.

Gordan