Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, 1 May 2008, Doug \superdug\ Smith wrote: > What I would like to know, is anyone using GFS on top of DRBD? Yes. I do for one. > If so are you using the cluster tools inside of RHCS? What do you mean by that? If you mean for configuring, I usually create/edit cluster.conf manually. > Also, how do you mitigate a failure situation, more in reference to regaining > Consistency and Fencing of a node? You set up failover domains and migrate the floating IP to one of the remaining servers. You can specify priorities to control the order of preference for what service(s) should fail over to what server(s). If your fencing works correctly, failure will be handled transparently. Just make sure that you set up DRBD to fence the other node when it detects a failure - itherwise you can end up in a situation where DRBD disconnected, but cluster didn't fence, which results in split brain, and the data between the two copies will diverge. (you will need to use the stonith fencing option in drbd.conf and point it at the RHCS fencing script) > I have GFS running on DRBD without issue right now, but I am having trouble > recovering from a network disconnect or reboot of a node in a mock failure > situation. > > Is there a way with the GFS setup to keep one node online after a failure? What exactly is the problem? When a node fails, it gets powered off. When it powers back up, DRBD will automatically resync (assuming you have it set to start automatically), and when the node rejoins the cluster/fencing domain, the resource manager will try to get the migrated services back to the local node. All this time, the remaining node(s) will continue working. If you are seeing the whole cluster just hang when a node fails, that means you didn't configure the fencing correctly. Check syslog for related messages. Gordan