Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'm new to setting up clustering and DRBD. We're setting up a HA NFS cluster between 2 nodes, each node is running CentOS 5.1. I have DRBD working between them and RHCS is working with exporting a NFS export backed by GFS. The GFS volume is sitting on drbd0. When everything is running everything is good. However, I simulated a failure by shutting down the node that held the NFS service while a client was writing to it. Now here's what happened: Node 1 went down NFS Client paused during a write NFS Service came up on Node 2 NFS Client finished write NFS Client issued an ls NFS Clients ls hung, other processes on the Client were fine ls issued on Node 2 against the GFS filesystem, this also hung. All other processes on Node 2 were fine. So what happened is DRBD hung without effecting any other operation. When Node 1 was brought back online communications with Node 2 did not resume and a Split-Brain situation occurred. Now looking into this I believe that the problem is that the DRBD process/service is not properly fenced, or possibly not fenced at all. Now I see a lot of documentation fencing DRBD with Heartbeat but as I am already using RHCS for other services which depend on DRBD working, is it possible to use RHCS do handle the fencing and heartbeat processes? How would I go about configuring it? -Thank you in advance.