[DRBD-user] halt after split brain on Red Hat Cluster 5

Chris Harms chris at cmiware.com
Tue Jul 3 17:14:03 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I do have fencing setup via a Dell DRAC card, which uses the system NICs 
as its connection.  The first time I tried this test, I thought RHCM was 
the culprit, but they assured me that both nodes should not be fenced 
when the connection was re-established.  After that, I found the halt 
commands in drbd.conf which fit the symptoms perfectly.

I actually performed this test twice, once with each node.  The first 
time it behaved as expected, with the offending node being rebooted by 
the cluster.  When testing the second node, both systems halted.  I plan 
to do more testing and try to rule out one subsystem in this mix.

Sadly, manual fencing is busted in CS 5, at least when setting up with 
Conga.  I haven't found useful info on it for manual editing.  I think 
Conga omits a nodename attribute in Cluster.conf.

Florian, do you have recommendations for drbd.conf settings for after 
split-brain events if RHCS is going to do the fencing?

Many thanks,
Chris

Florian G. Haas wrote:
> Chris,
>
> since you're on RHCM, are you sure it's DRBD that's causing your node lockup? 
> When RHCM loses the connection to the peer node, AFAIK it will assume it's in 
> split brain until it is certain that the peer has been properly fenced. 
> Assuming you don't have fencing in place, now would probably be a good time 
> to implement it. For testing purposes, you may use the "manual" fence device, 
> which you must acknowledge using fence_ack_manual.
>
> I hope this is applicable to your setup. Let us know if it helps. Mind that my 
> suggestions stem from experience gathered using RHEL 4 U4 and GFS, but the 
> scenario you describe sounds all too familiar. :-) 
>
> Cheers,
> Florian
>
> On Tuesday 03 July 2007 03:01:20 Chris Harms wrote:
>   
>> Hi All,
>>
>> I'm having a problem after simulating a network failure (unplugging the
>> cables) and reconnecting.  Upon reconnecting the cables, both nodes get
>> halted by the system and do not log anything.  I have removed the
>> default settings for Split Brain scenarios in drbd.conf and replaced
>> them with what I thought were innocuous commands:
>>
>> Is there an unlisted default setting in DRBD that might issue a halt to
>> the system?  Also, if I want the cluster manager to do fencing, what
>> would be good settings for the after split brain handlers?
>> [...]
>>     
>
>   




More information about the drbd-user mailing list