[DRBD-user] How to enhance heartbeat in checking failures?

Nicola Ranaldo ranaldo at unina.it
Thu Feb 12 18:07:52 CET 2004


 I installed heartbeat v 1.1.5 on the same two hp proliant dl380.
 Node names are cl1 and cl2 and cl1 is the active (and preferred) member.

 I powered off disks on cl1 to simulate a disaster and after few seconds i
 got a kernel seriuos error (of course) about the update process.
 After 2 (!) minutes i got a "low level" disk error, the systems locked and
 cl2 as a new active member.
 This is Ok but only the first time!
 All the other tests produced the update error and not the "low level"
error.
 So cl1 stays up, answer to all eth* and serial request with no file over!
 This is a simple but critical condition. Another could be a process killed
 with sig 11 due to a ram trouble (the service down and all seem to be
fine).

 The question is... how can i check the server or better the "services" are
 "really" up and running to inform heartbeat?
 Is there a good respawn tool?

 And so on, can i write a monitor plugin at application level and put it
into
 hb? (for example a perl script checking e-mail, web and so on).

 Sorry for my bad english.

 Thank you

     Nicola Ranaldo





More information about the drbd-user mailing list