In order to test heartbeat's stonith, I have been doing kill -9 on the
heartbeat processes on the active server.  What happens then is after
heartbeat's timeout period, the standby server uses stonith to pull
the power on the active server and immediately tries to bring up its
drbd as primary.  That fails, I presume because its drbd still thinks
the other side is primary.  I don't think heartbeat passes on any drbd
error messages to /var/log/messages, so it is just a guess, but failover
works if I just pull the power plug on the whole active server or kill
heartbeat & drbd proceses at the same time, so that must be the problem.
Wouldn't it make sense for drbd, when told to become primary, to do a
quick check, maybe one or two queries with one-second timeouts, to see
if its peer is still alive and if not then go ahead and become primary?

- Dave

