Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
In order to test heartbeat's stonith, I have been doing kill -9 on the heartbeat processes on the active server. What happens then is after heartbeat's timeout period, the standby server uses stonith to pull the power on the active server and immediately tries to bring up its drbd as primary. That fails, I presume because its drbd still thinks the other side is primary. I don't think heartbeat passes on any drbd error messages to /var/log/messages, so it is just a guess, but failover works if I just pull the power plug on the whole active server or kill heartbeat & drbd proceses at the same time, so that must be the problem. Wouldn't it make sense for drbd, when told to become primary, to do a quick check, maybe one or two queries with one-second timeouts, to see if its peer is still alive and if not then go ahead and become primary? - Dave