[DRBD-user] Recovery if active heartbeat dies before drbd

Dave Dykstra dwdha at drdykstra.us
Thu May 19 16:14:17 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


In order to test heartbeat's stonith, I have been doing kill -9 on the
heartbeat processes on the active server.  What happens then is after
heartbeat's timeout period, the standby server uses stonith to pull
the power on the active server and immediately tries to bring up its
drbd as primary.  That fails, I presume because its drbd still thinks
the other side is primary.  I don't think heartbeat passes on any drbd
error messages to /var/log/messages, so it is just a guess, but failover
works if I just pull the power plug on the whole active server or kill
heartbeat & drbd proceses at the same time, so that must be the problem.
Wouldn't it make sense for drbd, when told to become primary, to do a
quick check, maybe one or two queries with one-second timeouts, to see
if its peer is still alive and if not then go ahead and become primary?

- Dave



More information about the drbd-user mailing list