Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Aug 30, 2005 at 11:24:29AM +0200, Lars Ellenberg wrote: > / 2005-08-29 13:39:20 -0500 > \ Dave Dykstra: > > (someone asked about why to use stonith if DRBD prevents corruption) > > Drbd will prevent data corruption on its own, but stonith with drbd can > > give you increased uptime because there are cases when a standby drbd or > > heartbeat will refuse to take over until the formerly active one has been > > proven to be shut down. > > which are: ... ? > > > > btw: > we at LINBIT make sure that heartbeat has as many communication > channels as possible, but try to avoid stonith in most deployments: > we had cases where heartbeat would reboot one node, and might have > stonithed the other at the same event -- not exactly heartbeats fault, > more "misbehaving resource agents", but still very annoying. > > we feel better if we automatise as less as possible, > though obviously as much as necessary or convenient. > > as far as I can see, stonith with drbd does not really buy you anything. You know better than I do, Lars, about the states that DRBD can get into, but I know that heartbeat tries very hard to avoid split brain and doesn't distinguish between whether it's using DRBD or not. I initially tried to get by without stonith but eventually came to the conclusion that I needed it because failovers sometimes didn't happen properly. Come to think of it, it may be because if heartbeat dies on the active side but DRBD doesn't, the takeover by heartbeat fails and I had assumed that a stonith would clean that up. As it turns out, DRBD still won't take over immediately after a stonith, not until it times out, and that continues to be a thorny issue that I've raised on both mailing lists and do not have yet have an answer for. Alan, can you comment? - Dave