stonith won't help with what we're trying to test.<br><br>using both communications channels seems it won't solve this issue either.<br><br>we wan't heartbeat to fail drbd over if the main (or "public") interface is down.
<br>the machine may still be operational, but the network could be down - for a number of reasons.<br><br>if we monitor both communications channels as you say, it'll never fail over because the cross-over cable<br>for the drbd data never fails.
<br><br>we currently have heartbeat monitoring the "public" ip of the other server.<br>we can't have the drbd data on the same interface - because that could end up being too much traffic<br>and would limit our bandwidth to do any real work.
<br><br>monitoring the cross-over (drbd) link is equally pointless. drbd manages that itself quite well.<br>if we disable heartbeat for drbd monitoring, then we lack the ability to umount and remount the partitions if servers fail.
<br><br>it seems that in this instance heartbeat can only be used for a full server failure (lost power).<br><br>Dan.<br><br><div><span class="gmail_quote">On 6/18/07, <b class="gmail_sendername">Lars Ellenberg</b> <<a href="mailto:lars.ellenberg@linbit.com">
lars.ellenberg@linbit.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">On Thu, Jun 14, 2007 at 01:54:58PM -0400, Dan Gahlinger wrote:
<br>> Lars,<br>><br>> we found the "fly in the ointment" . we know why it fails, but no idea how to<br>> fix it.<br>> Take this basic setup:<br>><br>> test1 and test2 running drbd and heartbeat.
<br>> drbd is on a cross-over cable between the two.<br>> heartbeat is on the public interface.<br>> test1 is primary (for the sake of argument)<br>><br><br>for heartbeat to do its heartbeats you should use every communication
<br>channel available. you should definitely use the drbd replication link<br>as heartbeat comm channel, too.<br><br>> unplug the public ethernet interface from test1.<br>> Nothing changes.<br>><br>> test2 cannot become primary. it is impossible.
<br>> test1 is already primary.<br>> drbd connection is active.<br>><br>> heartbeat attempts to run a drbddisk r0 start on test2<br>> which is physically impossible, because drbd is already running.<br>> test2 never gets the virtual ip resource (though I'm not sure why).
<br>> the debug log says "success" but it doesn't actually do it.<br>> running the command manually for the virtual ip works ok though.<br>><br>> heartbeat would need to do the following for this to work properly:
<br>> 1. don't attempt to start drbd - this will never work<br>> 2. do an unmount of the drbd filesystems on test1<br>> 3. do a drbdadm secondary on test1<br>> 4. then do a drbdadm primary all on test2<br>
><br>> I'm not even sure this is possible.<br><br>there are several options:<br> * stonith<br> when heartbeat detects one box to be dead,<br> it would switch it off using a power switch,<br> just to be sure -- because it might be not dead, after all,
<br> it may be "only" a complete loss of communications...<br><br> * when you have multiple comm channels, and want to trigger a<br> switchover of services when the outside connectivity on the active<br> node breaks, the concept of "ping nodes" or groups thereof helps.
<br><br> choose your ping nodes (and timeouts!) wisely to match your situation<br> (network and outside connectivity), ping nodes should be highly<br> available themselves (chose the upstream router/switch combo,<br>
chose the first hop of the provider network, something like that).<br> otherwise you could get spurious failover/failback.<br><br>--<br>: Lars Ellenberg Tel +43-1-8178292-0 :<br>: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
<br>: Vivenotgasse 48, A-1120 Vienna/Europe <a href="http://www.linbit.com">http://www.linbit.com</a> :<br>__<br>please use the "List-Reply" function of your email client.<br>_______________________________________________
<br>drbd-user mailing list<br><a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br><a href="http://lists.linbit.com/mailman/listinfo/drbd-user">http://lists.linbit.com/mailman/listinfo/drbd-user</a>
<br></blockquote></div><br>