[DRBD-user] DRBD: failover when sync connection dies?

Lars Ellenberg lars.ellenberg at linbit.com
Tue Dec 18 19:28:51 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Dec 18, 2007 at 11:26:16AM -0500, Greg Haase wrote:
> >> >>>My question is this:
> >> >>>How can i make one node take over all resources if local crossover
> >
> > you don't want to.
> >
> > if your lan connection dies,
> > and your lan connection was your replication link,
> > then you don't have replication anymore,
> > and so you would go online with non-current data.
> >
> > if currently your LAN connection is a direct "crossover cable",
> > why would you think any clients would benefit from failing over?
> >
> > if you change to a switched LAN, and add a ping node,
> > why do you think any clients would benefit from that?
> >
> > how can you be sure what component failed,
> >  local NIC, cables, remote NIC, switch, driver, ...?
> >
> > what problem are you trying to solve?
> >   I mean not "failing over when the LAN link dies".
> >   please zoom out a little.
> >
> > from my point of view, it makes no sense to trigger a failover
> > because the replication link dies. it would even be harmful.
> > so don't do that.
> 
> This speaks really to the question I posted earlier. I agree that you
> wouldn't want to fail-over, but... When your sync connection dies, how do
> you handle it?
> 
> How do you prevent the other node from trying to come and and creating a
> split brain situation?

use the drbd-outdate-peer handler and configure dopd.
yes, it has some issues as well, I know. we fixed some of those only
last week. as long as you don't use too many drbd, it should work
reliably enough with heartbeat 2.1.2.
make sure you configure a timeout (the default timeout is 60seconds,
which is longer than several other timeouts and causes cascading timeout
trouble), in short:
        outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; 


> How do you get alerted that the sync is broken?

nagios pages you?

> How do you recover?

fix the replication link.
reconnect drbd, if it does not do so by itself.

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list