[DRBD-user] DRBD: failover when sync connection dies?

Martin Gombac martin at isg.si
Wed Dec 19 16:05:04 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 2007.12.18, at 19:28, Lars Ellenberg wrote:

> On Tue, Dec 18, 2007 at 11:26:16AM -0500, Greg Haase wrote:
>>>>>>> My question is this:
>>>>>>> How can i make one node take over all resources if local  
>>>>>>> crossover
>>>
>>> you don't want to.
>>>
>>> if your lan connection dies,
>>> and your lan connection was your replication link,
>>> then you don't have replication anymore,
>>> and so you would go online with non-current data.
>>>
>>> if currently your LAN connection is a direct "crossover cable",
>>> why would you think any clients would benefit from failing over?
>>>
>>> if you change to a switched LAN, and add a ping node,
>>> why do you think any clients would benefit from that?
>>>
>>> how can you be sure what component failed,
>>>  local NIC, cables, remote NIC, switch, driver, ...?
>>>
>>> what problem are you trying to solve?
>>>   I mean not "failing over when the LAN link dies".
>>>   please zoom out a little.
>>>
>>> from my point of view, it makes no sense to trigger a failover
>>> because the replication link dies. it would even be harmful.
>>> so don't do that.
>>
>> This speaks really to the question I posted earlier. I agree that you
>> wouldn't want to fail-over, but... When your sync connection dies,  
>> how do
>> you handle it?
>>
>> How do you prevent the other node from trying to come and and  
>> creating a
>> split brain situation?
>
> use the drbd-outdate-peer handler and configure dopd.
> yes, it has some issues as well, I know. we fixed some of those only
> last week. as long as you don't use too many drbd, it should work
> reliably enough with heartbeat 2.1.2.
> make sure you configure a timeout (the default timeout is 60seconds,
> which is longer than several other timeouts and causes cascading  
> timeout
> trouble), in short:
>         outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
.... and have downtime for half of your services when you take  
problematic node offline.

>
>
>> How do you get alerted that the sync is broken?
>
> nagios pages you?
Nagios is cool, i use it, but probably won't help you with crossover  
link. Altho there is nagios-nrpe which probably with custom plugins  
would allow you to monitor it. In case you do write your own plugin  
for this, forward it to me. ;-)

>
>> How do you recover?
>
> fix the replication link.
> reconnect drbd, if it does not do so by itself.
Take the server offline and services that are on it with it. Fix it.  
Bring it back. Be quick at it tho. Try to explain to the costumer why  
the other node can't take over the resources even thou you sold them  
fail-over clustered install.


>
> -- 
> : Lars Ellenberg                           http://www.linbit.com :
> : DRBD/HA support and consulting             sales at linbit.com :
> : LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
> : Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
> __
> please use the "List-Reply" function of your email client.
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list