Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 2007.12.18, at 19:28, Lars Ellenberg wrote: > On Tue, Dec 18, 2007 at 11:26:16AM -0500, Greg Haase wrote: >>>>>>> My question is this: >>>>>>> How can i make one node take over all resources if local >>>>>>> crossover >>> >>> you don't want to. >>> >>> if your lan connection dies, >>> and your lan connection was your replication link, >>> then you don't have replication anymore, >>> and so you would go online with non-current data. >>> >>> if currently your LAN connection is a direct "crossover cable", >>> why would you think any clients would benefit from failing over? >>> >>> if you change to a switched LAN, and add a ping node, >>> why do you think any clients would benefit from that? >>> >>> how can you be sure what component failed, >>> local NIC, cables, remote NIC, switch, driver, ...? >>> >>> what problem are you trying to solve? >>> I mean not "failing over when the LAN link dies". >>> please zoom out a little. >>> >>> from my point of view, it makes no sense to trigger a failover >>> because the replication link dies. it would even be harmful. >>> so don't do that. >> >> This speaks really to the question I posted earlier. I agree that you >> wouldn't want to fail-over, but... When your sync connection dies, >> how do >> you handle it? >> >> How do you prevent the other node from trying to come and and >> creating a >> split brain situation? > > use the drbd-outdate-peer handler and configure dopd. > yes, it has some issues as well, I know. we fixed some of those only > last week. as long as you don't use too many drbd, it should work > reliably enough with heartbeat 2.1.2. > make sure you configure a timeout (the default timeout is 60seconds, > which is longer than several other timeouts and causes cascading > timeout > trouble), in short: > outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; .... and have downtime for half of your services when you take problematic node offline. > > >> How do you get alerted that the sync is broken? > > nagios pages you? Nagios is cool, i use it, but probably won't help you with crossover link. Altho there is nagios-nrpe which probably with custom plugins would allow you to monitor it. In case you do write your own plugin for this, forward it to me. ;-) > >> How do you recover? > > fix the replication link. > reconnect drbd, if it does not do so by itself. Take the server offline and services that are on it with it. Fix it. Bring it back. Be quick at it tho. Try to explain to the costumer why the other node can't take over the resources even thou you sold them fail-over clustered install. > > -- > : Lars Ellenberg http://www.linbit.com : > : DRBD/HA support and consulting sales at linbit.com : > : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : > : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : > __ > please use the "List-Reply" function of your email client. > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user