[DRBD-user] determine host aliveness on disconnect

Wed Jun 13 12:27:01 CEST 2007

David Masover wrote:
> On Saturday 09 June 2007 12:59:08 Joost van den Broek wrote:
>> Hi,
>>
>> I'm currently building a system with two primary nodes with ocfs2 on top
>> of drbd.
> [...]
>> E.g. the cable on eth1 gets disconnected, both hosts will still be
>> available through eth0, thus load-balancing continues to happen. Of
>> course, this is very bad behaviour and both hosts will get almost
>> inconsistent data immediately.
>>
>> I would think that there should be some ping check through the other
>> interface to ensure the other host has died completely, and if it's
>> still reachable, one host should get the inconsistent status (or even
>> panic). Or are there other ways to do what I want?
> 
> ocfs2 will "fence" a host if it loses its connection to that host or to the 
> storage. It does this in earlier versions by panicing, and I believe later 
> versions contain the option of rebooting instead. The downside is, both hosts 
> will probably panic. Upside is, if it's a temporary problem, rebooting might 
> solve it -- the boot scripts will wait for both hosts to come up.
> 
> It looks like drbd itself likes to rely on heartbeat to handle this kind of 
> situation. There are plenty of options for how to proceed when connectivity 
> is restored, even to the point of panicing one host, but I don't see any 
> options for what happens at the disconnect.
> 
> It does look like heartbeat could be scripted to do what you want, though, 
> assuming ocfs2 doesn't just panic everything. I would hardcode one host to 
> automatically assume it's the primary (and take over), and the other to 
> automatically die, assuming they can still find each other. On the secondary, 
> you'd do:
> 
> <insert commands to kill -9 apache or whatever. Also may want to kill anything 
> you find holding the device open (fuser -m).>
> ifdown eth0   # (on Debian-like systems.)
> umount /dev/drbd0
> drbdadm secondary r0
> drbdadm invalidate r0
> 
> Because you've brought down one interface on purpose, and the other is down 
> anyway, the primary node should figure out that it's alone, and could be 
> configured to take over the secondary's IP address. If that does happen, you 
> may not even have to reconfigure your load balancing, it'll just "load 
> balance" over the same box.
> 
> Then, when connectivity is restored (via eth1), you'd just:
> 
> drbdadm primary r0
> mount /dev/drbd0
> ifup eth0
> <insert commands to start apache or whatever>
> 
> Disclaimer: I've never used heartbeat, just a poor-man's hack with ping and 
> cron. I may have no clue what I'm talking about. However, docs look pretty 
> thorough over at http://www.linux-ha.org/

Thanks for your response. I know such things can be done with heartbeat,
but I can't figure out how to let it react on just one interface going
down instead of the whole network connection. It's possible to ucast or
mcast on multiple interfaces with heartbeat, but afaik there is no such
thing as "if u/mcast on eth1 does not respond, but u/mcast on eth0 does,
then things need to be done". That's why I was wondering if this
couldn't be handled internally by drbd, that would make things much easier.

Joost