[DRBD-user] determine host aliveness on disconnect

Sun Jun 10 00:02:33 CEST 2007

On Saturday 09 June 2007 12:59:08 Joost van den Broek wrote:
> Hi,
>
> I'm currently building a system with two primary nodes with ocfs2 on top
> of drbd.
[...]
> E.g. the cable on eth1 gets disconnected, both hosts will still be
> available through eth0, thus load-balancing continues to happen. Of
> course, this is very bad behaviour and both hosts will get almost
> inconsistent data immediately.
>
> I would think that there should be some ping check through the other
> interface to ensure the other host has died completely, and if it's
> still reachable, one host should get the inconsistent status (or even
> panic). Or are there other ways to do what I want?

ocfs2 will "fence" a host if it loses its connection to that host or to the 
storage. It does this in earlier versions by panicing, and I believe later 
versions contain the option of rebooting instead. The downside is, both hosts 
will probably panic. Upside is, if it's a temporary problem, rebooting might 
solve it -- the boot scripts will wait for both hosts to come up.

It looks like drbd itself likes to rely on heartbeat to handle this kind of 
situation. There are plenty of options for how to proceed when connectivity 
is restored, even to the point of panicing one host, but I don't see any 
options for what happens at the disconnect.

It does look like heartbeat could be scripted to do what you want, though, 
assuming ocfs2 doesn't just panic everything. I would hardcode one host to 
automatically assume it's the primary (and take over), and the other to 
automatically die, assuming they can still find each other. On the secondary, 
you'd do:

<insert commands to kill -9 apache or whatever. Also may want to kill anything 
you find holding the device open (fuser -m).>
ifdown eth0   # (on Debian-like systems.)
umount /dev/drbd0
drbdadm secondary r0
drbdadm invalidate r0

Because you've brought down one interface on purpose, and the other is down 
anyway, the primary node should figure out that it's alone, and could be 
configured to take over the secondary's IP address. If that does happen, you 
may not even have to reconfigure your load balancing, it'll just "load 
balance" over the same box.

Then, when connectivity is restored (via eth1), you'd just:

drbdadm primary r0
mount /dev/drbd0
ifup eth0
<insert commands to start apache or whatever>

Disclaimer: I've never used heartbeat, just a poor-man's hack with ping and 
cron. I may have no clue what I'm talking about. However, docs look pretty 
thorough over at http://www.linux-ha.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070609/bd9efe91/attachment.pgp>