[DRBD-user] determine host aliveness on disconnect
David Masover
ninja at slaphack.com
Sun Jun 10 00:02:33 CEST 2007
On Saturday 09 June 2007 12:59:08 Joost van den Broek wrote:
> Hi,
>
> I'm currently building a system with two primary nodes with ocfs2 on top
> of drbd.
[...]
> E.g. the cable on eth1 gets disconnected, both hosts will still be
> available through eth0, thus load-balancing continues to happen. Of
> course, this is very bad behaviour and both hosts will get almost
> inconsistent data immediately.
>
> I would think that there should be some ping check through the other
> interface to ensure the other host has died completely, and if it's
> still reachable, one host should get the inconsistent status (or even
> panic). Or are there other ways to do what I want?
ocfs2 will "fence" a host if it loses its connection to that host or to the
storage. It does this in earlier versions by panicing, and I believe later
versions contain the option of rebooting instead. The downside is, both hosts
will probably panic. Upside is, if it's a temporary problem, rebooting might
solve it -- the boot scripts will wait for both hosts to come up.
It looks like drbd itself likes to rely on heartbeat to handle this kind of
situation. There are plenty of options for how to proceed when connectivity
is restored, even to the point of panicing one host, but I don't see any
options for what happens at the disconnect.
It does look like heartbeat could be scripted to do what you want, though,
assuming ocfs2 doesn't just panic everything. I would hardcode one host to
automatically assume it's the primary (and take over), and the other to
automatically die, assuming they can still find each other. On the secondary,
you'd do:
<insert commands to kill -9 apache or whatever. Also may want to kill anything
you find holding the device open (fuser -m).>
ifdown eth0 # (on Debian-like systems.)
umount /dev/drbd0
drbdadm secondary r0
drbdadm invalidate r0
Because you've brought down one interface on purpose, and the other is down
anyway, the primary node should figure out that it's alone, and could be
configured to take over the secondary's IP address. If that does happen, you
may not even have to reconfigure your load balancing, it'll just "load
balance" over the same box.
Then, when connectivity is restored (via eth1), you'd just:
drbdadm primary r0
mount /dev/drbd0
ifup eth0
<insert commands to start apache or whatever>
Disclaimer: I've never used heartbeat, just a poor-man's hack with ping and
cron. I may have no clue what I'm talking about. However, docs look pretty
thorough over at http://www.linux-ha.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: not available
Url : http://lists.linbit.com/pipermail/drbd-user/attachments/20070609/bd9efe91/attachment.pgp
More information about the drbd-user
mailing list