<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
Thank you for the answer Lars, but yesterday I solved without the
outdate peer handler but adding in my drbd.conf "after-sb-1pri
discard-secondary;" directive.<br>
This is my drbd.conf:<br>
<br>
<i>resource ovHA {<br>
protocol C;<br>
<br>
startup { wfc-timeout 60; degr-wfc-timeout 120; }<br>
disk { on-io-error detach;<br>
}<br>
net {<br>
ko-count 4;<br>
timeout 80; # unit: 0.1 seconds<br>
connect-int 10; # unit: seconds<br>
ping-int 10; # unit: seconds<br>
ko-count 4;<br>
max-buffers 4096;<br>
max-epoch-size 2048;<br>
after-sb-0pri discard-older-primary;<br>
<b>after-sb-1pri discard-secondary;</b><br>
}<br>
<br>
syncer {<br>
rate 100M;<br>
}<br>
<br>
on OV-HA1 {<br>
device /dev/drbd0;<br>
disk /dev/hda2;<br>
address 192.168.0.58:8000;<br>
meta-disk internal;<br>
}<br>
<br>
on OV-HA2 {<br>
device /dev/drbd0;<br>
disk /dev/hda2;<br>
address 192.168.0.59:8000;<br>
meta-disk internal;<br>
}<br>
}<br>
<br>
</i>This scenario is for test purpose, in production obviously I will
have 2 ethernet :)<br>
Cheers,<br>
Matteo.<br>
<br>
Lars Ellenberg ha scritto:
<blockquote cite="mid:20071015181304.GB6091@barkeeper1.linbit"
type="cite">
<pre wrap="">On Mon, Oct 15, 2007 at 04:00:17PM +0200, Matteo Campana wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi all,
following the example in this Florian's post: (<a
class="moz-txt-link-freetext" href="http://fghaas.wordpress.com/2007">http://fghaas.wordpress.com/2007</a>
/10/01/an-underrated-cluster-admins-companion-dopd/) I'm testing the
outdate-peer plugin.
My scenario: two debian machines (OV-HA1 primary, OV-HA2 secondary) ,
heartbeat+drbd, 1 ethernet + 1 serial cable (the ethernet is used both for drbd
replication and to expose services).
I also know that a dedicated ethernet connections between the two nodes is
recommended for drdb data synchronization, but for testing use this is the
scenario :).
Heartbeat is configured with ipfail, so when the ethernet connection goes
down, heartbeat migrate the services to the other node.
Obviusly in this configuration the troubles appears when I unplug the OV-HA1
(primary) link: I'm testing the outdate-peer daemon as I read on your post
because without this plugin the secondary becames primary (and this is OK) ,
but when I reconnect the ethernet the 2 nodes are "standalone" and not
re-syncronize their drbd partitions (this is the case of "drbd split brain").
Now with your post's configuration:
• in OV-HA2's ha-log I see this warning WARN: check_drbd_peer: drbd peer
OV-HA1 was not found;
• however the plugin seems to work, because my OV-HA2 is now outdated;
• after the log message above, I see in OV-HA2's ha-log:
ResourceManager[6217]: 2007/10/15_14:54:47 ERROR: Return code 20 from /etc
/ha.d/resource.d/drbddisk
ResourceManager[6217]: 2007/10/15_14:54:47 CRIT: Giving up resources due
to failure of drbddisk::ovHA
• investigating the syslog I see that OV-HA2 fails to become primary
Oct 15 14:54:47 localhost
kernel: drbd0: State change failed: Refusing to be Primary without at least
one UpToDate disk
Oct 15 14:54:47 localhost kernel: drbd0: state = { cs:WFConnection
st:Secondary/Unknown ds:Outdated/DUnknown r--- }
Oct 15 14:54:47 localhost kernel: drbd0: wanted = { cs:WFConnection
st:Primary/Unknown ds:Outdated/DUnknown r--- }
Oct 15 14:54:47 localhost kernel: ttyS0: 1 input overrun(s)
Oct 15 14:54:47 localhost ResourceManager[6217]: debug: /etc/ha.d/
resource.d/drbddisk ovHA start done. RC=20
Oct 15 14:54:47 localhost ResourceManager[6217]: ERROR: Return code 20 from
/etc/ha.d/resource.d/drbddisk
Oct 15 14:54:47 localhost ResourceManager[6217]: CRIT: Giving up resources
due to failure of drbddisk::ovHA
It is correct that now in my scenario:
• the plugin outdate the secondary when etherner fails;
• the secondary fails to become primary because now it is marked as
"outdated" :)
Is there a solution?
</pre>
</blockquote>
<pre wrap=""><!---->
very specific for exactly your scenario as I understand it:
it is called "suicide".
implementations of that can be found in e.g. OCFS2.
when you lose outside connectivity, your setup implies you lost
data-replication as well.
so you can safely comit suicide.
in the drbd outdate peer handler,
instead of trying to outdate the peer,
shout yourself in the head.
you could also try to let heartbeat do the suicide for you,
it already has a few scenarios where it does it (e.g. repeated failed stops).
something like
"echo 1 > /proc/sys/kernel/sysrq; echo o > /proc/sysrq-trigger;"
should do the trick.
but I really recommend to fix the deployment instead.
:)
</pre>
</blockquote>
<br>
<div class="moz-signature"><br>
</div>
<div class="moz-signature"><br>
</div>
</body>
</html>