<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

</head>

<body bgcolor="#ffffff" text="#000000">

Thank you for the answer Lars, but yesterday I solved without the

outdate peer handler but adding in my drbd.conf "after-sb-1pri

discard-secondary;" directive.<br>

This is my drbd.conf:<br>

<br>

<i>resource ovHA {<br>

    protocol      C;<br>

<br>

    startup { wfc-timeout 60; degr-wfc-timeout 120; }<br>

    disk { on-io-error detach;<br>

    }<br>

    net {<br>

    ko-count 4;<br>

    timeout     80;    # unit: 0.1 seconds<br>

    connect-int  10;    # unit: seconds<br>

    ping-int     10;    # unit: seconds<br>

    ko-count     4;<br>

    max-buffers 4096;<br>

    max-epoch-size 2048;<br>

    after-sb-0pri discard-older-primary;<br>

    <b>after-sb-1pri discard-secondary;</b><br>

    }<br>

<br>

    syncer {<br>

    rate 100M;<br>

      }<br>

<br>

     on OV-HA1 {<br>

        device      /dev/drbd0;<br>

        disk        /dev/hda2;<br>

        address     192.168.0.58:8000;<br>

        meta-disk   internal;<br>

        }<br>

<br>

      on OV-HA2 {<br>

          device      /dev/drbd0;<br>

        disk        /dev/hda2;<br>

        address     192.168.0.59:8000;<br>

        meta-disk   internal;<br>

        }<br>

    }<br>

<br>

</i>This scenario is for test purpose, in production obviously I will

have 2 ethernet :)<br>

Cheers,<br>

Matteo.<br>

<br>

Lars Ellenberg ha scritto:

<blockquote cite="mid:20071015181304.GB6091@barkeeper1.linbit"

 type="cite">

  <pre wrap="">On Mon, Oct 15, 2007 at 04:00:17PM +0200, Matteo Campana wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">Hi all,

following the example in this Florian's post: (<a

 class="moz-txt-link-freetext" href="http://fghaas.wordpress.com/2007">http://fghaas.wordpress.com/2007</a>

/10/01/an-underrated-cluster-admins-companion-dopd/) I'm testing the

outdate-peer plugin.

My scenario: two debian machines (OV-HA1 primary, OV-HA2 secondary) ,

heartbeat+drbd, 1 ethernet + 1 serial cable (the ethernet is used both for drbd

replication and to expose services).

I also know that a dedicated ethernet connections between the two nodes is

recommended for drdb data synchronization, but for testing use this is the

scenario :).

Heartbeat is configured with ipfail, so when the ethernet connection goes 

down,  heartbeat  migrate the services  to the  other node.

Obviusly in this configuration the troubles appears when I unplug the OV-HA1

(primary) link: I'm testing the outdate-peer daemon as I read on your post

because without this plugin the secondary becames primary (and this is OK) ,

but when I reconnect the ethernet the 2 nodes are "standalone" and not

re-syncronize their drbd partitions (this is the case of "drbd split brain").

Now with your post's configuration:

  &#8226; in OV-HA2's ha-log  I see this warning  WARN: check_drbd_peer: drbd peer

    OV-HA1 was not found;

  &#8226; however the plugin seems to work, because my OV-HA2 is now outdated;

  &#8226; after the log message above, I see in OV-HA2's ha-log:

    ResourceManager[6217]:  2007/10/15_14:54:47 ERROR: Return code 20 from /etc

    /ha.d/resource.d/drbddisk

    ResourceManager[6217]:  2007/10/15_14:54:47 CRIT: Giving up resources due

    to failure of drbddisk::ovHA

  &#8226; investigating the syslog I see that OV-HA2 fails to become primary         

                                                   Oct 15 14:54:47 localhost

    kernel: drbd0: State change failed: Refusing to be Primary without at least

    one UpToDate disk

    Oct 15 14:54:47 localhost kernel: drbd0:   state = { cs:WFConnection

    st:Secondary/Unknown ds:Outdated/DUnknown r--- }

    Oct 15 14:54:47 localhost kernel: drbd0:  wanted = { cs:WFConnection

    st:Primary/Unknown ds:Outdated/DUnknown r--- }

    Oct 15 14:54:47 localhost kernel: ttyS0: 1 input overrun(s)

    Oct 15 14:54:47 localhost ResourceManager[6217]: debug: /etc/ha.d/

    resource.d/drbddisk ovHA start done. RC=20

    Oct 15 14:54:47 localhost ResourceManager[6217]: ERROR: Return code 20 from

    /etc/ha.d/resource.d/drbddisk

    Oct 15 14:54:47 localhost ResourceManager[6217]: CRIT: Giving up resources

    due to failure of drbddisk::ovHA

It is correct that now in my scenario:

  &#8226; the plugin outdate the secondary when etherner fails;

  &#8226; the secondary fails to become  primary because  now it is marked as

    "outdated" :)

Is there a solution?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

very specific for exactly your scenario as I understand it:

it is called "suicide".

implementations of that can be found in e.g. OCFS2.

when you lose outside connectivity, your setup implies you lost

data-replication as well.

so you can safely comit suicide.

in the drbd outdate peer handler,

instead of trying to outdate the peer, 

shout yourself in the head.

you could also try to let heartbeat do the suicide for you,

it already has a few scenarios where it does it (e.g. repeated failed stops).

something like

 "echo 1 &gt; /proc/sys/kernel/sysrq; echo o &gt; /proc/sysrq-trigger;"

should do the trick.

but I really recommend to fix the deployment instead.

  :)

  </pre>

</blockquote>

<br>

<div class="moz-signature"><br>

</div>

<div class="moz-signature"><br>

</div>

</body>

</html>