[DRBD-user] drbd peer outdated plugin

Lars Ellenberg lars.ellenberg at linbit.com
Mon Oct 15 20:13:04 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Oct 15, 2007 at 04:00:17PM +0200, Matteo Campana wrote:
> Hi all,
> 
> following the example in this Florian's post: (http://fghaas.wordpress.com/2007
> /10/01/an-underrated-cluster-admins-companion-dopd/) I'm testing the
> outdate-peer plugin.
> 
> My scenario: two debian machines (OV-HA1 primary, OV-HA2 secondary) ,
> heartbeat+drbd, 1 ethernet + 1 serial cable (the ethernet is used both for drbd
> replication and to expose services).
> I also know that a dedicated ethernet connections between the two nodes is
> recommended for drdb data synchronization, but for testing use this is the
> scenario :).
> Heartbeat is configured with ipfail, so when the ethernet connection goes 
> down,  heartbeat  migrate the services  to the  other node.
> 
> Obviusly in this configuration the troubles appears when I unplug the OV-HA1
> (primary) link: I'm testing the outdate-peer daemon as I read on your post
> because without this plugin the secondary becames primary (and this is OK) ,
> but when I reconnect the ethernet the 2 nodes are "standalone" and not
> re-syncronize their drbd partitions (this is the case of "drbd split brain").
> Now with your post's configuration:
> 
>   • in OV-HA2's ha-log  I see this warning  WARN: check_drbd_peer: drbd peer
>     OV-HA1 was not found;
>   • however the plugin seems to work, because my OV-HA2 is now outdated;
>   • after the log message above, I see in OV-HA2's ha-log:
>     ResourceManager[6217]:  2007/10/15_14:54:47 ERROR: Return code 20 from /etc
>     /ha.d/resource.d/drbddisk
>     ResourceManager[6217]:  2007/10/15_14:54:47 CRIT: Giving up resources due
>     to failure of drbddisk::ovHA
>   • investigating the syslog I see that OV-HA2 fails to become primary         
>                                                                                
>                                                    Oct 15 14:54:47 localhost
>     kernel: drbd0: State change failed: Refusing to be Primary without at least
>     one UpToDate disk
>     Oct 15 14:54:47 localhost kernel: drbd0:   state = { cs:WFConnection
>     st:Secondary/Unknown ds:Outdated/DUnknown r--- }
>     Oct 15 14:54:47 localhost kernel: drbd0:  wanted = { cs:WFConnection
>     st:Primary/Unknown ds:Outdated/DUnknown r--- }
>     Oct 15 14:54:47 localhost kernel: ttyS0: 1 input overrun(s)
>     Oct 15 14:54:47 localhost ResourceManager[6217]: debug: /etc/ha.d/
>     resource.d/drbddisk ovHA start done. RC=20
>     Oct 15 14:54:47 localhost ResourceManager[6217]: ERROR: Return code 20 from
>     /etc/ha.d/resource.d/drbddisk
>     Oct 15 14:54:47 localhost ResourceManager[6217]: CRIT: Giving up resources
>     due to failure of drbddisk::ovHA
> 
> It is correct that now in my scenario:
> 
>   • the plugin outdate the secondary when etherner fails;
>   • the secondary fails to become  primary because  now it is marked as
>     "outdated" :)
> 
> 
> Is there a solution?


very specific for exactly your scenario as I understand it:
it is called "suicide".
implementations of that can be found in e.g. OCFS2.
when you lose outside connectivity, your setup implies you lost
data-replication as well.
so you can safely comit suicide.

in the drbd outdate peer handler,
instead of trying to outdate the peer, 
shout yourself in the head.

you could also try to let heartbeat do the suicide for you,
it already has a few scenarios where it does it (e.g. repeated failed stops).

something like
 "echo 1 > /proc/sys/kernel/sysrq; echo o > /proc/sysrq-trigger;"
should do the trick.


but I really recommend to fix the deployment instead.

  :)

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list