[DRBD-user] DOPD problem and Heartbeat

Wed Apr 18 21:00:43 CEST 2012

On Wed, Apr 18, 2012 at 07:55:32PM +0200, aluno3 at poczta.onet.pl wrote:
> Hello
> 
> We are testing DOPD mechanism and reviewing source of the dopd file
> (http://hg.linux-ha.org/lha-2.1/file/1d5b54f0a2e0/contrib/drbd-outdate-peer/dopd.c).

Would you please use heartbeat 3
(and pacemaker, unless you use the haresource mode of heartbeat)

When using pacemaker, use the drbd crm-fence-peer.sh.
It covers all the cases dopd would cover, and in fact even a couple more
corner cases in multiple failure scenarios.

> 
> Is it ok that in function check_drbd_peer, during loop, at the
> beginning is checking status of the node and in case if node is dead
> then function is finishing with returning FALSE even if node is ping
> node? Next part of the code checks if node is 'normal' node, but it
> is to late.

Then I guess we have to fix that.

> In case when you have:
> -configured ping node,
> -timeouts: ping-int 10, deadping 10, deadtime 30
> 
> and link from replication, ping node down, dopd starts working. Function
> check_drbd_peer checks if status of the node is dead (ping node is
> dead, remote/normal node is ok) and if yes, ends with returning
> FALSE and does not mark remote volumes as outdated with using other
> auxiliary path. Unfortunately during test such problem occurred.
> 
> We know that DRBD timeouts have to be lower then heartbeat timeouts, but
> in case when dopd has to mark a lot of remote resources, it cannot do
> that in time. It is easy to race.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed