Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Apr 18, 2012 at 07:55:32PM +0200, aluno3 at poczta.onet.pl wrote: > Hello > > We are testing DOPD mechanism and reviewing source of the dopd file > (http://hg.linux-ha.org/lha-2.1/file/1d5b54f0a2e0/contrib/drbd-outdate-peer/dopd.c). Would you please use heartbeat 3 (and pacemaker, unless you use the haresource mode of heartbeat) When using pacemaker, use the drbd crm-fence-peer.sh. It covers all the cases dopd would cover, and in fact even a couple more corner cases in multiple failure scenarios. > > Is it ok that in function check_drbd_peer, during loop, at the > beginning is checking status of the node and in case if node is dead > then function is finishing with returning FALSE even if node is ping > node? Next part of the code checks if node is 'normal' node, but it > is to late. Then I guess we have to fix that. > In case when you have: > -configured ping node, > -timeouts: ping-int 10, deadping 10, deadtime 30 > > and link from replication, ping node down, dopd starts working. Function > check_drbd_peer checks if status of the node is dead (ping node is > dead, remote/normal node is ok) and if yes, ends with returning > FALSE and does not mark remote volumes as outdated with using other > auxiliary path. Unfortunately during test such problem occurred. > > We know that DRBD timeouts have to be lower then heartbeat timeouts, but > in case when dopd has to mark a lot of remote resources, it cannot do > that in time. It is easy to race. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed