[DRBD-user] DOPD problem and Heartbeat

aluno3 at poczta.onet.pl aluno3 at poczta.onet.pl
Wed Apr 18 19:55:32 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello

We are testing DOPD mechanism and reviewing source of the dopd file
(http://hg.linux-ha.org/lha-2.1/file/1d5b54f0a2e0/contrib/drbd-outdate-peer/dopd.c).

Is it ok that in function check_drbd_peer, during loop, at the beginning 
is checking status of the node and in case if node is dead then function 
is finishing with returning FALSE even if node is ping node? Next part 
of the code checks if node is 'normal' node, but it is to late.

In case when you have:
-configured ping node,
-timeouts: ping-int 10, deadping 10, deadtime 30

and link from replication, ping node down, dopd starts working. Function
check_drbd_peer checks if status of the node is dead (ping node is dead, 
remote/normal node is ok) and if yes, ends with returning FALSE and does 
not mark remote volumes as outdated with using other auxiliary path. 
Unfortunately during test such problem occurred.

We know that DRBD timeouts have to be lower then heartbeat timeouts, but
in case when dopd has to mark a lot of remote resources, it cannot do
that in time. It is easy to race.



More information about the drbd-user mailing list