Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello We are testing DOPD mechanism and reviewing source of the dopd file (http://hg.linux-ha.org/lha-2.1/file/1d5b54f0a2e0/contrib/drbd-outdate-peer/dopd.c). Is it ok that in function check_drbd_peer, during loop, at the beginning is checking status of the node and in case if node is dead then function is finishing with returning FALSE even if node is ping node? Next part of the code checks if node is 'normal' node, but it is to late. In case when you have: -configured ping node, -timeouts: ping-int 10, deadping 10, deadtime 30 and link from replication, ping node down, dopd starts working. Function check_drbd_peer checks if status of the node is dead (ping node is dead, remote/normal node is ok) and if yes, ends with returning FALSE and does not mark remote volumes as outdated with using other auxiliary path. Unfortunately during test such problem occurred. We know that DRBD timeouts have to be lower then heartbeat timeouts, but in case when dopd has to mark a lot of remote resources, it cannot do that in time. It is easy to race.