Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> On Wed, Apr 18, 2012 at 07:55:32PM +0200, aluno3 at poczta.onet.pl wrote: > > Hello > > > > We are testing DOPD mechanism and reviewing source of the dopd file > > (http://hg.linux-ha.org/lha-2.1/file/1d5b54f0a2e0/contrib/drbd-outdate-peer/dopd.c). > > Would you please use heartbeat 3 > (and pacemaker, unless you use the haresource mode of heartbeat) > > When using pacemaker, use the drbd crm-fence-peer.sh. > It covers all the cases dopd would cover, and in fact even a couple more > corner cases in multiple failure scenarios. We would like to use heartbeat 3 with newer crm but our front end is not adapted yet... > > > > > Is it ok that in function check_drbd_peer, during loop, at the > > beginning is checking status of the node and in case if node is dead > > then function is finishing with returning FALSE even if node is ping > > node? Next part of the code checks if node is 'normal' node, but it > > is to late. > > Then I guess we have to fix that. Maybe fix should look like: --- ./heartbeat/contrib/drbd-outdate-peer/dopd.c 2008-08-18 14:32:19.000000000 +0200 +++ ./heartbeat-dopdfix/contrib/drbd-outdate-peer/dopd.c 2012-04-18 20:10:41.000000000 +0200 @@ -226,7 +226,7 @@ check_drbd_peer(const char *drbd_peer) } while((node = dopd_cluster_conn->llc_ops->nextnode(dopd_cluster_conn)) != NULL) { const char *status = dopd_cluster_conn->llc_ops->node_status(dopd_cluster_conn, node); - if (!strcmp(status, "dead")) { + if (!strcmp(status, "dead") && !strcmp("normal", dopd_cluster_conn->llc_ops->node_type(dopd_cluster_conn, node))) { cl_log(LOG_WARNING, "Cluster node: %s: status: %s", node, status); return FALSE; > > > In case when you have: > > -configured ping node, > > -timeouts: ping-int 10, deadping 10, deadtime 30 > > > > and link from replication, ping node down, dopd starts working. Function > > check_drbd_peer checks if status of the node is dead (ping node is > > dead, remote/normal node is ok) and if yes, ends with returning > > FALSE and does not mark remote volumes as outdated with using other > > auxiliary path. Unfortunately during test such problem occurred. > > > > We know that DRBD timeouts have to be lower then heartbeat timeouts, but > > in case when dopd has to mark a lot of remote resources, it cannot do > > that in time. It is easy to race. > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > __ > please don't Cc me, but send to list -- I'm subscribed > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user >