Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Apr 18, 2012 at 09:41:53PM +0200, aluno3 wrote: > > On Wed, Apr 18, 2012 at 07:55:32PM +0200, aluno3 at poczta.onet.pl wrote: > > > Hello > > > > > > We are testing DOPD mechanism and reviewing source of the dopd file > > > (http://hg.linux-ha.org/lha-2.1/file/1d5b54f0a2e0/contrib/drbd-outdate-peer/dopd.c). > > > > Would you please use heartbeat 3 > > (and pacemaker, unless you use the haresource mode of heartbeat) > > > > When using pacemaker, use the drbd crm-fence-peer.sh. > > It covers all the cases dopd would cover, and in fact even a couple more > > corner cases in multiple failure scenarios. > > We would like to use heartbeat 3 with newer crm but our front end is not adapted yet... > > > > > > > > > Is it ok that in function check_drbd_peer, during loop, at the > > > beginning is checking status of the node and in case if node is dead > > > then function is finishing with returning FALSE even if node is ping > > > node? Next part of the code checks if node is 'normal' node, but it > > > is to late. > > > > Then I guess we have to fix that. > > Maybe fix should look like: > > --- ./heartbeat/contrib/drbd-outdate-peer/dopd.c 2008-08-18 14:32:19.000000000 +0200 > +++ ./heartbeat-dopdfix/contrib/drbd-outdate-peer/dopd.c 2012-04-18 20:10:41.000000000 +0200 > @@ -226,7 +226,7 @@ check_drbd_peer(const char *drbd_peer) > } > while((node = dopd_cluster_conn->llc_ops->nextnode(dopd_cluster_conn)) != NULL) { > const char *status = dopd_cluster_conn->llc_ops->node_status(dopd_cluster_conn, node); > - if (!strcmp(status, "dead")) { > + if (!strcmp(status, "dead") && !strcmp("normal", dopd_cluster_conn->llc_ops->node_type(dopd_cluster_conn, node))) { > cl_log(LOG_WARNING, "Cluster node: %s: status: %s", > node, status); > return FALSE; I'd say, it should rather look like (against heartbeat 3 source, so it may or may not directly apply on your tree; probably best to just copy over all of contrib/drbd-outdate-peer from 3): diff --git a/contrib/drbd-outdate-peer/dopd.c b/contrib/drbd-outdate-peer/dopd.c --- a/contrib/drbd-outdate-peer/dopd.c +++ b/contrib/drbd-outdate-peer/dopd.c @@ -226,19 +226,26 @@ check_drbd_peer(const char *drbd_peer) } while((node = dopd_cluster_conn->llc_ops->nextnode(dopd_cluster_conn)) != NULL) { const char *status = dopd_cluster_conn->llc_ops->node_status(dopd_cluster_conn, node); + + /* Look for the peer */ + if (strcasecmp(node, drbd_peer)) + continue; + + if (strcmp("normal", dopd_cluster_conn->llc_ops->node_type(dopd_cluster_conn, node))) { + cl_log(LOG_WARNING, "Cluster node: %s: status: %s is not a normal node", + node, status); + break; + } + if (!strcmp(status, "dead")) { cl_log(LOG_WARNING, "Cluster node: %s: status: %s", node, status); - return FALSE; + break; } - /* Look for the peer */ - if (!strcmp("normal", dopd_cluster_conn->llc_ops->node_type(dopd_cluster_conn, node)) - && !strcasecmp(node, drbd_peer)) { - cl_log(LOG_DEBUG, "node %s found\n", node); - found = TRUE; - break; - } + cl_log(LOG_DEBUG, "node %s found with status %s\n", node, status); + found = TRUE; + break; } if (dopd_cluster_conn->llc_ops->end_nodewalk(dopd_cluster_conn) != HA_OK) { cl_log(LOG_INFO, "Cannot end node walk"); Not even compile tested, but I think this is what it should look like. > > > In case when you have: > > > -configured ping node, > > > -timeouts: ping-int 10, deadping 10, deadtime 30 > > > > > > and link from replication, ping node down, dopd starts working. Function > > > check_drbd_peer checks if status of the node is dead (ping node is > > > dead, remote/normal node is ok) and if yes, ends with returning > > > FALSE and does not mark remote volumes as outdated with using other > > > auxiliary path. Unfortunately during test such problem occurred. > > > > > > We know that DRBD timeouts have to be lower then heartbeat timeouts, but > > > in case when dopd has to mark a lot of remote resources, it cannot do > > > that in time. It is easy to race. > > > > -- > > : Lars Ellenberg > > : LINBIT | Your Way to High Availability > > : DRBD/HA support and consulting http://www.linbit.com > > > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > > __ > > please don't Cc me, but send to list -- I'm subscribed > > _______________________________________________ > > drbd-user mailing list > > drbd-user at lists.linbit.com > > http://lists.linbit.com/mailman/listinfo/drbd-user > > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed