<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE></TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD>
<BODY><!-- Converted from text/plain format -->
<P><FONT size=2><FONT face="Trebuchet MS">On Tue, Apr 24, 2007 at 12:10:37PM
+0200, Lukasz Engel wrote:<BR>>> I have 2 machines running drdb 0.7.23
(self compiled) with configured<BR>>> 5 drdbX resources (and heartbeat
running above), drbd uses direct<BR>>> cross-over cable for
synchronization. Kernel 2.6.19.2 (vendor kernel -<BR>>> trustix 3)
UP.<BR>>><BR>>> Today I disconnected and connected direct cable and
after that 2 of 5<BR>>> drbds was failing to reconnect:<BR>>>
drbd0,2,4 successuly connected<BR>>> drbd1 on secondary blocked in
NetworkFailure state (WFConnection on<BR>>> primary)<BR>>> drbd3 was
retrying to reconnect, but could not succeed (always went to<BR>>>
BrokenPipe after WFReportParams)<BR><BR>>this should not happen.<BR>>it is
known to happen sometimes anyways.<BR>>it is some sort of race
condition.<BR>>the scheme to avoid it is heavily dependend on
timeouts.</FONT></FONT></P>
<P><FONT size=2><FONT face="Trebuchet MS">[<FONT color=#008080>Parag] Lars, we
are also facing similar issue. Can you please explain what kind of race
condition will cause this and what are all time-outs we need to tune to avoid
this problem. We can not use work-around mention below since this requires
unmounting DRBD partition.<BR></FONT><BR>>> drbdadm down/up for both
failed devices helped<BR><BR>>that is the recommended workaround to solve
this behaviour.<BR><BR></FONT><FONT
face="Trebuchet MS"></P></FONT></FONT></BODY></HTML>