<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD><TITLE></TITLE>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD>

<BODY><!-- Converted from text/plain format -->

<P><FONT size=2><FONT face="Trebuchet MS">On Tue, Apr 24, 2007 at 12:10:37PM 

+0200, Lukasz Engel wrote:<BR>&gt;&gt; I have 2 machines running drdb 0.7.23 

(self compiled) with configured<BR>&gt;&gt; 5 drdbX resources (and heartbeat 

running above), drbd uses direct<BR>&gt;&gt; cross-over cable for 

synchronization. Kernel 2.6.19.2 (vendor kernel -<BR>&gt;&gt; trustix 3) 

UP.<BR>&gt;&gt;<BR>&gt;&gt; Today I disconnected and connected direct cable and 

after that 2 of 5<BR>&gt;&gt; drbds was failing to reconnect:<BR>&gt;&gt; 

drbd0,2,4 successuly connected<BR>&gt;&gt; drbd1 on secondary blocked in 

NetworkFailure state (WFConnection on<BR>&gt;&gt; primary)<BR>&gt;&gt; drbd3 was 

retrying to reconnect, but could not succeed (always went to<BR>&gt;&gt; 

BrokenPipe after WFReportParams)<BR><BR>&gt;this should not happen.<BR>&gt;it is 

known to happen sometimes anyways.<BR>&gt;it is some sort of race 

condition.<BR>&gt;the scheme to avoid it is heavily dependend on 

timeouts.</FONT></FONT></P>

<P><FONT size=2><FONT face="Trebuchet MS">[<FONT color=#008080>Parag] Lars, we 

are also facing similar issue. Can you please explain what kind of race 

condition will cause this and what are all time-outs we need to tune to avoid 

this problem. We can not use work-around mention below since this requires 

unmounting DRBD partition.<BR></FONT><BR>&gt;&gt; drbdadm down/up for both 

failed devices helped<BR><BR>&gt;that is the recommended workaround to solve 

this behaviour.<BR><BR></FONT><FONT 

face="Trebuchet MS"></P></FONT></FONT></BODY></HTML>