[DRBD-user] [0.7.23] reconnect problem after link loss

Lars Ellenberg lars.ellenberg at linbit.com
Tue Apr 24 14:21:13 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Apr 24, 2007 at 02:14:58PM +0200, Lukasz Engel wrote:
> >
> >On Tue, Apr 24, 2007 at 12:10:37PM +0200, Lukasz Engel wrote:
> >  
> >>I have 2 machines running drdb 0.7.23 (self compiled) with configured 5 
> >>drdbX resources (and heartbeat running above),
> >>drbd uses direct cross-over cable for synchronization. Kernel 2.6.19.2 
> >>(vendor kernel - trustix 3) UP.
> >>
> >>Today I disconnected and connected direct cable and after that 2 of 5 
> >>drbds was failing to reconnect:
> >>drbd0,2,4 successuly connected
> >>drbd1 on secondary blocked in NetworkFailure state (WFConnection on 
> >>primary)
> >>drbd3 was retrying to reconnect, but could not succeed (always went to 
> >>BrokenPipe after WFReportParams)
> >>    
> >
> >this should not happen.
> >it is known to happen sometimes anyways.
> >it is some sort of race condition.
> >  
> >the scheme to avoid it is heavily dependend on timeouts.
> >  
> Any chances for fix ?
> (If it help I should be able to disconnect my drbd link sometimes to 
> make some test...)

I remembered similar symptoms from a long time ago,
when we spend a long time to debug this.
We thought we had fixed it.
You see the same symptoms again.
It may be a different problem, it may be out "fix" back then
only mad it less likely to occur.

Since I can not reproduce it, I can not debug it.
If you can track down _why_ it happens, great.
I'm happy to fix it then.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list