Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Apr 24, 2007 at 02:14:58PM +0200, Lukasz Engel wrote: > > > >On Tue, Apr 24, 2007 at 12:10:37PM +0200, Lukasz Engel wrote: > > > >>I have 2 machines running drdb 0.7.23 (self compiled) with configured 5 > >>drdbX resources (and heartbeat running above), > >>drbd uses direct cross-over cable for synchronization. Kernel 2.6.19.2 > >>(vendor kernel - trustix 3) UP. > >> > >>Today I disconnected and connected direct cable and after that 2 of 5 > >>drbds was failing to reconnect: > >>drbd0,2,4 successuly connected > >>drbd1 on secondary blocked in NetworkFailure state (WFConnection on > >>primary) > >>drbd3 was retrying to reconnect, but could not succeed (always went to > >>BrokenPipe after WFReportParams) > >> > > > >this should not happen. > >it is known to happen sometimes anyways. > >it is some sort of race condition. > > > >the scheme to avoid it is heavily dependend on timeouts. > > > Any chances for fix ? > (If it help I should be able to disconnect my drbd link sometimes to > make some test...) I remembered similar symptoms from a long time ago, when we spend a long time to debug this. We thought we had fixed it. You see the same symptoms again. It may be a different problem, it may be out "fix" back then only mad it less likely to occur. Since I can not reproduce it, I can not debug it. If you can track down _why_ it happens, great. I'm happy to fix it then. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.