[DRBD-user] Re: [0.7.23] reconnect problem after link loss

Thu Apr 26 06:55:29 CEST 2007

On Tue, Apr 24, 2007 at 12:10:37PM +0200, Lukasz Engel wrote:
>> I have 2 machines running drdb 0.7.23 (self compiled) with configured
>> 5 drdbX resources (and heartbeat running above), drbd uses direct
>> cross-over cable for synchronization. Kernel 2.6.19.2 (vendor kernel
-
>> trustix 3) UP.
>>
>> Today I disconnected and connected direct cable and after that 2 of 5
>> drbds was failing to reconnect:
>> drbd0,2,4 successuly connected
>> drbd1 on secondary blocked in NetworkFailure state (WFConnection on
>> primary)
>> drbd3 was retrying to reconnect, but could not succeed (always went
to
>> BrokenPipe after WFReportParams)

>this should not happen.
>it is known to happen sometimes anyways.
>it is some sort of race condition.
>the scheme to avoid it is heavily dependend on timeouts.

[Parag] Lars, we are also facing similar issue. Can you please explain
what kind of race condition will cause this and what are all time-outs
we need to tune to avoid this problem. We can not use work-around
mention below since this requires unmounting DRBD partition.

>> drbdadm down/up for both failed devices helped

>that is the recommended workaround to solve this behaviour.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070426/83bc4856/attachment.htm>