[Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver

Philipp Reisner philipp.reisner at linbit.com
Tue Nov 27 11:36:12 CET 2007


On Monday 19 November 2007 00:11:36 Montrose, Ernest wrote:
> Hi all,
> There is problem that manifest itself this way:
>
> Consider 2 nodes A and B,  "A" issues a disconnect to r2, B gets into
> drbd_receiver.c: drbd_disconnect().  While B is disconnecting, it gets a
> "disconnect" request for r2.  This hangs the receiver.
>
> I am thinking that we should just not allow the state transition to
> "disconnecting" if we are already doing so. We could redefine "Standalone"
> to mean less then or equal to "TearDown" in some cases.  I include a patch
> to show this.
>

Hi Ernest,

I tried hard to reproduce/understand this. I tried with various 
instrumentations but I can not reproduce this. 

I assumed that it "hangs" in the drbd_state_lock() function, but
I could not find it by experiment nor by drawing timing diagrams.

Could you provide some LOGs of this event ?

Thanks!

The best I get:

Node1:
[42951592.560000] drbd0: state_locked
[42951592.560000] drbd0: state_unlocked
[42951592.560000] drbd0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
[42951592.560000] drbd0: state_locked
[42951592.560000] drbd0: state_unlocked
[42951592.560000] drbd0: Writing meta data super block now.
[42951592.560000] drbd0: sock was shut down by peer
[42951592.560000] drbd0: short read expecting header on sock: r=0
[42951592.560000] drbd0: sock_recvmsg returned -104
[42951592.560000] drbd0: asender terminated
[42951592.560000] drbd0: tl_clear()
[42951592.560000] drbd0: Connection closed
[42951592.560000] drbd0: conn( Disconnecting -> StandAlone )
[42951592.560000] drbd0: receiver terminated

Node2:
[42951603.570000] drbd0: state_locked
[42951603.570000] drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
[42951603.570000] drbd0: Writing meta data super block now.
[42951603.570000] drbd0: state_unlocked
[42951603.570000] drbd0: conn( TearDown -> Disconnecting )
[42951603.570000] drbd0: asender terminated
[42951603.570000] drbd0: tl_clear()
[42951603.570000] drbd0: Connection closed
[42951603.570000] drbd0: conn( Disconnecting -> StandAlone )
[42951603.570000] drbd0: receiver terminated

Of course the state transition TearDown -> Disconnecting is not right/fine, but
I can not reproduce a hang of the receiver...

-phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :


More information about the drbd-dev mailing list