[Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver

Montrose, Ernest Ernest.Montrose at stratus.com
Tue Nov 27 16:06:25 CET 2007


Phil,
Interesting...With a delay at the end of drbd_disconnect() it happens
every time for me.  What I did is that I delay for 30 seconds and
quickly issue the disconnect in that window.
I added this at the very end of drbd_disconnect:
 if(os.conn == TearDown && ns.conn == Unconnected && mdev->minor ==11)
{
 INFO("drbd_disconnect: ##5# EM-- Done but waiting 30 seconds######\n");
 set_current_state(TASK_INTERRUPTIBLE);
 schedule_timeout(HZ * 30);
 INFO("drbd_disconnect: ##5# EM-- Done ##### waiting 30
seconds######\n");
}

Notice mdev->minor == 11..you can change the 11 to some other device
that you are doing the disconnect on.  Once you see the message "done
waiting" then you'd issue the local disconnect.  Put the instrumented
driver on one side (The side that will do the last disconnect)

BTW, I agree that your spin on the patch is less intrusive.  I will test
that and let you know.

EM--

-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner at linbit.com] 
Sent: Tuesday, November 27, 2007 9:53 AM
To: drbd-dev at linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: disconnecting while already disconnecting
can hang the receiver

On Tuesday 27 November 2007 14:06:46 Montrose, Ernest wrote:
> Phil,
> I looked at my notes...To reproduce this you can fake the condition
this
> way:
> * Issue a disconnect on node0 for r5.
> * Locally on node1 we will get into drbd_receiver.c:drbd_disconnect()
> and while there in drbd_disconnect() (Put a small delay there or
> something); issue a "drbdsetup /dev/drbd5 disconnect".
>
> This last drbdsetup will time out with " No response from the DRBD
> driver! Is the module loaded?"
> But the driver will be waiting forever in
> drbd_nl.c:drbd_nl_disconnect().
>

Yes. This is what I tested. I had a delay in drbd_disconenct(). 
I did not managed to get it into troubles.

BTW, while looking at the patch, I would have done it like this:

@@ -589,7 +589,8 @@ STATIC int is_valid_state_transition(drbd_dev* 
mdev,drbd_state_t ns,drbd_state_t
        if( (ns.conn == StartingSyncT || ns.conn == StartingSyncS ) &&
            os.conn > Connected) rv=SS_ResyncRunning;

-       if( ns.conn == Disconnecting && os.conn == StandAlone)
+       if ( ns.conn == Disconnecting &&
+            ( os.conn == StandAlone || os.conn == TearDown ) )
                rv=SS_AlreadyStandAlone;

        if( ns.disk > Attaching && os.disk == Diskless)

-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :


More information about the drbd-dev mailing list