<div dir="ltr"><div><font face="Courier New, Courier, monospace">Hi List,<br>

          <br>

        </font></div>

      <font face="Courier New, Courier, monospace">I am using DRBD 8.3.6

        along with Linux Kernel Version 2.6.32, in my environment i have

        used an iSCSI device(ext3) on my secondary as the backup device.

        When i run a test-case which does a synchronous writes on

        primary mounted partition(ext3), At the same time if the network

        is down on iSCSI Host i experience a hang on primary for a span

        of ~120 seconds.<br>

        <br>

      </font>

    <font face="Courier New, Courier, monospace">Testcase on Primary:</font><br>

    <pre><font face="Courier New, Courier, monospace">&quot;

while true; do date | tee -a /mnt/drbd1/c.dat; echo -n A ; sync ; echo -n B ;

sleep 1 ; echo C ; done

&quot;</font></pre>

    <font face="Courier New, Courier, monospace">Initial analysis

      pointed us to the Ext3 layer where we observed a hang, below is

      the sequence,<br>

      <br>

      journal_commit_transaction -&gt; wait_for_iobuf -&gt;

      wait_on_buffer &lt; <b>gets stuck here</b> &gt; wait_on_buffer

      -&gt; buffer locked -&gt; wait_on_bit -&gt; sync_buffer -&gt;

      io_schedule<br>

    </font><br>

    <div><font face="Courier New, Courier, monospace">When we debugged

        it further we understood that we were waiting for a callback to

        be received from drbd driver,<br>

      </font><br>

      <pre><font face="Courier New, Courier, monospace">submit_bh:

callback for bh = journal_end_buffer_io_sync

</font></pre>

      <pre><font face="Courier New, Courier, monospace">callback for bio = end_bio_bh_io_sync ( calls journal_end_buffer_io_sync )

submit_bh -&gt;  register callback for bio (buffer io) end_bio_bh_io_sync -&gt;

submit_bio -&gt; generic_make_request -&gt; __generic_make_request -&gt; 

q-&gt;make_request_fn -&gt; corresponding handle for drbd is called which is

drbd_make_request_26,</font></pre>

      <br>

      <font face="Courier New, Courier, monospace">When we debugged it

        further in drbd driver and the iscsi driver we understood that

        when n/w is down, iSCSI layer goes to a blocked state for time

        equivalent to the </font><font face="Courier New, Courier,

        monospace">session recovery timeout value which default to 120

        sec. On Secondary, Operations from &lt;scsi_io_completion&gt; to

        &lt;asender through wake_asender in drbd_endio_write_sec&gt;

        does not happen when the iscsi is in blocked state and hence the

        callback to the ext3 layer does not happen on the Primary which

        waits</font>

      <font face="Courier New, Courier, monospace">on a wait queue to

        receive a P_RECV_ACK from secondary. Attached the complete call

        trace for reference.<br>

      </font><br>

      <font face="Courier New, Courier, monospace">I back-ported a set

        of patches from 8.3 branch, major ones being the below, Complete

        list is available as part of back-ported patches listed in the

        attached text file.<br>

      </font><br>

      <font face="Courier New, Courier, monospace">all patches listed

        for drbd: detach from frozen backing device</font><br>

      <font face="Courier New, Courier, monospace">&amp;<br>

        drbd: Implemented real timeout checking for request processing

        time<br>

      </font><font face="Courier New, Courier, monospace"><br>

      </font></div>

    <div><font face="Courier New, Courier, monospace">I can still see

        the issue with the back-ported patches, so we made some changes

        to the drbd driver wherein </font><font face="Courier New,

        Courier, monospace">if there is no response from the peer </font><font face="Courier New, Courier, monospace">we try to trigger a

        timeout and subsequently a state change. I have attached the

        patch for reference. Can anyone please suggest if the attached

        patch is the right way of resolving the issue?<br>

      </font></div>

    <font face="Courier New, Courier, monospace"><br>

        Thanks &amp; Regards,<br>

        Mukunda</font></div>