<div dir="ltr"><div>Dear Philipp,<br></div><br>Please check my previous question of CASE-14(&quot;[DRBD-user] [CASE-14] primary node hang by VM-net-disconnect during big file copy&quot;).<div>According to this case, Linux drbd deadlock may occur.<div><div>On the other hand, Windows side there is no deadlock but sometimes the transfer_log  list is broken in _tl_restart function.</div><div><br></div><div>So, We are trying to modify the source code as follows: </div><div><div><br></div><div>1. Modifications</div><div><br></div>1) in drbd_send_and_submit()</div><div><br></div><div><div><span style="white-space:pre-wrap">        </span>if (likely(req-&gt;i.size != 0)) {</div><div><span style="white-space:pre-wrap">                </span>if (rw == WRITE) {</div><div><span style="white-space:pre-wrap">                        </span>struct drbd_request *req2;</div><div><span style="white-space:pre-wrap">                        </span>resource-&gt;current_tle_writes++;</div><div>#if 0 // WIN32 ### ignore tail_recursion ###</div><div><span style="white-space:pre-wrap">                        </span>list_for_each_entry_reverse(req2, &amp;resource-&gt;transfer_log, tl_requests) {</div><div><span style="white-space:pre-wrap">                                </span>if (req2-&gt;rq_state[0] &amp; RQ_WRITE) {</div><div><span style="white-space:pre-wrap">                                        </span>/* Make the new write request depend on</div><div><span style="white-space:pre-wrap">                                        </span> * the previous one. */</div><div><span style="white-space:pre-wrap">                                        </span>kref_get(&amp;req-&gt;kref);</div><div><span style="white-space:pre-wrap">                                        </span>break;</div><div><span style="white-space:pre-wrap">                                </span>}</div><div><span style="white-space:pre-wrap">                        </span>}</div><div>#endif</div><div><span style="white-space:pre-wrap">                </span>}</div><div><br></div><div><span style="white-space:pre-wrap">                </span>list_add_tail(&amp;req-&gt;tl_requests, &amp;resource-&gt;transfer_log);</div><div><span style="white-space:pre-wrap">        </span>}</div><div><br></div><div><br></div><div>2) in drbd_req_destroy()</div><div><br></div><div><div><span style="white-space:pre-wrap">        </span>if (s &amp; RQ_WRITE &amp;&amp; req_size) {</div><div><span style="white-space:pre-wrap">                </span>list_for_each_entry(req, &amp;device-&gt;resource-&gt;transfer_log, tl_requests) {</div><div><span style="white-space:pre-wrap">                        </span>if (req-&gt;rq_state[0] &amp; RQ_WRITE) {</div><div><span style="white-space:pre-wrap">                                </span>/*</div><div><span style="white-space:pre-wrap">                                </span> * Do the equivalent of:</div><div><span style="white-space:pre-wrap">                                </span> *   kref_put(&amp;req-&gt;kref, drbd_req_destroy)</div><div><span style="white-space:pre-wrap">                                </span> * without recursing into the destructor.</div><div><span style="white-space:pre-wrap">                                </span> */</div><div>#if 0  // WIN32 ### ignore tail_recursion ###</div><div><span style="white-space:pre-wrap">                                </span>if (atomic_dec_and_test(&amp;req-&gt;kref.refcount))</div><div><span style="white-space:pre-wrap">                                        </span>goto tail_recursion;</div><div>#endif</div><div><span style="white-space:pre-wrap">                                </span>break;</div><div><span style="white-space:pre-wrap">                        </span>}</div><div><span style="white-space:pre-wrap">                </span>}</div><div><span style="white-space:pre-wrap">        </span>}</div></div><div><br></div><div><br></div><div>2. Questions<br></div><div><br></div><div>1) This part of &quot;tail_recursion&quot; is a new design on verson 9.  </div><div>     Is this essential operation? </div><div>     I mean, what do you think about my ignoring tail_recursion part for temporary workaround?</div><div>   </div><div>2)   And what is the reason for the marking of &quot;kref_get(&amp;req-&gt;kref);&quot;  in drbd_send_and_submit and processing with recursion in  drbd_req_destroy later? </div><div><br></div><div><div>3) On Windows side, we ignore this part(see source code of &quot;#if 0 // WIN32 ### ignore tail_recursion ###&quot;). </div><div>     Anyway, after ignore, Windows drbd engine works well, till now.  Is there any problem?</div></div><div><br></div><div><br></div><div>On Linux side, you cannot see this list-crash-case because the CASE-14 test may be done by deadlock first. </div><div>Please check the CASE-14 deadlock case first and then check this CASE-20.</div><div><br></div><div>Thanks.</div><div><br></div><div><br></div>
</div></div></div></div>