<div dir="ltr"><div>Dear Philipp,<br></div><br>Please check my previous question of CASE-14("[DRBD-user] [CASE-14] primary node hang by VM-net-disconnect during big file copy").<div>According to this case, Linux drbd deadlock may occur.<div><div>On the other hand, Windows side there is no deadlock but sometimes the transfer_log list is broken in _tl_restart function.</div><div><br></div><div>So, We are trying to modify the source code as follows: </div><div><div><br></div><div>1. Modifications</div><div><br></div>1) in drbd_send_and_submit()</div><div><br></div><div><div><span style="white-space:pre-wrap">        </span>if (likely(req->i.size != 0)) {</div><div><span style="white-space:pre-wrap">                </span>if (rw == WRITE) {</div><div><span style="white-space:pre-wrap">                        </span>struct drbd_request *req2;</div><div><span style="white-space:pre-wrap">                        </span>resource->current_tle_writes++;</div><div>#if 0 // WIN32 ### ignore tail_recursion ###</div><div><span style="white-space:pre-wrap">                        </span>list_for_each_entry_reverse(req2, &resource->transfer_log, tl_requests) {</div><div><span style="white-space:pre-wrap">                                </span>if (req2->rq_state[0] & RQ_WRITE) {</div><div><span style="white-space:pre-wrap">                                        </span>/* Make the new write request depend on</div><div><span style="white-space:pre-wrap">                                        </span> * the previous one. */</div><div><span style="white-space:pre-wrap">                                        </span>kref_get(&req->kref);</div><div><span style="white-space:pre-wrap">                                        </span>break;</div><div><span style="white-space:pre-wrap">                                </span>}</div><div><span style="white-space:pre-wrap">                        </span>}</div><div>#endif</div><div><span style="white-space:pre-wrap">                </span>}</div><div><br></div><div><span style="white-space:pre-wrap">                </span>list_add_tail(&req->tl_requests, &resource->transfer_log);</div><div><span style="white-space:pre-wrap">        </span>}</div><div><br></div><div><br></div><div>2) in drbd_req_destroy()</div><div><br></div><div><div><span style="white-space:pre-wrap">        </span>if (s & RQ_WRITE && req_size) {</div><div><span style="white-space:pre-wrap">                </span>list_for_each_entry(req, &device->resource->transfer_log, tl_requests) {</div><div><span style="white-space:pre-wrap">                        </span>if (req->rq_state[0] & RQ_WRITE) {</div><div><span style="white-space:pre-wrap">                                </span>/*</div><div><span style="white-space:pre-wrap">                                </span> * Do the equivalent of:</div><div><span style="white-space:pre-wrap">                                </span> * kref_put(&req->kref, drbd_req_destroy)</div><div><span style="white-space:pre-wrap">                                </span> * without recursing into the destructor.</div><div><span style="white-space:pre-wrap">                                </span> */</div><div>#if 0 // WIN32 ### ignore tail_recursion ###</div><div><span style="white-space:pre-wrap">                                </span>if (atomic_dec_and_test(&req->kref.refcount))</div><div><span style="white-space:pre-wrap">                                        </span>goto tail_recursion;</div><div>#endif</div><div><span style="white-space:pre-wrap">                                </span>break;</div><div><span style="white-space:pre-wrap">                        </span>}</div><div><span style="white-space:pre-wrap">                </span>}</div><div><span style="white-space:pre-wrap">        </span>}</div></div><div><br></div><div><br></div><div>2. Questions<br></div><div><br></div><div>1) This part of "tail_recursion" is a new design on verson 9. </div><div> Is this essential operation? </div><div> I mean, what do you think about my ignoring tail_recursion part for temporary workaround?</div><div> </div><div>2) And what is the reason for the marking of "kref_get(&req->kref);" in drbd_send_and_submit and processing with recursion in drbd_req_destroy later? </div><div><br></div><div><div>3) On Windows side, we ignore this part(see source code of "#if 0 // WIN32 ### ignore tail_recursion ###"). </div><div> Anyway, after ignore, Windows drbd engine works well, till now. Is there any problem?</div></div><div><br></div><div><br></div><div>On Linux side, you cannot see this list-crash-case because the CASE-14 test may be done by deadlock first. </div><div>Please check the CASE-14 deadlock case first and then check this CASE-20.</div><div><br></div><div>Thanks.</div><div><br></div><div><br></div>
</div></div></div></div>