Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sun, Feb 14, 2016 at 07:39:36PM +0900, 김재헌 wrote: > Dear Philipp, > > Please check my previous question of CASE-14("[DRBD-user] [CASE-14] primary > node hang by VM-net-disconnect during big file copy"). > According to this case, Linux drbd deadlock may occur. > On the other hand, Windows side there is no deadlock but sometimes the > transfer_log list is broken in _tl_restart function. > > So, We are trying to modify the source code as follows: > > 1. Modifications > > 1) in drbd_send_and_submit() > > if (likely(req->i.size != 0)) { > if (rw == WRITE) { > struct drbd_request *req2; > resource->current_tle_writes++; > #if 0 // WIN32 ### ignore tail_recursion ### > list_for_each_entry_reverse(req2, &resource->transfer_log, tl_requests) { > if (req2->rq_state[0] & RQ_WRITE) { > /* Make the new write request depend on > * the previous one. */ > kref_get(&req->kref); > break; > } > } > #endif > } > > list_add_tail(&req->tl_requests, &resource->transfer_log); > } > > > 2) in drbd_req_destroy() > > if (s & RQ_WRITE && req_size) { > list_for_each_entry(req, &device->resource->transfer_log, tl_requests) { > if (req->rq_state[0] & RQ_WRITE) { > /* > * Do the equivalent of: > * kref_put(&req->kref, drbd_req_destroy) > * without recursing into the destructor. > */ > #if 0 // WIN32 ### ignore tail_recursion ### > if (atomic_dec_and_test(&req->kref.refcount)) > goto tail_recursion; > #endif > break; > } > } > } > > > 2. Questions > > 1) This part of "tail_recursion" is a new design on verson 9. > Is this essential operation? > I mean, what do you think about my ignoring tail_recursion part for > temporary workaround? I cannot explain all the implications within two lines of text, but we want the "destruction" of drbd_requests to happen in the order they have been put on the transfer log. That is important in some multi-peer scenarios. If there is no explicit dependency (here implemented via kref), they could be destroyed out-of-order, and that could lead to bad decisions elsewhere, or potentially not enough being resynced in some scenarios. > 2) And what is the reason for the marking of "kref_get(&req->kref);" in > drbd_send_and_submit and processing with recursion in drbd_req_destroy > later? see above. > 3) On Windows side, we ignore this part(see source code of "#if 0 // WIN32 > ### ignore tail_recursion ###"). > Anyway, after ignore, Windows drbd engine works well, till now. Is > there any problem? see above. > On Linux side, you cannot see this list-crash-case because the CASE-14 test > may be done by deadlock first. > Please check the CASE-14 deadlock case first and then check this CASE-20. Cheers, Lars