Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear Lars, Thank you for your kind reply.~ Frankly speaking, I don't understand your comments clearly yet. Let me check deeply our modification code, later. Anyway, I think you might have meet this situation if you did CASE-14 test. Could you please test "CASE-14 primary node hang by VM-net-disconnect during big file copy". I think it's critical problem. I will wait for your good news. Thanks. 2016-02-15 22:54 GMT+09:00 Lars Ellenberg <lars.ellenberg at linbit.com>: > On Sun, Feb 14, 2016 at 07:39:36PM +0900, 김재헌 wrote: > > Dear Philipp, > > > > Please check my previous question of CASE-14("[DRBD-user] [CASE-14] > primary > > node hang by VM-net-disconnect during big file copy"). > > According to this case, Linux drbd deadlock may occur. > > On the other hand, Windows side there is no deadlock but sometimes the > > transfer_log list is broken in _tl_restart function. > > > > So, We are trying to modify the source code as follows: > > > > 1. Modifications > > > > 1) in drbd_send_and_submit() > > > > if (likely(req->i.size != 0)) { > > if (rw == WRITE) { > > struct drbd_request *req2; > > resource->current_tle_writes++; > > #if 0 // WIN32 ### ignore tail_recursion ### > > list_for_each_entry_reverse(req2, &resource->transfer_log, tl_requests) { > > if (req2->rq_state[0] & RQ_WRITE) { > > /* Make the new write request depend on > > * the previous one. */ > > kref_get(&req->kref); > > break; > > } > > } > > #endif > > } > > > > list_add_tail(&req->tl_requests, &resource->transfer_log); > > } > > > > > > 2) in drbd_req_destroy() > > > > if (s & RQ_WRITE && req_size) { > > list_for_each_entry(req, &device->resource->transfer_log, tl_requests) { > > if (req->rq_state[0] & RQ_WRITE) { > > /* > > * Do the equivalent of: > > * kref_put(&req->kref, drbd_req_destroy) > > * without recursing into the destructor. > > */ > > #if 0 // WIN32 ### ignore tail_recursion ### > > if (atomic_dec_and_test(&req->kref.refcount)) > > goto tail_recursion; > > #endif > > break; > > } > > } > > } > > > > > > 2. Questions > > > > 1) This part of "tail_recursion" is a new design on verson 9. > > Is this essential operation? > > I mean, what do you think about my ignoring tail_recursion part for > > temporary workaround? > > I cannot explain all the implications within two lines of text, but we > want the "destruction" of drbd_requests to happen in the order they have > been put on the transfer log. > That is important in some multi-peer scenarios. > > If there is no explicit dependency (here implemented via kref), they > could be destroyed out-of-order, and that could lead to bad decisions > elsewhere, or potentially not enough being resynced in some scenarios. > > > 2) And what is the reason for the marking of "kref_get(&req->kref);" > in > > drbd_send_and_submit and processing with recursion in drbd_req_destroy > > later? > > see above. > > > 3) On Windows side, we ignore this part(see source code of "#if 0 // > WIN32 > > ### ignore tail_recursion ###"). > > Anyway, after ignore, Windows drbd engine works well, till now. Is > > there any problem? > > see above. > > > On Linux side, you cannot see this list-crash-case because the CASE-14 > test > > may be done by deadlock first. > > Please check the CASE-14 deadlock case first and then check this CASE-20. > > Cheers, > > Lars > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160216/e5b08eaf/attachment.htm>