Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear Philipp, Please check my previous question of CASE-14("[DRBD-user] [CASE-14] primary node hang by VM-net-disconnect during big file copy"). According to this case, Linux drbd deadlock may occur. On the other hand, Windows side there is no deadlock but sometimes the transfer_log list is broken in _tl_restart function. So, We are trying to modify the source code as follows: 1. Modifications 1) in drbd_send_and_submit() if (likely(req->i.size != 0)) { if (rw == WRITE) { struct drbd_request *req2; resource->current_tle_writes++; #if 0 // WIN32 ### ignore tail_recursion ### list_for_each_entry_reverse(req2, &resource->transfer_log, tl_requests) { if (req2->rq_state[0] & RQ_WRITE) { /* Make the new write request depend on * the previous one. */ kref_get(&req->kref); break; } } #endif } list_add_tail(&req->tl_requests, &resource->transfer_log); } 2) in drbd_req_destroy() if (s & RQ_WRITE && req_size) { list_for_each_entry(req, &device->resource->transfer_log, tl_requests) { if (req->rq_state[0] & RQ_WRITE) { /* * Do the equivalent of: * kref_put(&req->kref, drbd_req_destroy) * without recursing into the destructor. */ #if 0 // WIN32 ### ignore tail_recursion ### if (atomic_dec_and_test(&req->kref.refcount)) goto tail_recursion; #endif break; } } } 2. Questions 1) This part of "tail_recursion" is a new design on verson 9. Is this essential operation? I mean, what do you think about my ignoring tail_recursion part for temporary workaround? 2) And what is the reason for the marking of "kref_get(&req->kref);" in drbd_send_and_submit and processing with recursion in drbd_req_destroy later? 3) On Windows side, we ignore this part(see source code of "#if 0 // WIN32 ### ignore tail_recursion ###"). Anyway, after ignore, Windows drbd engine works well, till now. Is there any problem? On Linux side, you cannot see this list-crash-case because the CASE-14 test may be done by deadlock first. Please check the CASE-14 deadlock case first and then check this CASE-20. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160214/2b46638a/attachment.htm>