[DRBD-user] [CASE-20] What is the tail_recursion operation during drbd_req_destroy?

김재헌 jhkim at mantech.co.kr
Sun Feb 14 11:39:36 CET 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Dear Philipp,

Please check my previous question of CASE-14("[DRBD-user] [CASE-14] primary
node hang by VM-net-disconnect during big file copy").
According to this case, Linux drbd deadlock may occur.
On the other hand, Windows side there is no deadlock but sometimes the
transfer_log  list is broken in _tl_restart function.

So, We are trying to modify the source code as follows:

1. Modifications

1) in drbd_send_and_submit()

if (likely(req->i.size != 0)) {
if (rw == WRITE) {
struct drbd_request *req2;
#if 0 // WIN32 ### ignore tail_recursion ###
list_for_each_entry_reverse(req2, &resource->transfer_log, tl_requests) {
if (req2->rq_state[0] & RQ_WRITE) {
/* Make the new write request depend on
* the previous one. */

list_add_tail(&req->tl_requests, &resource->transfer_log);

2) in drbd_req_destroy()

if (s & RQ_WRITE && req_size) {
list_for_each_entry(req, &device->resource->transfer_log, tl_requests) {
if (req->rq_state[0] & RQ_WRITE) {
* Do the equivalent of:
*   kref_put(&req->kref, drbd_req_destroy)
* without recursing into the destructor.
#if 0  // WIN32 ### ignore tail_recursion ###
if (atomic_dec_and_test(&req->kref.refcount))
goto tail_recursion;

2. Questions

1) This part of "tail_recursion" is a new design on verson 9.
     Is this essential operation?
     I mean, what do you think about my ignoring tail_recursion part for
temporary workaround?

2)   And what is the reason for the marking of "kref_get(&req->kref);"  in
drbd_send_and_submit and processing with recursion in  drbd_req_destroy

3) On Windows side, we ignore this part(see source code of "#if 0 // WIN32
### ignore tail_recursion ###").
     Anyway, after ignore, Windows drbd engine works well, till now.  Is
there any problem?

On Linux side, you cannot see this list-crash-case because the CASE-14 test
may be done by deadlock first.
Please check the CASE-14 deadlock case first and then check this CASE-20.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160214/2b46638a/attachment.htm>

More information about the drbd-user mailing list