[I apologize if this is a double post]<br>Hi all,<br>I have recently started hacking into drbd kernel code and I am bit of a newbie to the concept of "bio"s. <br><br>my question: (all concerning IO at secondary, for Protocol A/B)<br>
In drbd_receiver.c, esp in function receive_Data(..),<br>
the backup disconnects from primary when drbd_submit_ee(..) call fails.<br>The comments indicate <br> /* drbd_submit_ee currently fails for one reason only:<br><div id=":1hu"><div>
* not being able to allocate enough bios. <br>
* Is dropping the connection going to help? */<br><br>So, the code just finishes the activity log io, releases the ee and returns false, <br>which causes the main loop to disconnect from primary.<br><br>Why was this choice made? <br>
Please correct me if I am wrong: <br>Isnt failure to allocate a bio a temporary issue? I mean the kernel ran out of<br>bio's to allocate out of its slabs (or short of memory currently) and thus <br>retrying again after a while might work.<br>
<br>I understand that for protocol C, one cannot buffer the IO on secondary. But for Protocol A/B, they<br>can certainly be buffered and retried. Isnt that better than just disconnecting from primary and causing<br>reconnects?<br>
==========<br>On the same note, <br>function "w_e_reissue" callback is used to resubmit a failed IO , if the IO had REQ_HARDBARRIER flag.<br>Looking at this function, it tries to reissue the IO and <br> (a) when drbd_submit_ee fails, <br>
it installs itself as the callback handler and re queues the work. This contradicts with the receive_Data(..)<br>error handling, where drbd_submit_ee call failure leads to connection termination.<br><br> Also, this could cause potential looping (probably infinite) when the submit_ee call keeps failing due to ENOMEM.<br>
shouldnt there be some sort of "num_attempts" counter that limits number of IO retries?<br><br>the comments in this function<br>"@cancel The connection will be closed anyways (unused in this callback)"<br>
I cannot find a control path that causes a connection close, before reaching this function. On the other hand,<br> drbd_endio_sec --> drbd_endio_sec_final <br> where this ee is simply requeued, with its callback changed to w_e_reissue which always returns 1.<br>
(unlike e_end_block which returns 0 causing the worker thread to force connection to go down)<br>=========<br><font color="#888888"><font color="#888888"><br>shriram</font></font></div></div><br clear="all"><br>-- <br>
perception is but an offspring of its own self<br>