[Drbd-dev] Protocol A,B & submit ee failure

Shriram Rajagopalan rshriram at gmail.com
Wed Nov 17 20:00:03 CET 2010


[I apologize if this is a double post]
Hi all,
I have recently started hacking into drbd kernel code and I am bit of a
newbie to the concept of "bio"s.

my question: (all concerning IO at secondary, for Protocol A/B)
 In drbd_receiver.c, esp in function receive_Data(..),
the backup disconnects from primary when drbd_submit_ee(..) call fails.
The comments indicate
        /* drbd_submit_ee currently fails for one reason only:
         * not being able to allocate enough
bios.

         * Is dropping the connection going to help? */

So, the code just finishes the activity log io, releases the ee and returns
false,
which causes the main loop to disconnect from primary.

Why was this choice made?
Please correct me if I am wrong:
Isnt failure to allocate a bio a temporary issue? I mean the kernel ran out
of
bio's to allocate out of its slabs (or short of memory currently) and thus
retrying again after a while might work.

I understand that for protocol C, one cannot buffer the IO on secondary. But
for Protocol A/B, they
can certainly be buffered and retried. Isnt that better than just
disconnecting from primary and causing
reconnects?
==========
On the same note,
function "w_e_reissue" callback is used to resubmit a failed IO , if the IO
had REQ_HARDBARRIER flag.
Looking at this function, it tries to reissue the IO and
 (a) when drbd_submit_ee fails,
    it installs itself as the callback handler and re queues the work. This
contradicts with the receive_Data(..)
error handling, where drbd_submit_ee call failure leads to connection
termination.

   Also, this could cause potential looping (probably infinite) when the
submit_ee call keeps failing due to ENOMEM.
   shouldnt there be some sort of "num_attempts" counter that limits number
of IO retries?

the comments in this function
"@cancel  The connection will be closed anyways (unused in this callback)"
I cannot find a control path that causes a connection close, before reaching
this function. On the other hand,
 drbd_endio_sec --> drbd_endio_sec_final
   where this ee is simply requeued, with its callback changed to
w_e_reissue which always returns 1.
   (unlike e_end_block which returns 0 causing the worker thread to force
connection to go down)
=========

shriram


-- 
perception is but an offspring of its own self
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20101117/96959d76/attachment.htm>


More information about the drbd-dev mailing list