transferring bvecs over the network in drbd
Lars Ellenberg
lars.ellenberg at linbit.com
Thu May 8 10:39:56 CEST 2025
On Wed, May 07, 2025 at 11:45:50PM -0700, Christoph Hellwig wrote:
> Hi all,
>
> I recently went over code that directly access the bio_vec bv_page/
> bv_offset members and the code in _drbd_send_bio/_drbd_send_zc_bio
> came to my attention.
>
> It iterates the bio to kmap all segments, and then either does a
> sock_sendmsg on a newly created kvec iter, or one one a new bvec iter
> for each segment. The former can't work on highmem systems and both
> versions are rather inefficient.
>
> What is preventing drbd from doing a single sock_sendmsg with the
> bvec payload? nvme-tcp (nvme_tcp_init_iter0 is a good example for
> doing that, or the sunrpc svcsock code using it's local bvec list
> (svc_tcp_sendmsg).
For async replication, we want to actually copy data into send buffer,
we cannot have the network stack hold a reference to a page for which
we signalled io completion already.
For sync replication we want to avoid additional data copy if possible,
so try to use "zero copy sendpage".
That's why we have two variants of what looks to be the same thing.
Why we do it that way: probably when we wrote that part,
a better infrastructure was not available, or we were not aware of it.
Thanks for the pointers, we'll look into it.
Using more efficient ways to do stuff sounds good.
Lars
More information about the drbd-dev
mailing list