[PATCH 08/11] drbd_transport_rdma: fix a race between dtr_connect and drbd_thread_stop

Philipp Reisner philipp.reisner at linbit.com
Fri Jun 28 14:36:04 CEST 2024


Hello Dongsheng,

I am repeating your description in my own words so that you can verify
I got it right:

CPU 0 executes dtr_connect() and is still before the
wait_for_completion_interruptible().
CPU 1 executes send_sig() in drbd_thread_stop().

Then you conclude that wait_for_completion_interruptible() will not
abort, because the signal
was raised before CPU 0 reached wait_for_completion_interruptible().

If that is your description, then it is wrong.
This is not how signals and the wait_event() macros work.

best regards,
 Philipp

On Mon, Jun 24, 2024 at 9:27 AM zhengbing.huang
<zhengbing.huang at easystack.cn> wrote:
>
> From: Dongsheng Yang <dongsheng.yang at easystack.cn>
>
> If the send_sig() in drbd_thread_stop before wait_for_completion_interruptible() in dtr_connect(),
> it can't return from dtr_connect in network failure.
>
> So replace wait_for_completion_interruptible with wait_for_completion_interruptible_timeout, and
> check status by dtr_connect() itself.
>
> This behavior is similar with tcp transport
>
> Signed-off-by: Dongsheng Yang <dongsheng.yang at easystack.cn>
> ---
>  drbd/drbd_transport_rdma.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drbd/drbd_transport_rdma.c b/drbd/drbd_transport_rdma.c
> index 77ff0055e..c47b344f8 100644
> --- a/drbd/drbd_transport_rdma.c
> +++ b/drbd/drbd_transport_rdma.c
> @@ -2996,12 +2996,21 @@ static int dtr_connect(struct drbd_transport *transport)
>  {
>         struct dtr_transport *rdma_transport =
>                 container_of(transport, struct dtr_transport, transport);
> -       int i, err = -ENOMEM;
> +       int i, err;
>
> -       err = wait_for_completion_interruptible(&rdma_transport->connected);
> -       if (err) {
> +again:
> +       if (drbd_should_abort_listening(transport)) {
> +               err = -EAGAIN;
> +               goto abort;
> +       }
> +
> +       err = wait_for_completion_interruptible_timeout(&rdma_transport->connected, HZ);
> +       if (err < 0) {
>                 flush_signals(current);
>                 goto abort;
> +       } else if (err == 0) {
> +               /* timed out */
> +               goto again;
>         }
>
>         err = atomic_read(&rdma_transport->first_path_connect_err);
> --
> 2.27.0
>


More information about the drbd-dev mailing list