[PATCH 08/11] drbd_transport_rdma: fix a race between dtr_connect and drbd_thread_stop

Dongsheng Yang dongsheng.yang at easystack.cn
Mon Jul 1 04:30:16 CEST 2024



在 2024/6/28 星期五 下午 8:36, Philipp Reisner 写道:
> Hello Dongsheng,
> 
> I am repeating your description in my own words so that you can verify
> I got it right:
> 
> CPU 0 executes dtr_connect() and is still before the
> wait_for_completion_interruptible().
> CPU 1 executes send_sig() in drbd_thread_stop().
> 
> Then you conclude that wait_for_completion_interruptible() will not
> abort, because the signal
> was raised before CPU 0 reached wait_for_completion_interruptible().

The problem is dtr_prepare_connect() calles flush_signals(), so the 
signal from drbd_thread_stop() can be flushed by dtr_prepare_connect().
> 
> If that is your description, then it is wrong.
> This is not how signals and the wait_event() macros work.
> 
> best regards,
>   Philipp
> 
> On Mon, Jun 24, 2024 at 9:27 AM zhengbing.huang
> <zhengbing.huang at easystack.cn> wrote:
>>
>> From: Dongsheng Yang <dongsheng.yang at easystack.cn>
>>
>> If the send_sig() in drbd_thread_stop before wait_for_completion_interruptible() in dtr_connect(),
>> it can't return from dtr_connect in network failure.
>>
>> So replace wait_for_completion_interruptible with wait_for_completion_interruptible_timeout, and
>> check status by dtr_connect() itself.
>>
>> This behavior is similar with tcp transport
>>
>> Signed-off-by: Dongsheng Yang <dongsheng.yang at easystack.cn>
>> ---
>>   drbd/drbd_transport_rdma.c | 15 ++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/drbd/drbd_transport_rdma.c b/drbd/drbd_transport_rdma.c
>> index 77ff0055e..c47b344f8 100644
>> --- a/drbd/drbd_transport_rdma.c
>> +++ b/drbd/drbd_transport_rdma.c
>> @@ -2996,12 +2996,21 @@ static int dtr_connect(struct drbd_transport *transport)
>>   {
>>          struct dtr_transport *rdma_transport =
>>                  container_of(transport, struct dtr_transport, transport);
>> -       int i, err = -ENOMEM;
>> +       int i, err;
>>
>> -       err = wait_for_completion_interruptible(&rdma_transport->connected);
>> -       if (err) {
>> +again:
>> +       if (drbd_should_abort_listening(transport)) {
>> +               err = -EAGAIN;
>> +               goto abort;
>> +       }
>> +
>> +       err = wait_for_completion_interruptible_timeout(&rdma_transport->connected, HZ);
>> +       if (err < 0) {
>>                  flush_signals(current);
>>                  goto abort;
>> +       } else if (err == 0) {
>> +               /* timed out */
>> +               goto again;
>>          }
>>
>>          err = atomic_read(&rdma_transport->first_path_connect_err);
>> --
>> 2.27.0
>>


More information about the drbd-dev mailing list