[PATCH 01/11] drbd_nl: dont allow detating to be inttrupted in waiting D_DETACHING to DISKLESS

Tue Jul 2 03:45:59 CEST 2024

在 2024/7/1 星期一 下午 6:00, Philipp Reisner 写道:
> Hi Dongsheng,
> 
> Thanks for all this information! That is already nearly a perfect
> commit message.
> Still, I am looking for a better approach to solving this problem.
> Instead of making the detach uninterruptible, I suggest finding a way
> to still schedule the ldev_destroy_work in this corner case.

Sounds good, we are willing to do review and test for your patch when 
it's ready.
> 
> PS: I prefer changes with 100 lines of commit message that touches 3
> lines of code over 3 lines of commit message for 100 lines of code
> changes.

Totally understood, that's important for discussion and code maintaining.

Thanx
> 
> best regards,
>   Philipp
> 
> On Mon, Jul 1, 2024 at 4:02 AM Dongsheng Yang
> <dongsheng.yang at easystack.cn> wrote:
>>
>>
>>
>> 在 2024/6/28 星期五 下午 5:10, Philipp Reisner 写道:
>>> Hello Dongsheng,
>>>
>>> First of all, thanks for contributing patches to us.
>>> Please find my reply on the patch below the quote:
>>>
>>> On Mon, Jun 24, 2024 at 7:52 AM zhengbing.huang
>>> <zhengbing.huang at easystack.cn> wrote:
>>>>
>>>> From: Dongsheng Yang <dongsheng.yang at easystack.cn>
>>>>
>>>> In our network failure and drbd down testing, we found warning in dmesg and drbd down process into D state:
>>>>
>>>> "kernel: drbd /unregistered/ramtest3/0 drbd103: ASSERTION device->disk_state[NOW] == D_FAILED || device->disk_state[NOW] == D_DETACHING FAILED in go_diskless"
>>>>
>>>> the problem is the wait_event is inttruptable, it could be intrupted by signal and call drbd_cleanup_device before go_diskless()
>>>>
>>>
>>> In this case, I suggest improving the expression in the assertion.
>>> Improving an assertion can also mean removing that assertion.
>>
>> Hi Philipp,
>>          This patchset is fixing the problems found by a network failure test
>> script[1].
>>          The [1/11] is not about just a WARNING, it will result a process with D
>> state in wait_event(device->misc_wait, !test_bit(GOING_DISKLESS,
>> &device->flags)); in adm_del_minor().
>>
>> let's think about this sequence:
>>
>> a) drbd_adm_down -> adm_detach -> change_disk_state(device, D_DETACHING...
>>
>> b) it will call put_ldev(), set GOING_DISKLESS and post a work for
>> GO_DISKLESS
>>
>> c) adm_detach() start wait_event_interruptible(device->misc_wait,
>>                          get_disk_state(device) != D_DETACHING);
>> but it can be intrrupted, then call drbd_cleanup_device() to set
>> device->disk_state[NOW] = D_DISKLESS;
>>
>> after that, it will go to adm_del_minor() and
>> wait_event(device->misc_wait, !test_bit(GOING_DISKLESS,
>> &device->flags)); which expects drbd_ldev_destroy to clear GOING_DISKLESS.
>>
>> d) on the other hand, go_diskless work start and warn on the message in
>> commit message. it will do change_disk_state(device, D_DISKLESS,
>> CS_HARD, "go-diskless", NULL); But the disk_state[NOW] is already
>> D_DISKLESS. So it will not schedule &device->ldev_destroy_work.
>>
>> As a result, the wait_event in c) will never return.
>>
>>
>> [1]:
>> check_drbd_process() {
>>       ps aux | grep " D"|grep drbd
>> }
>>
>> check_node_2_drbd_process() {
>>       ssh node-2 'ps aux' | grep " D"|grep drbd
>> }
>>
>> wait_for_no_drbd_d_state() {
>>       count=0
>>       while true; do
>>           if check_drbd_process; then
>>               echo "Found drbd process in D state, sleeping for ${count}
>> second..."
>>               sleep 1
>>               count=$((count + 1))
>>           else
>>               echo "No drbd process in D state."
>>               break
>>           fi
>>       done
>>       while true; do
>>           if check_node_2_drbd_process; then
>>               echo "Found drbd process in D state, sleeping for ${count}
>> second..."
>>               sleep 1
>>               count=$((count + 1))
>>           else
>>               echo "No drbd process in D state."
>>               break
>>           fi
>>       done
>> }
>>
>> random_sleep=$((RANDOM % 100))
>>
>> ssh node-2 "ifup Bond2-roce.1469"
>> ifup Bond2-roce.1469
>>
>> sleep 5
>>
>> for i in `seq 0 9`; do
>>           drbdadm up ramtest${i}
>>           ssh node-2 "drbdadm up ramtest${i}"
>> done
>>
>> sleep ${random_sleep}
>>
>> ssh node-2 "ifdown Bond2-roce.1469"
>>
>> random_sleep=$((RANDOM % 10))
>>
>> for i in `seq 0 9`; do
>>           drbdsetup fail-io ramtest${i} &
>>           drbdadm down ramtest${i} &
>> done
>>
>> sleep 10
>>
>> wait_for_no_drbd_d_state
>>>
>>> The wait_event_interruptible() is there for a reason. Think of a
>>> backing disk that behaves like a tar pit—a backing device that no
>>> longer finishes IO requests. You want a way to interrupt the drbdsetup
>>> waiting in detach.
>>>
>>> PS: A bit more elaborative commit messages are welcome.
>>>
>>> best regards,
>>>    Philipp
>>>