[Drbd-dev] [PATCH] drbd: when change susp_uuid[NEW] to true, make sure susp_uuid[OLD] is false

Philipp Reisner philipp.reisner at linbit.com
Thu Nov 23 07:06:30 CET 2023


Hello Zhengbing,

Thank you for pointing this out. I created a little bit different patch.
First, the sanitize_state() function should clean up the new state,
but I think it should never change the old state.
It can not rewrite history.
Second, we do not want to create two new current-uuids while I/O is frozen.

    drbd: fix susp_uuid clearing

   I introduced susp_uuid for keeping a resource longer suspended for
   writing a new current-uuid.
   But in the following scenario, it failed to clear the susp_uuid bit:

    1. The network connection between a primary and secondary node
    fails. The primary has the setting quorum=2. IO freezes on the
    primary node.

    2. The user changes the quorum setting from 2 to one. The primary
    node generates a new current-uuid and clears the susp_uuid bit in a
    second state change.

    3. If, during the second state change, another condition for creating
    a new current UUID evaluates to true, the code fails to clear the
    suspen_uuid bit.

   The second attempt to create a new current-uuid gets ignored since it
   is pointless to create two current-uuids while I/O is frozen.

   Fixes: d47f7456ab ("drbd: create new UUID before resuming IO upon
   regaining quorum")

   Co-developed-by: zhengbing.huang <zhengbing.huang at easystack.cn>

diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
index e35150340..316d3fd39 100644
--- a/drbd/drbd_state.c
+++ b/drbd/drbd_state.c
@@ -4189,7 +4189,7 @@ static int w_after_state_change(struct drbd_work
*w, int unused)
                       still_connected = true;
       }

-       if (!susp_uuid[OLD] && susp_uuid[NEW]) {
+       if (susp_uuid[NEW]) {
               unsigned long irq_flags;

               begin_state_change(resource, &irq_flags, CS_VERBOSE);


I give this to internal review for merging.

best regards,
 Philipp


On Wed, Nov 22, 2023 at 4:25 AM zhengbing.huang
<zhengbing.huang at easystack.cn> wrote:
>
> The problem scenario is as follows:
>
> 1. drbd is built on two nodes, role is primary and secondary, quorum is 2.
>    then drbd's network fails. IO will be suspended.
> 2. primary modify quorum to 1, during this state change,
>    drbd will set susp_uuid[NEW] to true and generate a new UUID.
> 3. then in w_after_state_change, start the second state change,
>    set susp_uuid[NEW] to false. but during the second state change,
>    it's possible to find NEW_CUR_UUID flag was set by others.
>    then sanitize_state() will set susp_uuid[NEW] to true.
>
> Finally susp_uuid value is {true, true}, IO is frozen.
> And there is no way to set susp_uuid to false after that.
>
> So, while susp_uuid[NEW] is set to true, we want susp_uuid[OLD] to be false.
>
> Fixes: d47f7456ab ("drbd: create new UUID before resuming IO upon regaining quorum")
> Signed-off-by: zhengbing.huang <zhengbing.huang at easystack.cn>
> ---
>  drbd/drbd_state.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
> index e35150340..0dedd2dae 100644
> --- a/drbd/drbd_state.c
> +++ b/drbd/drbd_state.c
> @@ -2356,6 +2356,7 @@ static void sanitize_state(struct drbd_resource *resource)
>         if (resource_is_suspended(resource, OLD) && !resource_is_suspended(resource, NEW)) {
>                 idr_for_each_entry(&resource->devices, device, vnr) {
>                         if (test_bit(NEW_CUR_UUID, &device->flags)) {
> +                               resource->susp_uuid[OLD] = false;
>                                 resource->susp_uuid[NEW] = true;
>                                 break;
>                         }
> --
> 2.17.1
>


More information about the drbd-dev mailing list