<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><pre>Hi Philipp,</pre><br>The third point about modification is that it may not cover all scenarios.<br><br>In w_after_state_change(), between generating the new current_uuid and<br>clearing the susp_uuid, it is possible that another thread set the __NEW_CUR_UUID flag.<br><br>The problem scenario is as follows:<br><br>T1:<br>begin_state_change()<br>end_state_change()<br> __end_state_change()<br> ___end_state_change()<br> # susp_uuid value is {True, True}<br> T2: w_after_state_change()<br><div> drbd_uuid_new_current()</div><div><br></div> T3:<br> begin_state_change()<br> # set __NEW_CUR_UUID flag<br> end_state_change()<br><br> begin_state_change()<br> # set susp_uuid[NEW] = False<br> end_state_change()<br><br>In the end, the NEW_CUR_UUID flag bit is generated and not cleaned up.<br><br>Therefore, the third point about modification can be change to this:<br><br>diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c<br>index e35150340..2101c0ebc 100644<br>--- a/drbd/drbd_state.c<br>+++ b/drbd/drbd_state.c<br>@@ -2553,6 +2553,7 @@ static bool should_try_become_up_to_date(struct drbd_device *device, enum drbd_d<br> static void finish_state_change(struct drbd_resource *resource, const char *tag)<br> {<br> enum drbd_role *role = resource->role;<br>+ bool *susp_uuid = resource->susp_uuid;<br> struct drbd_device *device;<br> struct drbd_connection *connection;<br> bool starting_resync = false;<br>@@ -2828,7 +2829,7 @@ static void finish_state_change(struct drbd_resource *resource, const char *tag)<br> if (role[OLD] == R_SECONDARY && role[NEW] == R_PRIMARY)<br> create_new_uuid = true;<br> <br>- if (create_new_uuid)<br>+ if (create_new_uuid && !susp_uuid[OLD])<br> set_bit(__NEW_CUR_UUID, &device->flags);<br> <br> if (disk_state[NEW] != D_NEGOTIATING && get_ldev_if_state(device, D_DETACHING)) {<br><br><div>best regards,</div><div>Zhengbing<br></div><div><br></div><br><br><div style="position:relative;zoom:1"></div><br><pre><br>From: Philipp Reisner <philipp.reisner@linbit.com>
Date: 2023-11-25 15:47:16
To: "黄正兵" <zhengbing.huang@easystack.cn>
Cc: dongsheng.yang@easystack.cn,drbd-dev@lists.linbit.com
Subject: Re: Re: [PATCH] drbd: when change susp_uuid[NEW] to true, make sure susp_uuid[OLD] is false>Hi Zhengbing,
>
>Lars and I came up with this:
>
>commit e31d63966fac73fc000b584ccb2580016eccda17
>Author: Philipp Reisner <philipp.reisner@linbit.com>
>Date: Thu Nov 23 06:31:28 2023 +0100
>
> drbd: fix susp_uuid clearing
>
> I introduced susp_uuid for keeping a resource longer suspended for
> writing a new current-uuid.
> But in the following scenario, it failed to clear the susp_uuid bit:
>
> 1. The network connection between a primary and secondary node
> fails. The primary has the setting quorum=2. IO freezes on the
> primary node.
>
> 2. The user changes the quorum setting from 2 to one. The primary
> node generates a new current-uuid and clears the susp_uuid bit in
> a second state change.
>
> 3. If, during the second state change, another condition for creating
> a new current UUID evaluates to true, the code fails to clear the
> suspen_uuid bit.
>
> Fixing this in three places:
> 1. in sanitize_state(): Only set susp_uuid[NEW] if susp_uuid[OLD]
> was not set. That avoids it happening a second time after
> one iteration.
>
> 2. in w_after_state_chage(): Always end the cycle by setting clearing
> susp_uuid[NEW] if it is set. (Independently of susp_uuid[OLD].
>
> 3. in finish_sate_change(): Do not set NEW_CUR_UUID again if we are
> in this operation.
>
> Strictly speaking, (2) is unnecessary after (1), but I also do it
> since it shortens the code. Also, (3) alone would fix the issue.
>
> Fixes: d47f7456ab ("drbd: create new UUID before resuming IO upon
> regaining quorum")
>
> Co-developed-by: zhengbing.huang <zhengbing.huang@easystack.cn>
> Co-developed-by: Lars Ellenberg <lars.ellenberg@linbit.com>
>
>diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
>index e35150340..b884bb622 100644
>--- a/drbd/drbd_state.c
>+++ b/drbd/drbd_state.c
>@@ -2353,7 +2353,8 @@ static void sanitize_state(struct drbd_resource *resource)
> resource->susp_quorum[NEW] =
> resource->res_opts.on_no_quorum == ONQ_SUSPEND_IO ?
>!resource_has_quorum : false;
>
>- if (resource_is_suspended(resource, OLD) &&
>!resource_is_suspended(resource, NEW)) {
>+ if (!resource->susp_uuid[OLD] &&
>+ resource_is_suspended(resource, OLD) &&
>!resource_is_suspended(resource, NEW)) {
> idr_for_each_entry(&resource->devices, device, vnr) {
> if (test_bit(NEW_CUR_UUID, &device->flags)) {
> resource->susp_uuid[NEW] = true;
>@@ -2553,6 +2554,7 @@ static bool should_try_become_up_to_date(struct
>drbd_device *device, enum drbd_d
>static void finish_state_change(struct drbd_resource *resource, const char *tag)
>{
> enum drbd_role *role = resource->role;
>+ bool *susp_uuid = resource->susp_uuid;
> struct drbd_device *device;
> struct drbd_connection *connection;
> bool starting_resync = false;
>@@ -2828,7 +2830,10 @@ static void finish_state_change(struct
>drbd_resource *resource, const char *tag)
> if (role[OLD] == R_SECONDARY && role[NEW] == R_PRIMARY)
> create_new_uuid = true;
>
>- if (create_new_uuid)
>+ /* When susp_uuid goes from true to false, we just created a new
>+ * current-uuid, it is pointless to do this one more time
>+ */
>+ if (create_new_uuid && !(susp_uuid[OLD] && !susp_uuid[NEW]))
> set_bit(__NEW_CUR_UUID, &device->flags);
>
> if (disk_state[NEW] != D_NEGOTIATING &&
>get_ldev_if_state(device, D_DETACHING)) {
>@@ -4189,7 +4194,7 @@ static int w_after_state_change(struct drbd_work
>*w, int unused)
> still_connected = true;
> }
>
>- if (!susp_uuid[OLD] && susp_uuid[NEW]) {
>+ if (susp_uuid[NEW]) {
> unsigned long irq_flags;
>
> begin_state_change(resource, &irq_flags, CS_VERBOSE);
>
>On Thu, Nov 23, 2023 at 9:48 AM 黄正兵 <zhengbing.huang@easystack.cn> wrote:
>>
>> Hi Philipp,
>>
>> Such modifications may cause new problems.
>>
>> If, during the second state change, the NEW_CUR_UUID flag was not cleared.
>> clears the susp_uuid bit in a thirdly state change.
>>
>> Eventually, it becomes an endless cycle.
>>
>> So, is it possible to add the following modifications:
>>
>> diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
>> index 2fbd0b1..6bd6431 100644
>> --- a/drbd/drbd_state.c
>> +++ b/drbd/drbd_state.c
>> @@ -3938,9 +3938,13 @@ static int w_after_state_change(struct drbd_work *w, int unused)
>> drbd_maybe_khelper(device, connection, "disconnected");
>> }
>>
>> - if (!susp_uuid[OLD] && susp_uuid[NEW] &&
>> - test_and_clear_bit(NEW_CUR_UUID, &device->flags))
>> - new_current_uuid = true;
>> + if (susp_uuid[NEW]) {
>> + if (susp_uuid[OLD])
>> + clear_bit(NEW_CUR_UUID, &device->flags);
>> + else if (test_and_clear_bit(NEW_CUR_UUID, &device->flags))) {
>> + new_curr_uuid = true;
>> + }
>> + }
>>
>> if (new_current_uuid)
>> drbd_uuid_new_current(device, false);
>>
>> best regards,
>> Zhengbing
>>
>>
>>
>>
>>
>> From: Philipp Reisner <philipp.reisner@linbit.com>
>> Date: 2023-11-23 14:06:30
>> To: "zhengbing.huang" <zhengbing.huang@easystack.cn>
>> Cc: drbd-dev@lists.linbit.com,joel.colledge@linbit.com
>> Subject: Re: [PATCH] drbd: when change susp_uuid[NEW] to true, make sure susp_uuid[OLD] is false>Hello Zhengbing,
>> >
>> >Thank you for pointing this out. I created a little bit different patch.
>> >First, the sanitize_state() function should clean up the new state,
>> >but I think it should never change the old state.
>> >It can not rewrite history.
>> >Second, we do not want to create two new current-uuids while I/O is frozen.
>> >
>> > drbd: fix susp_uuid clearing
>> >
>> > I introduced susp_uuid for keeping a resource longer suspended for
>> > writing a new current-uuid.
>> > But in the following scenario, it failed to clear the susp_uuid bit:
>> >
>> > 1. The network connection between a primary and secondary node
>> > fails. The primary has the setting quorum=2. IO freezes on the
>> > primary node.
>> >
>> > 2. The user changes the quorum setting from 2 to one. The primary
>> > node generates a new current-uuid and clears the susp_uuid bit in a
>> > second state change.
>> >
>> > 3. If, during the second state change, another condition for creating
>> > a new current UUID evaluates to true, the code fails to clear the
>> > suspen_uuid bit.
>> >
>> > The second attempt to create a new current-uuid gets ignored since it
>> > is pointless to create two current-uuids while I/O is frozen.
>> >
>> > Fixes: d47f7456ab ("drbd: create new UUID before resuming IO upon
>> > regaining quorum")
>> >
>> > Co-developed-by: zhengbing.huang <zhengbing.huang@easystack.cn>
>> >
>> >diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
>> >index e35150340..316d3fd39 100644
>> >--- a/drbd/drbd_state.c
>> >+++ b/drbd/drbd_state.c
>> >@@ -4189,7 +4189,7 @@ static int w_after_state_change(struct drbd_work
>> >*w, int unused)
>> > still_connected = true;
>> > }
>> >
>> >- if (!susp_uuid[OLD] && susp_uuid[NEW]) {
>> >+ if (susp_uuid[NEW]) {
>> > unsigned long irq_flags;
>> >
>> > begin_state_change(resource, &irq_flags, CS_VERBOSE);
>> >
>> >
>> >I give this to internal review for merging.
>> >
>> >best regards,
>> > Philipp
>> >
>> >
>> >On Wed, Nov 22, 2023 at 4:25 AM zhengbing.huang
>> ><zhengbing.huang@easystack.cn> wrote:
>> >>
>> >> The problem scenario is as follows:
>> >>
>> >> 1. drbd is built on two nodes, role is primary and secondary, quorum is 2.
>> >> then drbd's network fails. IO will be suspended.
>> >> 2. primary modify quorum to 1, during this state change,
>> >> drbd will set susp_uuid[NEW] to true and generate a new UUID.
>> >> 3. then in w_after_state_change, start the second state change,
>> >> set susp_uuid[NEW] to false. but during the second state change,
>> >> it's possible to find NEW_CUR_UUID flag was set by others.
>> >> then sanitize_state() will set susp_uuid[NEW] to true.
>> >>
>> >> Finally susp_uuid value is {true, true}, IO is frozen.
>> >> And there is no way to set susp_uuid to false after that.
>> >>
>> >> So, while susp_uuid[NEW] is set to true, we want susp_uuid[OLD] to be false.
>> >>
>> >> Fixes: d47f7456ab ("drbd: create new UUID before resuming IO upon regaining quorum")
>> >> Signed-off-by: zhengbing.huang <zhengbing.huang@easystack.cn>
>> >> ---
>> >> drbd/drbd_state.c | 1 +
>> >> 1 file changed, 1 insertion(+)
>> >>
>> >> diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
>> >> index e35150340..0dedd2dae 100644
>> >> --- a/drbd/drbd_state.c
>> >> +++ b/drbd/drbd_state.c
>> >> @@ -2356,6 +2356,7 @@ static void sanitize_state(struct drbd_resource *resource)
>> >> if (resource_is_suspended(resource, OLD) && !resource_is_suspended(resource, NEW)) {
>> >> idr_for_each_entry(&resource->devices, device, vnr) {
>> >> if (test_bit(NEW_CUR_UUID, &device->flags)) {
>> >> + resource->susp_uuid[OLD] = false;
>> >> resource->susp_uuid[NEW] = true;
>> >> break;
>> >> }
>> >> --
>> >> 2.17.1
>> >>
>>
>>
</pre></div><br>