[Drbd-dev] Fwd: Bug Report : meet an unexcepted WFBitMapS status after restarting the primary
Dongsheng Yang
dongsheng081251 at gmail.com
Thu Mar 5 14:14:00 CET 2020
Adding linux-block maillist......
---------- Forwarded message ---------
发件人: Dongsheng Yang <dongsheng081251 at gmail.com>
Date: 2020年2月6日周四 上午9:44
Subject: Fwd: Bug Report : meet an unexcepted WFBitMapS status after
restarting the primary
To: <lars.ellenberg at linbit.com>, <philipp.reisner at linbit.com>,
<linux-block at vger.kernel.org>, <joel.colledge at linbit.com>,
<drbd-dev at lists.linbit.com>
Cc: <duan.zhang at easystack.cn>
Hi Philipp and Lars,
Any suggestions?
Thanx
---------- Forwarded message ---------
发件人: Dongsheng Yang <dongsheng081251 at gmail.com>
Date: 2020年2月5日周三 下午7:06
Subject: Bug Report : meet an unexcepted WFBitMapS status after
restarting the primary
To: <joel.colledge at linbit.com>
Cc: <drbd-dev at lists.linbit.com>, <duan.zhang at easystack.cn>
Hi guys,
Version: drbd-9.0.21-1
Layout: drbd.res within 3 nodes -- node-1(Secondary), node-2(Primary),
node-3(Secondary)
Description:
a.reboot node-2 when cluster is working.
b.re-up the drbd.res on node-2 after it restarted.
c.an expected resync from node-3 to node-2 happens. When the resync is
done, however,
node-1 raises an unexpected WFBitMapS repl status and can't recover
to normal anymore.
Status output:
node-1: drbdadm status
drbd6 role:Secondary
disk:UpToDate
hotspare connection:Connecting
node-2 role:Primary
replication:WFBitMapS peer-disk:Consistent
node-3 role:Secondary
peer-disk:UpToDate
node-2: drbdadm status
drbd6 role:Primary
disk:UpToDate
hotspare connection:Connecting
node-1 role:Secondary
peer-disk:UpToDate
node-3 role:Secondary
peer-disk:UpToDate
I assume that there is a process sequence below according to my source
code version:
node-1 node-2
node-3
restarted with CRASHED_PRIMARY
start sync with node-3 as target
start sync with node-2 as source
… …
end sync with node-3
end sync with node-2
w_after_state_change
loop 1 within for loop against node-1:(a)
receive_uuids10 send uuid with
UUID_FLAG_GOT_STABLE&CRASHED_PRIMARY to node-1
receive uuid of node-2 with CRASHED_PRIMARY loop 2 within for
loop against node-3:
clear CRASHED_PRIMARY(b)
send uuid to node-2 with UUID_FLAG_RESYNC receive uuids10
sync_handshake to SYNC_SOURCE_IF_BOTH_FAILED sync_handshake to NO_SYNC
change repl state to WFBitMapS
The key problem is about the order of step(a) and step(b), that is,
node-2 sends the
unexpected CRASHED_PRIMARY to node-1 though it's actually no longer a
crashed primary
after syncing with node-3.
So may I have the below questions:
a.Is this really a BUG or just an expected result?
b.If there's already a patch fix within the newest verion?
c.If there's some workaround method against this kind of unexcepted
status, since I really
meet so many other problems like that :(
More information about the drbd-dev
mailing list