[Drbd-dev] [CASE-29] After re-connect, WFBitMapS-WFBitMapT status has sustained continuously and copy command hangs
Lars Ellenberg
lars.ellenberg at linbit.com
Thu Apr 21 15:57:36 CEST 2016
On Thu, Apr 21, 2016 at 10:48:11PM +0900, Jaeheon Kim wrote:
> Hi,
>
> We wrote some temporary solution to avoid file copy hang problem.
> We inserted wake_up function for sender_work queue after _req_mod(req,
> QUEUE_FOR_SEND_OOS, peer_device).
> Please check following code in drbd_process_write_request function.
Please try to send unified diffs.
maybe git diff, even.
>
> drbd_process_write_request ()
> {
>
> ........
>
>
> } else if (drbd_set_out_of_sync(peer_device, req->i.sector, req->i.size))
>
> #ifdef _WIN32_V9 // Windows DRBD
> {
> _req_mod(req, QUEUE_FOR_SEND_OOS, peer_device);
> if(peer_device->repl_state[NOW] == L_WF_BITMAP_S)
> {
> wake_up(&peer_device->connection->sender_work.q_wait);
> }
> }
> #else
> _req_mod(req, QUEUE_FOR_SEND_OOS, peer_device); // Linux Org
> #endif
>
> }
>
> What do you think about this idea?
You are correct,
if all established replication links are "ahead",
and not a single link actually gets the data,
we may miss the wake up of the sender.
better fix is probably
diff --git a/drbd/drbd_req.c b/drbd/drbd_req.c
index 3159de8..ae4bbd6 100644
--- a/drbd/drbd_req.c
+++ b/drbd/drbd_req.c
@@ -1666,8 +1666,7 @@ static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request
}
if (!drbd_process_write_request(req))
no_remote = true;
- else
- wake_all_senders(resource);
+ wake_all_senders(resource);
} else {
if (peer_device) {
_req_mod(req, TO_BE_SENT, peer_device);
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support
DRBD® and LINBIT® are registered trademarks of LINBIT
More information about the drbd-dev
mailing list