Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I use DRBD ver 8.3.15 with configuratios "net - on-congestion = pull-ahead". "protocol C" The connect state "Ahead" and never resync again. "Ahead Stuck" Problem. Fortunately, I can reproduce this problem with two I/O generaotors "fio" on Virtual Machines of VMFS5(ESXi Server v5.0u1) in 3 minutes. So, I analisys the Ahead->SourceSync mechanizm by runnning code. At function got_BarrierACK (drbd_receiver.c) will try to resync again. But when mdev->ap_in_flight is negative value, can not resync again. I modified the source "drbd_receiver.c" and the problem is away at the I/O pattern. The value "mdev->ap_in_flight" is used for only for "congestion fill", I think. Is this right? Please check difference below!! ==[[Difference]]================================= # diff -c drbd_receiver.c.old drbd_receiver.c *** drbd_receiver.c.old --- drbd_receiver.c *************** *** 4815,4831 **** return true; } STATIC int got_BarrierAck(struct drbd_conf *mdev, struct p_header80 *h) { struct p_barrier_ack *p = (struct p_barrier_ack *)h; tl_release(mdev, p->barrier, be32_to_cpu(p->set_size)); if (mdev->state.conn == C_AHEAD && ! atomic_read(&mdev->ap_in_flight) == 0 && !drbd_test_and_set_flag(mdev, AHEAD_TO_SYNC_SOURCE)) { mdev->start_resync_timer.expires = jiffies + HZ; add_timer(&mdev->start_resync_timer); } return true;--- 4815,4846 ---- return true; } + /* Fix Ahead Stuck Problem */ STATIC int got_BarrierAck(struct drbd_conf *mdev, struct p_header80 *h) { struct p_barrier_ack *p = (struct p_barrier_ack *)h; tl_release(mdev, p->barrier, be32_to_cpu(p->set_size)); + /*[Debug] Output Detail */ + dev_info( DEV, "got_BarrierACK(state.conn=%d,ap_in_flight=%d)\n", + mdev->state.conn, atomic_read(&mdev->ap_in_flight) ); + if (mdev->state.conn == C_AHEAD && ! /* ap_in_flight is sometime less than zero */ ! /* atomic_read(&mdev->ap_in_flight) == 0 && */ ! atomic_read(&mdev->ap_in_flight) <= 0 && !drbd_test_and_set_flag(mdev, AHEAD_TO_SYNC_SOURCE)) { + + /* Reset ap_in_flight into zero */ + atomic_set(&mdev->ap_in_flight, 0); + mdev->start_resync_timer.expires = jiffies + HZ; add_timer(&mdev->start_resync_timer); + + /*[Debug] resync timer start*/ + dev_info(DEV,"start resync timer!!\n" ); + } return true; Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140315/6d87376a/attachment.htm>