Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sat, Mar 15, 2014 at 10:25:51AM +0900, 谷萩毅之 wrote:
> Hi,
>
>
> I use DRBD ver 8.3.15 with configuratios
> "net - on-congestion = pull-ahead".
> "protocol C"
> The connect state "Ahead" and never resync again.
> "Ahead Stuck" Problem.
>
> Fortunately, I can reproduce this problem with two
> I/O generaotors "fio" on Virtual Machines of
> VMFS5(ESXi Server v5.0u1) in 3 minutes.
>
>
> So, I analisys the Ahead->SourceSync mechanizm by runnning code.
>
> At function got_BarrierACK (drbd_receiver.c) will
> try to resync again.
>
> But when mdev->ap_in_flight is negative value, can not resync again.
>
> I modified the source "drbd_receiver.c" and the problem is away at
> the I/O pattern.
>
>
> The value "mdev->ap_in_flight" is used for only for
> "congestion fill", I think. Is this right?
>
>
> Please check difference below!!
Well, thank you.
Your workaround makes the symptom go away.
The fix should be to figure out why
ap_in_flight can become negative.
That should not be possible.
Would you try to reproduce with current drbd 8.4 git head,
so we know whether this is still an issue?
> ==[[Difference]]=================================
Also, please always use "unified" diffs (diff -u)
or better yet, git diff...
> # diff -c drbd_receiver.c.old drbd_receiver.c
> + /* Fix Ahead Stuck Problem */
> STATIC int got_BarrierAck(struct drbd_conf *mdev, struct p_header80 *h)
> {
> struct p_barrier_ack *p = (struct p_barrier_ack *)h;
>
> tl_release(mdev, p->barrier, be32_to_cpu(p->set_size));
>
> + /*[Debug] Output Detail */
> + dev_info( DEV, "got_BarrierACK(state.conn=%d,ap_in_flight=%d)\n",
> + mdev->state.conn, atomic_read(&mdev->ap_in_flight) );
> +
> if (mdev->state.conn == C_AHEAD &&
> ! /* ap_in_flight is sometime less than zero */
> ! /* atomic_read(&mdev->ap_in_flight) == 0 && */
> ! atomic_read(&mdev->ap_in_flight) <= 0 &&
> !drbd_test_and_set_flag(mdev, AHEAD_TO_SYNC_SOURCE)) {
> +
> + /* Reset ap_in_flight into zero */
> + atomic_set(&mdev->ap_in_flight, 0);
> +
> mdev->start_resync_timer.expires = jiffies + HZ;
> add_timer(&mdev->start_resync_timer);
> +
> + /*[Debug] resync timer start*/
> + dev_info(DEV,"start resync timer!!\n" );
> +
> }
>
> return true;
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed