[DRBD-user] Ahead stuck problem

Lars Ellenberg lars.ellenberg at linbit.com
Sun Mar 16 15:48:52 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Sat, Mar 15, 2014 at 10:25:51AM +0900, 谷萩毅之 wrote:
> Hi,
> 
> 
> I use DRBD ver 8.3.15 with configuratios
>   "net - on-congestion = pull-ahead".
>   "protocol C"
> The connect state "Ahead" and never resync again.
> "Ahead Stuck" Problem.
> 
> Fortunately, I can reproduce this problem with two
> I/O generaotors "fio" on Virtual Machines of
> VMFS5(ESXi Server v5.0u1) in 3 minutes.
> 
> 
> So, I analisys the Ahead->SourceSync mechanizm by runnning code.
> 
> At function got_BarrierACK (drbd_receiver.c) will
> try to resync again.
> 
> But when mdev->ap_in_flight is negative value, can not resync again.
> 
> I modified the source "drbd_receiver.c" and the problem is away at
> the I/O pattern.
> 
> 
> The value "mdev->ap_in_flight" is used for only for
> "congestion fill", I think. Is this right?
> 
> 
> Please check difference below!!

Well, thank you.
Your workaround makes the symptom go away.
The fix should be to figure out why
ap_in_flight can become negative.
That should not be possible.

Would you try to reproduce with current drbd 8.4 git head,
so we know whether this is still an issue?

> ==[[Difference]]=================================

Also, please always use "unified" diffs (diff -u)
or better yet, git diff...

> # diff -c drbd_receiver.c.old drbd_receiver.c
> + /* Fix Ahead Stuck Problem */
>   STATIC int got_BarrierAck(struct drbd_conf *mdev, struct p_header80 *h)
>   {
>         struct p_barrier_ack *p = (struct p_barrier_ack *)h;
> 
>         tl_release(mdev, p->barrier, be32_to_cpu(p->set_size));
> 
> + /*[Debug] Output Detail */
> +         dev_info( DEV, "got_BarrierACK(state.conn=%d,ap_in_flight=%d)\n",
> +                   mdev->state.conn, atomic_read(&mdev->ap_in_flight) );
> +
>         if (mdev->state.conn == C_AHEAD &&
> ! /* ap_in_flight is sometime less than zero */
> ! /*        atomic_read(&mdev->ap_in_flight) == 0 && */
> !           atomic_read(&mdev->ap_in_flight) <= 0 &&
>             !drbd_test_and_set_flag(mdev, AHEAD_TO_SYNC_SOURCE)) {
> +
> + /* Reset ap_in_flight into zero */
> +               atomic_set(&mdev->ap_in_flight, 0);
> +
>                 mdev->start_resync_timer.expires = jiffies + HZ;
>                 add_timer(&mdev->start_resync_timer);
> +
> + /*[Debug] resync timer start*/
> +               dev_info(DEV,"start resync timer!!\n" );
> +
>         }
> 
>         return true;


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list