Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sat, Mar 15, 2014 at 10:25:51AM +0900, 谷萩毅之 wrote: > Hi, > > > I use DRBD ver 8.3.15 with configuratios > "net - on-congestion = pull-ahead". > "protocol C" > The connect state "Ahead" and never resync again. > "Ahead Stuck" Problem. > > Fortunately, I can reproduce this problem with two > I/O generaotors "fio" on Virtual Machines of > VMFS5(ESXi Server v5.0u1) in 3 minutes. > > > So, I analisys the Ahead->SourceSync mechanizm by runnning code. > > At function got_BarrierACK (drbd_receiver.c) will > try to resync again. > > But when mdev->ap_in_flight is negative value, can not resync again. > > I modified the source "drbd_receiver.c" and the problem is away at > the I/O pattern. > > > The value "mdev->ap_in_flight" is used for only for > "congestion fill", I think. Is this right? > > > Please check difference below!! Well, thank you. Your workaround makes the symptom go away. The fix should be to figure out why ap_in_flight can become negative. That should not be possible. Would you try to reproduce with current drbd 8.4 git head, so we know whether this is still an issue? > ==[[Difference]]================================= Also, please always use "unified" diffs (diff -u) or better yet, git diff... > # diff -c drbd_receiver.c.old drbd_receiver.c > + /* Fix Ahead Stuck Problem */ > STATIC int got_BarrierAck(struct drbd_conf *mdev, struct p_header80 *h) > { > struct p_barrier_ack *p = (struct p_barrier_ack *)h; > > tl_release(mdev, p->barrier, be32_to_cpu(p->set_size)); > > + /*[Debug] Output Detail */ > + dev_info( DEV, "got_BarrierACK(state.conn=%d,ap_in_flight=%d)\n", > + mdev->state.conn, atomic_read(&mdev->ap_in_flight) ); > + > if (mdev->state.conn == C_AHEAD && > ! /* ap_in_flight is sometime less than zero */ > ! /* atomic_read(&mdev->ap_in_flight) == 0 && */ > ! atomic_read(&mdev->ap_in_flight) <= 0 && > !drbd_test_and_set_flag(mdev, AHEAD_TO_SYNC_SOURCE)) { > + > + /* Reset ap_in_flight into zero */ > + atomic_set(&mdev->ap_in_flight, 0); > + > mdev->start_resync_timer.expires = jiffies + HZ; > add_timer(&mdev->start_resync_timer); > + > + /*[Debug] resync timer start*/ > + dev_info(DEV,"start resync timer!!\n" ); > + > } > > return true; -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed