[DRBD-user] Ahead stuck problem

谷萩毅之 yahagi.tgi at gmail.com
Sat Mar 15 02:25:51 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,


I use DRBD ver 8.3.15 with configuratios

  "net - on-congestion = pull-ahead".

  "protocol C"

The connect state "Ahead" and never resync again.

"Ahead Stuck" Problem.


Fortunately, I can reproduce this problem with two

I/O generaotors "fio" on Virtual Machines of

VMFS5(ESXi Server v5.0u1) in 3 minutes.


So, I analisys the Ahead->SourceSync mechanizm by runnning code.


At function got_BarrierACK (drbd_receiver.c) will

try to resync again.

But when mdev->ap_in_flight is negative value, can not resync again.


I modified the source "drbd_receiver.c" and the problem is away at

the I/O pattern.


The value "mdev->ap_in_flight" is used for only for

"congestion fill", I think. Is this right?


Please check difference below!!


==[[Difference]]=================================

# diff -c drbd_receiver.c.old drbd_receiver.c
*** drbd_receiver.c.old
--- drbd_receiver.c
***************
*** 4815,4831 ****
        return true;
  }

  STATIC int got_BarrierAck(struct drbd_conf *mdev, struct p_header80 *h)
  {
        struct p_barrier_ack *p = (struct p_barrier_ack *)h;

        tl_release(mdev, p->barrier, be32_to_cpu(p->set_size));

        if (mdev->state.conn == C_AHEAD &&
!           atomic_read(&mdev->ap_in_flight) == 0 &&
            !drbd_test_and_set_flag(mdev, AHEAD_TO_SYNC_SOURCE)) {
                mdev->start_resync_timer.expires = jiffies + HZ;
                add_timer(&mdev->start_resync_timer);
        }

        return true;--- 4815,4846 ----

        return true;
  }

+ /* Fix Ahead Stuck Problem */
  STATIC int got_BarrierAck(struct drbd_conf *mdev, struct p_header80 *h)
  {
        struct p_barrier_ack *p = (struct p_barrier_ack *)h;

        tl_release(mdev, p->barrier, be32_to_cpu(p->set_size));

+ /*[Debug] Output Detail */
+         dev_info( DEV, "got_BarrierACK(state.conn=%d,ap_in_flight=%d)\n",
+                   mdev->state.conn, atomic_read(&mdev->ap_in_flight) );
+
        if (mdev->state.conn == C_AHEAD &&
! /* ap_in_flight is sometime less than zero */
! /*        atomic_read(&mdev->ap_in_flight) == 0 && */
!           atomic_read(&mdev->ap_in_flight) <= 0 &&
            !drbd_test_and_set_flag(mdev, AHEAD_TO_SYNC_SOURCE)) {
+
+ /* Reset ap_in_flight into zero */
+               atomic_set(&mdev->ap_in_flight, 0);
+
                mdev->start_resync_timer.expires = jiffies + HZ;
                add_timer(&mdev->start_resync_timer);
+
+ /*[Debug] resync timer start*/
+               dev_info(DEV,"start resync timer!!\n" );
+
        }

        return true;



Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140315/6d87376a/attachment.htm>


More information about the drbd-user mailing list