[Drbd-dev] [CASE-41] After re-connected, despite of OOS remaining primary does not start re-synchronization or continues AHEAD mode.

Jaeheon Kim jhkim at mantech.co.kr
Sun Apr 10 15:47:07 CEST 2016


Dear Phil,

We are verifying your patch for CASE-17;

     -
http://git.drbd.org/drbd-9.0.git/commitdiff/db0534012b16cb6708aeb57b8cca29816c4e5b52

It's good for resolution of "sync progress percentage unchanged at 100.00.".
But another problem occurred.


1. Test Env.

 - Version: lasted  [86e4439]
 - 2 nodes (drbd.conf sets for multi node, but we test just 2 nodes)
 - protocol A; sndbuf-size 10M;on-congestion pull-ahead;congestion-fill 5M;


2. Test-steps

 1) make primary/secondary
 2) copy big file
 3) drbdadm disconnect at the beginning of copying
 4) drbdadm connect r0
 5) sometimes, UpToDate/UpToDate without resync despite of OOS remaining.


3. primary status

[root at drbd9-01 바탕화면]# drbdsetup status --verbose --statistics
r0 node-id:0 role:Primary suspended:no
    write-ordering:flush
  volume:0 minor:1 disk:UpToDate
      size:4998328 read:6662461 written:3878932 al-writes:250 bm-writes:0
      upper-pending:0 lower-pending:0 al-suspended:no blocked:no
  drbd9-02 node-id:1 connection:Connected role:Secondary congested:no
    volume:0 replication:Established peer-disk:UpToDate
        resync-suspended:no
        received:0 sent:6661324 out-of-sync:1931560 pending:0 unacked:0


4. Log
  - primary log :
      -  http://pastebin.com/LKt0bmR5

  - Please check my comments in the attached log files.


5. Quesstions

 1) Why does the resync stop despite of OOS remaining?

 2) Sometimes, AHEAD mode continues, why?

 3) On Windows DRBD, sometimes the following messages occur.  Linux as
well, I think.

Mon Apr  4 19:15:05.555 2016 (UTC + 9:00):
Mon Apr  4 19:15:32.595 2016 (UTC + 9:00): WDRBD_INFO: [receive_state] drbd
r0/0 drbd2 2008r2-x64-pas1: SyncSource still sees bits set!! FIXME
Mon Apr  4 19:15:32.595 2016 (UTC + 9:00):
Mon Apr  4 19:15:32.612 2016 (UTC + 9:00): WDRBD_INFO:
[drbd_resync_finished] drbd r0/0 drbd2 2008r2-x64-pas1: Resync done (total
34 sec; paused 0 sec; 52524 K/sec)
Mon Apr  4 19:15:32.612 2016 (UTC + 9:00):
Mon Apr  4 19:15:32.626 2016 (UTC + 9:00): _WIN32_v9_CHECK: n_oos=396551
rs_failed=0. Ignore assert


 3-1)  On receive_state(), Is "FIXME" no problem?

         /* TODO: Since DRBD9 we experience that SyncSource still
            has  bits set... NEED TO UNDERSTAND AND FIX! */
          drbd_warn(peer_device, "SyncSource still sees bits set!!
FIXME\n");


 3-2)  On drbd_resync_finished(), Why does this ASSERT occure?

         D_ASSERT(peer_device, (n_oos - peer_device->rs_failed) == 0);


Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20160410/f598a5db/attachment.htm>


More information about the drbd-dev mailing list