Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear Philipp, Today, "Unstable Outdated problem" does not occur yet. But this CASE-16 similar problem occurred on Windows-DRBD(patched with your latest version [86e4439]) 1. Version - 86e4439 2. Reproduce step 1) force "Outdate" state on each node by "drbdsetup outdate" command 2) promote one node to primary: "drbdadm primary --force r0" 3) file copy 4) result: (1) file copy is pending at the beginning. (2) primary status does not end in WFBitMapS status. 3. Questions There are two set-positions of CONSIDER_RESYNC. We think drbd_set_role() part may have some problem. If this function sets the CONSIDER_RESYNC, the bitmap-exchange should be occurred. After then, if this bitmap-exchange starts, this hang problem will be sometimes occurred or not. So, we try to disable the bitmap-exchange like this; receive_bitmap() { ... } else if (peer_device->repl_state[NOW] != L_WF_BITMAP_S) { /* admin may have requested C_DISCONNECTING, * other threads may have noticed network errors */ drbd_info(peer_device, "unexpected repl_state (%s) in receive_bitmap\n", drbd_repl_str(peer_device->repl_state[NOW])); #ifdef _WIN32_V9 err = -EIO; goto out; #endif } ... } 1) What do yo think about our workaround? 2) When does "outdated-outdated" status occur? 3) Could you explain more detail about CONSIDER_RESYNC meaning or purpose? 4. Logs - Let me give you 2 logs, one is WindowsDRBD the other is LinuxDRBD - Both have very similar pattern. - Both maybe wait for bitmap response with WBitMapS status. - And finally both be ended in failure of I/O hang. 1) [CASE-16] Windows DRBD log - http://pastebin.com/Sm7pJyTa 2) [CASE-16] Linux DRBD Log - http://pastebin.com/VhGyBAwT Thanks. 2016-02-10 14:58 GMT+09:00 김재헌 <jhkim at mantech.co.kr>: > Dear Philipp, > > 1. Test version > - CentOS-7 Linux 3.10.0-229.7.2.el7.x86_64 > - Engine: V9.0.1-1 > -- GIT-hash: f57acfc22d29a95697e683fb6bbacd9a1ad4348e build by > root at drbd9-02, 2016-02-09 09:46:20 > > > 2. Test scenario > > 1) status > > [root at drbd9-01 ~]# drbdadm status r0 > r0 role:Secondary > disk:Outdated > drbd9-02 role:Secondary > peer-disk:Outdated > > 2) try mount > > ........... > > *Feb 9 14:19:31 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: pdsk( Outdated > -> Consistent ) repl( Established -> WFBitMapS )* > Feb 9 14:19:31 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: send bitmap > stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 100.0% > Feb 9 14:19:31 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: unexpected > Feb 9 14:19:31 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: In UUIDs from > node 1 found equal UUID (3D7DF30CF04727A4) for nodes 2 3 > Feb 9 14:19:31 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: I have > C223F7CE3C9D7358 for node_id=2 > Feb 9 14:19:31 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: I have > C223F7CE3C9D7358 for node_id=3 > > Feb 9 14:19:35 drbd9-01 kernel: EXT4-fs (drbd1): mounting ext3 file > system using the ext4 subsystem > Feb 9 14:20:01 drbd9-01 systemd: Starting Session 13 of user root. > Feb 9 14:20:01 drbd9-01 systemd: Started Session 13 of user root. > > Feb 9 14:21:49 drbd9-01 kernel: INFO: task mount:4758 blocked for more > than 120 seconds. > Feb 9 14:21:49 drbd9-01 kernel: "echo 0 > /proc/sys/ > *kernel/hung_task_timeout_secs*" disables this message. > Feb 9 14:21:49 drbd9-01 kernel: mount D ffff88003d613680 0 > 4758 4685 0x00000080 > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160406/fc66c743/attachment.htm>