Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, This problem is reproduced on your latest 9.0.1-1 version also. 1. Test version - CentOS-7 Linux 3.10.0-229.7.2.el7.x86_64 - [root at drbd9-02 etc]# cat /proc/drbd version: 9.0.1-1 (api:2/proto:86-111) GIT-hash: f57acfc22d29a95697e683fb6bbacd9a1ad4348e build by root at drbd9-02, 2016-02-09 09:46:20 Transports (api:14): tcp (1.0.0) 2. Test scenario 1) setup primary-seconary 2) set Async replication mode(sndbuf=1M, congestion-fill=256KB) 3) copy big file(4GB) 4) disconnect by disable vmware's network icon while copying 5) Result: - VM hang sometimes!!!! 3. Log (/var/log/messages) analysys: 1) promote to primary Attached:Feb 9 12:25:13 drbd9-01 kernel: drbd r0: role( Secondary -> Primary ) Feb 9 12:25:13 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: unexpected Feb 9 12:25:13 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: In UUIDs from node 1 found equal UUID (3D7DF30CF04727A4) for nodes 2 3 Feb 9 12:25:13 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: I have EBD0DBB6FFD73C44 for node_id=2 Feb 9 12:25:13 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: I have EBD0DBB6FFD73C44 for node_id=3 2) mount volume to /mnt Feb 9 12:25:22 drbd9-01 kernel: EXT4-fs (drbd1): mounting ext3 file system using the ext4 subsystem Feb 9 12:25:22 drbd9-01 kernel: EXT4-fs (drbd1): mounted filesystem with ordered data mode. Opts: (null) 3) copy big file(3GB) to data volume (/mnt) 4) congestion occured Feb 9 12:25:53 drbd9-01 kernel: drbd r0/0 drbd1: Congestion-fill threshold reached Feb 9 12:25:53 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: pdsk( UpToDate -> Consistent ) repl( Established -> Ahead ) Feb 9 12:25:55 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: pdsk( Consistent -> Outdated ) Feb 9 12:25:55 drbd9-01 kernel: drbd r0/0 drbd1: new current UUID: C223F7CE3C9D7359 weak: FFFFFFFFFFFFFFFE Feb 9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: helper command: /sbin/drbdadm before-resync-source Feb 9 12:25:56 drbd9-01 drbdadm[2817]: Don't know which config file belongs to resource r0, trying default ones... Feb 9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: helper command: /sbin/drbdadm before-resync-source exit code 0 (0x0) Feb 9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: pdsk( Outdated -> Inconsistent ) repl( Ahead -> SyncSource ) Feb 9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: Began resync as SyncSource (will sync 295372 KB [73843 bits set]). Feb 9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1: Congestion-fill threshold reached Feb 9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: repl( SyncSource -> Ahead ) 5) Disable Vmware network icon Feb 9 12:25:56 drbd9-01 kernel: vmxnet3 0000:04:00.0 eth1: NIC Link is Down 6) send timeout occured Feb 9 12:26:08 drbd9-01 kernel: drbd r0 drbd9-02: [drbd_s_r0/2726] sending time expired, ko = 4294967295 7) ping timeout occured Feb 9 12:26:09 drbd9-01 kernel: drbd r0 drbd9-02: PingAck did not arrive in time. Feb 9 12:26:09 drbd9-01 kernel: drbd r0 drbd9-02: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown ) Feb 9 12:26:09 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: repl( Ahead -> Off ) Feb 9 12:26:09 drbd9-01 kernel: drbd r0 drbd9-02: ack_receiver terminated Feb 9 12:26:09 drbd9-01 kernel: drbd r0 drbd9-02: Terminating ack_recv thread 8) "cp 4GB.file /mnt" application hang 9) hang status be monitored by hung_task_timeout_secs. 10) worker thread deadlock Feb 9 12:28:53 drbd9-01 kernel: INFO: task drbd_w_r0:2716 blocked for more than 120 seconds. Feb 9 12:28:53 drbd9-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 9 12:28:53 drbd9-01 kernel: drbd_w_r0 D ffff88003d613680 0 2716 2 0x00000080 Feb 9 12:28:53 drbd9-01 kernel: ffff8800116ffc10 0000000000000046 ffff880019505b00 ffff8800116fffd8 Feb 9 12:28:53 drbd9-01 kernel: ffff8800116fffd8 ffff8800116fffd8 ffff880019505b00 ffff88002e8dc0e8 Feb 9 12:28:53 drbd9-01 kernel: ffff88002e8dc0ec ffff880019505b00 00000000ffffffff ffff88002e8dc0f0 Feb 9 12:28:53 drbd9-01 kernel: Call Trace: Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa040ef43>] conn_prepare_command+0x33/0x80 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa040efb2>] send_state+0x22/0xb0 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa0410c5b>] drbd_send_state+0x1b/0x20 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa042e668>] w_after_state_change+0x9d8/0x1f30 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81609101>] ? __schedule+0x241/0x7c0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa03ee94d>] drbd_worker+0x10d/0x530 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ? w_complete+0x20/0x20 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa040c521>] drbd_thread_setup+0x61/0x120 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ? w_complete+0x20/0x20 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81614158>] ret_from_fork+0x58/0x90 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 11) receiver thread deadlock Feb 9 12:28:53 drbd9-01 kernel: INFO: task drbd_r_r0:2730 blocked for more than 120 seconds. Feb 9 12:28:53 drbd9-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 9 12:28:53 drbd9-01 kernel: drbd_r_r0 D ffff88003d613680 0 2730 2 0x00000084 Feb 9 12:28:53 drbd9-01 kernel: ffff88001748bd20 0000000000000046 ffff88001b3016c0 ffff88001748bfd8 Feb 9 12:28:53 drbd9-01 kernel: ffff88001748bfd8 ffff88001748bfd8 ffff88001b3016c0 ffff88002e8dc0e8 Feb 9 12:28:53 drbd9-01 kernel: ffff88002e8dc0ec ffff88001b3016c0 00000000ffffffff ffff88002e8dc0f0 Feb 9 12:28:53 drbd9-01 kernel: Call Trace: Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa0412bdc>] drbd_transport_shutdown+0x2c/0x80 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa03ffed5>] conn_disconnect+0xb5/0x2f0 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa0431356>] ? change_cstate+0x86/0xc0 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa0400e18>] drbd_receiver+0x188/0x630 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ? w_complete+0x20/0x20 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa040c521>] drbd_thread_setup+0x61/0x120 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ? w_complete+0x20/0x20 [drbd] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81614158>] ret_from_fork+0x58/0x90 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 12) copy app deadlock maybe. Feb 9 12:28:53 drbd9-01 kernel: INFO: task jbd2/drbd1-8:2806 blocked for more than 120 seconds. Feb 9 12:28:53 drbd9-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 9 12:28:53 drbd9-01 kernel: jbd2/drbd1-8 D ffff88003d613680 0 2806 2 0x00000080 Feb 9 12:28:53 drbd9-01 kernel: ffff880011623af8 0000000000000046 ffff880011764440 ffff880011623fd8 Feb 9 12:28:53 drbd9-01 kernel: ffff880011623fd8 ffff880011623fd8 ffff880011764440 ffff88003d613f48 Feb 9 12:28:53 drbd9-01 kernel: ffff88003ffac1e8 0000000000000002 ffffffff81155fa0 ffff880011623b70 Feb 9 12:28:53 drbd9-01 kernel: Call Trace: Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81155fa0>] ? wait_on_page_read+0x60/0x60 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff816099ad>] io_schedule+0x9d/0x130 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81155fae>] sleep_on_page+0xe/0x20 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81607780>] __wait_on_bit+0x60/0x90 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81155d36>] wait_on_page_bit+0x86/0xb0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81098270>] ? autoremove_wake_function+0x40/0x40 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81155e71>] filemap_fdatawait_range+0x111/0x1b0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81155f37>] filemap_fdatawait+0x27/0x30 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa04d0a9f>] jbd2_journal_commit_transaction+0xa8f/0x19a0 [jbd2] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff810ad756>] ? __dequeue_entity+0x26/0x40 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff810125c6>] ? __switch_to+0x136/0x4a0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa04d5da9>] kjournald2+0xc9/0x260 [jbd2] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa04d5ce0>] ? commit_timeout+0x10/0x10 [jbd2] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81614158>] ret_from_fork+0x58/0x90 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 13) copy app deadlock maybe. Feb 9 12:28:53 drbd9-01 kernel: INFO: task cp:2815 blocked for more than 120 seconds. Feb 9 12:28:53 drbd9-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 9 12:28:53 drbd9-01 kernel: cp D ffff88003d613680 0 2815 2626 0x00000080 Feb 9 12:28:53 drbd9-01 kernel: ffff88001950b9a8 0000000000000082 ffff880011764fa0 ffff88001950bfd8 Feb 9 12:28:53 drbd9-01 kernel: ffff88001950bfd8 ffff88001950bfd8 ffff880011764fa0 ffff88003d613f48 Feb 9 12:28:53 drbd9-01 kernel: ffff88003ffa9f50 0000000000000002 ffffffffa04cd720 ffff88001950ba20 Feb 9 12:28:53 drbd9-01 kernel: Call Trace: Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa04cd720>] ? start_this_handle+0x5d0/0x5d0 [jbd2] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff816099ad>] io_schedule+0x9d/0x130 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa04cd72e>] sleep_on_shadow_bh+0xe/0x20 [jbd2] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81607780>] __wait_on_bit+0x60/0x90 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa04cd720>] ? start_this_handle+0x5d0/0x5d0 [jbd2] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81607837>] out_of_line_wait_on_bit+0x87/0xb0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81098270>] ? autoremove_wake_function+0x40/0x40 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa04cec0d>] do_get_write_access+0x2ed/0x500 [jbd2] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff811f9eed>] ? __getblk+0x2d/0x2e0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa04cee47>] jbd2_journal_get_write_access+0x27/0x40 [jbd2] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa05304cb>] __ext4_journal_get_write_access+0x3b/0x80 [ext4] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa05029a0>] ext4_reserve_inode_write+0x70/0xa0 [ext4] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa0505e30>] ? ext4_dirty_inode+0x40/0x60 [ext4] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa0502a23>] ext4_mark_inode_dirty+0x53/0x210 [ext4] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa0505e30>] ext4_dirty_inode+0x40/0x60 [ext4] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff811f069a>] __mark_inode_dirty+0xca/0x290 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff811e1721>] update_time+0x81/0xd0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff811e1970>] file_update_time+0xa0/0xf0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81158648>] __generic_file_aio_write+0x198/0x3e0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff811588ed>] generic_file_aio_write+0x5d/0xc0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa04f9b75>] ext4_file_write+0xb5/0x460 [ext4] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa026f7ac>] ? xfs_iunlock+0xdc/0xf0 [xfs] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffffa02293a9>] ? xfs_file_aio_read+0x189/0x310 [xfs] Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff811c5ebd>] do_sync_write+0x8d/0xd0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff811c665d>] vfs_write+0xbd/0x1e0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff811c70a8>] SyS_write+0x58/0xb0 Feb 9 12:28:53 drbd9-01 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b 14) once again after 2 minutes Feb 9 12:30:53 drbd9-01 kernel: INFO: task drbd_w_r0:2716 blocked for more than 120 seconds. Feb 9 12:30:53 drbd9-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 9 12:30:53 drbd9-01 kernel: drbd_w_r0 D ffff88003d613680 0 2716 2 0x00000080 Feb 9 12:30:53 drbd9-01 kernel: ffff8800116ffc10 0000000000000046 ffff880019505b00 ffff8800116fffd8 Feb 9 12:30:53 drbd9-01 kernel: ffff8800116fffd8 ffff8800116fffd8 ffff880019505b00 ffff88002e8dc0e8 Feb 9 12:30:53 drbd9-01 kernel: ffff88002e8dc0ec ffff880019505b00 00000000ffffffff ffff88002e8dc0f0 Feb 9 12:30:53 drbd9-01 kernel: Call Trace: Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa040ef43>] conn_prepare_command+0x33/0x80 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa040efb2>] send_state+0x22/0xb0 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa0410c5b>] drbd_send_state+0x1b/0x20 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa042e668>] w_after_state_change+0x9d8/0x1f30 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81609101>] ? __schedule+0x241/0x7c0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa03ee94d>] drbd_worker+0x10d/0x530 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ? w_complete+0x20/0x20 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa040c521>] drbd_thread_setup+0x61/0x120 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ? w_complete+0x20/0x20 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81614158>] ret_from_fork+0x58/0x90 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Feb 9 12:30:53 drbd9-01 kernel: INFO: task drbd_r_r0:2730 blocked for more than 120 seconds. Feb 9 12:30:53 drbd9-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 9 12:30:53 drbd9-01 kernel: drbd_r_r0 D ffff88003d613680 0 2730 2 0x00000084 Feb 9 12:30:53 drbd9-01 kernel: ffff88001748bd20 0000000000000046 ffff88001b3016c0 ffff88001748bfd8 Feb 9 12:30:53 drbd9-01 kernel: ffff88001748bfd8 ffff88001748bfd8 ffff88001b3016c0 ffff88002e8dc0e8 Feb 9 12:30:53 drbd9-01 kernel: ffff88002e8dc0ec ffff88001b3016c0 00000000ffffffff ffff88002e8dc0f0 Feb 9 12:30:53 drbd9-01 kernel: Call Trace: Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa0412bdc>] drbd_transport_shutdown+0x2c/0x80 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa03ffed5>] conn_disconnect+0xb5/0x2f0 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa0431356>] ? change_cstate+0x86/0xc0 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa0400e18>] drbd_receiver+0x188/0x630 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ? w_complete+0x20/0x20 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa040c521>] drbd_thread_setup+0x61/0x120 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ? w_complete+0x20/0x20 [drbd] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81614158>] ret_from_fork+0x58/0x90 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Feb 9 12:30:53 drbd9-01 kernel: INFO: task jbd2/drbd1-8:2806 blocked for more than 120 seconds. Feb 9 12:30:53 drbd9-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 9 12:30:53 drbd9-01 kernel: jbd2/drbd1-8 D ffff88003d613680 0 2806 2 0x00000080 Feb 9 12:30:53 drbd9-01 kernel: ffff880011623af8 0000000000000046 ffff880011764440 ffff880011623fd8 Feb 9 12:30:53 drbd9-01 kernel: ffff880011623fd8 ffff880011623fd8 ffff880011764440 ffff88003d613f48 Feb 9 12:30:53 drbd9-01 kernel: ffff88003ffac1e8 0000000000000002 ffffffff81155fa0 ffff880011623b70 Feb 9 12:30:53 drbd9-01 kernel: Call Trace: Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81155fa0>] ? wait_on_page_read+0x60/0x60 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff816099ad>] io_schedule+0x9d/0x130 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81155fae>] sleep_on_page+0xe/0x20 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81607780>] __wait_on_bit+0x60/0x90 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81155d36>] wait_on_page_bit+0x86/0xb0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81098270>] ? autoremove_wake_function+0x40/0x40 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81155e71>] filemap_fdatawait_range+0x111/0x1b0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81155f37>] filemap_fdatawait+0x27/0x30 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa04d0a9f>] jbd2_journal_commit_transaction+0xa8f/0x19a0 [jbd2] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff810ad756>] ? __dequeue_entity+0x26/0x40 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff810125c6>] ? __switch_to+0x136/0x4a0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa04d5da9>] kjournald2+0xc9/0x260 [jbd2] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa04d5ce0>] ? commit_timeout+0x10/0x10 [jbd2] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81614158>] ret_from_fork+0x58/0x90 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Feb 9 12:30:53 drbd9-01 kernel: INFO: task cp:2815 blocked for more than 120 seconds. Feb 9 12:30:53 drbd9-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 9 12:30:53 drbd9-01 kernel: cp D ffff88003d613680 0 2815 2626 0x00000080 Feb 9 12:30:53 drbd9-01 kernel: ffff88001950b9a8 0000000000000082 ffff880011764fa0 ffff88001950bfd8 Feb 9 12:30:53 drbd9-01 kernel: ffff88001950bfd8 ffff88001950bfd8 ffff880011764fa0 ffff88003d613f48 Feb 9 12:30:53 drbd9-01 kernel: ffff88003ffa9f50 0000000000000002 ffffffffa04cd720 ffff88001950ba20 Feb 9 12:30:53 drbd9-01 kernel: Call Trace: Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa04cd720>] ? start_this_handle+0x5d0/0x5d0 [jbd2] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff816099ad>] io_schedule+0x9d/0x130 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa04cd72e>] sleep_on_shadow_bh+0xe/0x20 [jbd2] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81607780>] __wait_on_bit+0x60/0x90 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa04cd720>] ? start_this_handle+0x5d0/0x5d0 [jbd2] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81607837>] out_of_line_wait_on_bit+0x87/0xb0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81098270>] ? autoremove_wake_function+0x40/0x40 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa04cec0d>] do_get_write_access+0x2ed/0x500 [jbd2] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff811f9eed>] ? __getblk+0x2d/0x2e0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa04cee47>] jbd2_journal_get_write_access+0x27/0x40 [jbd2] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa05304cb>] __ext4_journal_get_write_access+0x3b/0x80 [ext4] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa05029a0>] ext4_reserve_inode_write+0x70/0xa0 [ext4] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa0505e30>] ? ext4_dirty_inode+0x40/0x60 [ext4] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa0502a23>] ext4_mark_inode_dirty+0x53/0x210 [ext4] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa0505e30>] ext4_dirty_inode+0x40/0x60 [ext4] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff811f069a>] __mark_inode_dirty+0xca/0x290 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff811e1721>] update_time+0x81/0xd0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff811e1970>] file_update_time+0xa0/0xf0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81158648>] __generic_file_aio_write+0x198/0x3e0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff811588ed>] generic_file_aio_write+0x5d/0xc0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa04f9b75>] ext4_file_write+0xb5/0x460 [ext4] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa026f7ac>] ? xfs_iunlock+0xdc/0xf0 [xfs] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffffa02293a9>] ? xfs_file_aio_read+0x189/0x310 [xfs] Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff811c5ebd>] do_sync_write+0x8d/0xd0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff811c665d>] vfs_write+0xbd/0x1e0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff811c70a8>] SyS_write+0x58/0xb0 Feb 9 12:30:53 drbd9-01 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b 15) reset VM Feb 9 12:35:03 drbd9-01 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="902" x-info="http://www.rsyslog.com"] start Feb 9 12:34:55 drbd9-01 journal: Runtime journal is using 6.1M (max 48.9M, leaving 73.4M of free 483.6M, current limit 48.9M). Feb 9 12:34:55 drbd9-01 kernel: Initializing cgroup subsys cpuset Feb 9 12:34:55 drbd9-01 kernel: Initializing cgroup subsys cpu Feb 9 12:34:55 drbd9-01 kernel: Initializing cgroup subsys cpuacct Feb 9 12:34:55 drbd9-01 kernel: Linux version 3.10.0-229.7.2.el7.x86_64 ( builder at kbuilder.dev.centos.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Tue Jun 23 22:06:11 UTC 2015 Feb 9 12:34:55 drbd9-01 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.7.2.el7.x86_64 root=/dev/mapper/centos-root rord.lvm.lv=centos/root rd.lvm.lv=centos/swap net.ifnames=0 rhgb quiet systemd.debug LANG=ko_KR.UTF-8 4. Question - Please see above 10), 11) points, I think It's deadlock. - the worker and receiver threads may be deadlocked by race condition for data socket. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160212/7787fd19/attachment.htm>