[DRBD-user] [CASE-14] primary node hang by VM-net-disconnect during big file copy

김재헌 jhkim at mantech.co.kr
Fri Feb 12 08:02:33 CET 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

This problem is reproduced on your latest 9.0.1-1 version also.

1. Test version
 - CentOS-7 Linux 3.10.0-229.7.2.el7.x86_64
- [root at drbd9-02 etc]# cat /proc/drbd
      version: 9.0.1-1 (api:2/proto:86-111)
      GIT-hash: f57acfc22d29a95697e683fb6bbacd9a1ad4348e build by
root at drbd9-02, 2016-02-09 09:46:20
      Transports (api:14): tcp (1.0.0)


2. Test scenario
  1) setup primary-seconary
  2) set Async replication mode(sndbuf=1M, congestion-fill=256KB)
  3) copy big file(4GB)
  4) disconnect by disable vmware's network icon while copying
  5) Result:
        - VM hang sometimes!!!!


3. Log (/var/log/messages) analysys:
1) promote to primary

Attached:Feb  9 12:25:13 drbd9-01 kernel: drbd r0: role( Secondary ->
Primary )
Feb  9 12:25:13 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: unexpected
Feb  9 12:25:13 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: In UUIDs from
node 1 found equal UUID (3D7DF30CF04727A4) for nodes 2 3
Feb  9 12:25:13 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: I have
EBD0DBB6FFD73C44 for node_id=2
Feb  9 12:25:13 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: I have
EBD0DBB6FFD73C44 for node_id=3

2) mount volume to /mnt

Feb  9 12:25:22 drbd9-01 kernel: EXT4-fs (drbd1): mounting ext3 file system
using the ext4 subsystem
Feb  9 12:25:22 drbd9-01 kernel: EXT4-fs (drbd1): mounted filesystem with
ordered data mode. Opts: (null)

3) copy big file(3GB) to data volume (/mnt)

4) congestion occured

Feb  9 12:25:53 drbd9-01 kernel: drbd r0/0 drbd1: Congestion-fill threshold
reached
Feb  9 12:25:53 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: pdsk( UpToDate
-> Consistent ) repl( Established -> Ahead )
Feb  9 12:25:55 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: pdsk( Consistent
-> Outdated )
Feb  9 12:25:55 drbd9-01 kernel: drbd r0/0 drbd1: new current UUID:
C223F7CE3C9D7359 weak: FFFFFFFFFFFFFFFE
Feb  9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: helper command:
/sbin/drbdadm before-resync-source
Feb  9 12:25:56 drbd9-01 drbdadm[2817]: Don't know which config file
belongs to resource r0, trying default ones...
Feb  9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: helper command:
/sbin/drbdadm before-resync-source exit code 0 (0x0)
Feb  9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: pdsk( Outdated
-> Inconsistent ) repl( Ahead -> SyncSource )
Feb  9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: Began resync as
SyncSource (will sync 295372 KB [73843 bits set]).
Feb  9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1: Congestion-fill threshold
reached
Feb  9 12:25:56 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: repl( SyncSource
-> Ahead )

5) Disable Vmware network icon

Feb  9 12:25:56 drbd9-01 kernel: vmxnet3 0000:04:00.0 eth1: NIC Link is Down

6) send timeout occured

Feb  9 12:26:08 drbd9-01 kernel: drbd r0 drbd9-02: [drbd_s_r0/2726] sending
time expired, ko = 4294967295

7) ping timeout occured

Feb  9 12:26:09 drbd9-01 kernel: drbd r0 drbd9-02: PingAck did not arrive
in time.
Feb  9 12:26:09 drbd9-01 kernel: drbd r0 drbd9-02: conn( Connected ->
NetworkFailure ) peer( Secondary -> Unknown )
Feb  9 12:26:09 drbd9-01 kernel: drbd r0/0 drbd1 drbd9-02: repl( Ahead ->
Off )
Feb  9 12:26:09 drbd9-01 kernel: drbd r0 drbd9-02: ack_receiver terminated
Feb  9 12:26:09 drbd9-01 kernel: drbd r0 drbd9-02: Terminating ack_recv
thread

8)  "cp 4GB.file /mnt" application hang

9) hang status be monitored by hung_task_timeout_secs.

10) worker thread deadlock

Feb  9 12:28:53 drbd9-01 kernel: INFO: task drbd_w_r0:2716 blocked for more
than 120 seconds.
Feb  9 12:28:53 drbd9-01 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 12:28:53 drbd9-01 kernel: drbd_w_r0       D ffff88003d613680     0
 2716      2 0x00000080
Feb  9 12:28:53 drbd9-01 kernel: ffff8800116ffc10 0000000000000046
ffff880019505b00 ffff8800116fffd8
Feb  9 12:28:53 drbd9-01 kernel: ffff8800116fffd8 ffff8800116fffd8
ffff880019505b00 ffff88002e8dc0e8
Feb  9 12:28:53 drbd9-01 kernel: ffff88002e8dc0ec ffff880019505b00
00000000ffffffff ffff88002e8dc0f0
Feb  9 12:28:53 drbd9-01 kernel: Call Trace:
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff8160a899>]
schedule_preempt_disabled+0x29/0x70
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff816085e5>]
__mutex_lock_slowpath+0xc5/0x1c0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa040ef43>]
conn_prepare_command+0x33/0x80 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa040efb2>] send_state+0x22/0xb0
[drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa0410c5b>]
drbd_send_state+0x1b/0x20 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa042e668>]
w_after_state_change+0x9d8/0x1f30 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81609101>] ?
__schedule+0x241/0x7c0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa03ee94d>]
drbd_worker+0x10d/0x530 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81098230>] ?
wake_up_bit+0x30/0x30
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ?
w_complete+0x20/0x20 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa040c521>]
drbd_thread_setup+0x61/0x120 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ?
w_complete+0x20/0x20 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81614158>]
ret_from_fork+0x58/0x90
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140

11) receiver thread deadlock

Feb  9 12:28:53 drbd9-01 kernel: INFO: task drbd_r_r0:2730 blocked for more
than 120 seconds.
Feb  9 12:28:53 drbd9-01 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 12:28:53 drbd9-01 kernel: drbd_r_r0       D ffff88003d613680     0
 2730      2 0x00000084
Feb  9 12:28:53 drbd9-01 kernel: ffff88001748bd20 0000000000000046
ffff88001b3016c0 ffff88001748bfd8
Feb  9 12:28:53 drbd9-01 kernel: ffff88001748bfd8 ffff88001748bfd8
ffff88001b3016c0 ffff88002e8dc0e8
Feb  9 12:28:53 drbd9-01 kernel: ffff88002e8dc0ec ffff88001b3016c0
00000000ffffffff ffff88002e8dc0f0
Feb  9 12:28:53 drbd9-01 kernel: Call Trace:
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff8160a899>]
schedule_preempt_disabled+0x29/0x70
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff816085e5>]
__mutex_lock_slowpath+0xc5/0x1c0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa0412bdc>]
drbd_transport_shutdown+0x2c/0x80 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa03ffed5>]
conn_disconnect+0xb5/0x2f0 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa0431356>] ?
change_cstate+0x86/0xc0 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa0400e18>]
drbd_receiver+0x188/0x630 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ?
w_complete+0x20/0x20 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa040c521>]
drbd_thread_setup+0x61/0x120 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ?
w_complete+0x20/0x20 [drbd]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81614158>]
ret_from_fork+0x58/0x90
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140

12) copy app deadlock maybe.

Feb  9 12:28:53 drbd9-01 kernel: INFO: task jbd2/drbd1-8:2806 blocked for
more than 120 seconds.
Feb  9 12:28:53 drbd9-01 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 12:28:53 drbd9-01 kernel: jbd2/drbd1-8    D ffff88003d613680     0
 2806      2 0x00000080
Feb  9 12:28:53 drbd9-01 kernel: ffff880011623af8 0000000000000046
ffff880011764440 ffff880011623fd8
Feb  9 12:28:53 drbd9-01 kernel: ffff880011623fd8 ffff880011623fd8
ffff880011764440 ffff88003d613f48
Feb  9 12:28:53 drbd9-01 kernel: ffff88003ffac1e8 0000000000000002
ffffffff81155fa0 ffff880011623b70
Feb  9 12:28:53 drbd9-01 kernel: Call Trace:
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81155fa0>] ?
wait_on_page_read+0x60/0x60
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff816099ad>] io_schedule+0x9d/0x130
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81155fae>] sleep_on_page+0xe/0x20
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81607780>]
__wait_on_bit+0x60/0x90
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81155d36>]
wait_on_page_bit+0x86/0xb0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81098270>] ?
autoremove_wake_function+0x40/0x40
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81155e71>]
filemap_fdatawait_range+0x111/0x1b0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81155f37>]
filemap_fdatawait+0x27/0x30
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa04d0a9f>]
jbd2_journal_commit_transaction+0xa8f/0x19a0 [jbd2]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff810ad756>] ?
__dequeue_entity+0x26/0x40
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff810125c6>] ?
__switch_to+0x136/0x4a0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa04d5da9>] kjournald2+0xc9/0x260
[jbd2]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81098230>] ?
wake_up_bit+0x30/0x30
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa04d5ce0>] ?
commit_timeout+0x10/0x10 [jbd2]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81614158>]
ret_from_fork+0x58/0x90
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140

13) copy app deadlock maybe.

Feb  9 12:28:53 drbd9-01 kernel: INFO: task cp:2815 blocked for more than
120 seconds.
Feb  9 12:28:53 drbd9-01 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 12:28:53 drbd9-01 kernel: cp              D ffff88003d613680     0
 2815   2626 0x00000080
Feb  9 12:28:53 drbd9-01 kernel: ffff88001950b9a8 0000000000000082
ffff880011764fa0 ffff88001950bfd8
Feb  9 12:28:53 drbd9-01 kernel: ffff88001950bfd8 ffff88001950bfd8
ffff880011764fa0 ffff88003d613f48
Feb  9 12:28:53 drbd9-01 kernel: ffff88003ffa9f50 0000000000000002
ffffffffa04cd720 ffff88001950ba20
Feb  9 12:28:53 drbd9-01 kernel: Call Trace:
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa04cd720>] ?
start_this_handle+0x5d0/0x5d0 [jbd2]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff816099ad>] io_schedule+0x9d/0x130
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa04cd72e>]
sleep_on_shadow_bh+0xe/0x20 [jbd2]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81607780>]
__wait_on_bit+0x60/0x90
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa04cd720>] ?
start_this_handle+0x5d0/0x5d0 [jbd2]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81607837>]
out_of_line_wait_on_bit+0x87/0xb0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81098270>] ?
autoremove_wake_function+0x40/0x40
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa04cec0d>]
do_get_write_access+0x2ed/0x500 [jbd2]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff811f9eed>] ? __getblk+0x2d/0x2e0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa04cee47>]
jbd2_journal_get_write_access+0x27/0x40 [jbd2]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa05304cb>]
__ext4_journal_get_write_access+0x3b/0x80 [ext4]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa05029a0>]
ext4_reserve_inode_write+0x70/0xa0 [ext4]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa0505e30>] ?
ext4_dirty_inode+0x40/0x60 [ext4]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa0502a23>]
ext4_mark_inode_dirty+0x53/0x210 [ext4]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa0505e30>]
ext4_dirty_inode+0x40/0x60 [ext4]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff811f069a>]
__mark_inode_dirty+0xca/0x290
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff811e1721>] update_time+0x81/0xd0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff811e1970>]
file_update_time+0xa0/0xf0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81158648>]
__generic_file_aio_write+0x198/0x3e0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff811588ed>]
generic_file_aio_write+0x5d/0xc0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa04f9b75>]
ext4_file_write+0xb5/0x460 [ext4]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa026f7ac>] ?
xfs_iunlock+0xdc/0xf0 [xfs]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffffa02293a9>] ?
xfs_file_aio_read+0x189/0x310 [xfs]
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff811c5ebd>]
do_sync_write+0x8d/0xd0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff811c665d>] vfs_write+0xbd/0x1e0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff811c70a8>] SyS_write+0x58/0xb0
Feb  9 12:28:53 drbd9-01 kernel: [<ffffffff81614209>]
system_call_fastpath+0x16/0x1b

14) once again after 2 minutes

Feb  9 12:30:53 drbd9-01 kernel: INFO: task drbd_w_r0:2716 blocked for more
than 120 seconds.
Feb  9 12:30:53 drbd9-01 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 12:30:53 drbd9-01 kernel: drbd_w_r0       D ffff88003d613680     0
 2716      2 0x00000080
Feb  9 12:30:53 drbd9-01 kernel: ffff8800116ffc10 0000000000000046
ffff880019505b00 ffff8800116fffd8
Feb  9 12:30:53 drbd9-01 kernel: ffff8800116fffd8 ffff8800116fffd8
ffff880019505b00 ffff88002e8dc0e8
Feb  9 12:30:53 drbd9-01 kernel: ffff88002e8dc0ec ffff880019505b00
00000000ffffffff ffff88002e8dc0f0
Feb  9 12:30:53 drbd9-01 kernel: Call Trace:
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff8160a899>]
schedule_preempt_disabled+0x29/0x70
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff816085e5>]
__mutex_lock_slowpath+0xc5/0x1c0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa040ef43>]
conn_prepare_command+0x33/0x80 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa040efb2>] send_state+0x22/0xb0
[drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa0410c5b>]
drbd_send_state+0x1b/0x20 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa042e668>]
w_after_state_change+0x9d8/0x1f30 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81609101>] ?
__schedule+0x241/0x7c0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa03ee94d>]
drbd_worker+0x10d/0x530 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81098230>] ?
wake_up_bit+0x30/0x30
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ?
w_complete+0x20/0x20 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa040c521>]
drbd_thread_setup+0x61/0x120 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ?
w_complete+0x20/0x20 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81614158>]
ret_from_fork+0x58/0x90
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
Feb  9 12:30:53 drbd9-01 kernel: INFO: task drbd_r_r0:2730 blocked for more
than 120 seconds.
Feb  9 12:30:53 drbd9-01 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 12:30:53 drbd9-01 kernel: drbd_r_r0       D ffff88003d613680     0
 2730      2 0x00000084
Feb  9 12:30:53 drbd9-01 kernel: ffff88001748bd20 0000000000000046
ffff88001b3016c0 ffff88001748bfd8
Feb  9 12:30:53 drbd9-01 kernel: ffff88001748bfd8 ffff88001748bfd8
ffff88001b3016c0 ffff88002e8dc0e8
Feb  9 12:30:53 drbd9-01 kernel: ffff88002e8dc0ec ffff88001b3016c0
00000000ffffffff ffff88002e8dc0f0
Feb  9 12:30:53 drbd9-01 kernel: Call Trace:
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff8160a899>]
schedule_preempt_disabled+0x29/0x70
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff816085e5>]
__mutex_lock_slowpath+0xc5/0x1c0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa0412bdc>]
drbd_transport_shutdown+0x2c/0x80 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa03ffed5>]
conn_disconnect+0xb5/0x2f0 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa0431356>] ?
change_cstate+0x86/0xc0 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa0400e18>]
drbd_receiver+0x188/0x630 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ?
w_complete+0x20/0x20 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa040c521>]
drbd_thread_setup+0x61/0x120 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa040c4c0>] ?
w_complete+0x20/0x20 [drbd]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81614158>]
ret_from_fork+0x58/0x90
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140

Feb  9 12:30:53 drbd9-01 kernel: INFO: task jbd2/drbd1-8:2806 blocked for
more than 120 seconds.
Feb  9 12:30:53 drbd9-01 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 12:30:53 drbd9-01 kernel: jbd2/drbd1-8    D ffff88003d613680     0
 2806      2 0x00000080
Feb  9 12:30:53 drbd9-01 kernel: ffff880011623af8 0000000000000046
ffff880011764440 ffff880011623fd8
Feb  9 12:30:53 drbd9-01 kernel: ffff880011623fd8 ffff880011623fd8
ffff880011764440 ffff88003d613f48
Feb  9 12:30:53 drbd9-01 kernel: ffff88003ffac1e8 0000000000000002
ffffffff81155fa0 ffff880011623b70
Feb  9 12:30:53 drbd9-01 kernel: Call Trace:
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81155fa0>] ?
wait_on_page_read+0x60/0x60
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff816099ad>] io_schedule+0x9d/0x130
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81155fae>] sleep_on_page+0xe/0x20
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81607780>]
__wait_on_bit+0x60/0x90
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81155d36>]
wait_on_page_bit+0x86/0xb0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81098270>] ?
autoremove_wake_function+0x40/0x40
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81155e71>]
filemap_fdatawait_range+0x111/0x1b0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81155f37>]
filemap_fdatawait+0x27/0x30
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa04d0a9f>]
jbd2_journal_commit_transaction+0xa8f/0x19a0 [jbd2]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff810ad756>] ?
__dequeue_entity+0x26/0x40
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff810125c6>] ?
__switch_to+0x136/0x4a0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa04d5da9>] kjournald2+0xc9/0x260
[jbd2]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81098230>] ?
wake_up_bit+0x30/0x30
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa04d5ce0>] ?
commit_timeout+0x10/0x10 [jbd2]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81614158>]
ret_from_fork+0x58/0x90
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff810971a0>] ?
kthread_create_on_node+0x140/0x140
Feb  9 12:30:53 drbd9-01 kernel: INFO: task cp:2815 blocked for more than
120 seconds.
Feb  9 12:30:53 drbd9-01 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 12:30:53 drbd9-01 kernel: cp              D ffff88003d613680     0
 2815   2626 0x00000080
Feb  9 12:30:53 drbd9-01 kernel: ffff88001950b9a8 0000000000000082
ffff880011764fa0 ffff88001950bfd8
Feb  9 12:30:53 drbd9-01 kernel: ffff88001950bfd8 ffff88001950bfd8
ffff880011764fa0 ffff88003d613f48
Feb  9 12:30:53 drbd9-01 kernel: ffff88003ffa9f50 0000000000000002
ffffffffa04cd720 ffff88001950ba20
Feb  9 12:30:53 drbd9-01 kernel: Call Trace:
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa04cd720>] ?
start_this_handle+0x5d0/0x5d0 [jbd2]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff816099ad>] io_schedule+0x9d/0x130
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa04cd72e>]
sleep_on_shadow_bh+0xe/0x20 [jbd2]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81607780>]
__wait_on_bit+0x60/0x90
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa04cd720>] ?
start_this_handle+0x5d0/0x5d0 [jbd2]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81607837>]
out_of_line_wait_on_bit+0x87/0xb0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81098270>] ?
autoremove_wake_function+0x40/0x40
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa04cec0d>]
do_get_write_access+0x2ed/0x500 [jbd2]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff811f9eed>] ? __getblk+0x2d/0x2e0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa04cee47>]
jbd2_journal_get_write_access+0x27/0x40 [jbd2]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa05304cb>]
__ext4_journal_get_write_access+0x3b/0x80 [ext4]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa05029a0>]
ext4_reserve_inode_write+0x70/0xa0 [ext4]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa0505e30>] ?
ext4_dirty_inode+0x40/0x60 [ext4]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa0502a23>]
ext4_mark_inode_dirty+0x53/0x210 [ext4]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa0505e30>]
ext4_dirty_inode+0x40/0x60 [ext4]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff811f069a>]
__mark_inode_dirty+0xca/0x290
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff811e1721>] update_time+0x81/0xd0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff811e1970>]
file_update_time+0xa0/0xf0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81158648>]
__generic_file_aio_write+0x198/0x3e0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff811588ed>]
generic_file_aio_write+0x5d/0xc0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa04f9b75>]
ext4_file_write+0xb5/0x460 [ext4]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa026f7ac>] ?
xfs_iunlock+0xdc/0xf0 [xfs]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffffa02293a9>] ?
xfs_file_aio_read+0x189/0x310 [xfs]
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff811c5ebd>]
do_sync_write+0x8d/0xd0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff811c665d>] vfs_write+0xbd/0x1e0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff811c70a8>] SyS_write+0x58/0xb0
Feb  9 12:30:53 drbd9-01 kernel: [<ffffffff81614209>]
system_call_fastpath+0x16/0x1b

15) reset VM

Feb  9 12:35:03 drbd9-01 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.7" x-pid="902" x-info="http://www.rsyslog.com"] start
Feb  9 12:34:55 drbd9-01 journal: Runtime journal is using 6.1M (max 48.9M,
leaving 73.4M of free 483.6M, current limit 48.9M).
Feb  9 12:34:55 drbd9-01 kernel: Initializing cgroup subsys cpuset
Feb  9 12:34:55 drbd9-01 kernel: Initializing cgroup subsys cpu
Feb  9 12:34:55 drbd9-01 kernel: Initializing cgroup subsys cpuacct
Feb  9 12:34:55 drbd9-01 kernel: Linux version 3.10.0-229.7.2.el7.x86_64 (
builder at kbuilder.dev.centos.org) (gcc version 4.8.3 20140911 (Red Hat
4.8.3-9) (GCC) ) #1 SMP Tue Jun 23 22:06:11 UTC 2015
Feb  9 12:34:55 drbd9-01 kernel: Command line:
BOOT_IMAGE=/vmlinuz-3.10.0-229.7.2.el7.x86_64 root=/dev/mapper/centos-root
rord.lvm.lv=centos/root rd.lvm.lv=centos/swap net.ifnames=0 rhgb quiet
systemd.debug LANG=ko_KR.UTF-8


4. Question
 - Please see above 10), 11) points, I think It's deadlock.
 - the worker and receiver threads may be deadlocked by race condition for
data socket.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160212/7787fd19/attachment.htm>


More information about the drbd-user mailing list