No subject


Tue Jan 23 13:55:44 CET 2018


    [Sun Mar 10 05:46:55 2019] INFO: task drbd_r_csi-9400:21586
blocked for more than 120 seconds.
    [Sun Mar 10 05:46:55 2019]       Tainted: P           O     4.15.18-7-pve #1
    [Sun Mar 10 05:46:55 2019] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [Sun Mar 10 05:46:55 2019] drbd_r_csi-9400 D    0 21586      2 0x80000000
    [Sun Mar 10 05:46:55 2019] Call Trace:
    [Sun Mar 10 05:46:55 2019]  __schedule+0x3e0/0x870
    [Sun Mar 10 05:46:55 2019]  schedule+0x36/0x80
    [Sun Mar 10 05:46:55 2019]  receive_peer_dagtag+0x162/0x2f0 [drbd]
    [Sun Mar 10 05:46:55 2019]  ? wait_woken+0x80/0x80
    [Sun Mar 10 05:46:55 2019]  ? got_twopc_reply+0x1f0/0x1f0 [drbd]
    [Sun Mar 10 05:46:55 2019]  drbd_receiver+0x4ed/0x670 [drbd]
    [Sun Mar 10 05:46:55 2019]  drbd_thread_setup+0x76/0x180 [drbd]
    [Sun Mar 10 05:46:55 2019]  kthread+0x105/0x140
    [Sun Mar 10 05:46:55 2019]  ? __drbd_next_peer_device_ref+0x150/0x150 [drbd]
    [Sun Mar 10 05:46:55 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
    [Sun Mar 10 05:46:55 2019]  ? do_syscall_64+0x73/0x130
    [Sun Mar 10 05:46:55 2019]  ? SyS_exit_group+0x14/0x20
    [Sun Mar 10 05:46:55 2019]  ret_from_fork+0x35/0x40
    [Sun Mar 10 05:46:55 2019] INFO: task drbd_r_csi-9400:18504
blocked for more than 120 seconds.
    [Sun Mar 10 05:46:55 2019]       Tainted: P           O     4.15.18-7-pve #1
    [Sun Mar 10 05:46:55 2019] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [Sun Mar 10 05:46:55 2019] drbd_r_csi-9400 D    0 18504      2 0x80000004
    [Sun Mar 10 05:46:55 2019] Call Trace:
    [Sun Mar 10 05:46:55 2019]  __schedule+0x3e0/0x870
    [Sun Mar 10 05:46:55 2019]  schedule+0x36/0x80
    [Sun Mar 10 05:46:55 2019]  conn_wait_ee_empty+0x72/0x9f [drbd]
    [Sun Mar 10 05:46:55 2019]  ? wait_woken+0x80/0x80
    [Sun Mar 10 05:46:55 2019]  conn_disconnect+0x1c0/0x780 [drbd]
    [Sun Mar 10 05:46:55 2019]  ? w_flush+0x50/0x50 [drbd]
    [Sun Mar 10 05:46:55 2019]  drbd_receiver+0x2a2/0x670 [drbd]
    [Sun Mar 10 05:46:55 2019]  drbd_thread_setup+0x76/0x180 [drbd]
    [Sun Mar 10 05:46:55 2019]  kthread+0x105/0x140
    [Sun Mar 10 05:46:55 2019]  ? __drbd_next_peer_device_ref+0x150/0x150 [drbd]
    [Sun Mar 10 05:46:55 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
    [Sun Mar 10 05:46:55 2019]  ret_from_fork+0x35/0x40
    [Sun Mar 10 05:46:55 2019] INFO: task kworker/u16:0:18946 blocked
for more than 120 seconds.
    [Sun Mar 10 05:46:55 2019]       Tainted: P           O     4.15.18-7-pve #1
    [Sun Mar 10 05:46:55 2019] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [Sun Mar 10 05:46:55 2019] kworker/u16:0   D    0 18946      2 0x80000000
    [Sun Mar 10 05:46:55 2019] Workqueue: drbd1104_submit do_submit [drbd]
    [Sun Mar 10 05:46:55 2019] Call Trace:
    [Sun Mar 10 05:46:55 2019]  __schedule+0x3e0/0x870
    [Sun Mar 10 05:46:55 2019]  schedule+0x36/0x80
    [Sun Mar 10 05:46:55 2019]  do_submit+0x325/0x5f0 [drbd]
    [Sun Mar 10 05:46:55 2019]  ? __switch_to_asm+0x34/0x70
    [Sun Mar 10 05:46:55 2019]  ? __switch_to_asm+0x40/0x70
    [Sun Mar 10 05:46:55 2019]  ? __switch_to_asm+0x34/0x70
    [Sun Mar 10 05:46:55 2019]  ? __switch_to_asm+0x40/0x70
    [Sun Mar 10 05:46:55 2019]  ? wait_woken+0x80/0x80
    [Sun Mar 10 05:46:55 2019]  ? __switch_to_asm+0x34/0x70
    [Sun Mar 10 05:46:55 2019]  process_one_work+0x1e0/0x400
    [Sun Mar 10 05:46:55 2019]  ? __drbd_make_request+0x4e0/0x4e0 [drbd]
    [Sun Mar 10 05:46:55 2019]  ? process_one_work+0x1e0/0x400
    [Sun Mar 10 05:46:55 2019]  worker_thread+0x4b/0x420
    [Sun Mar 10 05:46:55 2019]  kthread+0x105/0x140
    [Sun Mar 10 05:46:55 2019]  ? process_one_work+0x400/0x400
    [Sun Mar 10 05:46:55 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
    [Sun Mar 10 05:46:55 2019]  ? do_syscall_64+0x73/0x130
    [Sun Mar 10 05:46:55 2019]  ? SyS_exit_group+0x14/0x20
    [Sun Mar 10 05:46:55 2019]  ret_from_fork+0x35/0x40

`drbdadm status` shows `NetworkFailure` on the diskless node:

    csi-940049b5-d5aa-46f0-9a73-fe601c3fc696 role:Secondary
      disk:Consistent
      m7c14 role:Secondary
        peer-disk:Consistent
      m8c25 connection:NetworkFailure
      m8c29 role:Secondary
        peer-disk:Diskless

`linstor r l` shows `Consistent` for the diskless node.

    | csi-940049b5-d5aa-46f0-9a73-fe601c3fc696       | m12c4  | 7158 |
Unused | Consistent |
    | csi-940049b5-d5aa-46f0-9a73-fe601c3fc696       | m7c14  | 7158 |
Unused |   UpToDate |
    | csi-940049b5-d5aa-46f0-9a73-fe601c3fc696       | m8c25  | 7158 |
Unused |   Diskless |
    | csi-940049b5-d5aa-46f0-9a73-fe601c3fc696       | m8c29  | 7158 |
Unused |   Diskless |

Any `drbdadm down`, `drbdadm disconnect` operations are stuck and not
working on the data nodes. Only reboot is helping.

We have this issue with one volume, then we created another one and
copied all the files from old one to new one and run workload on new.
This problem was occurred with new volume too, although the different
nodes with different hardware was used.

Environement:

backend: lvm
filesystem: ext4
linstor version: v0.7.5
kernel version: 4.15.18-7-pve
drbd version:

    version: 9.0.14-1 (api:2/proto:86-113)
    GIT-hash: 62f906cf44ef02a30ce0c148fec223b40c51c533 build by
@gitlab-runner-docker1-0, 2018-10-09 16:33:22
    Transports (api:16): tcp (9.0.14-1)

Could you explain please where exactly can be problem and where to dig further?
Thanks in advance!

- kvaps


More information about the drbd-user mailing list