[DRBD-user] New 3-way drbd setup does not seem to take i/o

Remolina, Diego J dijuremo at aerospace.gatech.edu
Thu May 3 13:45:14 CEST 2018


A bit of progress, but still call traces being dumped in the logs. I waited for the full initial sync to finish, then I created the file system from a different node, ae-fs02, instead of ae-fs01. Initially, the command hung for a while, but it eventually succeded. However the following call traces where dumped:


[ 5687.457691] drbd test: role( Secondary -> Primary )
[ 5882.661739] INFO: task mkfs.xfs:80231 blocked for more than 120 seconds.
[ 5882.661770] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5882.661796] mkfs.xfs        D ffff9df559b1cf10     0 80231   8839 0x00000080
[ 5882.661800] Call Trace:
[ 5882.661807]  [<ffffffffb7d12f49>] schedule+0x29/0x70
[ 5882.661809]  [<ffffffffb7d108b9>] schedule_timeout+0x239/0x2c0
[ 5882.661819]  [<ffffffffc08a1f1b>] ? drbd_make_request+0x23b/0x360 [drbd]
[ 5882.661824]  [<ffffffffb76f76e2>] ? ktime_get_ts64+0x52/0xf0
[ 5882.661826]  [<ffffffffb7d1245d>] io_schedule_timeout+0xad/0x130
[ 5882.661828]  [<ffffffffb7d1357d>] wait_for_completion_io+0xfd/0x140
[ 5882.661833]  [<ffffffffb76cee80>] ? wake_up_state+0x20/0x20
[ 5882.661837]  [<ffffffffb792308c>] blkdev_issue_discard+0x2ac/0x2d0
[ 5882.661843]  [<ffffffffb792c141>] blk_ioctl_discard+0xd1/0x120
[ 5882.661845]  [<ffffffffb792cc12>] blkdev_ioctl+0x5e2/0x9b0
[ 5882.661849]  [<ffffffffb7859691>] block_ioctl+0x41/0x50
[ 5882.661854]  [<ffffffffb782fb90>] do_vfs_ioctl+0x350/0x560
[ 5882.661857]  [<ffffffffb77ccc77>] ? do_munmap+0x317/0x470
[ 5882.661859]  [<ffffffffb782fe41>] SyS_ioctl+0xa1/0xc0
[ 5882.661862]  [<ffffffffb7d1f7d5>] system_call_fastpath+0x1c/0x21
[ 6002.650486] INFO: task mkfs.xfs:80231 blocked for more than 120 seconds.
[ 6002.650514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6002.650539] mkfs.xfs        D ffff9df559b1cf10     0 80231   8839 0x00000080
[ 6002.650543] Call Trace:
[ 6002.650550]  [<ffffffffb7d12f49>] schedule+0x29/0x70
[ 6002.650552]  [<ffffffffb7d108b9>] schedule_timeout+0x239/0x2c0
[ 6002.650563]  [<ffffffffc08a1f1b>] ? drbd_make_request+0x23b/0x360 [drbd]
[ 6002.650567]  [<ffffffffb76f76e2>] ? ktime_get_ts64+0x52/0xf0
[ 6002.650569]  [<ffffffffb7d1245d>] io_schedule_timeout+0xad/0x130
[ 6002.650571]  [<ffffffffb7d1357d>] wait_for_completion_io+0xfd/0x140
[ 6002.650575]  [<ffffffffb76cee80>] ? wake_up_state+0x20/0x20
[ 6002.650579]  [<ffffffffb792308c>] blkdev_issue_discard+0x2ac/0x2d0
[ 6002.650582]  [<ffffffffb792c141>] blk_ioctl_discard+0xd1/0x120
[ 6002.650585]  [<ffffffffb792cc12>] blkdev_ioctl+0x5e2/0x9b0
[ 6002.650588]  [<ffffffffb7859691>] block_ioctl+0x41/0x50
[ 6002.650591]  [<ffffffffb782fb90>] do_vfs_ioctl+0x350/0x560
[ 6002.650594]  [<ffffffffb77ccc77>] ? do_munmap+0x317/0x470
[ 6002.650596]  [<ffffffffb782fe41>] SyS_ioctl+0xa1/0xc0
[ 6002.650599]  [<ffffffffb7d1f7d5>] system_call_fastpath+0x1c/0x21
[ 6122.639403] INFO: task mkfs.xfs:80231 blocked for more than 120 seconds.
[ 6122.639426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6122.639451] mkfs.xfs        D ffff9df559b1cf10     0 80231   8839 0x00000080
[ 6122.639455] Call Trace:
[ 6122.639463]  [<ffffffffb7d12f49>] schedule+0x29/0x70
[ 6122.639465]  [<ffffffffb7d108b9>] schedule_timeout+0x239/0x2c0
[ 6122.639476]  [<ffffffffc08a1f1b>] ? drbd_make_request+0x23b/0x360 [drbd]
[ 6122.639480]  [<ffffffffb76f76e2>] ? ktime_get_ts64+0x52/0xf0
[ 6122.639482]  [<ffffffffb7d1245d>] io_schedule_timeout+0xad/0x130
[ 6122.639484]  [<ffffffffb7d1357d>] wait_for_completion_io+0xfd/0x140
[ 6122.639489]  [<ffffffffb76cee80>] ? wake_up_state+0x20/0x20
[ 6122.639493]  [<ffffffffb792308c>] blkdev_issue_discard+0x2ac/0x2d0
[ 6122.639496]  [<ffffffffb792c141>] blk_ioctl_discard+0xd1/0x120
[ 6122.639499]  [<ffffffffb792cc12>] blkdev_ioctl+0x5e2/0x9b0
[ 6122.639501]  [<ffffffffb7859691>] block_ioctl+0x41/0x50
[ 6122.639504]  [<ffffffffb782fb90>] do_vfs_ioctl+0x350/0x560
[ 6122.639507]  [<ffffffffb77ccc77>] ? do_munmap+0x317/0x470
[ 6122.639509]  [<ffffffffb782fe41>] SyS_ioctl+0xa1/0xc0
[ 6122.639512]  [<ffffffffb7d1f7d5>] system_call_fastpath+0x1c/0x21



I would think this is not normal. Do you think this is a RHEL 7.5 specific issue?


# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)
# uname -a
Linux ae-fs02 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Diego


________________________________
From: Remolina, Diego J
Sent: Wednesday, May 2, 2018 12:33:44 PM
To: Roland Kammerer; drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] New 3-way drbd setup does not seem to take i/o


Dear Roland,


I cleared the current cluster configuration with drbdmanage uninit in all nodes and started fresh after manually clearing the zvol in the ZFS pool as well and rebooting the servers.


Once again there is a hang when I try to create and XFS filesystem on top of the drbd device. I do see some panics on the logs (scroll all the way to the end):


http://termbin.com/b5u3


I am running this on RHEL 7.5 on kernel: 3.10.0-862.el7.x86_64


Am I hitting a bug? Is it possible the problem is I am not waiting for the initial sync to finish? I have already upgraded the kernel module to the latest 0.9.14 announced today.


# rpm -qa |grep kmod-drbd

kmod-drbd-9.0.14_3.10.0_862-1.el7.x86_64


[root at ae-fs01 tmp]# drbdmanage list-nodes
+------------------------------------------------------------------------------------------------------------+
| Name    | Pool Size | Pool Free |                                                                  | State |
|------------------------------------------------------------------------------------------------------------|
| ae-fs01 |  13237248 |  12730090 |                                                                  |    ok |
| ae-fs02 |  13237248 |  12730095 |                                                                  |    ok |
| ae-fs03 |  13237248 |  12730089 |                                                                  |    ok |
+------------------------------------------------------------------------------------------------------------+
[root at ae-fs01 tmp]# drbdmanage list-volumes
+------------------------------------------------------------------------------------------------------------+
| Name | Vol ID |       Size | Minor |                                                               | State |
|------------------------------------------------------------------------------------------------------------|
| test |      0 | 465.66 GiB |   100 |                                                               |    ok |
+------------------------------------------------------------------------------------------------------------+
[root at ae-fs01 tmp]# drbdadm status
.drbdctrl role:Primary
 volume:0 disk:UpToDate
 volume:1 disk:UpToDate
 ae-fs02 role:Secondary
   volume:0 peer-disk:UpToDate
   volume:1 peer-disk:UpToDate
 ae-fs03 role:Secondary
   volume:0 peer-disk:UpToDate
   volume:1 peer-disk:UpToDate

test role:Primary
 disk:UpToDate
 ae-fs02 role:Secondary
   replication:SyncSource peer-disk:Inconsistent done:5.12
 ae-fs03 role:Secondary
   replication:SyncSource peer-disk:Inconsistent done:5.15


Thanks,


Diego

________________________________
From: drbd-user-bounces at lists.linbit.com <drbd-user-bounces at lists.linbit.com> on behalf of Roland Kammerer <roland.kammerer at linbit.com>
Sent: Wednesday, May 2, 2018 2:30:54 AM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] New 3-way drbd setup does not seem to take i/o

On Tue, May 01, 2018 at 04:14:52PM +0000, Remolina, Diego J wrote:
> Hi, was wondering if you could guide me as to what could be the issue here. I configured 3 servers with drbdmanage-0.99.16-1 and drbd-9.3.1-1 and related packages.
>
>
> I created a zfs pool, then use zfs2.Zfs2 plugin and created a
> resource. All seems fine, up to the point when I want to test the
> resource and create a file system in it. At that point, if I try to
> create say an XFS filesystem, things freeze. If I create a ZFS pool on
> the drbd device, the creation succeeds, but then I cannot write or
> read from that.
>
>
> # zfs list
> NAME                 USED  AVAIL  REFER  MOUNTPOINT
> mainpool            11.6T  1.02T    24K  none
> mainpool/export_00  11.6T  12.6T  7.25G  -
>
> The plugin configuration:
> [GLOBAL]
>
> [Node:ae-fs01]
> storage-plugin = drbdmanage.storage.zvol2.Zvol2
>
> [Plugin:Zvol2]
> volume-group = mainpool
>
>
> # drbdmanage list-nodes
> +------------------------------------------------------------------------------------------------------------+
> | Name    | Pool Size | Pool Free |                                                                  | State |
> |------------------------------------------------------------------------------------------------------------|
> | ae-fs01 |  13237248 |   1065678 |                                                                  |    ok |
> | ae-fs02 |  13237248 |   1065683 |                                                                  |    ok |
> | ae-fs03 |  13237248 |   1065672 |                                                                  |    ok |
> +------------------------------------------------------------------------------------------------------------+
>
>
> # drbdmanage list-volumes
> +------------------------------------------------------------------------------------------------------------+
> | Name   | Vol ID |      Size | Minor |                                                              | State |
> |------------------------------------------------------------------------------------------------------------|
> | export |      0 | 10.91 TiB |   106 |                                                              |    ok |
> +------------------------------------------------------------------------------------------------------------+
>
> But trying to make one node primary and creating a file system, either
> a new zfs pool for data or XFS file system fail.
>
>
> # drbdadm primary export
> # drbdadm status
> .drbdctrl role:Secondary
>  volume:0 disk:UpToDate
>  volume:1 disk:UpToDate
>  ae-fs02 role:Primary
>    volume:0 peer-disk:UpToDate
>    volume:1 peer-disk:UpToDate
>  ae-fs03 role:Secondary
>    volume:0 peer-disk:UpToDate
>    volume:1 peer-disk:UpToDate
>
> export role:Primary
>  disk:UpToDate
>  ae-fs02 role:Secondary
>    peer-disk:UpToDate
>  ae-fs03 role:Secondary
>    peer-disk:UpToDate
>
> # zpool create export /dev/drbd106
> # zfs set compression=lz4 export
> # ls /export
> ls: reading directory /export: Not a directory
>
> If I destroy the pool and try to format /dev/drbd106 as XFS, it just
> hangs forever. Any ideas as to what is happening?

Carving out zvols which are then use by DRBD should work. Putting
another zfs/zpool on top might have it's quirks, especially with
auto-promote. And maybe the failed XFS was then a follow up problem.

So start with somthing easier then:
create a small (like 10M) resource with DM and then try to create the
XFS on on (without the additional zfs steps).

Regards, rck
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180503/202b84a7/attachment-0001.htm>


More information about the drbd-user mailing list