[DRBD-user] New 3-way drbd setup does not seem to take i/o

Remolina, Diego J dijuremo at aerospace.gatech.edu
Mon May 7 18:24:37 CEST 2018


Hi,


Yep, I did go smaller, but 10MB seemed too basic, so I did a 500GB volume.


[root at ae-fs01 /]# time mkfs.xfs /dev/drbd100
meta-data=/dev/drbd100           isize=512    agcount=4, agsize=30517578 blks
        =                       sectsz=4096  attr=2, projid32bit=1
        =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=122070312, imaxpct=25
        =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=59604, version=2
        =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

real    3m23.066s
user    0m0.001s
sys     0m0.097s


I spoke to the guys in the XFS IRC channel which were able to pretty much immediately suggest the solution. Use "-K" to not attempt to discard blocks at mkfs time. This pretty much went from taking almost 4 hours to format the ~12TB device to just about a minute. I assume that since drbd supports discard/TRIM for SSD backends, even while I have none at the moment, the mkfs.xfs creation process saw the support for "discard" and was trying to discard prior to creating the file system. I am not currently using SSDs on the system, they are all 10K rpm 2.5" SAS drives.


Without using -K


# time mkfs.xfs /dev/drbd100
meta-data=/dev/drbd100           isize=512    agcount=12, agsize=268435455 blks
        =                       sectsz=4096  attr=2, projid32bit=1
        =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=3093299200, imaxpct=5
        =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
        =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


real    230m43.596s
user    0m0.020s
sys     0m13.172s

Using -K


# time mkfs.xfs -K /dev/drbd100
meta-data=/dev/drbd100           isize=512    agcount=12, agsize=268435455 blks
        =                       sectsz=4096  attr=2, projid32bit=1
        =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=3093299200, imaxpct=5
        =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
        =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


real    1m3.523s
user    0m0.009s
sys     0m0.463s


My drbd device is working well now and I have the large single XFS partition on it.


Thanks,


Diego

________________________________
From: drbd-user-bounces at lists.linbit.com <drbd-user-bounces at lists.linbit.com> on behalf of Lars Ellenberg <lars.ellenberg at linbit.com>
Sent: Thursday, May 3, 2018 10:13:07 AM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] New 3-way drbd setup does not seem to take i/o

On Thu, May 03, 2018 at 11:45:14AM +0000, Remolina, Diego J wrote:
> A bit of progress, but still call traces being dumped in the logs. I
> waited for the full initial sync to finish, then I created the file
> system from a different node, ae-fs02, instead of ae-fs01. Initially,
> the command hung for a while, but it eventually succeded. However the
> following call traces where dumped:
>
>
> [ 5687.457691] drbd test: role( Secondary -> Primary )
> [ 5882.661739] INFO: task mkfs.xfs:80231 blocked for more than 120 seconds.
> [ 5882.661770] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 5882.661796] mkfs.xfs        D ffff9df559b1cf10     0 80231   8839 0x00000080
> [ 5882.661800] Call Trace:
> [ 5882.661807]  [<ffffffffb7d12f49>] schedule+0x29/0x70
> [ 5882.661809]  [<ffffffffb7d108b9>] schedule_timeout+0x239/0x2c0
> [ 5882.661819]  [<ffffffffc08a1f1b>] ? drbd_make_request+0x23b/0x360 [drbd]
> [ 5882.661824]  [<ffffffffb76f76e2>] ? ktime_get_ts64+0x52/0xf0
> [ 5882.661826]  [<ffffffffb7d1245d>] io_schedule_timeout+0xad/0x130
> [ 5882.661828]  [<ffffffffb7d1357d>] wait_for_completion_io+0xfd/0x140
> [ 5882.661833]  [<ffffffffb76cee80>] ? wake_up_state+0x20/0x20
> [ 5882.661837]  [<ffffffffb792308c>] blkdev_issue_discard+0x2ac/0x2d0
> [ 5882.661843]  [<ffffffffb792c141>] blk_ioctl_discard+0xd1/0x120


> I would think this is not normal. Do you think this is a RHEL 7.5 specific issue?
>
> # cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.5 (Maipo)
> # uname -a
> Linux ae-fs02 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

The suggestion was to:
> start with somthing easier then:
> create a small (like 10M) resource with DM and then try to create the
> XFS on on (without the additional zfs steps).

Notice the "small" and "easy" in there?
Once that works, go bigger, and more complex.

You have a ~ 11 TiB volume, which currently is being completely
discarded by mkfs.  This may take some time.  Yes, for larger devices,
this may take "more than 120 seconds".

As a side note, most of the time when you
think you want large DRBD volumes to then
"carve out smaller pieces on top of that",
you are mistaken. For reasons.

Anyways, as long as the "stats" still make progress,
that's just "situation normal, all fucked up".
And, as you said, "it eventually succeded".
So there :-)

There was an upstream kernel patch for that in 2014, adding a
"cond_resched()" in the submit path of blkdev_issue_discard(),
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/?id=c8123f8c9cb5&context=6&ignorews=0&dt=0

But it's not only the submission that can take a long time,
it is also (and especially) the wait_for_completion_io().

We could "make the warnings" go away by accepting only (arbitrary small
number) of discard requests at a time, and then blocking in
submit_bio(), until at least one of the pending ones completes.
But that'd be only cosmetic I think,
and potentially make things take even longer.


--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180507/9cc5c703/attachment.htm>


More information about the drbd-user mailing list