[DRBD-user] New 3-way drbd setup does not seem to take i/o

Thu May 3 16:13:07 CEST 2018

On Thu, May 03, 2018 at 11:45:14AM +0000, Remolina, Diego J wrote:
> A bit of progress, but still call traces being dumped in the logs. I
> waited for the full initial sync to finish, then I created the file
> system from a different node, ae-fs02, instead of ae-fs01. Initially,
> the command hung for a while, but it eventually succeded. However the
> following call traces where dumped:
> 
> 
> [ 5687.457691] drbd test: role( Secondary -> Primary )
> [ 5882.661739] INFO: task mkfs.xfs:80231 blocked for more than 120 seconds.
> [ 5882.661770] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 5882.661796] mkfs.xfs        D ffff9df559b1cf10     0 80231   8839 0x00000080
> [ 5882.661800] Call Trace:
> [ 5882.661807]  [<ffffffffb7d12f49>] schedule+0x29/0x70
> [ 5882.661809]  [<ffffffffb7d108b9>] schedule_timeout+0x239/0x2c0
> [ 5882.661819]  [<ffffffffc08a1f1b>] ? drbd_make_request+0x23b/0x360 [drbd]
> [ 5882.661824]  [<ffffffffb76f76e2>] ? ktime_get_ts64+0x52/0xf0
> [ 5882.661826]  [<ffffffffb7d1245d>] io_schedule_timeout+0xad/0x130
> [ 5882.661828]  [<ffffffffb7d1357d>] wait_for_completion_io+0xfd/0x140
> [ 5882.661833]  [<ffffffffb76cee80>] ? wake_up_state+0x20/0x20
> [ 5882.661837]  [<ffffffffb792308c>] blkdev_issue_discard+0x2ac/0x2d0
> [ 5882.661843]  [<ffffffffb792c141>] blk_ioctl_discard+0xd1/0x120

> I would think this is not normal. Do you think this is a RHEL 7.5 specific issue?
> 
> # cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.5 (Maipo)
> # uname -a
> Linux ae-fs02 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

The suggestion was to:
> start with somthing easier then:
> create a small (like 10M) resource with DM and then try to create the
> XFS on on (without the additional zfs steps).

Notice the "small" and "easy" in there?
Once that works, go bigger, and more complex.

You have a ~ 11 TiB volume, which currently is being completely
discarded by mkfs.  This may take some time.  Yes, for larger devices,
this may take "more than 120 seconds".

As a side note, most of the time when you
think you want large DRBD volumes to then
"carve out smaller pieces on top of that",
you are mistaken. For reasons.

Anyways, as long as the "stats" still make progress,
that's just "situation normal, all fucked up".
And, as you said, "it eventually succeded".
So there :-)

There was an upstream kernel patch for that in 2014, adding a
"cond_resched()" in the submit path of blkdev_issue_discard(),
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/?id=c8123f8c9cb5&context=6&ignorews=0&dt=0

But it's not only the submission that can take a long time,
it is also (and especially) the wait_for_completion_io().

We could "make the warnings" go away by accepting only (arbitrary small
number) of discard requests at a time, and then blocking in
submit_bio(), until at least one of the pending ones completes.
But that'd be only cosmetic I think,
and potentially make things take even longer.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed