[Drbd-dev] request_timer continuous loop if there is disk-timeout

Lars Ellenberg lars.ellenberg at linbit.com
Fri Mar 11 11:02:35 CET 2016


On Fri, Mar 11, 2016 at 04:30:27PM +0900, 박경민 wrote:
> Hello.
> I'm a software engineer in Mantech.
> 
> In testing about disk-timeout property,
> if not default value, which will lead into a continuous loop.
> 
> in request_timer_fn()
> ...
> if (device->disk_state[NOW] > D_FAILED) {
> et = min_not_zero(et, dt);
> next_trigger_time = time_min_in_future(now,
> next_trigger_time, oldest_submit_jif + dt);
> restart_timer = true;
> }
> ...
> I think, if there is no request, next_trigger_time should be calculated
> below
> next_trigger_time = time_min_in_future(now,
> next_trigger_time + *dt*, oldest_submit_jif + dt);
> 
> However, I can't be sure.

"dt" : disk timeout
"et" : effective timeout
"ent" : effective network timeout
"now" : well, now.
"next_trigger_time" : when to trigger the next timer

next_trigger_time is initialized to "now".

it gets adjusted using "time_min_in_future()",
which is this helper:
static unsigned long time_min_in_future(unsigned long now,
                unsigned long t1, unsigned long t2)
{
        t1 = time_after(now, t1) ? now : t1;
        t2 = time_after(now, t2) ? now : t2;
        return time_after(t1, t2) ? t2 : t1;
}

time_after is ((long)((b) - (a)) < 0)), NOT <=.

next_trigger_time will become larger than now,
or stay at its initial value, which is now.

function ends with

	if (restart_timer) {
                next_trigger_time = time_min_in_future(now, next_trigger_time, now + et);
                mod_timer(&device->request_timer, next_trigger_time);
        }

so in case next_trigger_time will still be equal to now at the end of the
function, it will be set to "now + et" before it is passed to mod_timer.

et can only be zero if both network and disk timeout where zero,
in which case the whole thing would not even be used,
because that would mean timeouts are disabled. 


Besides that,
I would be surprised if disk timeout in 9 worked properly yet.

Also, disk timeout is evil in any case, and NOT TO BE USED
(not even in 8.4, where it *does* work properly ("as designed"), afaik)

Why? Because if it triggers, and the IO subsystem ("disk") decides to
still process the submitted request some time later,
you'd get stuff RDMA'd to some random memory page which may well be meanwhile
re-used for unrelated things. In which case we intentionally panic().

But you knew that already.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT


More information about the drbd-dev mailing list