[Drbd-dev] request_timer continuous loop if there is disk-timeout
Lars Ellenberg
lars.ellenberg at linbit.com
Fri Mar 11 11:02:35 CET 2016
On Fri, Mar 11, 2016 at 04:30:27PM +0900, 박경민 wrote:
> Hello.
> I'm a software engineer in Mantech.
>
> In testing about disk-timeout property,
> if not default value, which will lead into a continuous loop.
>
> in request_timer_fn()
> ...
> if (device->disk_state[NOW] > D_FAILED) {
> et = min_not_zero(et, dt);
> next_trigger_time = time_min_in_future(now,
> next_trigger_time, oldest_submit_jif + dt);
> restart_timer = true;
> }
> ...
> I think, if there is no request, next_trigger_time should be calculated
> below
> next_trigger_time = time_min_in_future(now,
> next_trigger_time + *dt*, oldest_submit_jif + dt);
>
> However, I can't be sure.
"dt" : disk timeout
"et" : effective timeout
"ent" : effective network timeout
"now" : well, now.
"next_trigger_time" : when to trigger the next timer
next_trigger_time is initialized to "now".
it gets adjusted using "time_min_in_future()",
which is this helper:
static unsigned long time_min_in_future(unsigned long now,
unsigned long t1, unsigned long t2)
{
t1 = time_after(now, t1) ? now : t1;
t2 = time_after(now, t2) ? now : t2;
return time_after(t1, t2) ? t2 : t1;
}
time_after is ((long)((b) - (a)) < 0)), NOT <=.
next_trigger_time will become larger than now,
or stay at its initial value, which is now.
function ends with
if (restart_timer) {
next_trigger_time = time_min_in_future(now, next_trigger_time, now + et);
mod_timer(&device->request_timer, next_trigger_time);
}
so in case next_trigger_time will still be equal to now at the end of the
function, it will be set to "now + et" before it is passed to mod_timer.
et can only be zero if both network and disk timeout where zero,
in which case the whole thing would not even be used,
because that would mean timeouts are disabled.
Besides that,
I would be surprised if disk timeout in 9 worked properly yet.
Also, disk timeout is evil in any case, and NOT TO BE USED
(not even in 8.4, where it *does* work properly ("as designed"), afaik)
Why? Because if it triggers, and the IO subsystem ("disk") decides to
still process the submitted request some time later,
you'd get stuff RDMA'd to some random memory page which may well be meanwhile
re-used for unrelated things. In which case we intentionally panic().
But you knew that already.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support
DRBD® and LINBIT® are registered trademarks of LINBIT
More information about the drbd-dev
mailing list