LKML Archive mirror
 help / color / mirror / Atom feed
* rt scheduler may calculate wrong rt_time
@ 2011-04-21 12:55 Thomas Giesel
  2011-04-22  8:21 ` Mike Galbraith
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Giesel @ 2011-04-21 12:55 UTC (permalink / raw
  To: linux-kernel

Friends of the scheduler,

I found that the current (well, at least 2.6.38) scheduler calculates a
wrong rt_time for realtime tasks in certain situations.

Example scenario:
- HZ = 1000, rt_runtime = 95 ms, rt_period = 100 ms (similar with other
  setups, but that's what I did)
- a high priority rt task (A) gets packets from Ethernet about every 10
  ms
- a low priority rt task (B) unfortunately runs for a longer time
  (here: endlessly :)
- no other tasks running (i.e. about 5 ms idle left per period)

When the runtime of the realtime tasks is exceeded (e.g. by (B)), they
are throttled. During this time idle is scheduled. When in idle,
tick_nohz_stop_sched_tick() will stop the scheduler tick, which causes
update_rq_clock() _not_ to be called for a while. When a realtime task
is woken up during this time (e.g. (A) by network traffic),
update_rq_clock() is called from enqueue_task(). The task is not picked
yet, because it is still throttled. After a while
sched_rt_period_timer() unthrottles the realtime tasks and cpu_idle
will call schedule().

schedule() picks (A) which has been woken up a while ago.
_pick_next_task_rt() sets exec_start to rq->clock_task. But this has
been updated last time when the task was woken up, which could have
been up to 5 ms ago in my example. So exec_start contains a time
_before_ the task was actually started. As a result of this, rt_time is
calculated too large which makes the rt tasks being throttled even
earlier in the next period. This error may even increase from interval
to interval, because the throttle-window (initially 5 ms) also
increases.

IMHO the best place to update clock_task would be to call a function
from tick_nohz_restart_sched_tick(). But currently I don't see a
suitable interface to the scheduler to do this. Currently I call
update_rq_clock(rq) just before put_prev_task() in schedule(). This
solves the issue and causes rt_runtime to be kept quite accurately.
(Well, same result would be to remove "if (...)" in put_prev_task())

What do you think is the best way to solve this issue?

Thomas


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rt scheduler may calculate wrong rt_time
  2011-04-21 12:55 rt scheduler may calculate wrong rt_time Thomas Giesel
@ 2011-04-22  8:21 ` Mike Galbraith
  2011-04-22 20:52   ` Thomas Giesel
  2011-04-27 17:51   ` Thomas Giesel
  0 siblings, 2 replies; 6+ messages in thread
From: Mike Galbraith @ 2011-04-22  8:21 UTC (permalink / raw
  To: Thomas Giesel; +Cc: linux-kernel, Peter Zijlstra

On Thu, 2011-04-21 at 14:55 +0200, Thomas Giesel wrote:
> Friends of the scheduler,
> 
> I found that the current (well, at least 2.6.38) scheduler calculates a
> wrong rt_time for realtime tasks in certain situations.
> 
> Example scenario:
> - HZ = 1000, rt_runtime = 95 ms, rt_period = 100 ms (similar with other
>   setups, but that's what I did)
> - a high priority rt task (A) gets packets from Ethernet about every 10
>   ms
> - a low priority rt task (B) unfortunately runs for a longer time
>   (here: endlessly :)
> - no other tasks running (i.e. about 5 ms idle left per period)
> 
> When the runtime of the realtime tasks is exceeded (e.g. by (B)), they
> are throttled. During this time idle is scheduled. When in idle,
> tick_nohz_stop_sched_tick() will stop the scheduler tick, which causes
> update_rq_clock() _not_ to be called for a while. When a realtime task
> is woken up during this time (e.g. (A) by network traffic),
> update_rq_clock() is called from enqueue_task(). The task is not picked
> yet, because it is still throttled. After a while
> sched_rt_period_timer() unthrottles the realtime tasks and cpu_idle
> will call schedule().
> 
> schedule() picks (A) which has been woken up a while ago.
> _pick_next_task_rt() sets exec_start to rq->clock_task. But this has
> been updated last time when the task was woken up, which could have
> been up to 5 ms ago in my example. So exec_start contains a time
> _before_ the task was actually started. As a result of this, rt_time is
> calculated too large which makes the rt tasks being throttled even
> earlier in the next period. This error may even increase from interval
> to interval, because the throttle-window (initially 5 ms) also
> increases.
> 
> IMHO the best place to update clock_task would be to call a function
> from tick_nohz_restart_sched_tick(). But currently I don't see a
> suitable interface to the scheduler to do this. Currently I call
> update_rq_clock(rq) just before put_prev_task() in schedule(). This
> solves the issue and causes rt_runtime to be kept quite accurately.
> (Well, same result would be to remove "if (...)" in put_prev_task())

Hm.  Does forcing a clock update if we're idle when we release the
throttle do the trick?

diff --git a/kernel/sched.c b/kernel/sched.c
index 8cb0a57..97245ef 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -464,7 +464,7 @@ struct rq {
 	u64 nohz_stamp;
 	unsigned char nohz_balance_kick;
 #endif
-	unsigned int skip_clock_update;
+	int skip_clock_update;
 
 	/* capture load from *all* tasks on this cpu: */
 	struct load_weight load;
@@ -650,7 +650,7 @@ static void update_rq_clock(struct rq *rq)
 {
 	s64 delta;
 
-	if (rq->skip_clock_update)
+	if (rq->skip_clock_update > 0)
 		return;
 
 	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
@@ -4131,7 +4131,7 @@ static inline void schedule_debug(struct task_struct *prev)
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	if (prev->on_rq)
+	if (prev->on_rq || rq->skip_clock_update < 0)
 		update_rq_clock(rq);
 	prev->sched_class->put_prev_task(rq, prev);
 }
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 19ecb31..3276b94 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -572,8 +572,15 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
 				enqueue = 1;
 		}
 
-		if (enqueue)
+		if (enqueue) {
+			/*
+			 * Tag a forced clock update if we're coming out of idle
+			 * so rq->clock_task will be updated when we schedule().
+			 */
+			if (rq->curr == rq->idle)
+				rq->skip_clock_update = -1;
 			sched_rt_rq_enqueue(rt_rq);
+		}
 		raw_spin_unlock(&rq->lock);
 	}
 




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: rt scheduler may calculate wrong rt_time
  2011-04-22  8:21 ` Mike Galbraith
@ 2011-04-22 20:52   ` Thomas Giesel
  2011-04-27 17:51   ` Thomas Giesel
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Giesel @ 2011-04-22 20:52 UTC (permalink / raw
  To: Mike Galbraith; +Cc: linux-kernel, Peter Zijlstra

Mike,

> Hm.  Does forcing a clock update if we're idle when we release the
> throttle do the trick?

Thanks. I'm on holidays currently. I'll check it next week when I'm
back. But I think it should work in this way.

Thomas


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rt scheduler may calculate wrong rt_time
  2011-04-22  8:21 ` Mike Galbraith
  2011-04-22 20:52   ` Thomas Giesel
@ 2011-04-27 17:51   ` Thomas Giesel
  2011-04-29  6:36     ` [patch] " Mike Galbraith
  1 sibling, 1 reply; 6+ messages in thread
From: Thomas Giesel @ 2011-04-27 17:51 UTC (permalink / raw
  To: Mike Galbraith; +Cc: linux-kernel, Peter Zijlstra


> Hm.  Does forcing a clock update if we're idle when we release the
> throttle do the trick?

It does. I tested it today and it works as expected. Even with ftrace I
couldn't see any suspicious behaviour anymore.

Mike: Can you send the patch to the right people to get it into the
kernel or should I do it? Or is Peter the right one already?

Thanks for your help.

Thomas



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [patch] Re: rt scheduler may calculate wrong rt_time
  2011-04-27 17:51   ` Thomas Giesel
@ 2011-04-29  6:36     ` Mike Galbraith
  2011-05-16 10:37       ` [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU tip-bot for Mike Galbraith
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Galbraith @ 2011-04-29  6:36 UTC (permalink / raw
  To: Thomas Giesel; +Cc: linux-kernel, Peter Zijlstra

On Wed, 2011-04-27 at 19:51 +0200, Thomas Giesel wrote:
> > Hm.  Does forcing a clock update if we're idle when we release the
> > throttle do the trick?
> 
> It does. I tested it today and it works as expected. Even with ftrace I
> couldn't see any suspicious behaviour anymore.
> 
> Mike: Can you send the patch to the right people to get it into the
> kernel or should I do it? Or is Peter the right one already?

Peter is the right one.  Below is an ever so slightly different version.

sched, rt: update rq clock when unthrottling of an otherwise idle CPU

If an RT task is awakened while it's rt_rq is throttled, the time between
wakeup/enqueue and unthrottle/selection may be accounted as rt_time
if the CPU is idle.  Set rq->skip_clock_update negative upon throttle
release to tell put_prev_task() that we need a clock update.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Reported-by: Thomas Giesel <skoe@directbox.com>

---
 kernel/sched.c    |    6 +++---
 kernel/sched_rt.c |    7 +++++++
 2 files changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -464,7 +464,7 @@ struct rq {
 	u64 nohz_stamp;
 	unsigned char nohz_balance_kick;
 #endif
-	unsigned int skip_clock_update;
+	int skip_clock_update;
 
 	/* capture load from *all* tasks on this cpu: */
 	struct load_weight load;
@@ -650,7 +650,7 @@ static void update_rq_clock(struct rq *r
 {
 	s64 delta;
 
-	if (rq->skip_clock_update)
+	if (rq->skip_clock_update > 0)
 		return;
 
 	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
@@ -4125,7 +4125,7 @@ static inline void schedule_debug(struct
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	if (prev->on_rq)
+	if (prev->on_rq || rq->skip_clock_update < 0)
 		update_rq_clock(rq);
 	prev->sched_class->put_prev_task(rq, prev);
 }
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -562,6 +562,13 @@ static int do_sched_rt_period_timer(stru
 			if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {
 				rt_rq->rt_throttled = 0;
 				enqueue = 1;
+
+				/*
+				 * Force a clock update if the CPU was idle,
+				 * lest wakeup -> unthrottle time accumulate.
+				 */
+				if (rt_rq->rt_nr_running && rq->curr == rq->idle)
+					rq->skip_clock_update = -1;
 			}
 			if (rt_rq->rt_time || rt_rq->rt_nr_running)
 				idle = 0;



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
  2011-04-29  6:36     ` [patch] " Mike Galbraith
@ 2011-05-16 10:37       ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Mike Galbraith @ 2011-05-16 10:37 UTC (permalink / raw
  To: linux-tip-commits
  Cc: linux-kernel, skoe, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  61eadef6a9bde9ea62fda724a9cb501ce9bc925a
Gitweb:     http://git.kernel.org/tip/61eadef6a9bde9ea62fda724a9cb501ce9bc925a
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Fri, 29 Apr 2011 08:36:50 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 16 May 2011 11:01:17 +0200

sched, rt: Update rq clock when unthrottling of an otherwise idle CPU

If an RT task is awakened while it's rt_rq is throttled, the time between
wakeup/enqueue and unthrottle/selection may be accounted as rt_time
if the CPU is idle.  Set rq->skip_clock_update negative upon throttle
release to tell put_prev_task() that we need a clock update.

Reported-by: Thomas Giesel <skoe@directbox.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1304059010.7472.1.camel@marge.simson.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c    |    6 +++---
 kernel/sched_rt.c |    7 +++++++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index f9778c0..b8b9a7d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -466,7 +466,7 @@ struct rq {
 	u64 nohz_stamp;
 	unsigned char nohz_balance_kick;
 #endif
-	unsigned int skip_clock_update;
+	int skip_clock_update;
 
 	/* capture load from *all* tasks on this cpu: */
 	struct load_weight load;
@@ -652,7 +652,7 @@ static void update_rq_clock(struct rq *rq)
 {
 	s64 delta;
 
-	if (rq->skip_clock_update)
+	if (rq->skip_clock_update > 0)
 		return;
 
 	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
@@ -4127,7 +4127,7 @@ static inline void schedule_debug(struct task_struct *prev)
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	if (prev->on_rq)
+	if (prev->on_rq || rq->skip_clock_update < 0)
 		update_rq_clock(rq);
 	prev->sched_class->put_prev_task(rq, prev);
 }
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 19ecb31..0943ed7 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -562,6 +562,13 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
 			if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {
 				rt_rq->rt_throttled = 0;
 				enqueue = 1;
+
+				/*
+				 * Force a clock update if the CPU was idle,
+				 * lest wakeup -> unthrottle time accumulate.
+				 */
+				if (rt_rq->rt_nr_running && rq->curr == rq->idle)
+					rq->skip_clock_update = -1;
 			}
 			if (rt_rq->rt_time || rt_rq->rt_nr_running)
 				idle = 0;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-05-16 10:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-21 12:55 rt scheduler may calculate wrong rt_time Thomas Giesel
2011-04-22  8:21 ` Mike Galbraith
2011-04-22 20:52   ` Thomas Giesel
2011-04-27 17:51   ` Thomas Giesel
2011-04-29  6:36     ` [patch] " Mike Galbraith
2011-05-16 10:37       ` [tip:sched/core] sched, rt: Update rq clock when unthrottling of an otherwise idle CPU tip-bot for Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).