* [PATCH] Make sure timers have migrated before killing migration_thread
@ 2010-05-19 9:05 Amit K. Arora
2010-05-19 9:31 ` Peter Zijlstra
0 siblings, 1 reply; 19+ messages in thread
From: Amit K. Arora @ 2010-05-19 9:05 UTC (permalink / raw
To: Peter Zijlstra, Ingo Molnar
Cc: Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart, Brian King,
linux-kernel
Problem : In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems to
be hung at a point where migration_call() tries to kill the migration_thread
of the dying CPU, which just got moved to the current CPU. This migration
thread does not get a chance to run (and die) since rt_throttled is set to 1
on current, and it doesn't get cleared as the hrtimer which is supposed to
reset the rt bandwidth (sched_rt_period_timer) is tied to the CPU being
offlined.
Solution : This patch pushes the killing of migration thread to "CPU_POST_DEAD"
event. By then all the timers (including sched_rt_period_timer) should have got
migrated (along with other callbacks).
Alternate Solution considered : Another option considered was to
increase the priority of the hrtimer cpu offline notifier, such that it
gets to run before scheduler's migration cpu offline notifier. In this
way we are sure that the timers will get migrated before migration_call
tries to kill migration_thread. But, this can have some non-obvious
implications, suggested Srivatsa.
Testing : Without the patch the stress tests didn't last for even 12
hours. And yes, the problem was reproducible. With the patch applied the
tests ran successfully for more than 48 hours.
Thanks!
--
Regards,
Amit Arora
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
--
diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
--- linux-2.6.34.org/kernel/sched.c 2010-05-18 22:56:21.000000000 -0700
+++ linux-2.6.34/kernel/sched.c 2010-05-18 22:58:31.000000000 -0700
@@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
cpu_rq(cpu)->migration_thread = NULL;
break;
+ case CPU_POST_DEAD:
+ /*
+ Bring the migration thread down in CPU_POST_DEAD event,
+ since the timers should have got migrated by now and thus
+ we should not see a deadlock between trying to kill the
+ migration thread and the sched_rt_period_timer.
+ */
+ cpuset_lock();
+ rq = cpu_rq(cpu);
+ kthread_stop(rq->migration_thread);
+ put_task_struct(rq->migration_thread);
+ rq->migration_thread = NULL;
+ cpuset_unlock();
+ break;
+
case CPU_DEAD:
case CPU_DEAD_FROZEN:
cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
migrate_live_tasks(cpu);
rq = cpu_rq(cpu);
- kthread_stop(rq->migration_thread);
- put_task_struct(rq->migration_thread);
- rq->migration_thread = NULL;
/* Idle task back to normal (off runqueue, low prio) */
raw_spin_lock_irq(&rq->lock);
update_rq_clock(rq);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Make sure timers have migrated before killing migration_thread
2010-05-19 9:05 [PATCH] Make sure timers have migrated before killing migration_thread Amit K. Arora
@ 2010-05-19 9:31 ` Peter Zijlstra
2010-05-19 12:13 ` [PATCH v2] " Amit K. Arora
2010-05-24 9:59 ` [PATCH] " Amit K. Arora
0 siblings, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-19 9:31 UTC (permalink / raw
To: Amit K. Arora
Cc: Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart,
Brian King, linux-kernel
On Wed, 2010-05-19 at 14:35 +0530, Amit K. Arora wrote:
> Problem : In a stress test where some heavy tests were running along with
> regular CPU offlining and onlining, a hang was observed. The system seems to
> be hung at a point where migration_call() tries to kill the migration_thread
> of the dying CPU, which just got moved to the current CPU. This migration
> thread does not get a chance to run (and die) since rt_throttled is set to 1
> on current, and it doesn't get cleared as the hrtimer which is supposed to
> reset the rt bandwidth (sched_rt_period_timer) is tied to the CPU being
> offlined.
>
> Solution : This patch pushes the killing of migration thread to "CPU_POST_DEAD"
> event. By then all the timers (including sched_rt_period_timer) should have got
> migrated (along with other callbacks).
>
> Alternate Solution considered : Another option considered was to
> increase the priority of the hrtimer cpu offline notifier, such that it
> gets to run before scheduler's migration cpu offline notifier. In this
> way we are sure that the timers will get migrated before migration_call
> tries to kill migration_thread. But, this can have some non-obvious
> implications, suggested Srivatsa.
>
> Testing : Without the patch the stress tests didn't last for even 12
> hours. And yes, the problem was reproducible. With the patch applied the
> tests ran successfully for more than 48 hours.
>
> Thanks!
> --
> Regards,
> Amit Arora
>
> Signed-off-by: Amit Arora <aarora@in.ibm.com>
> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> --
> diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
> --- linux-2.6.34.org/kernel/sched.c 2010-05-18 22:56:21.000000000 -0700
> +++ linux-2.6.34/kernel/sched.c 2010-05-18 22:58:31.000000000 -0700
> @@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
> cpu_rq(cpu)->migration_thread = NULL;
> break;
>
> + case CPU_POST_DEAD:
> + /*
> + Bring the migration thread down in CPU_POST_DEAD event,
> + since the timers should have got migrated by now and thus
> + we should not see a deadlock between trying to kill the
> + migration thread and the sched_rt_period_timer.
> + */
Faulty comment style that, please use:
/*
* text
* goes
* here
*/
> + cpuset_lock();
> + rq = cpu_rq(cpu);
> + kthread_stop(rq->migration_thread);
> + put_task_struct(rq->migration_thread);
> + rq->migration_thread = NULL;
> + cpuset_unlock();
> + break;
> +
The other problem is more urgent though, CPU_POST_DEAD runs outside of
the hotplug lock and thus the above becomes a race where we could
possible kill off the migration thread of a newly brought up cpu:
cpu0 - down 2
cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
cpu0 - post_down 2 - frees the migration thread -- oops!
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2] Make sure timers have migrated before killing migration_thread
2010-05-19 9:31 ` Peter Zijlstra
@ 2010-05-19 12:13 ` Amit K. Arora
2010-05-20 7:28 ` Peter Zijlstra
2010-05-24 9:59 ` [PATCH] " Amit K. Arora
1 sibling, 1 reply; 19+ messages in thread
From: Amit K. Arora @ 2010-05-19 12:13 UTC (permalink / raw
To: Peter Zijlstra
Cc: Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart,
Brian King, linux-kernel
On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 14:35 +0530, Amit K. Arora wrote:
Hi Peter,
Thanks for the review!
> > diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
> > --- linux-2.6.34.org/kernel/sched.c 2010-05-18 22:56:21.000000000 -0700
> > +++ linux-2.6.34/kernel/sched.c 2010-05-18 22:58:31.000000000 -0700
> > @@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
> > cpu_rq(cpu)->migration_thread = NULL;
> > break;
> >
> > + case CPU_POST_DEAD:
> > + /*
> > + Bring the migration thread down in CPU_POST_DEAD event,
> > + since the timers should have got migrated by now and thus
> > + we should not see a deadlock between trying to kill the
> > + migration thread and the sched_rt_period_timer.
> > + */
>
> Faulty comment style that, please use:
>
> /*
> * text
> * goes
> * here
> */
Sure.
> > + cpuset_lock();
> > + rq = cpu_rq(cpu);
> > + kthread_stop(rq->migration_thread);
> > + put_task_struct(rq->migration_thread);
> > + rq->migration_thread = NULL;
> > + cpuset_unlock();
> > + break;
> > +
>
> The other problem is more urgent though, CPU_POST_DEAD runs outside of
> the hotplug lock and thus the above becomes a race where we could
> possible kill off the migration thread of a newly brought up cpu:
>
> cpu0 - down 2
> cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> cpu0 - post_down 2 - frees the migration thread -- oops!
Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
The cpuset_lock will synchronize, and thus avoid race between killing of
migration_thread in up_prepare and post_dead events.
Here is the updated patch. If you don't like this one too, do you mind
suggesting an alternate approach to tackle the problem ? Thanks !
--
Regards,
Amit Arora
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
--
diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
--- linux-2.6.34.org/kernel/sched.c 2010-05-18 22:56:21.000000000 -0700
+++ linux-2.6.34/kernel/sched.c 2010-05-19 04:47:49.000000000 -0700
@@ -5900,6 +5900,19 @@ migration_call(struct notifier_block *nf
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
+ cpuset_lock();
+ rq = cpu_rq(cpu);
+ /*
+ * Since we now kill migration_thread in CPU_POST_DEAD event,
+ * there may be a race here. So, lets cleanup the old
+ * migration_thread on the rq, if any.
+ */
+ if (unlikely(rq->migration_thread)) {
+ kthread_stop(rq->migration_thread);
+ put_task_struct(rq->migration_thread);
+ rq->migration_thread = NULL;
+ }
+ cpuset_unlock();
p = kthread_create(migration_thread, hcpu, "migration/%d", cpu);
if (IS_ERR(p))
return NOTIFY_BAD;
@@ -5942,14 +5955,34 @@ migration_call(struct notifier_block *nf
cpu_rq(cpu)->migration_thread = NULL;
break;
+ case CPU_POST_DEAD:
+ /*
+ * Bring the migration thread down in CPU_POST_DEAD event,
+ * since the timers should have got migrated by now and thus
+ * we should not see a deadlock between trying to kill the
+ * migration thread and the sched_rt_period_timer.
+ */
+ cpuset_lock();
+ rq = cpu_rq(cpu);
+ if (likely(rq->migration_thread)) {
+ /*
+ * Its possible that this CPU was onlined (from a
+ * different CPU) before we reached here and
+ * migration_thread was cleaned-up in the
+ * CPU_UP_PREPARE event handling.
+ */
+ kthread_stop(rq->migration_thread);
+ put_task_struct(rq->migration_thread);
+ rq->migration_thread = NULL;
+ }
+ cpuset_unlock();
+ break;
+
case CPU_DEAD:
case CPU_DEAD_FROZEN:
cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
migrate_live_tasks(cpu);
rq = cpu_rq(cpu);
- kthread_stop(rq->migration_thread);
- put_task_struct(rq->migration_thread);
- rq->migration_thread = NULL;
/* Idle task back to normal (off runqueue, low prio) */
raw_spin_lock_irq(&rq->lock);
update_rq_clock(rq);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
2010-05-19 12:13 ` [PATCH v2] " Amit K. Arora
@ 2010-05-20 7:28 ` Peter Zijlstra
2010-05-23 9:07 ` Mike Galbraith
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-20 7:28 UTC (permalink / raw
To: Amit K. Arora
Cc: Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart,
Brian King, linux-kernel, Thomas Gleixner
On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> Alternate Solution considered : Another option considered was to
> increase the priority of the hrtimer cpu offline notifier, such that it
> gets to run before scheduler's migration cpu offline notifier. In this
> way we are sure that the timers will get migrated before migration_call
> tries to kill migration_thread. But, this can have some non-obvious
> implications, suggested Srivatsa.
> On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > the hotplug lock and thus the above becomes a race where we could
> > possible kill off the migration thread of a newly brought up cpu:
> >
> > cpu0 - down 2
> > cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > cpu0 - post_down 2 - frees the migration thread -- oops!
>
> Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> The cpuset_lock will synchronize, and thus avoid race between killing of
> migration_thread in up_prepare and post_dead events.
>
> Here is the updated patch. If you don't like this one too, do you mind
> suggesting an alternate approach to tackle the problem ? Thanks !
Right, so this isn't pretty at all..
Ingo, the comment near the migration_notifier says that migration_call
should happen before all else, but can you see anything that would break
if we let the timer migration happen first?
Thomas?
> Signed-off-by: Amit Arora <aarora@in.ibm.com>
> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> --
For everybody who reads this, _please_ use three (3) dashes '-' to
separate the changelog from the patch, and left-align the changelog
(including all tags).
I seem to get more and more people sending patches with 2 dashes and
daft changelogs with whitespace stuffing which break my scripts.
> diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
> --- linux-2.6.34.org/kernel/sched.c 2010-05-18 22:56:21.000000000 -0700
> +++ linux-2.6.34/kernel/sched.c 2010-05-19 04:47:49.000000000 -0700
> @@ -5900,6 +5900,19 @@ migration_call(struct notifier_block *nf
>
> case CPU_UP_PREPARE:
> case CPU_UP_PREPARE_FROZEN:
> + cpuset_lock();
> + rq = cpu_rq(cpu);
> + /*
> + * Since we now kill migration_thread in CPU_POST_DEAD event,
> + * there may be a race here. So, lets cleanup the old
> + * migration_thread on the rq, if any.
> + */
> + if (unlikely(rq->migration_thread)) {
> + kthread_stop(rq->migration_thread);
> + put_task_struct(rq->migration_thread);
> + rq->migration_thread = NULL;
> + }
> + cpuset_unlock();
> p = kthread_create(migration_thread, hcpu, "migration/%d", cpu);
> if (IS_ERR(p))
> return NOTIFY_BAD;
> @@ -5942,14 +5955,34 @@ migration_call(struct notifier_block *nf
> cpu_rq(cpu)->migration_thread = NULL;
> break;
>
> + case CPU_POST_DEAD:
> + /*
> + * Bring the migration thread down in CPU_POST_DEAD event,
> + * since the timers should have got migrated by now and thus
> + * we should not see a deadlock between trying to kill the
> + * migration thread and the sched_rt_period_timer.
> + */
> + cpuset_lock();
> + rq = cpu_rq(cpu);
> + if (likely(rq->migration_thread)) {
> + /*
> + * Its possible that this CPU was onlined (from a
> + * different CPU) before we reached here and
> + * migration_thread was cleaned-up in the
> + * CPU_UP_PREPARE event handling.
> + */
> + kthread_stop(rq->migration_thread);
> + put_task_struct(rq->migration_thread);
> + rq->migration_thread = NULL;
> + }
> + cpuset_unlock();
> + break;
> +
> case CPU_DEAD:
> case CPU_DEAD_FROZEN:
> cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
> migrate_live_tasks(cpu);
> rq = cpu_rq(cpu);
> - kthread_stop(rq->migration_thread);
> - put_task_struct(rq->migration_thread);
> - rq->migration_thread = NULL;
> /* Idle task back to normal (off runqueue, low prio) */
> raw_spin_lock_irq(&rq->lock);
> update_rq_clock(rq);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
2010-05-20 7:28 ` Peter Zijlstra
@ 2010-05-23 9:07 ` Mike Galbraith
2010-05-23 9:13 ` Peter Zijlstra
2010-05-24 6:43 ` Amit K. Arora
2010-05-25 20:19 ` Thomas Gleixner
2 siblings, 1 reply; 19+ messages in thread
From: Mike Galbraith @ 2010-05-23 9:07 UTC (permalink / raw
To: Peter Zijlstra
Cc: Amit K. Arora, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel, Thomas Gleixner
On Thu, 2010-05-20 at 09:28 +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > Alternate Solution considered : Another option considered was to
> > increase the priority of the hrtimer cpu offline notifier, such that it
> > gets to run before scheduler's migration cpu offline notifier. In this
> > way we are sure that the timers will get migrated before migration_call
> > tries to kill migration_thread. But, this can have some non-obvious
> > implications, suggested Srivatsa.
>
>
> > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > the hotplug lock and thus the above becomes a race where we could
> > > possible kill off the migration thread of a newly brought up cpu:
> > >
> > > cpu0 - down 2
> > > cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > > cpu0 - post_down 2 - frees the migration thread -- oops!
> >
> > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > The cpuset_lock will synchronize, and thus avoid race between killing of
> > migration_thread in up_prepare and post_dead events.
> >
> > Here is the updated patch. If you don't like this one too, do you mind
> > suggesting an alternate approach to tackle the problem ? Thanks !
>
> Right, so this isn't pretty at all..
Since the problem seems to stem from interfering with a critical thread,
how about create a SCHED_SYSTEM_CRITICAL flag ala SCHED_RESET_ON_FORK?
Not particularly beautiful, and completely untested (well, it compiles).
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6cc43e0..23da2cf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -40,6 +40,8 @@
#define SCHED_IDLE 5
/* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
#define SCHED_RESET_ON_FORK 0x40000000
+/* Can be ORed in to flag a thread as being system critical. Not inherited. */
+#define SCHED_SYSTEM_CRITICAL 0x20000000
#ifdef __KERNEL__
@@ -1240,6 +1242,9 @@ struct task_struct {
/* Revert to default priority/policy when forking */
unsigned sched_reset_on_fork:1;
+ /* System critical thread. Cleared on fork. */
+ unsigned sched_system_critical:1;
+
pid_t pid;
pid_t tgid;
diff --git a/kernel/sched.c b/kernel/sched.c
index 2d17e3b..caf8b95 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2511,6 +2511,11 @@ void sched_fork(struct task_struct *p, int clone_flags)
*/
p->prio = current->normal_prio;
+ /*
+ * System critical policy flag is not inherited.
+ */
+ p->sched_system_critical = 0;
+
if (!rt_prio(p->prio))
p->sched_class = &fair_sched_class;
@@ -4486,7 +4491,7 @@ static int __sched_setscheduler(struct task_struct *p, int policy,
unsigned long flags;
const struct sched_class *prev_class;
struct rq *rq;
- int reset_on_fork;
+ int reset_on_fork, system_critical;
/* may grab non-irq protected spin_locks */
BUG_ON(in_interrupt());
@@ -4494,10 +4499,12 @@ recheck:
/* double check policy once rq lock held */
if (policy < 0) {
reset_on_fork = p->sched_reset_on_fork;
+ system_critical = p->sched_system_critical;
policy = oldpolicy = p->policy;
} else {
reset_on_fork = !!(policy & SCHED_RESET_ON_FORK);
- policy &= ~SCHED_RESET_ON_FORK;
+ system_critical = !!(policy & SCHED_SYSTEM_CRITICAL);
+ policy &= ~(SCHED_RESET_ON_FORK|SCHED_SYSTEM_CRITICAL);
if (policy != SCHED_FIFO && policy != SCHED_RR &&
policy != SCHED_NORMAL && policy != SCHED_BATCH &&
@@ -4552,6 +4559,10 @@ recheck:
/* Normal users shall not reset the sched_reset_on_fork flag */
if (p->sched_reset_on_fork && !reset_on_fork)
return -EPERM;
+
+ /* Normal users shall not reset the sched_system_critical flag */
+ if (p->sched_system_critical && !system_critical)
+ return -EPERM;
}
if (user) {
@@ -4560,7 +4571,7 @@ recheck:
* Do not allow realtime tasks into groups that have no runtime
* assigned.
*/
- if (rt_bandwidth_enabled() && rt_policy(policy) &&
+ if (rt_bandwidth_enabled() && rt_policy(policy) && !system_critical &&
task_group(p)->rt_bandwidth.rt_runtime == 0)
return -EPERM;
#endif
@@ -4595,6 +4606,7 @@ recheck:
p->sched_class->put_prev_task(rq, p);
p->sched_reset_on_fork = reset_on_fork;
+ p->sched_system_critical = system_critical;
oldprio = p->prio;
prev_class = p->sched_class;
@@ -4712,9 +4724,13 @@ SYSCALL_DEFINE1(sched_getscheduler, pid_t, pid)
p = find_process_by_pid(pid);
if (p) {
retval = security_task_getscheduler(p);
- if (!retval)
- retval = p->policy
- | (p->sched_reset_on_fork ? SCHED_RESET_ON_FORK : 0);
+ if (!retval) {
+ retval = p->policy;
+ if (p->sched_reset_on_fork)
+ retval |= SCHED_RESET_ON_FORK;
+ if (p->sched_system_critical)
+ retval |= SCHED_SYSTEM_CRITICAL;
+ }
}
rcu_read_unlock();
return retval;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 8afb953..4fb5ced 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -605,6 +605,7 @@ static void update_curr_rt(struct rq *rq)
struct sched_rt_entity *rt_se = &curr->rt;
struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
u64 delta_exec;
+ int system_critical = curr->sched_system_critical;
if (!task_has_rt_policy(curr))
return;
@@ -621,9 +622,13 @@ static void update_curr_rt(struct rq *rq)
curr->se.exec_start = rq->clock;
cpuacct_charge(curr, delta_exec);
- sched_rt_avg_update(rq, delta_exec);
+ /*
+ * System critical tasks do not contribute to bandwidth consumption,
+ * nor are they evicted when runtime is exceeded.
+ */
+ sched_rt_avg_update(rq, system_critical ? 0 : delta_exec);
- if (!rt_bandwidth_enabled())
+ if (!rt_bandwidth_enabled() || system_critical)
return;
for_each_sched_rt_entity(rt_se) {
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index b4e7431..6cbea9a 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -304,7 +304,7 @@ static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
cpu);
if (IS_ERR(p))
return NOTIFY_BAD;
- sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
+ sched_setscheduler_nocheck(p, SCHED_FIFO|SCHED_SYSTEM_CRITICAL, ¶m);
get_task_struct(p);
stopper->thread = p;
break;
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
2010-05-23 9:07 ` Mike Galbraith
@ 2010-05-23 9:13 ` Peter Zijlstra
0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-23 9:13 UTC (permalink / raw
To: Mike Galbraith
Cc: Amit K. Arora, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel, Thomas Gleixner
On Sun, 2010-05-23 at 11:07 +0200, Mike Galbraith wrote:
> On Thu, 2010-05-20 at 09:28 +0200, Peter Zijlstra wrote:
> > On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > > Alternate Solution considered : Another option considered was to
> > > increase the priority of the hrtimer cpu offline notifier, such that it
> > > gets to run before scheduler's migration cpu offline notifier. In this
> > > way we are sure that the timers will get migrated before migration_call
> > > tries to kill migration_thread. But, this can have some non-obvious
> > > implications, suggested Srivatsa.
> >
> >
> > > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > > the hotplug lock and thus the above becomes a race where we could
> > > > possible kill off the migration thread of a newly brought up cpu:
> > > >
> > > > cpu0 - down 2
> > > > cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > > > cpu0 - post_down 2 - frees the migration thread -- oops!
> > >
> > > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > > The cpuset_lock will synchronize, and thus avoid race between killing of
> > > migration_thread in up_prepare and post_dead events.
> > >
> > > Here is the updated patch. If you don't like this one too, do you mind
> > > suggesting an alternate approach to tackle the problem ? Thanks !
> >
> > Right, so this isn't pretty at all..
>
> Since the problem seems to stem from interfering with a critical thread,
> how about create a SCHED_SYSTEM_CRITICAL flag ala SCHED_RESET_ON_FORK?
>
> Not particularly beautiful, and completely untested (well, it compiles).
Nah, I'd rather we pull the migration thread out of SCHED_FIFO and
either schedule it explicit or add a sched_class for it. We need to do
that anyway once we go play with SCHED_DEADLINE.
But it would be very nice if we could simply order the timer and task
migration bits so that the whole problem doesn't exist in the first
place.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
2010-05-20 7:28 ` Peter Zijlstra
2010-05-23 9:07 ` Mike Galbraith
@ 2010-05-24 6:43 ` Amit K. Arora
2010-05-25 20:19 ` Thomas Gleixner
2 siblings, 0 replies; 19+ messages in thread
From: Amit K. Arora @ 2010-05-24 6:43 UTC (permalink / raw
To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner
Cc: Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart, Brian King,
linux-kernel
On Thu, May 20, 2010 at 09:28:03AM +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > Alternate Solution considered : Another option considered was to
> > increase the priority of the hrtimer cpu offline notifier, such that it
> > gets to run before scheduler's migration cpu offline notifier. In this
> > way we are sure that the timers will get migrated before migration_call
> > tries to kill migration_thread. But, this can have some non-obvious
> > implications, suggested Srivatsa.
>
>
> > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > the hotplug lock and thus the above becomes a race where we could
> > > possible kill off the migration thread of a newly brought up cpu:
> > >
> > > cpu0 - down 2
> > > cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > > cpu0 - post_down 2 - frees the migration thread -- oops!
> >
> > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > The cpuset_lock will synchronize, and thus avoid race between killing of
> > migration_thread in up_prepare and post_dead events.
> >
> > Here is the updated patch. If you don't like this one too, do you mind
> > suggesting an alternate approach to tackle the problem ? Thanks !
>
> Right, so this isn't pretty at all..
>
> Ingo, the comment near the migration_notifier says that migration_call
> should happen before all else, but can you see anything that would break
> if we let the timer migration happen first?
>
> Thomas?
Hello Ingo, Thomas,
Do you see any potential problems with migrating the timers before
the migration_call ?
Thanks!
--
Regards,
Amit Arora
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Make sure timers have migrated before killing migration_thread
2010-05-19 9:31 ` Peter Zijlstra
2010-05-19 12:13 ` [PATCH v2] " Amit K. Arora
@ 2010-05-24 9:59 ` Amit K. Arora
2010-05-24 13:28 ` Peter Zijlstra
2010-05-25 11:31 ` Peter Zijlstra
1 sibling, 2 replies; 19+ messages in thread
From: Amit K. Arora @ 2010-05-24 9:59 UTC (permalink / raw
To: Peter Zijlstra
Cc: tj, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel
On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 14:35 +0530, Amit K. Arora wrote:
> > + cpuset_lock();
> > + rq = cpu_rq(cpu);
> > + kthread_stop(rq->migration_thread);
> > + put_task_struct(rq->migration_thread);
> > + rq->migration_thread = NULL;
> > + cpuset_unlock();
> > + break;
> > +
>
> The other problem is more urgent though, CPU_POST_DEAD runs outside of
> the hotplug lock and thus the above becomes a race where we could
> possible kill off the migration thread of a newly brought up cpu:
>
> cpu0 - down 2
> cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> cpu0 - post_down 2 - frees the migration thread -- oops!
<Adding Tejun Heo to CC list .. >
Hi Peter,
In an offline discussion with Tejun, he suggested that the above race
can not happen, since _cpu_up() and _cpu_down() can never run in
parallel, because of cpu_add_remove_lock. Looking at the code we can see
that cpu_up() and cpu_down() call "_" variants with cpu_add_remove_lock
mutex held (using cpu_maps_update_begin()).
Here is exactly what he had to say:
"I don't think that's possible. There are two locks involved here.
cpu_add_remove_lock and cpu_hotplug.lock. The former wraps around the
second and already provides full exclusion between all cpu hotplug/unplug
operations.
The latter is there for reader/writer type exclusion via
get/put_online_cpus().
CPU_POST_DEAD is outside of cpu_hotplug.lock allowing get_online_cpus()
to proceed in parallel but it's still inside cpu_add_remove_lock so other
cpu up/down operations cannot begin before it finishes. "
Thus, since above race can never happen, is there any other issue with
this patch ?
Thanks!
--
Regards,
Amit Arora
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
--- linux-2.6.34.org/kernel/sched.c 2010-05-18 22:56:21.000000000 -0700
+++ linux-2.6.34/kernel/sched.c 2010-05-18 22:58:31.000000000 -0700
@@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
cpu_rq(cpu)->migration_thread = NULL;
break;
+ case CPU_POST_DEAD:
+ /*
+ * Bring the migration thread down in CPU_POST_DEAD event,
+ * since the timers should have got migrated by now and thus
+ * we should not see a deadlock between trying to kill the
+ * migration thread and the sched_rt_period_timer.
+ */
+ cpuset_lock();
+ rq = cpu_rq(cpu);
+ kthread_stop(rq->migration_thread);
+ put_task_struct(rq->migration_thread);
+ rq->migration_thread = NULL;
+ cpuset_unlock();
+ break;
+
case CPU_DEAD:
case CPU_DEAD_FROZEN:
cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
migrate_live_tasks(cpu);
rq = cpu_rq(cpu);
- kthread_stop(rq->migration_thread);
- put_task_struct(rq->migration_thread);
- rq->migration_thread = NULL;
/* Idle task back to normal (off runqueue, low prio) */
raw_spin_lock_irq(&rq->lock);
update_rq_clock(rq);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Make sure timers have migrated before killing migration_thread
2010-05-24 9:59 ` [PATCH] " Amit K. Arora
@ 2010-05-24 13:28 ` Peter Zijlstra
2010-05-24 15:16 ` Srivatsa Vaddagiri
2010-05-25 11:31 ` Peter Zijlstra
1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-24 13:28 UTC (permalink / raw
To: Amit K. Arora
Cc: tj, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel
On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> since _cpu_up() and _cpu_down() can never run in
> parallel, because of cpu_add_remove_lock.
Ah indeed. I guess your initial patch works then.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Make sure timers have migrated before killing migration_thread
2010-05-24 13:28 ` Peter Zijlstra
@ 2010-05-24 15:16 ` Srivatsa Vaddagiri
2010-05-24 15:55 ` Peter Zijlstra
0 siblings, 1 reply; 19+ messages in thread
From: Srivatsa Vaddagiri @ 2010-05-24 15:16 UTC (permalink / raw
To: Peter Zijlstra
Cc: Amit K. Arora, tj, Ingo Molnar, Gautham R Shenoy, Darren Hart,
Brian King, linux-kernel
On Mon, May 24, 2010 at 03:28:45PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> > since _cpu_up() and _cpu_down() can never run in
> > parallel, because of cpu_add_remove_lock.
>
> Ah indeed. I guess your initial patch works then.
One thing I found surprising was that a cpu's rt-bandwidth renewal could be
dependant on another cpu's (rt-bandwidth) timer firing ontime. In this case, we
had migration/23 pulled over to CPU0 and we hung later waiting for migration/23
to exit. migration/23 was not exiting because it could not run on CPU0 (as
CPU0's rt-bandwidth had expired). This situation remained forever. I would have
expected CPU0's bandwidth to have been renewed independent of some timer on
CPU23 to fire - maybe I am missing something not obvious in the code?
- vatsa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Make sure timers have migrated before killing migration_thread
2010-05-24 15:16 ` Srivatsa Vaddagiri
@ 2010-05-24 15:55 ` Peter Zijlstra
0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-24 15:55 UTC (permalink / raw
To: vatsa
Cc: Amit K. Arora, tj, Ingo Molnar, Gautham R Shenoy, Darren Hart,
Brian King, linux-kernel
On Mon, 2010-05-24 at 20:46 +0530, Srivatsa Vaddagiri wrote:
> On Mon, May 24, 2010 at 03:28:45PM +0200, Peter Zijlstra wrote:
> > On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> > > since _cpu_up() and _cpu_down() can never run in
> > > parallel, because of cpu_add_remove_lock.
> >
> > Ah indeed. I guess your initial patch works then.
>
> One thing I found surprising was that a cpu's rt-bandwidth renewal could be
> dependant on another cpu's (rt-bandwidth) timer firing ontime. In this case, we
> had migration/23 pulled over to CPU0 and we hung later waiting for migration/23
> to exit. migration/23 was not exiting because it could not run on CPU0 (as
> CPU0's rt-bandwidth had expired). This situation remained forever. I would have
> expected CPU0's bandwidth to have been renewed independent of some timer on
> CPU23 to fire - maybe I am missing something not obvious in the code?
The bandwidth constraint is per cgroup, and cgroups span cpus.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Make sure timers have migrated before killing migration_thread
2010-05-24 9:59 ` [PATCH] " Amit K. Arora
2010-05-24 13:28 ` Peter Zijlstra
@ 2010-05-25 11:31 ` Peter Zijlstra
2010-05-25 12:10 ` Amit K. Arora
2010-05-25 13:23 ` [PATCH v3] " Amit K. Arora
1 sibling, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-25 11:31 UTC (permalink / raw
To: Amit K. Arora
Cc: tj, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel
On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
>
> Thus, since above race can never happen, is there any other issue with
> this patch ?
It doesn't seem to apply nicely...
>
> Signed-off-by: Amit Arora <aarora@in.ibm.com>
> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> ---
> diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
> --- linux-2.6.34.org/kernel/sched.c 2010-05-18 22:56:21.000000000 -0700
> +++ linux-2.6.34/kernel/sched.c 2010-05-18 22:58:31.000000000 -0700
> @@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
> cpu_rq(cpu)->migration_thread = NULL;
> break;
>
> + case CPU_POST_DEAD:
> + /*
> + * Bring the migration thread down in CPU_POST_DEAD event,
> + * since the timers should have got migrated by now and thus
> + * we should not see a deadlock between trying to kill the
> + * migration thread and the sched_rt_period_timer.
> + */
> + cpuset_lock();
> + rq = cpu_rq(cpu);
> + kthread_stop(rq->migration_thread);
> + put_task_struct(rq->migration_thread);
> + rq->migration_thread = NULL;
> + cpuset_unlock();
> + break;
> +
> case CPU_DEAD:
> case CPU_DEAD_FROZEN:
> cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
> migrate_live_tasks(cpu);
> rq = cpu_rq(cpu);
> - kthread_stop(rq->migration_thread);
> - put_task_struct(rq->migration_thread);
> - rq->migration_thread = NULL;
> /* Idle task back to normal (off runqueue, low prio) */
> raw_spin_lock_irq(&rq->lock);
> update_rq_clock(rq);
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Make sure timers have migrated before killing migration_thread
2010-05-25 11:31 ` Peter Zijlstra
@ 2010-05-25 12:10 ` Amit K. Arora
2010-05-25 13:23 ` [PATCH v3] " Amit K. Arora
1 sibling, 0 replies; 19+ messages in thread
From: Amit K. Arora @ 2010-05-25 12:10 UTC (permalink / raw
To: Peter Zijlstra
Cc: tj, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel
On Tue, May 25, 2010 at 01:31:35PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> >
> > Thus, since above race can never happen, is there any other issue with
> > this patch ?
>
> It doesn't seem to apply nicely...
Hi Peter,
This patch was built on top of 2.6.34, where it applies cleanly. I
checked the git tree and the cpu_stop patchset from Tejun has changed
this code quite much (commit 969c79215a35b06e5e3efe69b9412f858df7856c).
The change is now required in cpu_stop_cpu_callback() in stop_machine.c.
Will post it soon. Thanks!
--
Regards,
Amit Arora
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3] Make sure timers have migrated before killing migration_thread
2010-05-25 11:31 ` Peter Zijlstra
2010-05-25 12:10 ` Amit K. Arora
@ 2010-05-25 13:23 ` Amit K. Arora
2010-05-25 14:22 ` Peter Zijlstra
` (2 more replies)
1 sibling, 3 replies; 19+ messages in thread
From: Amit K. Arora @ 2010-05-25 13:23 UTC (permalink / raw
To: Peter Zijlstra, Ingo Molnar
Cc: tj, Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart, Brian King,
linux-kernel
On Tue, May 25, 2010 at 01:31:35PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> >
> > Thus, since above race can never happen, is there any other issue with
> > this patch ?
>
> It doesn't seem to apply nicely...
Here is the new patch.
Problem : In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems to
be hung at a point where migration_call() tries to kill the migration_thread
of the dying CPU, which just got moved to the current CPU. This migration
thread does not get a chance to run (and die) since rt_throttled is set to 1
on current, and it doesn't get cleared as the hrtimer which is supposed to
reset the rt bandwidth (sched_rt_period_timer) is tied to the CPU which we just
marked dead!
Solution : This patch pushes the killing of migration thread to "CPU_POST_DEAD"
event. By then all the timers (including sched_rt_period_timer) should have got
migrated (along with other callbacks).
Thanks!
Regards,
Amit Arora
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
---
diff -Nuarp linux-2.6-next-20100525.org/kernel/stop_machine.c linux-2.6-next-20100525/kernel/stop_machine.c
--- linux-2.6-next-20100525.org/kernel/stop_machine.c 2010-05-25 14:27:51.000000000 -0400
+++ linux-2.6-next-20100525/kernel/stop_machine.c 2010-05-25 14:30:09.000000000 -0400
@@ -321,7 +321,7 @@ static int __cpuinit cpu_stop_cpu_callba
#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
- case CPU_DEAD:
+ case CPU_POST_DEAD:
{
struct cpu_stop_work *work;
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3] Make sure timers have migrated before killing migration_thread
2010-05-25 13:23 ` [PATCH v3] " Amit K. Arora
@ 2010-05-25 14:22 ` Peter Zijlstra
2010-05-25 16:27 ` Tejun Heo
2010-05-31 7:18 ` [tip:sched/urgent] sched: Make sure timers have migrated before killing the migration_thread tip-bot for Amit K. Arora
2 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-25 14:22 UTC (permalink / raw
To: Amit K. Arora
Cc: Ingo Molnar, tj, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel
On Tue, 2010-05-25 at 18:53 +0530, Amit K. Arora wrote:
nice :-)
Thanks!
> Signed-off-by: Amit Arora <aarora@in.ibm.com>
> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Tejun Heo <tj@kernel.org>
> ---
> diff -Nuarp linux-2.6-next-20100525.org/kernel/stop_machine.c linux-2.6-next-20100525/kernel/stop_machine.c
> --- linux-2.6-next-20100525.org/kernel/stop_machine.c 2010-05-25 14:27:51.000000000 -0400
> +++ linux-2.6-next-20100525/kernel/stop_machine.c 2010-05-25 14:30:09.000000000 -0400
> @@ -321,7 +321,7 @@ static int __cpuinit cpu_stop_cpu_callba
>
> #ifdef CONFIG_HOTPLUG_CPU
> case CPU_UP_CANCELED:
> - case CPU_DEAD:
> + case CPU_POST_DEAD:
> {
> struct cpu_stop_work *work;
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3] Make sure timers have migrated before killing migration_thread
2010-05-25 13:23 ` [PATCH v3] " Amit K. Arora
2010-05-25 14:22 ` Peter Zijlstra
@ 2010-05-25 16:27 ` Tejun Heo
2010-05-31 7:18 ` [tip:sched/urgent] sched: Make sure timers have migrated before killing the migration_thread tip-bot for Amit K. Arora
2 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2010-05-25 16:27 UTC (permalink / raw
To: Amit K. Arora
Cc: Peter Zijlstra, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel
Acked-by: Tejun Heo <tj@kernel.org>
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
2010-05-20 7:28 ` Peter Zijlstra
2010-05-23 9:07 ` Mike Galbraith
2010-05-24 6:43 ` Amit K. Arora
@ 2010-05-25 20:19 ` Thomas Gleixner
2010-05-26 6:43 ` Peter Zijlstra
2 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2010-05-25 20:19 UTC (permalink / raw
To: Peter Zijlstra
Cc: Amit K. Arora, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel
On Thu, 20 May 2010, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > Alternate Solution considered : Another option considered was to
> > increase the priority of the hrtimer cpu offline notifier, such that it
> > gets to run before scheduler's migration cpu offline notifier. In this
> > way we are sure that the timers will get migrated before migration_call
> > tries to kill migration_thread. But, this can have some non-obvious
> > implications, suggested Srivatsa.
>
>
> > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > the hotplug lock and thus the above becomes a race where we could
> > > possible kill off the migration thread of a newly brought up cpu:
> > >
> > > cpu0 - down 2
> > > cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > > cpu0 - post_down 2 - frees the migration thread -- oops!
> >
> > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > The cpuset_lock will synchronize, and thus avoid race between killing of
> > migration_thread in up_prepare and post_dead events.
> >
> > Here is the updated patch. If you don't like this one too, do you mind
> > suggesting an alternate approach to tackle the problem ? Thanks !
>
> Right, so this isn't pretty at all..
>
> Ingo, the comment near the migration_notifier says that migration_call
> should happen before all else, but can you see anything that would break
> if we let the timer migration happen first?
>
> Thomas?
That should work, though what is killing the scheduler per rq hrtimers
_before_ we migrate stuff ? We don't want to migrate them, right ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
2010-05-25 20:19 ` Thomas Gleixner
@ 2010-05-26 6:43 ` Peter Zijlstra
0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-26 6:43 UTC (permalink / raw
To: Thomas Gleixner
Cc: Amit K. Arora, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
Darren Hart, Brian King, linux-kernel
On Tue, 2010-05-25 at 22:19 +0200, Thomas Gleixner wrote:
> On Thu, 20 May 2010, Peter Zijlstra wrote:
>
> > On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > > Alternate Solution considered : Another option considered was to
> > > increase the priority of the hrtimer cpu offline notifier, such that it
> > > gets to run before scheduler's migration cpu offline notifier. In this
> > > way we are sure that the timers will get migrated before migration_call
> > > tries to kill migration_thread. But, this can have some non-obvious
> > > implications, suggested Srivatsa.
> >
> >
> > > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > > the hotplug lock and thus the above becomes a race where we could
> > > > possible kill off the migration thread of a newly brought up cpu:
> > > >
> > > > cpu0 - down 2
> > > > cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > > > cpu0 - post_down 2 - frees the migration thread -- oops!
> > >
> > > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > > The cpuset_lock will synchronize, and thus avoid race between killing of
> > > migration_thread in up_prepare and post_dead events.
> > >
> > > Here is the updated patch. If you don't like this one too, do you mind
> > > suggesting an alternate approach to tackle the problem ? Thanks !
> >
> > Right, so this isn't pretty at all..
> >
> > Ingo, the comment near the migration_notifier says that migration_call
> > should happen before all else, but can you see anything that would break
> > if we let the timer migration happen first?
> >
> > Thomas?
>
> That should work, though what is killing the scheduler per rq hrtimers
> _before_ we migrate stuff ? We don't want to migrate them, right ?
They're not rq timers, they're the 'cgroup' bandwidth timers and those
are free to migrate.
What I think happens is that the timer ends up being on the cpu that
goes down, then we disable IRQs on it and run out of bandwidth and get
stuck.
Anyway, we solved it with a one-liner in a different way.
Eventually I'll rip the whole migration thread thingy out of SCHED_FIFO,
which too should solve the issue I think.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [tip:sched/urgent] sched: Make sure timers have migrated before killing the migration_thread
2010-05-25 13:23 ` [PATCH v3] " Amit K. Arora
2010-05-25 14:22 ` Peter Zijlstra
2010-05-25 16:27 ` Tejun Heo
@ 2010-05-31 7:18 ` tip-bot for Amit K. Arora
2 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Amit K. Arora @ 2010-05-31 7:18 UTC (permalink / raw
To: linux-tip-commits
Cc: linux-kernel, ego, aarora, hpa, mingo, a.p.zijlstra, tj, tglx,
aarora, mingo
Commit-ID: 54e88fad223c4e1d94289611a90c7fe3ebe5631b
Gitweb: http://git.kernel.org/tip/54e88fad223c4e1d94289611a90c7fe3ebe5631b
Author: Amit K. Arora <aarora@linux.vnet.ibm.com>
AuthorDate: Tue, 25 May 2010 18:53:46 +0530
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 31 May 2010 08:37:44 +0200
sched: Make sure timers have migrated before killing the migration_thread
Problem: In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems
to be hung at a point where migration_call() tries to kill the
migration_thread of the dying CPU, which just got moved to the current
CPU. This migration thread does not get a chance to run (and die) since
rt_throttled is set to 1 on current, and it doesn't get cleared as the
hrtimer which is supposed to reset the rt bandwidth
(sched_rt_period_timer) is tied to the CPU which we just marked dead!
Solution: This patch pushes the killing of migration thread to
"CPU_POST_DEAD" event. By then all the timers (including
sched_rt_period_timer) should have got migrated (along with other
callbacks).
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20100525132346.GA14986@amitarora.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/stop_machine.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index b4e7431..70f8d90 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -321,7 +321,7 @@ static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
- case CPU_DEAD:
+ case CPU_POST_DEAD:
{
struct cpu_stop_work *work;
^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-05-31 7:20 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-19 9:05 [PATCH] Make sure timers have migrated before killing migration_thread Amit K. Arora
2010-05-19 9:31 ` Peter Zijlstra
2010-05-19 12:13 ` [PATCH v2] " Amit K. Arora
2010-05-20 7:28 ` Peter Zijlstra
2010-05-23 9:07 ` Mike Galbraith
2010-05-23 9:13 ` Peter Zijlstra
2010-05-24 6:43 ` Amit K. Arora
2010-05-25 20:19 ` Thomas Gleixner
2010-05-26 6:43 ` Peter Zijlstra
2010-05-24 9:59 ` [PATCH] " Amit K. Arora
2010-05-24 13:28 ` Peter Zijlstra
2010-05-24 15:16 ` Srivatsa Vaddagiri
2010-05-24 15:55 ` Peter Zijlstra
2010-05-25 11:31 ` Peter Zijlstra
2010-05-25 12:10 ` Amit K. Arora
2010-05-25 13:23 ` [PATCH v3] " Amit K. Arora
2010-05-25 14:22 ` Peter Zijlstra
2010-05-25 16:27 ` Tejun Heo
2010-05-31 7:18 ` [tip:sched/urgent] sched: Make sure timers have migrated before killing the migration_thread tip-bot for Amit K. Arora
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).