LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH] Make sure timers have migrated before killing migration_thread
@ 2010-05-19  9:05 Amit K. Arora
  2010-05-19  9:31 ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Amit K. Arora @ 2010-05-19  9:05 UTC (permalink / raw
  To: Peter Zijlstra, Ingo Molnar
  Cc: Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart, Brian King,
	linux-kernel

Problem : In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems to
be hung at a point where migration_call() tries to kill the migration_thread
of the dying CPU, which just got moved to the current CPU. This migration
thread does not get a chance to run (and die) since rt_throttled is set to 1
on current, and it doesn't get cleared as the hrtimer which is supposed to
reset the rt bandwidth (sched_rt_period_timer) is tied to the CPU being
offlined.

Solution : This patch pushes the killing of migration thread to "CPU_POST_DEAD"
event. By then all the timers (including sched_rt_period_timer) should have got
migrated (along with other callbacks).

Alternate Solution considered : Another option considered was to
increase the priority of the hrtimer cpu offline notifier, such that it
gets to run before scheduler's migration cpu offline notifier. In this
way we are sure that the timers will get migrated before migration_call
tries to kill migration_thread. But, this can have some non-obvious
implications, suggested Srivatsa.

Testing : Without the patch the stress tests didn't last for even 12
hours. And yes, the problem was reproducible. With the patch applied the
tests ran successfully for more than 48 hours.

Thanks!
--
Regards,
Amit Arora

 Signed-off-by: Amit Arora <aarora@in.ibm.com>
 Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
--
diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
--- linux-2.6.34.org/kernel/sched.c	2010-05-18 22:56:21.000000000 -0700
+++ linux-2.6.34/kernel/sched.c	2010-05-18 22:58:31.000000000 -0700
@@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
 		cpu_rq(cpu)->migration_thread = NULL;
 		break;
 
+	case CPU_POST_DEAD:
+		/*
+		  Bring the migration thread down in CPU_POST_DEAD event,
+		  since the timers should have got migrated by now and thus
+		  we should not see a deadlock between trying to kill the
+		  migration thread and the sched_rt_period_timer.
+		*/
+		cpuset_lock();
+		rq = cpu_rq(cpu);
+		kthread_stop(rq->migration_thread);
+		put_task_struct(rq->migration_thread);
+		rq->migration_thread = NULL;
+		cpuset_unlock();
+		break;
+
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
-		kthread_stop(rq->migration_thread);
-		put_task_struct(rq->migration_thread);
-		rq->migration_thread = NULL;
 		/* Idle task back to normal (off runqueue, low prio) */
 		raw_spin_lock_irq(&rq->lock);
 		update_rq_clock(rq);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Make sure timers have migrated before killing migration_thread
  2010-05-19  9:05 [PATCH] Make sure timers have migrated before killing migration_thread Amit K. Arora
@ 2010-05-19  9:31 ` Peter Zijlstra
  2010-05-19 12:13   ` [PATCH v2] " Amit K. Arora
  2010-05-24  9:59   ` [PATCH] " Amit K. Arora
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-19  9:31 UTC (permalink / raw
  To: Amit K. Arora
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart,
	Brian King, linux-kernel

On Wed, 2010-05-19 at 14:35 +0530, Amit K. Arora wrote:
> Problem : In a stress test where some heavy tests were running along with
> regular CPU offlining and onlining, a hang was observed. The system seems to
> be hung at a point where migration_call() tries to kill the migration_thread
> of the dying CPU, which just got moved to the current CPU. This migration
> thread does not get a chance to run (and die) since rt_throttled is set to 1
> on current, and it doesn't get cleared as the hrtimer which is supposed to
> reset the rt bandwidth (sched_rt_period_timer) is tied to the CPU being
> offlined.
> 
> Solution : This patch pushes the killing of migration thread to "CPU_POST_DEAD"
> event. By then all the timers (including sched_rt_period_timer) should have got
> migrated (along with other callbacks).
> 
> Alternate Solution considered : Another option considered was to
> increase the priority of the hrtimer cpu offline notifier, such that it
> gets to run before scheduler's migration cpu offline notifier. In this
> way we are sure that the timers will get migrated before migration_call
> tries to kill migration_thread. But, this can have some non-obvious
> implications, suggested Srivatsa.
> 
> Testing : Without the patch the stress tests didn't last for even 12
> hours. And yes, the problem was reproducible. With the patch applied the
> tests ran successfully for more than 48 hours.
> 
> Thanks!
> --
> Regards,
> Amit Arora
> 
>  Signed-off-by: Amit Arora <aarora@in.ibm.com>
>  Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> --
> diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
> --- linux-2.6.34.org/kernel/sched.c	2010-05-18 22:56:21.000000000 -0700
> +++ linux-2.6.34/kernel/sched.c	2010-05-18 22:58:31.000000000 -0700
> @@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
>  		cpu_rq(cpu)->migration_thread = NULL;
>  		break;
>  
> +	case CPU_POST_DEAD:
> +		/*
> +		  Bring the migration thread down in CPU_POST_DEAD event,
> +		  since the timers should have got migrated by now and thus
> +		  we should not see a deadlock between trying to kill the
> +		  migration thread and the sched_rt_period_timer.
> +		*/

Faulty comment style that, please use:

 /*
  * text
  *  goes
  *   here
  */

> +		cpuset_lock();
> +		rq = cpu_rq(cpu);
> +		kthread_stop(rq->migration_thread);
> +		put_task_struct(rq->migration_thread);
> +		rq->migration_thread = NULL;
> +		cpuset_unlock();
> +		break;
> +

The other problem is more urgent though, CPU_POST_DEAD runs outside of
the hotplug lock and thus the above becomes a race where we could
possible kill off the migration thread of a newly brought up cpu:

 cpu0 - down 2
 cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
 cpu0 - post_down 2 - frees the migration thread -- oops!



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] Make sure timers have migrated before killing migration_thread
  2010-05-19  9:31 ` Peter Zijlstra
@ 2010-05-19 12:13   ` Amit K. Arora
  2010-05-20  7:28     ` Peter Zijlstra
  2010-05-24  9:59   ` [PATCH] " Amit K. Arora
  1 sibling, 1 reply; 19+ messages in thread
From: Amit K. Arora @ 2010-05-19 12:13 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart,
	Brian King, linux-kernel

On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 14:35 +0530, Amit K. Arora wrote:

Hi Peter,

Thanks for the review!

> > diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
> > --- linux-2.6.34.org/kernel/sched.c	2010-05-18 22:56:21.000000000 -0700
> > +++ linux-2.6.34/kernel/sched.c	2010-05-18 22:58:31.000000000 -0700
> > @@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
> >  		cpu_rq(cpu)->migration_thread = NULL;
> >  		break;
> >  
> > +	case CPU_POST_DEAD:
> > +		/*
> > +		  Bring the migration thread down in CPU_POST_DEAD event,
> > +		  since the timers should have got migrated by now and thus
> > +		  we should not see a deadlock between trying to kill the
> > +		  migration thread and the sched_rt_period_timer.
> > +		*/
> 
> Faulty comment style that, please use:
> 
>  /*
>   * text
>   *  goes
>   *   here
>   */

Sure.
 
> > +		cpuset_lock();
> > +		rq = cpu_rq(cpu);
> > +		kthread_stop(rq->migration_thread);
> > +		put_task_struct(rq->migration_thread);
> > +		rq->migration_thread = NULL;
> > +		cpuset_unlock();
> > +		break;
> > +
> 
> The other problem is more urgent though, CPU_POST_DEAD runs outside of
> the hotplug lock and thus the above becomes a race where we could
> possible kill off the migration thread of a newly brought up cpu:
> 
>  cpu0 - down 2
>  cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
>  cpu0 - post_down 2 - frees the migration thread -- oops!

Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
The cpuset_lock will synchronize, and thus avoid race between killing of
migration_thread in up_prepare and post_dead events. 

Here is the updated patch. If you don't like this one too, do you mind
suggesting an alternate approach to tackle the problem ? Thanks !

--
Regards,
Amit Arora
 

 Signed-off-by: Amit Arora <aarora@in.ibm.com>
 Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
--
diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
--- linux-2.6.34.org/kernel/sched.c	2010-05-18 22:56:21.000000000 -0700
+++ linux-2.6.34/kernel/sched.c	2010-05-19 04:47:49.000000000 -0700
@@ -5900,6 +5900,19 @@ migration_call(struct notifier_block *nf
 
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
+		cpuset_lock();
+		rq = cpu_rq(cpu);
+		/*
+		 * Since we now kill migration_thread in CPU_POST_DEAD event,
+		 * there may be a race here. So, lets cleanup the old
+		 * migration_thread on the rq, if any.
+		 */
+		if (unlikely(rq->migration_thread)) {
+			kthread_stop(rq->migration_thread);
+			put_task_struct(rq->migration_thread);
+			rq->migration_thread = NULL;
+		}
+		cpuset_unlock();
 		p = kthread_create(migration_thread, hcpu, "migration/%d", cpu);
 		if (IS_ERR(p))
 			return NOTIFY_BAD;
@@ -5942,14 +5955,34 @@ migration_call(struct notifier_block *nf
 		cpu_rq(cpu)->migration_thread = NULL;
 		break;
 
+	case CPU_POST_DEAD:
+		/*
+		 * Bring the migration thread down in CPU_POST_DEAD event,
+		 * since the timers should have got migrated by now and thus
+		 * we should not see a deadlock between trying to kill the
+		 * migration thread and the sched_rt_period_timer.
+		 */
+		cpuset_lock();
+		rq = cpu_rq(cpu);
+		if (likely(rq->migration_thread)) {
+			/*
+			 * Its possible that this CPU was onlined (from a
+			 * different CPU) before we reached here and
+			 * migration_thread was cleaned-up in the
+			 * CPU_UP_PREPARE event handling.
+			 */
+			kthread_stop(rq->migration_thread);
+			put_task_struct(rq->migration_thread);
+			rq->migration_thread = NULL;
+		}
+		cpuset_unlock();
+		break;
+
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
-		kthread_stop(rq->migration_thread);
-		put_task_struct(rq->migration_thread);
-		rq->migration_thread = NULL;
 		/* Idle task back to normal (off runqueue, low prio) */
 		raw_spin_lock_irq(&rq->lock);
 		update_rq_clock(rq);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
  2010-05-19 12:13   ` [PATCH v2] " Amit K. Arora
@ 2010-05-20  7:28     ` Peter Zijlstra
  2010-05-23  9:07       ` Mike Galbraith
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-20  7:28 UTC (permalink / raw
  To: Amit K. Arora
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart,
	Brian King, linux-kernel, Thomas Gleixner

On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> Alternate Solution considered : Another option considered was to
> increase the priority of the hrtimer cpu offline notifier, such that it
> gets to run before scheduler's migration cpu offline notifier. In this
> way we are sure that the timers will get migrated before migration_call
> tries to kill migration_thread. But, this can have some non-obvious
> implications, suggested Srivatsa.


> On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > the hotplug lock and thus the above becomes a race where we could
> > possible kill off the migration thread of a newly brought up cpu:
> > 
> >  cpu0 - down 2
> >  cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> >  cpu0 - post_down 2 - frees the migration thread -- oops!
> 
> Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> The cpuset_lock will synchronize, and thus avoid race between killing of
> migration_thread in up_prepare and post_dead events. 
> 
> Here is the updated patch. If you don't like this one too, do you mind
> suggesting an alternate approach to tackle the problem ? Thanks !

Right, so this isn't pretty at all..

Ingo, the comment near the migration_notifier says that migration_call
should happen before all else, but can you see anything that would break
if we let the timer migration happen first?

Thomas?

>  Signed-off-by: Amit Arora <aarora@in.ibm.com>
>  Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> --

For everybody who reads this, _please_ use three (3) dashes '-' to
separate the changelog from the patch, and left-align the changelog
(including all tags).

I seem to get more and more people sending patches with 2 dashes and
daft changelogs with whitespace stuffing which break my scripts.

> diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
> --- linux-2.6.34.org/kernel/sched.c	2010-05-18 22:56:21.000000000 -0700
> +++ linux-2.6.34/kernel/sched.c	2010-05-19 04:47:49.000000000 -0700
> @@ -5900,6 +5900,19 @@ migration_call(struct notifier_block *nf
>  
>  	case CPU_UP_PREPARE:
>  	case CPU_UP_PREPARE_FROZEN:
> +		cpuset_lock();
> +		rq = cpu_rq(cpu);
> +		/*
> +		 * Since we now kill migration_thread in CPU_POST_DEAD event,
> +		 * there may be a race here. So, lets cleanup the old
> +		 * migration_thread on the rq, if any.
> +		 */
> +		if (unlikely(rq->migration_thread)) {
> +			kthread_stop(rq->migration_thread);
> +			put_task_struct(rq->migration_thread);
> +			rq->migration_thread = NULL;
> +		}
> +		cpuset_unlock();
>  		p = kthread_create(migration_thread, hcpu, "migration/%d", cpu);
>  		if (IS_ERR(p))
>  			return NOTIFY_BAD;
> @@ -5942,14 +5955,34 @@ migration_call(struct notifier_block *nf
>  		cpu_rq(cpu)->migration_thread = NULL;
>  		break;
>  
> +	case CPU_POST_DEAD:
> +		/*
> +		 * Bring the migration thread down in CPU_POST_DEAD event,
> +		 * since the timers should have got migrated by now and thus
> +		 * we should not see a deadlock between trying to kill the
> +		 * migration thread and the sched_rt_period_timer.
> +		 */
> +		cpuset_lock();
> +		rq = cpu_rq(cpu);
> +		if (likely(rq->migration_thread)) {
> +			/*
> +			 * Its possible that this CPU was onlined (from a
> +			 * different CPU) before we reached here and
> +			 * migration_thread was cleaned-up in the
> +			 * CPU_UP_PREPARE event handling.
> +			 */
> +			kthread_stop(rq->migration_thread);
> +			put_task_struct(rq->migration_thread);
> +			rq->migration_thread = NULL;
> +		}
> +		cpuset_unlock();
> +		break;
> +
>  	case CPU_DEAD:
>  	case CPU_DEAD_FROZEN:
>  		cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
>  		migrate_live_tasks(cpu);
>  		rq = cpu_rq(cpu);
> -		kthread_stop(rq->migration_thread);
> -		put_task_struct(rq->migration_thread);
> -		rq->migration_thread = NULL;
>  		/* Idle task back to normal (off runqueue, low prio) */
>  		raw_spin_lock_irq(&rq->lock);
>  		update_rq_clock(rq);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
  2010-05-20  7:28     ` Peter Zijlstra
@ 2010-05-23  9:07       ` Mike Galbraith
  2010-05-23  9:13         ` Peter Zijlstra
  2010-05-24  6:43       ` Amit K. Arora
  2010-05-25 20:19       ` Thomas Gleixner
  2 siblings, 1 reply; 19+ messages in thread
From: Mike Galbraith @ 2010-05-23  9:07 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: Amit K. Arora, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel, Thomas Gleixner

On Thu, 2010-05-20 at 09:28 +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > Alternate Solution considered : Another option considered was to
> > increase the priority of the hrtimer cpu offline notifier, such that it
> > gets to run before scheduler's migration cpu offline notifier. In this
> > way we are sure that the timers will get migrated before migration_call
> > tries to kill migration_thread. But, this can have some non-obvious
> > implications, suggested Srivatsa.
> 
> 
> > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > the hotplug lock and thus the above becomes a race where we could
> > > possible kill off the migration thread of a newly brought up cpu:
> > > 
> > >  cpu0 - down 2
> > >  cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > >  cpu0 - post_down 2 - frees the migration thread -- oops!
> > 
> > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > The cpuset_lock will synchronize, and thus avoid race between killing of
> > migration_thread in up_prepare and post_dead events. 
> > 
> > Here is the updated patch. If you don't like this one too, do you mind
> > suggesting an alternate approach to tackle the problem ? Thanks !
> 
> Right, so this isn't pretty at all..

Since the problem seems to stem from interfering with a critical thread,
how about create a SCHED_SYSTEM_CRITICAL flag ala SCHED_RESET_ON_FORK?

Not particularly beautiful, and completely untested (well, it compiles).

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6cc43e0..23da2cf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -40,6 +40,8 @@
 #define SCHED_IDLE		5
 /* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
 #define SCHED_RESET_ON_FORK     0x40000000
+/* Can be ORed in to flag a thread as being system critical.  Not inherited. */
+#define SCHED_SYSTEM_CRITICAL	0x20000000
 
 #ifdef __KERNEL__
 
@@ -1240,6 +1242,9 @@ struct task_struct {
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
 
+	/* System critical thread.  Cleared on fork. */
+	unsigned sched_system_critical:1;
+
 	pid_t pid;
 	pid_t tgid;
 
diff --git a/kernel/sched.c b/kernel/sched.c
index 2d17e3b..caf8b95 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2511,6 +2511,11 @@ void sched_fork(struct task_struct *p, int clone_flags)
 	 */
 	p->prio = current->normal_prio;
 
+	/*
+	 * System critical policy flag is not inherited.
+	 */
+	p->sched_system_critical = 0;
+
 	if (!rt_prio(p->prio))
 		p->sched_class = &fair_sched_class;
 
@@ -4486,7 +4491,7 @@ static int __sched_setscheduler(struct task_struct *p, int policy,
 	unsigned long flags;
 	const struct sched_class *prev_class;
 	struct rq *rq;
-	int reset_on_fork;
+	int reset_on_fork, system_critical;
 
 	/* may grab non-irq protected spin_locks */
 	BUG_ON(in_interrupt());
@@ -4494,10 +4499,12 @@ recheck:
 	/* double check policy once rq lock held */
 	if (policy < 0) {
 		reset_on_fork = p->sched_reset_on_fork;
+		system_critical = p->sched_system_critical;
 		policy = oldpolicy = p->policy;
 	} else {
 		reset_on_fork = !!(policy & SCHED_RESET_ON_FORK);
-		policy &= ~SCHED_RESET_ON_FORK;
+		system_critical = !!(policy & SCHED_SYSTEM_CRITICAL);
+		policy &= ~(SCHED_RESET_ON_FORK|SCHED_SYSTEM_CRITICAL);
 
 		if (policy != SCHED_FIFO && policy != SCHED_RR &&
 				policy != SCHED_NORMAL && policy != SCHED_BATCH &&
@@ -4552,6 +4559,10 @@ recheck:
 		/* Normal users shall not reset the sched_reset_on_fork flag */
 		if (p->sched_reset_on_fork && !reset_on_fork)
 			return -EPERM;
+
+		/* Normal users shall not reset the sched_system_critical flag */
+		if (p->sched_system_critical && !system_critical)
+			return -EPERM;
 	}
 
 	if (user) {
@@ -4560,7 +4571,7 @@ recheck:
 		 * Do not allow realtime tasks into groups that have no runtime
 		 * assigned.
 		 */
-		if (rt_bandwidth_enabled() && rt_policy(policy) &&
+		if (rt_bandwidth_enabled() && rt_policy(policy) && !system_critical &&
 				task_group(p)->rt_bandwidth.rt_runtime == 0)
 			return -EPERM;
 #endif
@@ -4595,6 +4606,7 @@ recheck:
 		p->sched_class->put_prev_task(rq, p);
 
 	p->sched_reset_on_fork = reset_on_fork;
+	p->sched_system_critical = system_critical;
 
 	oldprio = p->prio;
 	prev_class = p->sched_class;
@@ -4712,9 +4724,13 @@ SYSCALL_DEFINE1(sched_getscheduler, pid_t, pid)
 	p = find_process_by_pid(pid);
 	if (p) {
 		retval = security_task_getscheduler(p);
-		if (!retval)
-			retval = p->policy
-				| (p->sched_reset_on_fork ? SCHED_RESET_ON_FORK : 0);
+		if (!retval) {
+			retval = p->policy;
+			if (p->sched_reset_on_fork)
+				retval |= SCHED_RESET_ON_FORK;
+			if (p->sched_system_critical)
+				retval |= SCHED_SYSTEM_CRITICAL;
+		}
 	}
 	rcu_read_unlock();
 	return retval;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 8afb953..4fb5ced 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -605,6 +605,7 @@ static void update_curr_rt(struct rq *rq)
 	struct sched_rt_entity *rt_se = &curr->rt;
 	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
 	u64 delta_exec;
+	int system_critical = curr->sched_system_critical;
 
 	if (!task_has_rt_policy(curr))
 		return;
@@ -621,9 +622,13 @@ static void update_curr_rt(struct rq *rq)
 	curr->se.exec_start = rq->clock;
 	cpuacct_charge(curr, delta_exec);
 
-	sched_rt_avg_update(rq, delta_exec);
+	/*
+	 * System critical tasks do not contribute to bandwidth consumption,
+	 * nor are they evicted when runtime is exceeded.
+	 */
+	sched_rt_avg_update(rq, system_critical ? 0 : delta_exec);
 
-	if (!rt_bandwidth_enabled())
+	if (!rt_bandwidth_enabled() || system_critical)
 		return;
 
 	for_each_sched_rt_entity(rt_se) {
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index b4e7431..6cbea9a 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -304,7 +304,7 @@ static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
 				   cpu);
 		if (IS_ERR(p))
 			return NOTIFY_BAD;
-		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
+		sched_setscheduler_nocheck(p, SCHED_FIFO|SCHED_SYSTEM_CRITICAL, &param);
 		get_task_struct(p);
 		stopper->thread = p;
 		break;



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
  2010-05-23  9:07       ` Mike Galbraith
@ 2010-05-23  9:13         ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-23  9:13 UTC (permalink / raw
  To: Mike Galbraith
  Cc: Amit K. Arora, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel, Thomas Gleixner

On Sun, 2010-05-23 at 11:07 +0200, Mike Galbraith wrote:
> On Thu, 2010-05-20 at 09:28 +0200, Peter Zijlstra wrote:
> > On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > > Alternate Solution considered : Another option considered was to
> > > increase the priority of the hrtimer cpu offline notifier, such that it
> > > gets to run before scheduler's migration cpu offline notifier. In this
> > > way we are sure that the timers will get migrated before migration_call
> > > tries to kill migration_thread. But, this can have some non-obvious
> > > implications, suggested Srivatsa.
> > 
> > 
> > > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > > the hotplug lock and thus the above becomes a race where we could
> > > > possible kill off the migration thread of a newly brought up cpu:
> > > > 
> > > >  cpu0 - down 2
> > > >  cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > > >  cpu0 - post_down 2 - frees the migration thread -- oops!
> > > 
> > > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > > The cpuset_lock will synchronize, and thus avoid race between killing of
> > > migration_thread in up_prepare and post_dead events. 
> > > 
> > > Here is the updated patch. If you don't like this one too, do you mind
> > > suggesting an alternate approach to tackle the problem ? Thanks !
> > 
> > Right, so this isn't pretty at all..
> 
> Since the problem seems to stem from interfering with a critical thread,
> how about create a SCHED_SYSTEM_CRITICAL flag ala SCHED_RESET_ON_FORK?
> 
> Not particularly beautiful, and completely untested (well, it compiles).

Nah, I'd rather we pull the migration thread out of SCHED_FIFO and
either schedule it explicit or add a sched_class for it. We need to do
that anyway once we go play with SCHED_DEADLINE.

But it would be very nice if we could simply order the timer and task
migration bits so that the whole problem doesn't exist in the first
place.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
  2010-05-20  7:28     ` Peter Zijlstra
  2010-05-23  9:07       ` Mike Galbraith
@ 2010-05-24  6:43       ` Amit K. Arora
  2010-05-25 20:19       ` Thomas Gleixner
  2 siblings, 0 replies; 19+ messages in thread
From: Amit K. Arora @ 2010-05-24  6:43 UTC (permalink / raw
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner
  Cc: Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart, Brian King,
	linux-kernel

On Thu, May 20, 2010 at 09:28:03AM +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > Alternate Solution considered : Another option considered was to
> > increase the priority of the hrtimer cpu offline notifier, such that it
> > gets to run before scheduler's migration cpu offline notifier. In this
> > way we are sure that the timers will get migrated before migration_call
> > tries to kill migration_thread. But, this can have some non-obvious
> > implications, suggested Srivatsa.
> 
> 
> > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > the hotplug lock and thus the above becomes a race where we could
> > > possible kill off the migration thread of a newly brought up cpu:
> > > 
> > >  cpu0 - down 2
> > >  cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > >  cpu0 - post_down 2 - frees the migration thread -- oops!
> > 
> > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > The cpuset_lock will synchronize, and thus avoid race between killing of
> > migration_thread in up_prepare and post_dead events. 
> > 
> > Here is the updated patch. If you don't like this one too, do you mind
> > suggesting an alternate approach to tackle the problem ? Thanks !
> 
> Right, so this isn't pretty at all..
> 
> Ingo, the comment near the migration_notifier says that migration_call
> should happen before all else, but can you see anything that would break
> if we let the timer migration happen first?
> 
> Thomas?

Hello Ingo, Thomas,

Do you see any potential problems with migrating the timers before
the migration_call ?

Thanks!
--
Regards,
Amit Arora

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Make sure timers have migrated before killing migration_thread
  2010-05-19  9:31 ` Peter Zijlstra
  2010-05-19 12:13   ` [PATCH v2] " Amit K. Arora
@ 2010-05-24  9:59   ` Amit K. Arora
  2010-05-24 13:28     ` Peter Zijlstra
  2010-05-25 11:31     ` Peter Zijlstra
  1 sibling, 2 replies; 19+ messages in thread
From: Amit K. Arora @ 2010-05-24  9:59 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: tj, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel

On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 14:35 +0530, Amit K. Arora wrote:
> > +		cpuset_lock();
> > +		rq = cpu_rq(cpu);
> > +		kthread_stop(rq->migration_thread);
> > +		put_task_struct(rq->migration_thread);
> > +		rq->migration_thread = NULL;
> > +		cpuset_unlock();
> > +		break;
> > +
> 
> The other problem is more urgent though, CPU_POST_DEAD runs outside of
> the hotplug lock and thus the above becomes a race where we could
> possible kill off the migration thread of a newly brought up cpu:
> 
>  cpu0 - down 2
>  cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
>  cpu0 - post_down 2 - frees the migration thread -- oops!

<Adding Tejun Heo to CC list .. >

Hi Peter,

In an offline discussion with Tejun, he suggested that the above race
can not happen, since _cpu_up() and _cpu_down() can never run in
parallel, because of cpu_add_remove_lock. Looking at the code we can see
that cpu_up() and cpu_down() call "_" variants with cpu_add_remove_lock
mutex held (using cpu_maps_update_begin()).

Here is exactly what he had to say:

"I don't think that's possible.  There are two locks involved here. 
cpu_add_remove_lock and cpu_hotplug.lock.  The former wraps around the
second and already provides full exclusion between all cpu hotplug/unplug
operations. 
The latter is there for reader/writer type exclusion via
get/put_online_cpus().

CPU_POST_DEAD is outside of cpu_hotplug.lock allowing get_online_cpus()
to proceed in parallel but it's still inside cpu_add_remove_lock so other
cpu up/down operations cannot begin before it finishes. "


Thus, since above race can never happen, is there any other issue with
this patch ?

Thanks!
--
Regards,
Amit Arora


Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
--- linux-2.6.34.org/kernel/sched.c	2010-05-18 22:56:21.000000000 -0700
+++ linux-2.6.34/kernel/sched.c	2010-05-18 22:58:31.000000000 -0700
@@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
 		cpu_rq(cpu)->migration_thread = NULL;
 		break;
 
+	case CPU_POST_DEAD:
+		/*
+		 * Bring the migration thread down in CPU_POST_DEAD event,
+		 * since the timers should have got migrated by now and thus
+		 * we should not see a deadlock between trying to kill the
+		 * migration thread and the sched_rt_period_timer.
+		 */
+		cpuset_lock();
+		rq = cpu_rq(cpu);
+		kthread_stop(rq->migration_thread);
+		put_task_struct(rq->migration_thread);
+		rq->migration_thread = NULL;
+		cpuset_unlock();
+		break;
+
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
-		kthread_stop(rq->migration_thread);
-		put_task_struct(rq->migration_thread);
-		rq->migration_thread = NULL;
 		/* Idle task back to normal (off runqueue, low prio) */
 		raw_spin_lock_irq(&rq->lock);
 		update_rq_clock(rq);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Make sure timers have migrated before killing migration_thread
  2010-05-24  9:59   ` [PATCH] " Amit K. Arora
@ 2010-05-24 13:28     ` Peter Zijlstra
  2010-05-24 15:16       ` Srivatsa Vaddagiri
  2010-05-25 11:31     ` Peter Zijlstra
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-24 13:28 UTC (permalink / raw
  To: Amit K. Arora
  Cc: tj, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel

On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> since _cpu_up() and _cpu_down() can never run in
> parallel, because of cpu_add_remove_lock. 

Ah indeed. I guess your initial patch works then. 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Make sure timers have migrated before killing migration_thread
  2010-05-24 13:28     ` Peter Zijlstra
@ 2010-05-24 15:16       ` Srivatsa Vaddagiri
  2010-05-24 15:55         ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Srivatsa Vaddagiri @ 2010-05-24 15:16 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: Amit K. Arora, tj, Ingo Molnar, Gautham R Shenoy, Darren Hart,
	Brian King, linux-kernel

On Mon, May 24, 2010 at 03:28:45PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> > since _cpu_up() and _cpu_down() can never run in
> > parallel, because of cpu_add_remove_lock. 
> 
> Ah indeed. I guess your initial patch works then. 

One thing I found surprising was that a cpu's rt-bandwidth renewal could be
dependant on another cpu's (rt-bandwidth) timer firing ontime. In this case, we
had migration/23 pulled over to CPU0 and we hung later waiting for migration/23
to exit. migration/23 was not exiting because it could not run on CPU0 (as
CPU0's rt-bandwidth had expired). This situation remained forever. I would have
expected CPU0's bandwidth to have been renewed independent of some timer on
CPU23 to fire - maybe I am missing something not obvious in the code?

- vatsa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Make sure timers have migrated before killing migration_thread
  2010-05-24 15:16       ` Srivatsa Vaddagiri
@ 2010-05-24 15:55         ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-24 15:55 UTC (permalink / raw
  To: vatsa
  Cc: Amit K. Arora, tj, Ingo Molnar, Gautham R Shenoy, Darren Hart,
	Brian King, linux-kernel

On Mon, 2010-05-24 at 20:46 +0530, Srivatsa Vaddagiri wrote:
> On Mon, May 24, 2010 at 03:28:45PM +0200, Peter Zijlstra wrote:
> > On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> > > since _cpu_up() and _cpu_down() can never run in
> > > parallel, because of cpu_add_remove_lock. 
> > 
> > Ah indeed. I guess your initial patch works then. 
> 
> One thing I found surprising was that a cpu's rt-bandwidth renewal could be
> dependant on another cpu's (rt-bandwidth) timer firing ontime. In this case, we
> had migration/23 pulled over to CPU0 and we hung later waiting for migration/23
> to exit. migration/23 was not exiting because it could not run on CPU0 (as
> CPU0's rt-bandwidth had expired). This situation remained forever. I would have
> expected CPU0's bandwidth to have been renewed independent of some timer on
> CPU23 to fire - maybe I am missing something not obvious in the code?

The bandwidth constraint is per cgroup, and cgroups span cpus.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Make sure timers have migrated before killing migration_thread
  2010-05-24  9:59   ` [PATCH] " Amit K. Arora
  2010-05-24 13:28     ` Peter Zijlstra
@ 2010-05-25 11:31     ` Peter Zijlstra
  2010-05-25 12:10       ` Amit K. Arora
  2010-05-25 13:23       ` [PATCH v3] " Amit K. Arora
  1 sibling, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-25 11:31 UTC (permalink / raw
  To: Amit K. Arora
  Cc: tj, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel

On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> 
> Thus, since above race can never happen, is there any other issue with
> this patch ?

It doesn't seem to apply nicely...

> 
> Signed-off-by: Amit Arora <aarora@in.ibm.com>
> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> ---
> diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
> --- linux-2.6.34.org/kernel/sched.c     2010-05-18 22:56:21.000000000 -0700
> +++ linux-2.6.34/kernel/sched.c 2010-05-18 22:58:31.000000000 -0700
> @@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
>                 cpu_rq(cpu)->migration_thread = NULL;
>                 break;
>  
> +       case CPU_POST_DEAD:
> +               /*
> +                * Bring the migration thread down in CPU_POST_DEAD event,
> +                * since the timers should have got migrated by now and thus
> +                * we should not see a deadlock between trying to kill the
> +                * migration thread and the sched_rt_period_timer.
> +                */
> +               cpuset_lock();
> +               rq = cpu_rq(cpu);
> +               kthread_stop(rq->migration_thread);
> +               put_task_struct(rq->migration_thread);
> +               rq->migration_thread = NULL;
> +               cpuset_unlock();
> +               break;
> +
>         case CPU_DEAD:
>         case CPU_DEAD_FROZEN:
>                 cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
>                 migrate_live_tasks(cpu);
>                 rq = cpu_rq(cpu);
> -               kthread_stop(rq->migration_thread);
> -               put_task_struct(rq->migration_thread);
> -               rq->migration_thread = NULL;
>                 /* Idle task back to normal (off runqueue, low prio) */
>                 raw_spin_lock_irq(&rq->lock);
>                 update_rq_clock(rq);
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Make sure timers have migrated before killing migration_thread
  2010-05-25 11:31     ` Peter Zijlstra
@ 2010-05-25 12:10       ` Amit K. Arora
  2010-05-25 13:23       ` [PATCH v3] " Amit K. Arora
  1 sibling, 0 replies; 19+ messages in thread
From: Amit K. Arora @ 2010-05-25 12:10 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: tj, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel

On Tue, May 25, 2010 at 01:31:35PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> > 
> > Thus, since above race can never happen, is there any other issue with
> > this patch ?
> 
> It doesn't seem to apply nicely...

Hi Peter,

This patch was built on top of 2.6.34, where it applies cleanly. I
checked the git tree and the cpu_stop patchset from Tejun has changed
this code quite much (commit 969c79215a35b06e5e3efe69b9412f858df7856c).

The change is now required in cpu_stop_cpu_callback() in stop_machine.c.
Will post it soon. Thanks!

--
Regards,
Amit Arora

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3] Make sure timers have migrated before killing migration_thread
  2010-05-25 11:31     ` Peter Zijlstra
  2010-05-25 12:10       ` Amit K. Arora
@ 2010-05-25 13:23       ` Amit K. Arora
  2010-05-25 14:22         ` Peter Zijlstra
                           ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Amit K. Arora @ 2010-05-25 13:23 UTC (permalink / raw
  To: Peter Zijlstra, Ingo Molnar
  Cc: tj, Srivatsa Vaddagiri, Gautham R Shenoy, Darren Hart, Brian King,
	linux-kernel

On Tue, May 25, 2010 at 01:31:35PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote:
> > 
> > Thus, since above race can never happen, is there any other issue with
> > this patch ?
> 
> It doesn't seem to apply nicely...

Here is the new patch. 

Problem : In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems to
be hung at a point where migration_call() tries to kill the migration_thread
of the dying CPU, which just got moved to the current CPU. This migration
thread does not get a chance to run (and die) since rt_throttled is set to 1
on current, and it doesn't get cleared as the hrtimer which is supposed to
reset the rt bandwidth (sched_rt_period_timer) is tied to the CPU which we just
marked dead!

Solution : This patch pushes the killing of migration thread to "CPU_POST_DEAD"
event. By then all the timers (including sched_rt_period_timer) should have got
migrated (along with other callbacks).

Thanks!
Regards,
Amit Arora

Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
---
diff -Nuarp linux-2.6-next-20100525.org/kernel/stop_machine.c linux-2.6-next-20100525/kernel/stop_machine.c
--- linux-2.6-next-20100525.org/kernel/stop_machine.c	2010-05-25 14:27:51.000000000 -0400
+++ linux-2.6-next-20100525/kernel/stop_machine.c	2010-05-25 14:30:09.000000000 -0400
@@ -321,7 +321,7 @@ static int __cpuinit cpu_stop_cpu_callba
 
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_UP_CANCELED:
-	case CPU_DEAD:
+	case CPU_POST_DEAD:
 	{
 		struct cpu_stop_work *work;
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] Make sure timers have migrated before killing migration_thread
  2010-05-25 13:23       ` [PATCH v3] " Amit K. Arora
@ 2010-05-25 14:22         ` Peter Zijlstra
  2010-05-25 16:27         ` Tejun Heo
  2010-05-31  7:18         ` [tip:sched/urgent] sched: Make sure timers have migrated before killing the migration_thread tip-bot for Amit K. Arora
  2 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-25 14:22 UTC (permalink / raw
  To: Amit K. Arora
  Cc: Ingo Molnar, tj, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel

On Tue, 2010-05-25 at 18:53 +0530, Amit K. Arora wrote:

nice :-)

Thanks!

> Signed-off-by: Amit Arora <aarora@in.ibm.com>
> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Tejun Heo <tj@kernel.org>
> ---
> diff -Nuarp linux-2.6-next-20100525.org/kernel/stop_machine.c linux-2.6-next-20100525/kernel/stop_machine.c
> --- linux-2.6-next-20100525.org/kernel/stop_machine.c   2010-05-25 14:27:51.000000000 -0400
> +++ linux-2.6-next-20100525/kernel/stop_machine.c       2010-05-25 14:30:09.000000000 -0400
> @@ -321,7 +321,7 @@ static int __cpuinit cpu_stop_cpu_callba
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>         case CPU_UP_CANCELED:
> -       case CPU_DEAD:
> +       case CPU_POST_DEAD:
>         {
>                 struct cpu_stop_work *work; 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] Make sure timers have migrated before killing migration_thread
  2010-05-25 13:23       ` [PATCH v3] " Amit K. Arora
  2010-05-25 14:22         ` Peter Zijlstra
@ 2010-05-25 16:27         ` Tejun Heo
  2010-05-31  7:18         ` [tip:sched/urgent] sched: Make sure timers have migrated before killing the migration_thread tip-bot for Amit K. Arora
  2 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2010-05-25 16:27 UTC (permalink / raw
  To: Amit K. Arora
  Cc: Peter Zijlstra, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel


Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
  2010-05-20  7:28     ` Peter Zijlstra
  2010-05-23  9:07       ` Mike Galbraith
  2010-05-24  6:43       ` Amit K. Arora
@ 2010-05-25 20:19       ` Thomas Gleixner
  2010-05-26  6:43         ` Peter Zijlstra
  2 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2010-05-25 20:19 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: Amit K. Arora, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel

On Thu, 20 May 2010, Peter Zijlstra wrote:

> On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > Alternate Solution considered : Another option considered was to
> > increase the priority of the hrtimer cpu offline notifier, such that it
> > gets to run before scheduler's migration cpu offline notifier. In this
> > way we are sure that the timers will get migrated before migration_call
> > tries to kill migration_thread. But, this can have some non-obvious
> > implications, suggested Srivatsa.
> 
> 
> > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > the hotplug lock and thus the above becomes a race where we could
> > > possible kill off the migration thread of a newly brought up cpu:
> > > 
> > >  cpu0 - down 2
> > >  cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > >  cpu0 - post_down 2 - frees the migration thread -- oops!
> > 
> > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > The cpuset_lock will synchronize, and thus avoid race between killing of
> > migration_thread in up_prepare and post_dead events. 
> > 
> > Here is the updated patch. If you don't like this one too, do you mind
> > suggesting an alternate approach to tackle the problem ? Thanks !
> 
> Right, so this isn't pretty at all..
> 
> Ingo, the comment near the migration_notifier says that migration_call
> should happen before all else, but can you see anything that would break
> if we let the timer migration happen first?
> 
> Thomas?

That should work, though what is killing the scheduler per rq hrtimers
_before_ we migrate stuff ? We don't want to migrate them, right ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] Make sure timers have migrated before killing migration_thread
  2010-05-25 20:19       ` Thomas Gleixner
@ 2010-05-26  6:43         ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2010-05-26  6:43 UTC (permalink / raw
  To: Thomas Gleixner
  Cc: Amit K. Arora, Ingo Molnar, Srivatsa Vaddagiri, Gautham R Shenoy,
	Darren Hart, Brian King, linux-kernel

On Tue, 2010-05-25 at 22:19 +0200, Thomas Gleixner wrote:
> On Thu, 20 May 2010, Peter Zijlstra wrote:
> 
> > On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote:
> > > Alternate Solution considered : Another option considered was to
> > > increase the priority of the hrtimer cpu offline notifier, such that it
> > > gets to run before scheduler's migration cpu offline notifier. In this
> > > way we are sure that the timers will get migrated before migration_call
> > > tries to kill migration_thread. But, this can have some non-obvious
> > > implications, suggested Srivatsa.
> > 
> > 
> > > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote:
> > > > The other problem is more urgent though, CPU_POST_DEAD runs outside of
> > > > the hotplug lock and thus the above becomes a race where we could
> > > > possible kill off the migration thread of a newly brought up cpu:
> > > > 
> > > >  cpu0 - down 2
> > > >  cpu1 - up 2 (allocs a new migration thread, and leaks the old one)
> > > >  cpu0 - post_down 2 - frees the migration thread -- oops!
> > > 
> > > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ?
> > > The cpuset_lock will synchronize, and thus avoid race between killing of
> > > migration_thread in up_prepare and post_dead events. 
> > > 
> > > Here is the updated patch. If you don't like this one too, do you mind
> > > suggesting an alternate approach to tackle the problem ? Thanks !
> > 
> > Right, so this isn't pretty at all..
> > 
> > Ingo, the comment near the migration_notifier says that migration_call
> > should happen before all else, but can you see anything that would break
> > if we let the timer migration happen first?
> > 
> > Thomas?
> 
> That should work, though what is killing the scheduler per rq hrtimers
> _before_ we migrate stuff ? We don't want to migrate them, right ?

They're not rq timers, they're the 'cgroup' bandwidth timers and those
are free to migrate.

What I think happens is that the timer ends up being on the cpu that
goes down, then we disable IRQs on it and run out of bandwidth and get
stuck.

Anyway, we solved it with a one-liner in a different way.

Eventually I'll rip the whole migration thread thingy out of SCHED_FIFO,
which too should solve the issue I think.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [tip:sched/urgent] sched: Make sure timers have migrated before killing the migration_thread
  2010-05-25 13:23       ` [PATCH v3] " Amit K. Arora
  2010-05-25 14:22         ` Peter Zijlstra
  2010-05-25 16:27         ` Tejun Heo
@ 2010-05-31  7:18         ` tip-bot for Amit K. Arora
  2 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Amit K. Arora @ 2010-05-31  7:18 UTC (permalink / raw
  To: linux-tip-commits
  Cc: linux-kernel, ego, aarora, hpa, mingo, a.p.zijlstra, tj, tglx,
	aarora, mingo

Commit-ID:  54e88fad223c4e1d94289611a90c7fe3ebe5631b
Gitweb:     http://git.kernel.org/tip/54e88fad223c4e1d94289611a90c7fe3ebe5631b
Author:     Amit K. Arora <aarora@linux.vnet.ibm.com>
AuthorDate: Tue, 25 May 2010 18:53:46 +0530
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 31 May 2010 08:37:44 +0200

sched: Make sure timers have migrated before killing the migration_thread

Problem: In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems
to be hung at a point where migration_call() tries to kill the
migration_thread of the dying CPU, which just got moved to the current
CPU. This migration thread does not get a chance to run (and die) since
rt_throttled is set to 1 on current, and it doesn't get cleared as the
hrtimer which is supposed to reset the rt bandwidth
(sched_rt_period_timer) is tied to the CPU which we just marked dead!

Solution: This patch pushes the killing of migration thread to
"CPU_POST_DEAD" event. By then all the timers (including
sched_rt_period_timer) should have got migrated (along with other
callbacks).

Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20100525132346.GA14986@amitarora.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/stop_machine.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index b4e7431..70f8d90 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -321,7 +321,7 @@ static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
 
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_UP_CANCELED:
-	case CPU_DEAD:
+	case CPU_POST_DEAD:
 	{
 		struct cpu_stop_work *work;
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-05-31  7:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-19  9:05 [PATCH] Make sure timers have migrated before killing migration_thread Amit K. Arora
2010-05-19  9:31 ` Peter Zijlstra
2010-05-19 12:13   ` [PATCH v2] " Amit K. Arora
2010-05-20  7:28     ` Peter Zijlstra
2010-05-23  9:07       ` Mike Galbraith
2010-05-23  9:13         ` Peter Zijlstra
2010-05-24  6:43       ` Amit K. Arora
2010-05-25 20:19       ` Thomas Gleixner
2010-05-26  6:43         ` Peter Zijlstra
2010-05-24  9:59   ` [PATCH] " Amit K. Arora
2010-05-24 13:28     ` Peter Zijlstra
2010-05-24 15:16       ` Srivatsa Vaddagiri
2010-05-24 15:55         ` Peter Zijlstra
2010-05-25 11:31     ` Peter Zijlstra
2010-05-25 12:10       ` Amit K. Arora
2010-05-25 13:23       ` [PATCH v3] " Amit K. Arora
2010-05-25 14:22         ` Peter Zijlstra
2010-05-25 16:27         ` Tejun Heo
2010-05-31  7:18         ` [tip:sched/urgent] sched: Make sure timers have migrated before killing the migration_thread tip-bot for Amit K. Arora

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).