LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
@ 2023-05-24  3:53 Zqiang
  2023-05-24 13:53 ` Naresh Kamboju
  2023-05-24 22:02 ` Tejun Heo
  0 siblings, 2 replies; 4+ messages in thread
From: Zqiang @ 2023-05-24  3:53 UTC (permalink / raw
  To: tj, jiangshanlai, naresh.kamboju, qiang.zhang1211; +Cc: linux-kernel

Currently, the nr_running can be modified from timer tick, that means
the timer tick can run in not-irq-protected critical section to modify
nr_runnig, consider the following scenario:

CPU0
kworker/0:2 (events)
   worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
   ->pool->nr_running++;  (1)

   process_one_work()
   ->worker->current_func(work);
     ->schedule()
       ->wq_worker_sleeping()
         ->worker->sleeping = 1;
         ->pool->nr_running--;  (0)
           ....
       ->wq_worker_running()
               ....
               CPU0 by interrupt:
               wq_worker_tick()
               ->worker_set_flags(worker, WORKER_CPU_INTENSIVE);
                 ->pool->nr_running--;  (-1)
	         ->worker->flags |= WORKER_CPU_INTENSIVE;
               ....
         ->if (!(worker->flags & WORKER_NOT_RUNNING))
           ->pool->nr_running++;    (will not execute)
         ->worker->sleeping = 0;
         ....
    ->worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
      ->pool->nr_running++;  (0)
    ....
    worker_set_flags(worker, WORKER_PREP);
    ->pool->nr_running--;   (-1)
    ....
    worker_enter_idle()
    ->WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);

if the nr_workers is equal to nr_idle, due to the nr_running is not zero,
will trigger WARN_ON_ONCE().

[    2.460602] WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0
[    2.462163] Modules linked in:
[    2.463401] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted 6.4.0-rc2-next-20230519 #1
[    2.463771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[    2.465127] Workqueue:  0x0 (events)
[    2.465678] RIP: 0010:worker_enter_idle+0xb2/0xc0
...
[    2.472614] Call Trace:
[    2.473152]  <TASK>
[    2.474182]  worker_thread+0x71/0x430
[    2.474992]  ? _raw_spin_unlock_irqrestore+0x28/0x50
[    2.475263]  kthread+0x103/0x120
[    2.475493]  ? __pfx_worker_thread+0x10/0x10
[    2.476355]  ? __pfx_kthread+0x10/0x10
[    2.476635]  ret_from_fork+0x2c/0x50
[    2.477051]  </TASK>

This commit therefore add the check of worker->sleeping in wq_worker_tick(),
if the worker->sleeping is not zero, directly return.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Closes: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/log
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
---
 kernel/workqueue.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9c5c1cfa478f..a028b851333e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1051,7 +1051,7 @@ void wq_worker_running(struct task_struct *task)
 {
 	struct worker *worker = kthread_data(task);
 
-	if (!worker->sleeping)
+	if (!READ_ONCE(worker->sleeping))
 		return;
 
 	/*
@@ -1071,7 +1071,7 @@ void wq_worker_running(struct task_struct *task)
 	 */
 	worker->current_at = worker->task->se.sum_exec_runtime;
 
-	worker->sleeping = 0;
+	WRITE_ONCE(worker->sleeping, 0);
 }
 
 /**
@@ -1097,10 +1097,10 @@ void wq_worker_sleeping(struct task_struct *task)
 	pool = worker->pool;
 
 	/* Return if preempted before wq_worker_running() was reached */
-	if (worker->sleeping)
+	if (READ_ONCE(worker->sleeping))
 		return;
 
-	worker->sleeping = 1;
+	WRITE_ONCE(worker->sleeping, 1);
 	raw_spin_lock_irq(&pool->lock);
 
 	/*
@@ -1143,8 +1143,13 @@ void wq_worker_tick(struct task_struct *task)
 	 * If the current worker is concurrency managed and hogged the CPU for
 	 * longer than wq_cpu_intensive_thresh_us, it's automatically marked
 	 * CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
+	 *
+	 * The worker->sleeping is true means that the worker doing voluntary
+	 * switch and will not hogged the CPU, or the worker is running again
+	 * but the worker->sleeping has not been reset, in the process of executing
+	 * wq_worker_running().
 	 */
-	if ((worker->flags & WORKER_NOT_RUNNING) ||
+	if ((worker->flags & WORKER_NOT_RUNNING) || READ_ONCE(worker->sleeping) ||
 	    worker->task->se.sum_exec_runtime - worker->current_at <
 	    wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
 		return;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
  2023-05-24  3:53 [PATCH v3] workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle() Zqiang
@ 2023-05-24 13:53 ` Naresh Kamboju
  2023-05-24 22:03   ` Tejun Heo
  2023-05-24 22:02 ` Tejun Heo
  1 sibling, 1 reply; 4+ messages in thread
From: Naresh Kamboju @ 2023-05-24 13:53 UTC (permalink / raw
  To: Zqiang; +Cc: tj, jiangshanlai, linux-kernel, Anders Roxell, lkft-triage

+ Anders, LKFT

On Wed, 24 May 2023 at 09:23, Zqiang <qiang.zhang1211@gmail.com> wrote:
>
> Currently, the nr_running can be modified from timer tick, that means
> the timer tick can run in not-irq-protected critical section to modify
> nr_runnig, consider the following scenario:
>
> CPU0
> kworker/0:2 (events)
>    worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
>    ->pool->nr_running++;  (1)
>
>    process_one_work()
>    ->worker->current_func(work);
>      ->schedule()
>        ->wq_worker_sleeping()
>          ->worker->sleeping = 1;
>          ->pool->nr_running--;  (0)
>            ....
>        ->wq_worker_running()
>                ....
>                CPU0 by interrupt:
>                wq_worker_tick()
>                ->worker_set_flags(worker, WORKER_CPU_INTENSIVE);
>                  ->pool->nr_running--;  (-1)
>                  ->worker->flags |= WORKER_CPU_INTENSIVE;
>                ....
>          ->if (!(worker->flags & WORKER_NOT_RUNNING))
>            ->pool->nr_running++;    (will not execute)
>          ->worker->sleeping = 0;
>          ....
>     ->worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
>       ->pool->nr_running++;  (0)
>     ....
>     worker_set_flags(worker, WORKER_PREP);
>     ->pool->nr_running--;   (-1)
>     ....
>     worker_enter_idle()
>     ->WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);
>
> if the nr_workers is equal to nr_idle, due to the nr_running is not zero,
> will trigger WARN_ON_ONCE().
>
> [    2.460602] WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0
> [    2.462163] Modules linked in:
> [    2.463401] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted 6.4.0-rc2-next-20230519 #1
> [    2.463771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> [    2.465127] Workqueue:  0x0 (events)
> [    2.465678] RIP: 0010:worker_enter_idle+0xb2/0xc0
> ...
> [    2.472614] Call Trace:
> [    2.473152]  <TASK>
> [    2.474182]  worker_thread+0x71/0x430
> [    2.474992]  ? _raw_spin_unlock_irqrestore+0x28/0x50
> [    2.475263]  kthread+0x103/0x120
> [    2.475493]  ? __pfx_worker_thread+0x10/0x10
> [    2.476355]  ? __pfx_kthread+0x10/0x10
> [    2.476635]  ret_from_fork+0x2c/0x50
> [    2.477051]  </TASK>
>
> This commit therefore add the check of worker->sleeping in wq_worker_tick(),
> if the worker->sleeping is not zero, directly return.
>
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Closes: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/log
> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>

Since the probability of occurrence of this problem is only 3%,
Anders took this up and applied this on top of Linux next and
tested for 500 boot tests and all looked good.
Thanks, Anders.

- Naresh

> ---
>  kernel/workqueue.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9c5c1cfa478f..a028b851333e 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1051,7 +1051,7 @@ void wq_worker_running(struct task_struct *task)
>  {
>         struct worker *worker = kthread_data(task);
>
> -       if (!worker->sleeping)
> +       if (!READ_ONCE(worker->sleeping))
>                 return;
>
>         /*
> @@ -1071,7 +1071,7 @@ void wq_worker_running(struct task_struct *task)
>          */
>         worker->current_at = worker->task->se.sum_exec_runtime;
>
> -       worker->sleeping = 0;
> +       WRITE_ONCE(worker->sleeping, 0);
>  }
>
>  /**
> @@ -1097,10 +1097,10 @@ void wq_worker_sleeping(struct task_struct *task)
>         pool = worker->pool;
>
>         /* Return if preempted before wq_worker_running() was reached */
> -       if (worker->sleeping)
> +       if (READ_ONCE(worker->sleeping))
>                 return;
>
> -       worker->sleeping = 1;
> +       WRITE_ONCE(worker->sleeping, 1);
>         raw_spin_lock_irq(&pool->lock);
>
>         /*
> @@ -1143,8 +1143,13 @@ void wq_worker_tick(struct task_struct *task)
>          * If the current worker is concurrency managed and hogged the CPU for
>          * longer than wq_cpu_intensive_thresh_us, it's automatically marked
>          * CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
> +        *
> +        * The worker->sleeping is true means that the worker doing voluntary
> +        * switch and will not hogged the CPU, or the worker is running again
> +        * but the worker->sleeping has not been reset, in the process of executing
> +        * wq_worker_running().
>          */
> -       if ((worker->flags & WORKER_NOT_RUNNING) ||
> +       if ((worker->flags & WORKER_NOT_RUNNING) || READ_ONCE(worker->sleeping) ||
>             worker->task->se.sum_exec_runtime - worker->current_at <
>             wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
>                 return;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
  2023-05-24  3:53 [PATCH v3] workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle() Zqiang
  2023-05-24 13:53 ` Naresh Kamboju
@ 2023-05-24 22:02 ` Tejun Heo
  1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2023-05-24 22:02 UTC (permalink / raw
  To: Zqiang; +Cc: jiangshanlai, naresh.kamboju, linux-kernel

Hello,

I updated the comment and description and applied the patch to wq/for-6.5.

Thanks.

From c8f6219be2e58d7f676935ae90b64abef5d0966a Mon Sep 17 00:00:00 2001
From: Zqiang <qiang.zhang1211@gmail.com>
Date: Wed, 24 May 2023 11:53:39 +0800
Subject: [PATCH] workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()

Currently, pool->nr_running can be modified from timer tick, that means the
timer tick can run nested inside a not-irq-protected section that's in the
process of modifying nr_running. Consider the following scenario:

CPU0
kworker/0:2 (events)
   worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
   ->pool->nr_running++;  (1)

   process_one_work()
   ->worker->current_func(work);
     ->schedule()
       ->wq_worker_sleeping()
         ->worker->sleeping = 1;
         ->pool->nr_running--;  (0)
           ....
       ->wq_worker_running()
               ....
               CPU0 by interrupt:
               wq_worker_tick()
               ->worker_set_flags(worker, WORKER_CPU_INTENSIVE);
                 ->pool->nr_running--;  (-1)
	         ->worker->flags |= WORKER_CPU_INTENSIVE;
               ....
         ->if (!(worker->flags & WORKER_NOT_RUNNING))
           ->pool->nr_running++;    (will not execute)
         ->worker->sleeping = 0;
         ....
    ->worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
      ->pool->nr_running++;  (0)
    ....
    worker_set_flags(worker, WORKER_PREP);
    ->pool->nr_running--;   (-1)
    ....
    worker_enter_idle()
    ->WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);

if the nr_workers is equal to nr_idle, due to the nr_running is not zero,
will trigger WARN_ON_ONCE().

[    2.460602] WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0
[    2.462163] Modules linked in:
[    2.463401] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted 6.4.0-rc2-next-20230519 #1
[    2.463771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[    2.465127] Workqueue:  0x0 (events)
[    2.465678] RIP: 0010:worker_enter_idle+0xb2/0xc0
...
[    2.472614] Call Trace:
[    2.473152]  <TASK>
[    2.474182]  worker_thread+0x71/0x430
[    2.474992]  ? _raw_spin_unlock_irqrestore+0x28/0x50
[    2.475263]  kthread+0x103/0x120
[    2.475493]  ? __pfx_worker_thread+0x10/0x10
[    2.476355]  ? __pfx_kthread+0x10/0x10
[    2.476635]  ret_from_fork+0x2c/0x50
[    2.477051]  </TASK>

This commit therefore add the check of worker->sleeping in wq_worker_tick(),
if the worker->sleeping is not zero, directly return.

tj: Updated comment and description.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Closes: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/log
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ee16ddb0647c..3ad6806c7161 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1051,7 +1051,7 @@ void wq_worker_running(struct task_struct *task)
 {
 	struct worker *worker = kthread_data(task);
 
-	if (!worker->sleeping)
+	if (!READ_ONCE(worker->sleeping))
 		return;
 
 	/*
@@ -1071,7 +1071,7 @@ void wq_worker_running(struct task_struct *task)
 	 */
 	worker->current_at = worker->task->se.sum_exec_runtime;
 
-	worker->sleeping = 0;
+	WRITE_ONCE(worker->sleeping, 0);
 }
 
 /**
@@ -1097,10 +1097,10 @@ void wq_worker_sleeping(struct task_struct *task)
 	pool = worker->pool;
 
 	/* Return if preempted before wq_worker_running() was reached */
-	if (worker->sleeping)
+	if (READ_ONCE(worker->sleeping))
 		return;
 
-	worker->sleeping = 1;
+	WRITE_ONCE(worker->sleeping, 1);
 	raw_spin_lock_irq(&pool->lock);
 
 	/*
@@ -1143,8 +1143,15 @@ void wq_worker_tick(struct task_struct *task)
 	 * If the current worker is concurrency managed and hogged the CPU for
 	 * longer than wq_cpu_intensive_thresh_us, it's automatically marked
 	 * CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
+	 *
+	 * Set @worker->sleeping means that @worker is in the process of
+	 * switching out voluntarily and won't be contributing to
+	 * @pool->nr_running until it wakes up. As wq_worker_sleeping() also
+	 * decrements ->nr_running, setting CPU_INTENSIVE here can lead to
+	 * double decrements. The task is releasing the CPU anyway. Let's skip.
+	 * We probably want to make this prettier in the future.
 	 */
-	if ((worker->flags & WORKER_NOT_RUNNING) ||
+	if ((worker->flags & WORKER_NOT_RUNNING) || READ_ONCE(worker->sleeping) ||
 	    worker->task->se.sum_exec_runtime - worker->current_at <
 	    wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
 		return;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
  2023-05-24 13:53 ` Naresh Kamboju
@ 2023-05-24 22:03   ` Tejun Heo
  0 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2023-05-24 22:03 UTC (permalink / raw
  To: Naresh Kamboju
  Cc: Zqiang, jiangshanlai, linux-kernel, Anders Roxell, lkft-triage

On Wed, May 24, 2023 at 07:23:16PM +0530, Naresh Kamboju wrote:
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> Tested-by: Anders Roxell <anders.roxell@linaro.org>
> 
> Since the probability of occurrence of this problem is only 3%,
> Anders took this up and applied this on top of Linux next and
> tested for 500 boot tests and all looked good.
> Thanks, Anders.

This was a tricky bug and I really appreciate the bug report and testing.
Thank you so much.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-24 22:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-24  3:53 [PATCH v3] workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle() Zqiang
2023-05-24 13:53 ` Naresh Kamboju
2023-05-24 22:03   ` Tejun Heo
2023-05-24 22:02 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).