LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
@ 2024-04-19 10:34 Mark Rutland
  2024-04-19 21:53 ` Kees Cook
  2024-04-24  2:10 ` Paul E. McKenney
  0 siblings, 2 replies; 5+ messages in thread
From: Mark Rutland @ 2024-04-19 10:34 UTC (permalink / raw
  To: linux-kernel; +Cc: keescook, mark.rutland, paulmck

The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
smp_call_function*() calls (e.g. when the target CPU gets stuck within
the callback function). Testing this option requires triggering such
hangs.

This patch adds an lkdtm test with a hung smp_call_function_single()
callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
of the hung target CPU).

On arm64 using pseudo-NMI, this looks like:

| # mount -t debugfs none /sys/kernel/debug/
| # echo CSDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry CSDLOCKUP
| smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5000001136 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
| smp:     csd: CSD lock (#1) handling this request.
| Sending NMI from CPU 0 to CPUs 1:
| NMI backtrace for cpu 1
| CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
| Hardware name: linux,dummy-virt (DT)
| pstate: 60401005 (nZCv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--)
| pc : __lkdtm_CSDLOCKUP+0x0/0x8
| lr : __flush_smp_call_function_queue+0x1b0/0x290
| sp : ffff80008000bf30
| pmr_save: 00000060
| x29: ffff80008000bf30 x28: fff00000c02dc500 x27: 0000000000000000
| x26: 0000000000000000 x25: fff00000c02dc500 x24: ffffa41b939aa140
| x23: ffffa41b939aa140 x22: 0000000000000000 x21: ffff80008066bc40
| x20: 0000000000000000 x19: 0000000000000000 x18: fff05be56bd37000
| x17: fff05be56bd07000 x16: ffff800080008000 x15: 00005b132023e6fd
| x14: 00005aeabb53d8c3 x13: 000000000000032e x12: 0000000000000001
| x11: 0000000000000040 x10: fff00000c003d0a8 x9 : fff00000c003d0a0
| x8 : fff00000c0400270 x7 : 0000000000000000 x6 : ffffa41b9251b810
| x5 : 0000000000000000 x4 : fff05be56bd07000 x3 : ffff80008000bf30
| x2 : fff05be56bd07000 x1 : ffffa41b939aa140 x0 : 0000000000000000
| Call trace:
|  __lkdtm_CSDLOCKUP+0x0/0x8
|  generic_smp_call_function_single_interrupt+0x14/0x20
|  ipi_handler+0xb8/0x178
|  handle_percpu_devid_irq+0x84/0x130
|  generic_handle_domain_irq+0x2c/0x44
|  gic_handle_irq+0x118/0x240
|  call_on_irq_stack+0x24/0x4c
|  do_interrupt_handler+0x80/0x84
|  el1_interrupt+0x44/0xc0
|  el1h_64_irq_handler+0x18/0x24
|  el1h_64_irq+0x78/0x7c
|  default_idle_call+0x40/0x60
|  do_idle+0x23c/0x2d0
|  cpu_startup_entry+0x38/0x3c
|  secondary_start_kernel+0x148/0x180
|  __secondary_switched+0xb8/0xbc
| CPU: 0 PID: 143 Comm: sh Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x90/0xe8
|  show_stack+0x18/0x24
|  dump_stack_lvl+0xac/0xe8
|  dump_stack+0x18/0x24
|  csd_lock_wait_toolong+0x268/0x338
|  smp_call_function_single+0x1dc/0x2f0
|  lkdtm_CSDLOCKUP+0xcc/0xfc
|  lkdtm_do_action+0x1c/0x38
|  direct_entry+0xbc/0x14c
|  full_proxy_write+0x60/0xb4
|  vfs_write+0xd0/0x35c
|  ksys_write+0x70/0x104
|  __arm64_sys_write+0x1c/0x28
|  invoke_syscall+0x48/0x114
|  el0_svc_common.constprop.0+0x40/0xe0
|  do_el0_svc+0x1c/0x28
|  el0_svc+0x38/0x108
|  el0t_64_sync_handler+0x120/0x12c
|  el0t_64_sync+0x1a4/0x1a8
| smp: csd: Continued non-responsive CSD lock (#1) on CPU#0, waiting 10000001888 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
| smp:     csd: CSD lock (#1) handling this request.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
---
 drivers/misc/lkdtm/bugs.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

I wrote this because I needed to guide someone through debugging a hung
smp_call_function() call, and I needed examples with/without an NMI
backtrace. It seems like it'd be useful for testing the CSD lockup
detector and NMI backtrace code in future.

I'm not sure about the CSDLOCKUP name, but everything else I tried
didn't seem great either:

* IPILOCKUP sounds like it's testing IPIs generally
* SMPCALLLOCKUP and similar look weirdly long
* SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP

... and I'm happy to defer to Kees for the naming. ;)

Mark.

diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
index 5178c02b21eba..47cd1be09ac1f 100644
--- a/drivers/misc/lkdtm/bugs.c
+++ b/drivers/misc/lkdtm/bugs.c
@@ -286,6 +286,35 @@ static void lkdtm_HARDLOCKUP(void)
 		cpu_relax();
 }
 
+static void __lkdtm_CSDLOCKUP(void *unused)
+{
+	for (;;)
+		cpu_relax();
+}
+
+static void lkdtm_CSDLOCKUP(void)
+{
+	unsigned int cpu, target;
+
+	cpus_read_lock();
+
+	cpu = get_cpu();
+	target = cpumask_any_but(cpu_online_mask, cpu);
+
+	if (target >= nr_cpu_ids) {
+		pr_err("FAIL: no other online CPUs\n");
+		goto out_put_cpus;
+	}
+
+	smp_call_function_single(target, __lkdtm_CSDLOCKUP, NULL, 1);
+
+	pr_err("FAIL: did not hang\n");
+
+out_put_cpus:
+	put_cpu();
+	cpus_read_unlock();
+}
+
 static void lkdtm_SPINLOCKUP(void)
 {
 	/* Must be called twice to trigger. */
@@ -680,6 +709,7 @@ static struct crashtype crashtypes[] = {
 	CRASHTYPE(UNALIGNED_LOAD_STORE_WRITE),
 	CRASHTYPE(SOFTLOCKUP),
 	CRASHTYPE(HARDLOCKUP),
+	CRASHTYPE(CSDLOCKUP),
 	CRASHTYPE(SPINLOCKUP),
 	CRASHTYPE(HUNG_TASK),
 	CRASHTYPE(OVERFLOW_SIGNED),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
  2024-04-19 10:34 [PATCH] lkdtm/bugs: add test for hung smp_call_function_single() Mark Rutland
@ 2024-04-19 21:53 ` Kees Cook
  2024-04-23  9:47   ` Mark Rutland
  2024-04-24  2:10 ` Paul E. McKenney
  1 sibling, 1 reply; 5+ messages in thread
From: Kees Cook @ 2024-04-19 21:53 UTC (permalink / raw
  To: Mark Rutland; +Cc: linux-kernel, paulmck

On Fri, Apr 19, 2024 at 11:34:52AM +0100, Mark Rutland wrote:
> The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
> smp_call_function*() calls (e.g. when the target CPU gets stuck within
> the callback function). Testing this option requires triggering such
> hangs.
> 
> This patch adds an lkdtm test with a hung smp_call_function_single()
> callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
> backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
> of the hung target CPU).
> 
> On arm64 using pseudo-NMI, this looks like:
> 
> | # mount -t debugfs none /sys/kernel/debug/
> | # echo CSDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> | lkdtm: Performing direct entry CSDLOCKUP
> | smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5000001136 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
> | smp:     csd: CSD lock (#1) handling this request.
> | Sending NMI from CPU 0 to CPUs 1:
> | NMI backtrace for cpu 1
> | CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 60401005 (nZCv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> | pc : __lkdtm_CSDLOCKUP+0x0/0x8
> | lr : __flush_smp_call_function_queue+0x1b0/0x290
> | sp : ffff80008000bf30
> | pmr_save: 00000060
> | x29: ffff80008000bf30 x28: fff00000c02dc500 x27: 0000000000000000
> | x26: 0000000000000000 x25: fff00000c02dc500 x24: ffffa41b939aa140
> | x23: ffffa41b939aa140 x22: 0000000000000000 x21: ffff80008066bc40
> | x20: 0000000000000000 x19: 0000000000000000 x18: fff05be56bd37000
> | x17: fff05be56bd07000 x16: ffff800080008000 x15: 00005b132023e6fd
> | x14: 00005aeabb53d8c3 x13: 000000000000032e x12: 0000000000000001
> | x11: 0000000000000040 x10: fff00000c003d0a8 x9 : fff00000c003d0a0
> | x8 : fff00000c0400270 x7 : 0000000000000000 x6 : ffffa41b9251b810
> | x5 : 0000000000000000 x4 : fff05be56bd07000 x3 : ffff80008000bf30
> | x2 : fff05be56bd07000 x1 : ffffa41b939aa140 x0 : 0000000000000000
> | Call trace:
> |  __lkdtm_CSDLOCKUP+0x0/0x8
> |  generic_smp_call_function_single_interrupt+0x14/0x20
> |  ipi_handler+0xb8/0x178
> |  handle_percpu_devid_irq+0x84/0x130
> |  generic_handle_domain_irq+0x2c/0x44
> |  gic_handle_irq+0x118/0x240
> |  call_on_irq_stack+0x24/0x4c
> |  do_interrupt_handler+0x80/0x84
> |  el1_interrupt+0x44/0xc0
> |  el1h_64_irq_handler+0x18/0x24
> |  el1h_64_irq+0x78/0x7c
> |  default_idle_call+0x40/0x60
> |  do_idle+0x23c/0x2d0
> |  cpu_startup_entry+0x38/0x3c
> |  secondary_start_kernel+0x148/0x180
> |  __secondary_switched+0xb8/0xbc
> | CPU: 0 PID: 143 Comm: sh Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
> | Hardware name: linux,dummy-virt (DT)
> | Call trace:
> |  dump_backtrace+0x90/0xe8
> |  show_stack+0x18/0x24
> |  dump_stack_lvl+0xac/0xe8
> |  dump_stack+0x18/0x24
> |  csd_lock_wait_toolong+0x268/0x338
> |  smp_call_function_single+0x1dc/0x2f0
> |  lkdtm_CSDLOCKUP+0xcc/0xfc
> |  lkdtm_do_action+0x1c/0x38
> |  direct_entry+0xbc/0x14c
> |  full_proxy_write+0x60/0xb4
> |  vfs_write+0xd0/0x35c
> |  ksys_write+0x70/0x104
> |  __arm64_sys_write+0x1c/0x28
> |  invoke_syscall+0x48/0x114
> |  el0_svc_common.constprop.0+0x40/0xe0
> |  do_el0_svc+0x1c/0x28
> |  el0_svc+0x38/0x108
> |  el0t_64_sync_handler+0x120/0x12c
> |  el0t_64_sync+0x1a4/0x1a8
> | smp: csd: Continued non-responsive CSD lock (#1) on CPU#0, waiting 10000001888 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
> | smp:     csd: CSD lock (#1) handling this request.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> ---
>  drivers/misc/lkdtm/bugs.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> I wrote this because I needed to guide someone through debugging a hung
> smp_call_function() call, and I needed examples with/without an NMI
> backtrace. It seems like it'd be useful for testing the CSD lockup
> detector and NMI backtrace code in future.

Like the other lockup detector, I suspect we should skip it by default
in the selftests? Something like this:

diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
index 368973f05250..32baddc2c85d 100644
--- a/tools/testing/selftests/lkdtm/tests.txt
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -31,6 +31,7 @@ SLAB_FREE_CROSS
 SLAB_FREE_PAGE
 #SOFTLOCKUP Hangs the system
 #HARDLOCKUP Hangs the system
+#CSDLOCKUP Hangs the system
 #SPINLOCKUP Hangs the system
 #HUNG_TASK Hangs the system
 EXEC_DATA

> 
> I'm not sure about the CSDLOCKUP name, but everything else I tried
> didn't seem great either:
> 
> * IPILOCKUP sounds like it's testing IPIs generally
> * SMPCALLLOCKUP and similar look weirdly long
> * SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP
> 
> ... and I'm happy to defer to Kees for the naming. ;)

It looks like it's only useful with CSD lockup detector? If that's true,
sure, this name is fine.

> [...]
> diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
> index 5178c02b21eba..47cd1be09ac1f 100644
> --- a/drivers/misc/lkdtm/bugs.c
> +++ b/drivers/misc/lkdtm/bugs.c
> @@ -286,6 +286,35 @@ static void lkdtm_HARDLOCKUP(void)
>  		cpu_relax();
>  }
>  
> +static void __lkdtm_CSDLOCKUP(void *unused)
> +{
> +	for (;;)
> +		cpu_relax();
> +}
> +
> +static void lkdtm_CSDLOCKUP(void)
> +{
> +	unsigned int cpu, target;
> +
> +	cpus_read_lock();
> +
> +	cpu = get_cpu();
> +	target = cpumask_any_but(cpu_online_mask, cpu);
> +
> +	if (target >= nr_cpu_ids) {
> +		pr_err("FAIL: no other online CPUs\n");
> +		goto out_put_cpus;
> +	}
> +
> +	smp_call_function_single(target, __lkdtm_CSDLOCKUP, NULL, 1);
> +
> +	pr_err("FAIL: did not hang\n");
> +
> +out_put_cpus:
> +	put_cpu();
> +	cpus_read_unlock();
> +}
> +
>  static void lkdtm_SPINLOCKUP(void)
>  {
>  	/* Must be called twice to trigger. */
> @@ -680,6 +709,7 @@ static struct crashtype crashtypes[] = {
>  	CRASHTYPE(UNALIGNED_LOAD_STORE_WRITE),
>  	CRASHTYPE(SOFTLOCKUP),
>  	CRASHTYPE(HARDLOCKUP),
> +	CRASHTYPE(CSDLOCKUP),
>  	CRASHTYPE(SPINLOCKUP),
>  	CRASHTYPE(HUNG_TASK),
>  	CRASHTYPE(OVERFLOW_SIGNED),
> -- 
> 2.30.2

Seems good to me.

-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
  2024-04-19 21:53 ` Kees Cook
@ 2024-04-23  9:47   ` Mark Rutland
  2024-04-23 17:21     ` Kees Cook
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Rutland @ 2024-04-23  9:47 UTC (permalink / raw
  To: Kees Cook; +Cc: linux-kernel, paulmck

On Fri, Apr 19, 2024 at 02:53:59PM -0700, Kees Cook wrote:
> On Fri, Apr 19, 2024 at 11:34:52AM +0100, Mark Rutland wrote:
> > The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
> > smp_call_function*() calls (e.g. when the target CPU gets stuck within
> > the callback function). Testing this option requires triggering such
> > hangs.
> > 
> > This patch adds an lkdtm test with a hung smp_call_function_single()
> > callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
> > backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
> > of the hung target CPU).

[...]

> > I wrote this because I needed to guide someone through debugging a hung
> > smp_call_function() call, and I needed examples with/without an NMI
> > backtrace. It seems like it'd be useful for testing the CSD lockup
> > detector and NMI backtrace code in future.
> 
> Like the other lockup detector, I suspect we should skip it by default
> in the selftests? Something like this:
> 
> diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
> index 368973f05250..32baddc2c85d 100644
> --- a/tools/testing/selftests/lkdtm/tests.txt
> +++ b/tools/testing/selftests/lkdtm/tests.txt
> @@ -31,6 +31,7 @@ SLAB_FREE_CROSS
>  SLAB_FREE_PAGE
>  #SOFTLOCKUP Hangs the system
>  #HARDLOCKUP Hangs the system
> +#CSDLOCKUP Hangs the system
>  #SPINLOCKUP Hangs the system
>  #HUNG_TASK Hangs the system
>  EXEC_DATA

Ah, I wasn't ware of that file, yes.

> > I'm not sure about the CSDLOCKUP name, but everything else I tried
> > didn't seem great either:
> > 
> > * IPILOCKUP sounds like it's testing IPIs generally
> > * SMPCALLLOCKUP and similar look weirdly long
> > * SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP
> > 
> > ... and I'm happy to defer to Kees for the naming. ;)
> 
> It looks like it's only useful with CSD lockup detector? If that's true,
> sure, this name is fine.

I think it's also useful for testing other things (e.g. RCU stall detection),
so how about we go with SMP_CALL_LOCKUP, as that says what the test does rather
than what specifically it can be used to test?

Mark.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
  2024-04-23  9:47   ` Mark Rutland
@ 2024-04-23 17:21     ` Kees Cook
  0 siblings, 0 replies; 5+ messages in thread
From: Kees Cook @ 2024-04-23 17:21 UTC (permalink / raw
  To: Mark Rutland; +Cc: linux-kernel, paulmck

On Tue, Apr 23, 2024 at 10:47:29AM +0100, Mark Rutland wrote:
> On Fri, Apr 19, 2024 at 02:53:59PM -0700, Kees Cook wrote:
> > On Fri, Apr 19, 2024 at 11:34:52AM +0100, Mark Rutland wrote:
> > > The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
> > > smp_call_function*() calls (e.g. when the target CPU gets stuck within
> > > the callback function). Testing this option requires triggering such
> > > hangs.
> > > 
> > > This patch adds an lkdtm test with a hung smp_call_function_single()
> > > callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
> > > backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
> > > of the hung target CPU).
> 
> [...]
> 
> > > I wrote this because I needed to guide someone through debugging a hung
> > > smp_call_function() call, and I needed examples with/without an NMI
> > > backtrace. It seems like it'd be useful for testing the CSD lockup
> > > detector and NMI backtrace code in future.
> > 
> > Like the other lockup detector, I suspect we should skip it by default
> > in the selftests? Something like this:
> > 
> > diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
> > index 368973f05250..32baddc2c85d 100644
> > --- a/tools/testing/selftests/lkdtm/tests.txt
> > +++ b/tools/testing/selftests/lkdtm/tests.txt
> > @@ -31,6 +31,7 @@ SLAB_FREE_CROSS
> >  SLAB_FREE_PAGE
> >  #SOFTLOCKUP Hangs the system
> >  #HARDLOCKUP Hangs the system
> > +#CSDLOCKUP Hangs the system
> >  #SPINLOCKUP Hangs the system
> >  #HUNG_TASK Hangs the system
> >  EXEC_DATA
> 
> Ah, I wasn't ware of that file, yes.
> 
> > > I'm not sure about the CSDLOCKUP name, but everything else I tried
> > > didn't seem great either:
> > > 
> > > * IPILOCKUP sounds like it's testing IPIs generally
> > > * SMPCALLLOCKUP and similar look weirdly long
> > > * SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP
> > > 
> > > ... and I'm happy to defer to Kees for the naming. ;)
> > 
> > It looks like it's only useful with CSD lockup detector? If that's true,
> > sure, this name is fine.
> 
> I think it's also useful for testing other things (e.g. RCU stall detection),
> so how about we go with SMP_CALL_LOCKUP, as that says what the test does rather
> than what specifically it can be used to test?

Yeah, that works for me. Thanks!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] lkdtm/bugs: add test for hung smp_call_function_single()
  2024-04-19 10:34 [PATCH] lkdtm/bugs: add test for hung smp_call_function_single() Mark Rutland
  2024-04-19 21:53 ` Kees Cook
@ 2024-04-24  2:10 ` Paul E. McKenney
  1 sibling, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2024-04-24  2:10 UTC (permalink / raw
  To: Mark Rutland; +Cc: linux-kernel, keescook

On Fri, Apr 19, 2024 at 11:34:52AM +0100, Mark Rutland wrote:
> The CONFIG_CSD_LOCK_WAIT_DEBUG option enables debugging of hung
> smp_call_function*() calls (e.g. when the target CPU gets stuck within
> the callback function). Testing this option requires triggering such
> hangs.
> 
> This patch adds an lkdtm test with a hung smp_call_function_single()
> callbac, which can be used to test CONFIG_CSD_LOCK_WAIT_DEBUG and NMI
> backtraces (as CONFIG_CSD_LOCK_WAIT_DEBUG will attempt an NMI backtrace
> of the hung target CPU).
> 
> On arm64 using pseudo-NMI, this looks like:
> 
> | # mount -t debugfs none /sys/kernel/debug/
> | # echo CSDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> | lkdtm: Performing direct entry CSDLOCKUP
> | smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5000001136 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
> | smp:     csd: CSD lock (#1) handling this request.
> | Sending NMI from CPU 0 to CPUs 1:
> | NMI backtrace for cpu 1
> | CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 60401005 (nZCv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> | pc : __lkdtm_CSDLOCKUP+0x0/0x8
> | lr : __flush_smp_call_function_queue+0x1b0/0x290
> | sp : ffff80008000bf30
> | pmr_save: 00000060
> | x29: ffff80008000bf30 x28: fff00000c02dc500 x27: 0000000000000000
> | x26: 0000000000000000 x25: fff00000c02dc500 x24: ffffa41b939aa140
> | x23: ffffa41b939aa140 x22: 0000000000000000 x21: ffff80008066bc40
> | x20: 0000000000000000 x19: 0000000000000000 x18: fff05be56bd37000
> | x17: fff05be56bd07000 x16: ffff800080008000 x15: 00005b132023e6fd
> | x14: 00005aeabb53d8c3 x13: 000000000000032e x12: 0000000000000001
> | x11: 0000000000000040 x10: fff00000c003d0a8 x9 : fff00000c003d0a0
> | x8 : fff00000c0400270 x7 : 0000000000000000 x6 : ffffa41b9251b810
> | x5 : 0000000000000000 x4 : fff05be56bd07000 x3 : ffff80008000bf30
> | x2 : fff05be56bd07000 x1 : ffffa41b939aa140 x0 : 0000000000000000
> | Call trace:
> |  __lkdtm_CSDLOCKUP+0x0/0x8
> |  generic_smp_call_function_single_interrupt+0x14/0x20
> |  ipi_handler+0xb8/0x178
> |  handle_percpu_devid_irq+0x84/0x130
> |  generic_handle_domain_irq+0x2c/0x44
> |  gic_handle_irq+0x118/0x240
> |  call_on_irq_stack+0x24/0x4c
> |  do_interrupt_handler+0x80/0x84
> |  el1_interrupt+0x44/0xc0
> |  el1h_64_irq_handler+0x18/0x24
> |  el1h_64_irq+0x78/0x7c
> |  default_idle_call+0x40/0x60
> |  do_idle+0x23c/0x2d0
> |  cpu_startup_entry+0x38/0x3c
> |  secondary_start_kernel+0x148/0x180
> |  __secondary_switched+0xb8/0xbc
> | CPU: 0 PID: 143 Comm: sh Not tainted 6.9.0-rc4-00001-gda84b9dede43 #7
> | Hardware name: linux,dummy-virt (DT)
> | Call trace:
> |  dump_backtrace+0x90/0xe8
> |  show_stack+0x18/0x24
> |  dump_stack_lvl+0xac/0xe8
> |  dump_stack+0x18/0x24
> |  csd_lock_wait_toolong+0x268/0x338
> |  smp_call_function_single+0x1dc/0x2f0
> |  lkdtm_CSDLOCKUP+0xcc/0xfc
> |  lkdtm_do_action+0x1c/0x38
> |  direct_entry+0xbc/0x14c
> |  full_proxy_write+0x60/0xb4
> |  vfs_write+0xd0/0x35c
> |  ksys_write+0x70/0x104
> |  __arm64_sys_write+0x1c/0x28
> |  invoke_syscall+0x48/0x114
> |  el0_svc_common.constprop.0+0x40/0xe0
> |  do_el0_svc+0x1c/0x28
> |  el0_svc+0x38/0x108
> |  el0t_64_sync_handler+0x120/0x12c
> |  el0t_64_sync+0x1a4/0x1a8
> | smp: csd: Continued non-responsive CSD lock (#1) on CPU#0, waiting 10000001888 ns for CPU#01 __lkdtm_CSDLOCKUP+0x0/0x8(0x0).
> | smp:     csd: CSD lock (#1) handling this request.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  drivers/misc/lkdtm/bugs.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> I wrote this because I needed to guide someone through debugging a hung
> smp_call_function() call, and I needed examples with/without an NMI
> backtrace. It seems like it'd be useful for testing the CSD lockup
> detector and NMI backtrace code in future.
> 
> I'm not sure about the CSDLOCKUP name, but everything else I tried
> didn't seem great either:
> 
> * IPILOCKUP sounds like it's testing IPIs generally
> * SMPCALLLOCKUP and similar look weirdly long
> * SMP_CALL_LOCKUP and similar look different to {HARD,SOFT,SPIN}LOCKUP
> 
> ... and I'm happy to defer to Kees for the naming. ;)
> 
> Mark.
> 
> diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
> index 5178c02b21eba..47cd1be09ac1f 100644
> --- a/drivers/misc/lkdtm/bugs.c
> +++ b/drivers/misc/lkdtm/bugs.c
> @@ -286,6 +286,35 @@ static void lkdtm_HARDLOCKUP(void)
>  		cpu_relax();
>  }
>  
> +static void __lkdtm_CSDLOCKUP(void *unused)
> +{
> +	for (;;)
> +		cpu_relax();
> +}
> +
> +static void lkdtm_CSDLOCKUP(void)
> +{
> +	unsigned int cpu, target;
> +
> +	cpus_read_lock();
> +
> +	cpu = get_cpu();
> +	target = cpumask_any_but(cpu_online_mask, cpu);
> +
> +	if (target >= nr_cpu_ids) {
> +		pr_err("FAIL: no other online CPUs\n");
> +		goto out_put_cpus;
> +	}
> +
> +	smp_call_function_single(target, __lkdtm_CSDLOCKUP, NULL, 1);
> +
> +	pr_err("FAIL: did not hang\n");
> +
> +out_put_cpus:
> +	put_cpu();
> +	cpus_read_unlock();
> +}
> +
>  static void lkdtm_SPINLOCKUP(void)
>  {
>  	/* Must be called twice to trigger. */
> @@ -680,6 +709,7 @@ static struct crashtype crashtypes[] = {
>  	CRASHTYPE(UNALIGNED_LOAD_STORE_WRITE),
>  	CRASHTYPE(SOFTLOCKUP),
>  	CRASHTYPE(HARDLOCKUP),
> +	CRASHTYPE(CSDLOCKUP),
>  	CRASHTYPE(SPINLOCKUP),
>  	CRASHTYPE(HUNG_TASK),
>  	CRASHTYPE(OVERFLOW_SIGNED),
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-24  2:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-19 10:34 [PATCH] lkdtm/bugs: add test for hung smp_call_function_single() Mark Rutland
2024-04-19 21:53 ` Kees Cook
2024-04-23  9:47   ` Mark Rutland
2024-04-23 17:21     ` Kees Cook
2024-04-24  2:10 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).