All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718
@ 2024-04-23  4:10 John B. Wyatt IV
  2024-04-23 15:03 ` Luis Claudio R. Goncalves
  2024-04-29  9:11 ` [PATCH net] cxgb4: Properly lock TX queue for the selftest Sebastian Andrzej Siewior
  0 siblings, 2 replies; 4+ messages in thread
From: John B. Wyatt IV @ 2024-04-23  4:10 UTC (permalink / raw
  To: Raju Rangoju, Sebastian Andrzej Siewior
  Cc: Juri Lelli, Clark Williams, netdev, kernel-rts-sst, LKML

Hello Raju, Hello Sebastian,

Red Hat QE found this issue with cxgb4 only when the kernel has PREEMPT_RT set
with the preempt-rt patchset:

git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git

We are also seeing this in the Real-time builds of RHEL9 and 8.

The specific build is an internal build that was pulled from the mirror Clark
Williams setup for Fedora and RHEL testing.

https://gitlab.com/cki-project/kernel-ark/-/tree/os-build-rt?ref_type=heads

We use the branch: os-build-rt

I was unable to find the cause of this and I thought I should report it.

Please let me if you have any questions or you need any testing done.

Call trace is below:

kernel-rt-6.9.0-0.rc4.f8dba31b0a82.38.test.eln136.x86_64
 BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718
 caller is cxgb4_selftest_lb_pkt+0x3d/0x3a0 cxgb4
 Hardware name: Dell Inc. PowerEdge R750/0WT8Y6, BIOS 1.5.4 12/17/2021
 Call Trace:
  <TASK>
 dump_stack_lvl (lib/dump_stack.c:116) 
 check_preemption_disabled (lib/smp_processor_id.c:49) 
 cxgb4_selftest_lb_pkt+0x3d/0x3a0 cxgb4
 cxgb4_self_test+0x8f/0xe0 cxgb4
 ethtool_self_test (net/ethtool/ioctl.c:2002) 
 __dev_ethtool (net/ethtool/ioctl.c:2997) 
 ? migrate_enable (./include/linux/preempt.h:480 (discriminator 3) ./include/linux/preempt.h:480 (discriminator 3) kernel/sched/core.c:2472 (discriminator 3)) 
 ? kmalloc_trace (./arch/x86/include/asm/jump_label.h:55 ./include/linux/memcontrol.h:1839 mm/slub.c:1980 mm/slub.c:3807 mm/slub.c:3845 mm/slub.c:3992) 
 dev_ethtool (net/ethtool/ioctl.c:3177) 
 dev_ioctl (net/core/dev_ioctl.c:724) 
 sock_do_ioctl (net/socket.c:1236) 
 __x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:904 fs/ioctl.c:890 fs/ioctl.c:890) 
 do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) 
 ? mod_objcg_state (mm/memcontrol.c:3421) 
 ? migrate_enable (./include/linux/preempt.h:480 (discriminator 3) ./include/linux/preempt.h:480 (discriminator 3) kernel/sched/core.c:2472 (discriminator 3)) 
 ? try_charge_memcg (mm/memcontrol.c:2745) 
 ? __mod_node_page_state (./include/linux/preempt.h:477 (discriminator 3) mm/vmstat.c:405 (discriminator 3)) 
 ? migrate_enable (./include/linux/preempt.h:480 (discriminator 3) ./include/linux/preempt.h:480 (discriminator 3) kernel/sched/core.c:2472 (discriminator 3)) 
 ? rt_spin_unlock (kernel/locking/rtmutex.c:230 (discriminator 5) kernel/locking/spinlock_rt.c:84 (discriminator 5)) 
 ? do_anonymous_page (./include/linux/pgtable.h:114 mm/memory.c:4490) 
 ? __handle_mm_fault (mm/memory.c:3878 mm/memory.c:5300 mm/memory.c:5441) 
 ? syscall_exit_to_user_mode (kernel/entry/common.c:221) 
 ? __count_memcg_events (./include/linux/preempt.h:477 (discriminator 3) mm/memcontrol.c:704 (discriminator 3) mm/memcontrol.c:963 (discriminator 3)) 
 ? handle_mm_fault (mm/memory.c:5483 mm/memory.c:5622) 
 ? do_user_addr_fault (arch/x86/mm/fault.c:1443 (discriminator 1)) 
 ? clear_bhb_loop (arch/x86/entry/entry_64.S:1539) 
 ? clear_bhb_loop (arch/x86/entry/entry_64.S:1539) 
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
 RIP: 0033:0x7f55216c557b
 Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 68 0f 00 f7 d8 64 89 01 48
All code
========
   0:	ff                   	(bad)
   1:	ff                   	(bad)
   2:	ff 85 c0 79 9b 49    	incl   0x499b79c0(%rbp)
   8:	c7 c4 ff ff ff ff    	mov    $0xffffffff,%esp
   e:	5b                   	pop    %rbx
   f:	5d                   	pop    %rbp
  10:	4c 89 e0             	mov    %r12,%rax
  13:	41 5c                	pop    %r12
  15:	c3                   	ret
  16:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  1d:	00 00 
  1f:	f3 0f 1e fa          	endbr64
  23:	b8 10 00 00 00       	mov    $0x10,%eax
  28:	0f 05                	syscall
  2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
  30:	73 01                	jae    0x33
  32:	c3                   	ret
  33:	48 8b 0d 75 68 0f 00 	mov    0xf6875(%rip),%rcx        # 0xf68af
  3a:	f7 d8                	neg    %eax
  3c:	64 89 01             	mov    %eax,%fs:(%rcx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   6:	73 01                	jae    0x9
   8:	c3                   	ret
   9:	48 8b 0d 75 68 0f 00 	mov    0xf6875(%rip),%rcx        # 0xf6885
  10:	f7 d8                	neg    %eax
  12:	64 89 01             	mov    %eax,%fs:(%rcx)
  15:	48                   	rex.W
 RSP: 002b:00007ffd867a78f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007ffd867a7980 RCX: 00007f55216c557b
 RDX: 00007ffd867a7990 RSI: 0000000000008946 RDI: 0000000000000003
 RBP: 0000556fe43632e0 R08: 0000000000000003 R09: 0000000000000001
 R10: 0000000000000fff R11: 0000000000000246 R12: 0000556fe43632a0
 R13: 0000000000000018 R14: 0000000000000001 R15: 0000000000000000
  </TASK>

-- 
Sincerely,
John Wyatt
Software Engineer, Core Kernel
Red Hat


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718
  2024-04-23  4:10 net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718 John B. Wyatt IV
@ 2024-04-23 15:03 ` Luis Claudio R. Goncalves
  2024-04-29  9:11 ` [PATCH net] cxgb4: Properly lock TX queue for the selftest Sebastian Andrzej Siewior
  1 sibling, 0 replies; 4+ messages in thread
From: Luis Claudio R. Goncalves @ 2024-04-23 15:03 UTC (permalink / raw
  To: John B. Wyatt IV
  Cc: Raju Rangoju, Sebastian Andrzej Siewior, Juri Lelli,
	Clark Williams, netdev, LKML

On Tue, Apr 23, 2024 at 12:10:10AM -0400, John B. Wyatt IV wrote:
> Hello Raju, Hello Sebastian,
> 
> Red Hat QE found this issue with cxgb4 only when the kernel has PREEMPT_RT set
> with the preempt-rt patchset:
> 
> git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git
> 
> We are also seeing this in the Real-time builds of RHEL9 and 8.
> 
> The specific build is an internal build that was pulled from the mirror Clark
> Williams setup for Fedora and RHEL testing.
> 
> https://gitlab.com/cki-project/kernel-ark/-/tree/os-build-rt?ref_type=heads
> 
> We use the branch: os-build-rt
> 
> I was unable to find the cause of this and I thought I should report it.
> 
> Please let me if you have any questions or you need any testing done.
> 
> Call trace is below:
> 
> kernel-rt-6.9.0-0.rc4.f8dba31b0a82.38.test.eln136.x86_64
>  BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718
>  caller is cxgb4_selftest_lb_pkt+0x3d/0x3a0 cxgb4
>  Hardware name: Dell Inc. PowerEdge R750/0WT8Y6, BIOS 1.5.4 12/17/2021
>  Call Trace:
>   <TASK>
>  dump_stack_lvl (lib/dump_stack.c:116) 
>  check_preemption_disabled (lib/smp_processor_id.c:49) 
>  cxgb4_selftest_lb_pkt+0x3d/0x3a0 cxgb4
>  cxgb4_self_test+0x8f/0xe0 cxgb4
>  ethtool_self_test (net/ethtool/ioctl.c:2002) 
>  __dev_ethtool (net/ethtool/ioctl.c:2997) 

Hi John,

The patch below is untested but should fix the problem you reported:

======

cxgb4: fix smp_processor_id() usage in selftests

When PREEMPT_RT is enabled the following call can result in a "BUG: using
smp_processor_id() in preemptible [00000000] code: ethtool/xxxx" error
message:

    ethtool_self_test()
        cxgb4_self_test()
            cxgb4_selftest_lb_pkt()
                __netif_tx_lock(q->txq, smp_processor_id());  <--- BOOM

Replacing smp_processor_id() by raw_smp_processor_id() is safe in this
context given that __netif_tx_lock() is an inline function that takes a
spinlock and then uses the cpu value.

Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 1948b7bf99661..803dc62a4db04 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2674,7 +2674,7 @@ int cxgb4_selftest_lb_pkt(struct net_device *netdev)
 	lb->loopback = 1;
 
 	q = &adap->sge.ethtxq[pi->first_qset];
-	__netif_tx_lock(q->txq, smp_processor_id());
+	__netif_tx_lock(q->txq, raw_smp_processor_id());
 
 	reclaim_completed_tx(adap, &q->q, -1, true);
 	credits = txq_avail(&q->q) - ndesc;



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net] cxgb4: Properly lock TX queue for the selftest.
  2024-04-23  4:10 net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718 John B. Wyatt IV
  2024-04-23 15:03 ` Luis Claudio R. Goncalves
@ 2024-04-29  9:11 ` Sebastian Andrzej Siewior
  2024-05-01  1:40   ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-04-29  9:11 UTC (permalink / raw
  To: netdev
  Cc: John B. Wyatt IV, Raju Rangoju, Juri Lelli, Clark Williams,
	Luis Claudio R. Goncalves, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Thomas Gleixner

The selftest for the driver sends a dummy packet and checks if the
packet will be received properly as it should be. The regular TX path
and the selftest can use the same network queue so locking is required
and was missing in the selftest path. This was addressed in the commit
cited below.
Unfortunately locking the TX queue requires BH to be disabled which is
not the case in selftest path which is invoked in process context.
Lockdep should be complaining about this.

Use __netif_tx_lock_bh() for TX queue locking.

Fixes: c650e04898072 ("cxgb4: Fix race between loopback and normal Tx path")
Reported-by: "John B. Wyatt IV" <jwyatt@redhat.com>
Closes: https://lore.kernel.org/all/Zic0ot5aGgR-V4Ks@thinkpad2021/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 49d5808b7d11d..de52bcb884c41 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2670,12 +2670,12 @@ int cxgb4_selftest_lb_pkt(struct net_device *netdev)
 	lb->loopback = 1;
 
 	q = &adap->sge.ethtxq[pi->first_qset];
-	__netif_tx_lock(q->txq, smp_processor_id());
+	__netif_tx_lock_bh(q->txq);
 
 	reclaim_completed_tx(adap, &q->q, -1, true);
 	credits = txq_avail(&q->q) - ndesc;
 	if (unlikely(credits < 0)) {
-		__netif_tx_unlock(q->txq);
+		__netif_tx_unlock_bh(q->txq);
 		return -ENOMEM;
 	}
 
@@ -2710,7 +2710,7 @@ int cxgb4_selftest_lb_pkt(struct net_device *netdev)
 	init_completion(&lb->completion);
 	txq_advance(&q->q, ndesc);
 	cxgb4_ring_tx_db(adap, &q->q, ndesc);
-	__netif_tx_unlock(q->txq);
+	__netif_tx_unlock_bh(q->txq);
 
 	/* wait for the pkt to return */
 	ret = wait_for_completion_timeout(&lb->completion, 10 * HZ);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] cxgb4: Properly lock TX queue for the selftest.
  2024-04-29  9:11 ` [PATCH net] cxgb4: Properly lock TX queue for the selftest Sebastian Andrzej Siewior
@ 2024-05-01  1:40   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-05-01  1:40 UTC (permalink / raw
  To: Sebastian Andrzej Siewior
  Cc: netdev, jwyatt, rajur, jlelli, williams, lgoncalv, davem,
	edumazet, kuba, pabeni, tglx

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Apr 2024 11:11:47 +0200 you wrote:
> The selftest for the driver sends a dummy packet and checks if the
> packet will be received properly as it should be. The regular TX path
> and the selftest can use the same network queue so locking is required
> and was missing in the selftest path. This was addressed in the commit
> cited below.
> Unfortunately locking the TX queue requires BH to be disabled which is
> not the case in selftest path which is invoked in process context.
> Lockdep should be complaining about this.
> 
> [...]

Here is the summary with links:
  - [net] cxgb4: Properly lock TX queue for the selftest.
    https://git.kernel.org/netdev/net/c/9067eccdd784

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-01  1:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-23  4:10 net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718 John B. Wyatt IV
2024-04-23 15:03 ` Luis Claudio R. Goncalves
2024-04-29  9:11 ` [PATCH net] cxgb4: Properly lock TX queue for the selftest Sebastian Andrzej Siewior
2024-05-01  1:40   ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.