* net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718
@ 2024-04-23 4:10 John B. Wyatt IV
2024-04-23 15:03 ` Luis Claudio R. Goncalves
2024-04-29 9:11 ` [PATCH net] cxgb4: Properly lock TX queue for the selftest Sebastian Andrzej Siewior
0 siblings, 2 replies; 4+ messages in thread
From: John B. Wyatt IV @ 2024-04-23 4:10 UTC (permalink / raw
To: Raju Rangoju, Sebastian Andrzej Siewior
Cc: Juri Lelli, Clark Williams, netdev, kernel-rts-sst, LKML
Hello Raju, Hello Sebastian,
Red Hat QE found this issue with cxgb4 only when the kernel has PREEMPT_RT set
with the preempt-rt patchset:
git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git
We are also seeing this in the Real-time builds of RHEL9 and 8.
The specific build is an internal build that was pulled from the mirror Clark
Williams setup for Fedora and RHEL testing.
https://gitlab.com/cki-project/kernel-ark/-/tree/os-build-rt?ref_type=heads
We use the branch: os-build-rt
I was unable to find the cause of this and I thought I should report it.
Please let me if you have any questions or you need any testing done.
Call trace is below:
kernel-rt-6.9.0-0.rc4.f8dba31b0a82.38.test.eln136.x86_64
BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718
caller is cxgb4_selftest_lb_pkt+0x3d/0x3a0 cxgb4
Hardware name: Dell Inc. PowerEdge R750/0WT8Y6, BIOS 1.5.4 12/17/2021
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:116)
check_preemption_disabled (lib/smp_processor_id.c:49)
cxgb4_selftest_lb_pkt+0x3d/0x3a0 cxgb4
cxgb4_self_test+0x8f/0xe0 cxgb4
ethtool_self_test (net/ethtool/ioctl.c:2002)
__dev_ethtool (net/ethtool/ioctl.c:2997)
? migrate_enable (./include/linux/preempt.h:480 (discriminator 3) ./include/linux/preempt.h:480 (discriminator 3) kernel/sched/core.c:2472 (discriminator 3))
? kmalloc_trace (./arch/x86/include/asm/jump_label.h:55 ./include/linux/memcontrol.h:1839 mm/slub.c:1980 mm/slub.c:3807 mm/slub.c:3845 mm/slub.c:3992)
dev_ethtool (net/ethtool/ioctl.c:3177)
dev_ioctl (net/core/dev_ioctl.c:724)
sock_do_ioctl (net/socket.c:1236)
__x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:904 fs/ioctl.c:890 fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
? mod_objcg_state (mm/memcontrol.c:3421)
? migrate_enable (./include/linux/preempt.h:480 (discriminator 3) ./include/linux/preempt.h:480 (discriminator 3) kernel/sched/core.c:2472 (discriminator 3))
? try_charge_memcg (mm/memcontrol.c:2745)
? __mod_node_page_state (./include/linux/preempt.h:477 (discriminator 3) mm/vmstat.c:405 (discriminator 3))
? migrate_enable (./include/linux/preempt.h:480 (discriminator 3) ./include/linux/preempt.h:480 (discriminator 3) kernel/sched/core.c:2472 (discriminator 3))
? rt_spin_unlock (kernel/locking/rtmutex.c:230 (discriminator 5) kernel/locking/spinlock_rt.c:84 (discriminator 5))
? do_anonymous_page (./include/linux/pgtable.h:114 mm/memory.c:4490)
? __handle_mm_fault (mm/memory.c:3878 mm/memory.c:5300 mm/memory.c:5441)
? syscall_exit_to_user_mode (kernel/entry/common.c:221)
? __count_memcg_events (./include/linux/preempt.h:477 (discriminator 3) mm/memcontrol.c:704 (discriminator 3) mm/memcontrol.c:963 (discriminator 3))
? handle_mm_fault (mm/memory.c:5483 mm/memory.c:5622)
? do_user_addr_fault (arch/x86/mm/fault.c:1443 (discriminator 1))
? clear_bhb_loop (arch/x86/entry/entry_64.S:1539)
? clear_bhb_loop (arch/x86/entry/entry_64.S:1539)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f55216c557b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 68 0f 00 f7 d8 64 89 01 48
All code
========
0: ff (bad)
1: ff (bad)
2: ff 85 c0 79 9b 49 incl 0x499b79c0(%rbp)
8: c7 c4 ff ff ff ff mov $0xffffffff,%esp
e: 5b pop %rbx
f: 5d pop %rbp
10: 4c 89 e0 mov %r12,%rax
13: 41 5c pop %r12
15: c3 ret
16: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
1d: 00 00
1f: f3 0f 1e fa endbr64
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 ret
33: 48 8b 0d 75 68 0f 00 mov 0xf6875(%rip),%rcx # 0xf68af
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 ret
9: 48 8b 0d 75 68 0f 00 mov 0xf6875(%rip),%rcx # 0xf6885
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
RSP: 002b:00007ffd867a78f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffd867a7980 RCX: 00007f55216c557b
RDX: 00007ffd867a7990 RSI: 0000000000008946 RDI: 0000000000000003
RBP: 0000556fe43632e0 R08: 0000000000000003 R09: 0000000000000001
R10: 0000000000000fff R11: 0000000000000246 R12: 0000556fe43632a0
R13: 0000000000000018 R14: 0000000000000001 R15: 0000000000000000
</TASK>
--
Sincerely,
John Wyatt
Software Engineer, Core Kernel
Red Hat
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718
2024-04-23 4:10 net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718 John B. Wyatt IV
@ 2024-04-23 15:03 ` Luis Claudio R. Goncalves
2024-04-29 9:11 ` [PATCH net] cxgb4: Properly lock TX queue for the selftest Sebastian Andrzej Siewior
1 sibling, 0 replies; 4+ messages in thread
From: Luis Claudio R. Goncalves @ 2024-04-23 15:03 UTC (permalink / raw
To: John B. Wyatt IV
Cc: Raju Rangoju, Sebastian Andrzej Siewior, Juri Lelli,
Clark Williams, netdev, LKML
On Tue, Apr 23, 2024 at 12:10:10AM -0400, John B. Wyatt IV wrote:
> Hello Raju, Hello Sebastian,
>
> Red Hat QE found this issue with cxgb4 only when the kernel has PREEMPT_RT set
> with the preempt-rt patchset:
>
> git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git
>
> We are also seeing this in the Real-time builds of RHEL9 and 8.
>
> The specific build is an internal build that was pulled from the mirror Clark
> Williams setup for Fedora and RHEL testing.
>
> https://gitlab.com/cki-project/kernel-ark/-/tree/os-build-rt?ref_type=heads
>
> We use the branch: os-build-rt
>
> I was unable to find the cause of this and I thought I should report it.
>
> Please let me if you have any questions or you need any testing done.
>
> Call trace is below:
>
> kernel-rt-6.9.0-0.rc4.f8dba31b0a82.38.test.eln136.x86_64
> BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718
> caller is cxgb4_selftest_lb_pkt+0x3d/0x3a0 cxgb4
> Hardware name: Dell Inc. PowerEdge R750/0WT8Y6, BIOS 1.5.4 12/17/2021
> Call Trace:
> <TASK>
> dump_stack_lvl (lib/dump_stack.c:116)
> check_preemption_disabled (lib/smp_processor_id.c:49)
> cxgb4_selftest_lb_pkt+0x3d/0x3a0 cxgb4
> cxgb4_self_test+0x8f/0xe0 cxgb4
> ethtool_self_test (net/ethtool/ioctl.c:2002)
> __dev_ethtool (net/ethtool/ioctl.c:2997)
Hi John,
The patch below is untested but should fix the problem you reported:
======
cxgb4: fix smp_processor_id() usage in selftests
When PREEMPT_RT is enabled the following call can result in a "BUG: using
smp_processor_id() in preemptible [00000000] code: ethtool/xxxx" error
message:
ethtool_self_test()
cxgb4_self_test()
cxgb4_selftest_lb_pkt()
__netif_tx_lock(q->txq, smp_processor_id()); <--- BOOM
Replacing smp_processor_id() by raw_smp_processor_id() is safe in this
context given that __netif_tx_lock() is an inline function that takes a
spinlock and then uses the cpu value.
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 1948b7bf99661..803dc62a4db04 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2674,7 +2674,7 @@ int cxgb4_selftest_lb_pkt(struct net_device *netdev)
lb->loopback = 1;
q = &adap->sge.ethtxq[pi->first_qset];
- __netif_tx_lock(q->txq, smp_processor_id());
+ __netif_tx_lock(q->txq, raw_smp_processor_id());
reclaim_completed_tx(adap, &q->q, -1, true);
credits = txq_avail(&q->q) - ndesc;
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH net] cxgb4: Properly lock TX queue for the selftest.
2024-04-23 4:10 net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718 John B. Wyatt IV
2024-04-23 15:03 ` Luis Claudio R. Goncalves
@ 2024-04-29 9:11 ` Sebastian Andrzej Siewior
2024-05-01 1:40 ` patchwork-bot+netdevbpf
1 sibling, 1 reply; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-04-29 9:11 UTC (permalink / raw
To: netdev
Cc: John B. Wyatt IV, Raju Rangoju, Juri Lelli, Clark Williams,
Luis Claudio R. Goncalves, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Thomas Gleixner
The selftest for the driver sends a dummy packet and checks if the
packet will be received properly as it should be. The regular TX path
and the selftest can use the same network queue so locking is required
and was missing in the selftest path. This was addressed in the commit
cited below.
Unfortunately locking the TX queue requires BH to be disabled which is
not the case in selftest path which is invoked in process context.
Lockdep should be complaining about this.
Use __netif_tx_lock_bh() for TX queue locking.
Fixes: c650e04898072 ("cxgb4: Fix race between loopback and normal Tx path")
Reported-by: "John B. Wyatt IV" <jwyatt@redhat.com>
Closes: https://lore.kernel.org/all/Zic0ot5aGgR-V4Ks@thinkpad2021/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
drivers/net/ethernet/chelsio/cxgb4/sge.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 49d5808b7d11d..de52bcb884c41 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2670,12 +2670,12 @@ int cxgb4_selftest_lb_pkt(struct net_device *netdev)
lb->loopback = 1;
q = &adap->sge.ethtxq[pi->first_qset];
- __netif_tx_lock(q->txq, smp_processor_id());
+ __netif_tx_lock_bh(q->txq);
reclaim_completed_tx(adap, &q->q, -1, true);
credits = txq_avail(&q->q) - ndesc;
if (unlikely(credits < 0)) {
- __netif_tx_unlock(q->txq);
+ __netif_tx_unlock_bh(q->txq);
return -ENOMEM;
}
@@ -2710,7 +2710,7 @@ int cxgb4_selftest_lb_pkt(struct net_device *netdev)
init_completion(&lb->completion);
txq_advance(&q->q, ndesc);
cxgb4_ring_tx_db(adap, &q->q, ndesc);
- __netif_tx_unlock(q->txq);
+ __netif_tx_unlock_bh(q->txq);
/* wait for the pkt to return */
ret = wait_for_completion_timeout(&lb->completion, 10 * HZ);
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net] cxgb4: Properly lock TX queue for the selftest.
2024-04-29 9:11 ` [PATCH net] cxgb4: Properly lock TX queue for the selftest Sebastian Andrzej Siewior
@ 2024-05-01 1:40 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-05-01 1:40 UTC (permalink / raw
To: Sebastian Andrzej Siewior
Cc: netdev, jwyatt, rajur, jlelli, williams, lgoncalv, davem,
edumazet, kuba, pabeni, tglx
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 29 Apr 2024 11:11:47 +0200 you wrote:
> The selftest for the driver sends a dummy packet and checks if the
> packet will be received properly as it should be. The regular TX path
> and the selftest can use the same network queue so locking is required
> and was missing in the selftest path. This was addressed in the commit
> cited below.
> Unfortunately locking the TX queue requires BH to be disabled which is
> not the case in selftest path which is invoked in process context.
> Lockdep should be complaining about this.
>
> [...]
Here is the summary with links:
- [net] cxgb4: Properly lock TX queue for the selftest.
https://git.kernel.org/netdev/net/c/9067eccdd784
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-05-01 1:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-23 4:10 net: cxgb4: Call Trace reported with PREEMPT_RT: BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/78718 John B. Wyatt IV
2024-04-23 15:03 ` Luis Claudio R. Goncalves
2024-04-29 9:11 ` [PATCH net] cxgb4: Properly lock TX queue for the selftest Sebastian Andrzej Siewior
2024-05-01 1:40 ` patchwork-bot+netdevbpf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.