All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Roth <michael.roth@amd.com>
To: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, <linux-kernel@vger.kernel.org>,
	Fuad Tabba <tabba@google.com>, Marc Zyngier <maz@kernel.org>,
	Shaoqin Huang <shahuang@redhat.com>,
	David Matlack <dmatlack@google.com>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Breno Leitao <leitao@debian.org>, <kvm@vger.kernel.org>,
	Ben Gardon <bgardon@google.com>
Subject: Re: [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE]
Date: Thu, 28 Mar 2024 07:38:30 -0500	[thread overview]
Message-ID: <20240328123830.dma3nnmmlb7r52ic@amd.com> (raw)
In-Reply-To: <8fc784c2-2aad-4d1d-ba0f-e5ab69d28ec5@alu.unizg.hr>

On Tue, Mar 26, 2024 at 08:15:12PM +0100, Mirsad Todorovac wrote:
> On 3/26/24 11:16, Borislav Petkov wrote:
> > On Wed, Mar 20, 2024 at 02:28:57AM +0100, Mirsad Todorovac wrote:
> > > Please find the kernel .config attached.
> > 
> > Thanks, that's one huuuge kernel you're building. :)
> > 
> > > I got another one of these "Unpatched thunk" and it seems connected
> > > with selftest/kvm.
> > > 
> > > But running selftests/kvm one by one did not trigger the bug.
> > 
> > Which commands are you exactly running?
> > 
> > I'll try to reproduce here.
> 
> I think I have a reproducer here on the latest torvalds vanilla tree (on Ubuntu 22.04 LTS box):
> 
> root# tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> Running test with CAP_SYS_BOOT enabled
> Running as root, skipping nx_huge_pages_test with CAP_SYS_BOOT disabled
> root# git describe
> v6.9-rc1-5-g928a87efa423
> root#

I'm seeing it pretty consistently on kvm/next as well. Not sure if
there's anything special about my config but starting a fairly basic
SVM guest seems to be enough to trigger it for me on the first
invocation of svm_vcpu_run().

It seems to be 2 call-sites, one inside:

  amd_clear_divider()

and another inside:

  __svm_vcpu_run()

which seems to match up with the decoded stack you posted here. Maybe
the first case would be easiest to focus on? It's a fairly
straight-forward use of ALTERNATIVE():

  void noinstr amd_clear_divider(void)
  {
          asm volatile(ALTERNATIVE("", "div %2\n\t", X86_BUG_DIV0)
                       :: "a" (0), "d" (0), "r" (1));
  }
  EXPORT_SYMBOL_GPL(amd_clear_divider);

and it's been that way since before 4461438a84 ("x86/retpoline: Ensure
default return thunk isn't used at runtime") was added. Not sure if
anything else has changed underneath the covers since 4461438a84.

-Mike

> 
> > Thx.
> 
> Not at all.
> 
> The stacktrace for the bug triggered by the above command was:
> 
> kernel: [  101.973612] ------------[ cut here ]------------
> kernel: [  101.973615] Unpatched return thunk in use. This should not happen!
> kernel: [  101.973618] WARNING: CPU: 1 PID: 3827 at arch/x86/kernel/cpu/bugs.c:2935 __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [  101.973625] Modules linked in: xfrm_user nf_tables nfnetlink nvme_fabrics binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr amd_atl snd_hda_core intel_rapl_common nls_iso8859_1 snd_hwdep snd_pcm edac_mce_amd amdgpu crct10dif_pclmul polyval_clmulni snd_seq_midi polyval_generic snd_seq_midi_event ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 amdxcp sha1_ssse3 drm_exec aesni_intel snd_seq gpu_sched crypto_simd drm_buddy cryptd drm_suballoc_helper drm_ttm_helper snd_seq_device joydev input_leds rapl ttm snd_timer wmi_bmof drm_display_helper cec snd drm_kms_helper k10temp ccp i2c_algo_bit soundcore mac_hid tcp_bbr msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_generic nvme r8169 xhci_pci ahci nvme_core crc32_pclmul i2c_piix4 xhci_pci_renesas nvme_auth realtek libahci video wmi gpio_amdpt
> kernel: [  101.973685] CPU: 1 PID: 3827 Comm: nx_huge_pages_t Not tainted 6.9.0-rc1-torv-00005-g928a87efa423-dirty #36
> kernel: [  101.973687] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> kernel: [  101.973688] RIP: 0010:__warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [ 101.973691] Code: 62 c5 1d 01 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 31 ff e9 be 98 3b 01 48 c7 c7 98 21 c1 bc c6 05 22 26 8d 02 01 e8 90 aa 07 00 <0f> 0b 48 8b 5d f8 c9 31 f6 31 ff e9 9b 98 3b 01 90 90 90 90 90 90
> All code
> ========
>    0:	62 c5 1d 01 83       	(bad)
>    5:	e3 01                	jrcxz  0x8
>    7:	74 0e                	je     0x17
>    9:	48 8b 5d f8          	mov    -0x8(%rbp),%rbx
>    d:	c9                   	leave
>    e:	31 f6                	xor    %esi,%esi
>   10:	31 ff                	xor    %edi,%edi
>   12:	e9 be 98 3b 01       	jmp    0x13b98d5
>   17:	48 c7 c7 98 21 c1 bc 	mov    $0xffffffffbcc12198,%rdi
>   1e:	c6 05 22 26 8d 02 01 	movb   $0x1,0x28d2622(%rip)        # 0x28d2647
>   25:	e8 90 aa 07 00       	call   0x7aaba
>   2a:*	0f 0b                	ud2    		<-- trapping instruction
>   2c:	48 8b 5d f8          	mov    -0x8(%rbp),%rbx
>   30:	c9                   	leave
>   31:	31 f6                	xor    %esi,%esi
>   33:	31 ff                	xor    %edi,%edi
>   35:	e9 9b 98 3b 01       	jmp    0x13b98d5
>   3a:	90                   	nop
>   3b:	90                   	nop
>   3c:	90                   	nop
>   3d:	90                   	nop
>   3e:	90                   	nop
>   3f:	90                   	nop
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	0f 0b                	ud2
>    2:	48 8b 5d f8          	mov    -0x8(%rbp),%rbx
>    6:	c9                   	leave
>    7:	31 f6                	xor    %esi,%esi
>    9:	31 ff                	xor    %edi,%edi
>    b:	e9 9b 98 3b 01       	jmp    0x13b98ab
>   10:	90                   	nop
>   11:	90                   	nop
>   12:	90                   	nop
>   13:	90                   	nop
>   14:	90                   	nop
>   15:	90                   	nop
> kernel: [  101.973692] RSP: 0018:ffffbbd90580fc90 EFLAGS: 00010046
> kernel: [  101.973694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> kernel: [  101.973695] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> kernel: [  101.973696] RBP: ffffbbd90580fc98 R08: 0000000000000000 R09: 0000000000000000
> kernel: [  101.973697] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9964e4b7d4f0
> kernel: [  101.973698] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9964e4b7dc70
> kernel: [  101.973699] FS:  0000720b95372740(0000) GS:ffff9973d7a80000(0000) knlGS:0000000000000000
> kernel: [  101.973700] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: [  101.973701] CR2: 0000000000000000 CR3: 00000001aea6c000 CR4: 0000000000f50ef0
> kernel: [  101.973703] PKRU: 55555554
> kernel: [  101.973703] Call Trace:
> kernel: [  101.973704]  <TASK>
> kernel: [  101.973706] ? show_regs (./arch/x86/kernel/dumpstack.c:479)
> kernel: [  101.973709] ? __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [  101.973711] ? __warn (./kernel/panic.c:694)
> kernel: [  101.973713] ? __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [  101.973715] ? report_bug (./lib/bug.c:201 ./lib/bug.c:219)
> kernel: [  101.973718] ? irq_work_queue (./kernel/irq_work.c:119)
> kernel: [  101.973722] ? handle_bug (./arch/x86/kernel/traps.c:218)
> kernel: [  101.973725] ? exc_invalid_op (./arch/x86/kernel/traps.c:260 (discriminator 1))
> kernel: [  101.973727] ? asm_exc_invalid_op (././arch/x86/include/asm/idtentry.h:621)
> kernel: [  101.973731] ? __warn_thunk (./arch/x86/kernel/cpu/bugs.c:2935 (discriminator 3))
> kernel: [  101.973734] warn_thunk_thunk (./arch/x86/entry/entry.S:48)
> kernel: [  101.973738] svm_vcpu_enter_exit (././include/linux/kvm_host.h:547 ./arch/x86/kvm/svm/svm.c:4115)
> kernel: [  101.973740] svm_vcpu_run (././arch/x86/include/asm/cpufeature.h:171 ./arch/x86/kvm/svm/svm.c:4186)
> kernel: [  101.973744] kvm_arch_vcpu_ioctl_run (./arch/x86/kvm/x86.c:11008 ./arch/x86/kvm/x86.c:11211 ./arch/x86/kvm/x86.c:11437)
> kernel: [  101.973747] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [  101.973750] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [  101.973752] ? kvm_vm_stats_read (./arch/x86/kvm/../../../virt/kvm/kvm_main.c:5066)
> kernel: [  101.973755] kvm_vcpu_ioctl (./arch/x86/kvm/../../../virt/kvm/kvm_main.c:4464)
> kernel: [  101.973757] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [  101.973759] ? trace_hardirqs_on_prepare (./kernel/trace/trace_preemptirq.c:47 ./kernel/trace/trace_preemptirq.c:42)
> kernel: [  101.973761] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [  101.973763] ? syscall_exit_to_user_mode (./kernel/entry/common.c:221)
> kernel: [  101.973765] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [  101.973767] ? do_syscall_64 (././arch/x86/include/asm/cpufeature.h:171 ./arch/x86/entry/common.c:98)
> kernel: [  101.973770] __x64_sys_ioctl (./fs/ioctl.c:51 ./fs/ioctl.c:904 ./fs/ioctl.c:890 ./fs/ioctl.c:890)
> kernel: [  101.973773] do_syscall_64 (./arch/x86/entry/common.c:52 ./arch/x86/entry/common.c:83)
> kernel: [  101.973775] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [  101.973777] ? irqentry_exit (./kernel/entry/common.c:367)
> kernel: [  101.973778] ? srso_alias_return_thunk (./arch/x86/lib/retpoline.S:181)
> kernel: [  101.973780] entry_SYSCALL_64_after_hwframe (./arch/x86/entry/entry_64.S:129)
> kernel: [  101.973782] RIP: 0033:0x720b9511a94f
> kernel: [ 101.973798] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> All code
> ========
>    0:	00 48 89             	add    %cl,-0x77(%rax)
>    3:	44 24 18             	rex.R and $0x18,%al
>    6:	31 c0                	xor    %eax,%eax
>    8:	48 8d 44 24 60       	lea    0x60(%rsp),%rax
>    d:	c7 04 24 10 00 00 00 	movl   $0x10,(%rsp)
>   14:	48 89 44 24 08       	mov    %rax,0x8(%rsp)
>   19:	48 8d 44 24 20       	lea    0x20(%rsp),%rax
>   1e:	48 89 44 24 10       	mov    %rax,0x10(%rsp)
>   23:	b8 10 00 00 00       	mov    $0x10,%eax
>   28:	0f 05                	syscall
>   2a:*	41 89 c0             	mov    %eax,%r8d		<-- trapping instruction
>   2d:	3d 00 f0 ff ff       	cmp    $0xfffff000,%eax
>   32:	77 1f                	ja     0x53
>   34:	48 8b 44 24 18       	mov    0x18(%rsp),%rax
>   39:	64                   	fs
>   3a:	48                   	rex.W
>   3b:	2b                   	.byte 0x2b
>   3c:	04 25                	add    $0x25,%al
>   3e:	28 00                	sub    %al,(%rax)
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	41 89 c0             	mov    %eax,%r8d
>    3:	3d 00 f0 ff ff       	cmp    $0xfffff000,%eax
>    8:	77 1f                	ja     0x29
>    a:	48 8b 44 24 18       	mov    0x18(%rsp),%rax
>    f:	64                   	fs
>   10:	48                   	rex.W
>   11:	2b                   	.byte 0x2b
>   12:	04 25                	add    $0x25,%al
>   14:	28 00                	sub    %al,(%rax)
> kernel: [  101.973799] RSP: 002b:00007ffd786b9ca0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> kernel: [  101.973801] RAX: ffffffffffffffda RBX: 0000000000600000 RCX: 0000720b9511a94f
> kernel: [  101.973802] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
> kernel: [  101.973803] RBP: 0000720b953726c0 R08: 000000000041b228 R09: 0000000000000000
> kernel: [  101.973804] R10: 0000720b951d8882 R11: 0000000000000246 R12: 000000000c9b18c0
> kernel: [  101.973805] R13: 000000000c9b18c0 R14: 0000000000000000 R15: 0000000000000064
> kernel: [  101.973809]  </TASK>
> kernel: [  101.973810] ---[ end trace 0000000000000000 ]---
> 
> NOTE: Cc:-ed author of the reproducer for these results.
> NOTE 2: The stacktrace is only displayed once, repeating the reproducer doesn't work until the next reboot.
> 
> Sending the latest config as well attached:
> 
> Best regards,
> Mirsad Todorovac



  reply	other threads:[~2024-03-28 12:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18 19:47 [BUG net-next] arch/x86/kernel/cpu/bugs.c:2935: "Unpatched return thunk in use. This should not happen!" [STACKTRACE] Mirsad Todorovac
2024-03-18 20:21 ` Borislav Petkov
2024-03-20  1:28   ` Mirsad Todorovac
2024-03-26 10:16     ` Borislav Petkov
2024-03-26 19:15       ` Mirsad Todorovac
2024-03-28 12:38         ` Michael Roth [this message]
2024-04-02 10:15           ` bp
2024-04-02 13:38             ` Michael Roth
2024-04-03 12:14               ` Borislav Petkov
2024-04-03 12:48                 ` Sean Christopherson
2024-04-04 13:44                   ` Borislav Petkov
2024-04-17 15:52                     ` Paolo Bonzini
     [not found]               ` <f497a833-f945-4907-b916-1739324de014@alu.unizg.hr>
2024-04-04 13:41                 ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240328123830.dma3nnmmlb7r52ic@amd.com \
    --to=michael.roth@amd.com \
    --cc=bgardon@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mirsad.todorovac@alu.unizg.hr \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    --cc=shahuang@redhat.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.