Regressions List Tracking
 help / color / mirror / Atom feed
From: Kalle Valo <kvalo@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>
Cc: x86@kernel.org, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, regressions@lists.linux.dev,
	Jeff Johnson <quic_jjohnson@quicinc.com>
Subject: [regression] suspend stress test stalls within 30 minutes
Date: Sat, 11 May 2024 21:22:43 +0300	[thread overview]
Message-ID: <87o79cjjik.fsf@kernel.org> (raw)

Hi,

I have a weird problem with suspend. Somewhere around v6.9-rc4 or so (not sure
exactly) I started seeing that our ath11k Wi-Fi driver suspend tests to
randomly fail. I have been investigating this for some time and now it
looks like it's somehow related to CPU_MITIGATIONS Kconfig option and
nothing to do with wireless.

The simplified test case I have is to run suspend and resume in loop
like this (Wi-Fi modules are not loaded):

for i in {1..400}; do echo "rtcwake test $i" > /dev/kmsg; rtcwake -m mem -s 10; sleep 10; done

If CPU_MITIGATIONS is enabled I usually see suspend stalling within 30
minutes. If I disable CPU_MITIGATIONS using menuconfig I don't see the bug.

When the bug happens in the kernel.log I see this and suspend stalls:

[  361.716546] PM: suspend entry (deep)
[  361.722558] Filesystems sync: 0.005 seconds
[  624.222721] kworker/dying (2519) used greatest stack depth: 22240 bytes left
[  633.897857] loop0: detected capacity change from 0 to 8

And if I don't do anything for several minutes nothing happens. What is
really strange is that once I run 'sudo shutdown -h now' then suspend
somehow immediately unstalls and continues with suspend, like this:

[  847.631147] Freezing user space processes
[  847.649590] Freezing user space processes completed (elapsed 0.016 seconds)
[  847.650710] OOM killer disabled.
[  847.651799] Freezing remaining freezable tasks
[  847.654618] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[  847.663757] printk: Suspending console(s) (use no_console_suspend to debug)
[  847.710060] e1000e: EEE TX LPI TIMER: 00000011
[  847.852370] ACPI: EC: interrupt blocked
[  847.899416] ACPI: PM: Preparing to enter system sleep state S3
[  847.933433] ACPI: EC: event blocked
[  847.933437] ACPI: EC: EC stopped
[  847.933441] ACPI: PM: Saving platform NVS memory
[  847.933817] Disabling non-boot CPUs ...

And now the system goes into suspend state as it should. And if I press
the power button on the device then the system resumes and after that
shuts down (as expected because I run the shutdown command). This
behaviour is consistent, I see it every time the suspend bug happens.

The test setup is a several years old Intel NUC x86 system, more info
below.

Any recommendations how should I debug this further? I tried to bisect
this earlier but that failed, most likely because I hadn't yet realised
that this is related to CPU_MITIGATIONS and might have messed up the
.config settings during bisect.

Kalle

DMI: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021

Ubuntu 20.04.6 LTS (GNU/Linux 6.9.0-rc7+ x86_64)

systemd 245.4-4ubuntu3.23 running in system mode. (+PAM +AUDIT +SELINUX
+IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS
+ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2
default-hierarchy=hybrid)

I verified that I see this on latest commit from Linus' tree:

cf87f46fd34d Merge tag 'drm-fixes-2024-05-11' of https://gitlab.freedesktop.org/drm/kernel

Here's the diff between broken and working .config:

$ diffconfig broken.config works.config 
-CALL_PADDING y
-CALL_THUNKS y
-CALL_THUNKS_DEBUG n
-HAVE_CALL_THUNKS y
-MITIGATION_CALL_DEPTH_TRACKING y
-MITIGATION_GDS_FORCE y
-MITIGATION_IBPB_ENTRY y
-MITIGATION_IBRS_ENTRY y
-MITIGATION_PAGE_TABLE_ISOLATION y
-MITIGATION_RETHUNK y
-MITIGATION_RETPOLINE y
-MITIGATION_RFDS y
-MITIGATION_SLS y
-MITIGATION_SPECTRE_BHI y
-MITIGATION_SRSO y
-MITIGATION_UNRET_ENTRY y
-PREFIX_SYMBOLS y
 CPU_MITIGATIONS y -> n

             reply	other threads:[~2024-05-11 18:22 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-11 18:22 Kalle Valo [this message]
2024-05-11 18:48 ` [regression] suspend stress test stalls within 30 minutes Borislav Petkov
2024-05-11 18:49   ` Borislav Petkov
2024-05-11 20:26     ` Kalle Valo
2024-05-13 19:58       ` Kalle Valo
2024-05-14 13:17         ` Kalle Valo
2024-05-14 16:05           ` Borislav Petkov
2024-05-14 17:36             ` Pawan Gupta
2024-05-17 17:15             ` Kalle Valo
2024-05-17 17:22               ` Dave Hansen
2024-05-17 18:37                 ` Kalle Valo
2024-05-17 18:48                   ` Dave Hansen
2024-05-17 18:58                     ` Kalle Valo
2024-05-17 19:08                       ` Rafael J. Wysocki
2024-05-17 19:00                   ` Rafael J. Wysocki
2024-05-22  1:52                     ` Len Brown
2024-05-17 17:26               ` Borislav Petkov
2024-05-17 18:22                 ` Kalle Valo
2024-05-14 16:10           ` Dave Hansen
2024-05-15  7:22             ` Pawan Gupta
2024-05-15  7:44               ` Borislav Petkov
2024-05-15 16:27                 ` Pawan Gupta
2024-05-15 16:47                   ` Kalle Valo
2024-05-16  7:03                     ` Pawan Gupta
2024-05-16 14:25                       ` Pawan Gupta
2024-05-16 14:32                         ` Dave Hansen
2024-05-16 15:41                           ` Pawan Gupta
2024-05-17 17:41                         ` Kalle Valo
2024-05-17 18:31                           ` Pawan Gupta
2024-05-17 17:23                   ` Kalle Valo
2024-05-17 17:19               ` Kalle Valo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o79cjjik.fsf@kernel.org \
    --to=kvalo@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=quic_jjohnson@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).