[PATCH V2 0/3] cpufreq: cppc: Add support for frequency invariance

Linux-ACPI Archive mirror
 help / color / mirror / Atom feed

* [PATCH V2 0/3] cpufreq: cppc: Add support for frequency invariance
@ 2021-06-16  6:48 Viresh Kumar
  2021-06-16  6:48 ` [PATCH V2 3/3] cpufreq: CPPC: " Viresh Kumar
  2021-06-16 10:02 ` [PATCH V2 0/3] cpufreq: cppc: " Vincent Guittot
  0 siblings, 2 replies; 12+ messages in thread
From: Viresh Kumar @ 2021-06-16  6:48 UTC (permalink / raw
  To: Rafael Wysocki, Ionela Voinescu, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman,
	Ingo Molnar, Jonathan Corbet, Juri Lelli, Mel Gorman,
	Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla,
	Vincent Guittot, Viresh Kumar
  Cc: linux-pm, Qian Cai, linux-acpi, linux-doc, linux-kernel,
	Paul E. McKenney

Hello,

Changes since V1:

- Few of the patches migrating users to ->exit() callback are posted separately.

- The CPPC patch was completely reverted and so the support for FIE is again
  added here from scratch.

- The start_cpu() and stop_cpu() interface is reworked a little so stop_cpu() is
  only ever called for a CPU if start_cpu() was called for it earlier.

- A new patch to implement RCU locking in arch_topology core to avoid some
  races.

- Some cleanup and very clear/separate paths for FIE in cppc driver now.


-------------------------8<-------------------------

CPPC cpufreq driver is used for ARM servers and this patch series tries to
provide counter-based frequency invariance support for them in the absence for
architecture specific counters (like AMUs).

This was reverted earlier for the 5.13 kernel after Qian Cai reported kernel
oops during suspend/resume.

This is based of pm/linux-next + a cleanup patchset:

https://lore.kernel.org/linux-pm/cover.1623825358.git.viresh.kumar@linaro.org/

All the patches are pushed here together for people to run.

https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc

This is tested on my Hikey platform (without the actual read/write to
performance counters), with this script for over an hour:

while true; do
    for i in `seq 1 7`;
    do
        echo 0 > /sys/devices/system/cpu/cpu$i/online;
    done;

    for i in `seq 1 7`;
    do
        echo 1 > /sys/devices/system/cpu/cpu$i/online;
    done;
done


Vincent will be giving this patchset a try on ThunderX2.

Meanwhile it is up for review. Ideally I would like to get this merged for 5.14,
but lets see how it goes.

Thanks.

--
Viresh

Viresh Kumar (3):
  cpufreq: Add start_cpu() and stop_cpu() callbacks
  arch_topology: Avoid use-after-free for scale_freq_data
  cpufreq: CPPC: Add support for frequency invariance

 Documentation/cpu-freq/cpu-drivers.rst |   7 +-
 drivers/base/arch_topology.c           |  27 ++-
 drivers/cpufreq/Kconfig.arm            |  10 ++
 drivers/cpufreq/cppc_cpufreq.c         | 232 +++++++++++++++++++++++--
 drivers/cpufreq/cpufreq.c              |  19 +-
 include/linux/arch_topology.h          |   1 +
 include/linux/cpufreq.h                |   5 +-
 kernel/sched/core.c                    |   1 +
 8 files changed, 272 insertions(+), 30 deletions(-)

-- 
2.31.1.272.g89b43f80a514


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance
  2021-06-16  6:48 [PATCH V2 0/3] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
@ 2021-06-16  6:48 ` Viresh Kumar
  2021-06-16 12:48   ` Ionela Voinescu
  2021-06-16 10:02 ` [PATCH V2 0/3] cpufreq: cppc: " Vincent Guittot
  1 sibling, 1 reply; 12+ messages in thread
From: Viresh Kumar @ 2021-06-16  6:48 UTC (permalink / raw
  To: Rafael Wysocki, Ionela Voinescu, Viresh Kumar, Sudeep Holla,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira
  Cc: linux-pm, Qian Cai, linux-acpi, linux-kernel

The Frequency Invariance Engine (FIE) is providing a frequency scaling
correction factor that helps achieve more accurate load-tracking.

Normally, this scaling factor can be obtained directly with the help of
the cpufreq drivers as they know the exact frequency the hardware is
running at. But that isn't the case for CPPC cpufreq driver.

Another way of obtaining that is using the arch specific counter
support, which is already present in kernel, but that hardware is
optional for platforms.

This patch updates the CPPC driver to register itself with the topology
core to provide its own implementation (cppc_scale_freq_tick()) of
topology_scale_freq_tick() which gets called by the scheduler on every
tick. Note that the arch specific counters have higher priority than
CPPC counters, if available, though the CPPC driver doesn't need to have
any special handling for that.

On an invocation of cppc_scale_freq_tick(), we schedule an irq work
(since we reach here from hard-irq context), which then schedules a
normal work item and cppc_scale_freq_workfn() updates the per_cpu
arch_freq_scale variable based on the counter updates since the last
tick.

To allow platforms to disable this CPPC counter-based frequency
invariance support, this is all done under CONFIG_ACPI_CPPC_CPUFREQ_FIE,
which is enabled by default.

This also exports sched_setattr_nocheck() as the CPPC driver can be
built as a module.

Cc: linux-acpi@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/Kconfig.arm    |  10 ++
 drivers/cpufreq/cppc_cpufreq.c | 232 ++++++++++++++++++++++++++++++---
 include/linux/arch_topology.h  |   1 +
 kernel/sched/core.c            |   1 +
 4 files changed, 227 insertions(+), 17 deletions(-)

diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index e65e0a43be64..a5c5f70acfc9 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -19,6 +19,16 @@ config ACPI_CPPC_CPUFREQ
 
 	  If in doubt, say N.
 
+config ACPI_CPPC_CPUFREQ_FIE
+	bool "Frequency Invariance support for CPPC cpufreq driver"
+	depends on ACPI_CPPC_CPUFREQ && GENERIC_ARCH_TOPOLOGY
+	default y
+	help
+	  This extends frequency invariance support in the CPPC cpufreq driver,
+	  by using CPPC delivered and reference performance counters.
+
+	  If in doubt, say N.
+
 config ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM
 	tristate "Allwinner nvmem based SUN50I CPUFreq driver"
 	depends on ARCH_SUNXI
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index be4f62e2c5f1..63e4cff46f95 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -10,14 +10,18 @@
 
 #define pr_fmt(fmt)	"CPPC Cpufreq:"	fmt
 
+#include <linux/arch_topology.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/cpu.h>
 #include <linux/cpufreq.h>
 #include <linux/dmi.h>
+#include <linux/irq_work.h>
+#include <linux/kthread.h>
 #include <linux/time.h>
 #include <linux/vmalloc.h>
+#include <uapi/linux/sched/types.h>
 
 #include <asm/unaligned.h>
 
@@ -57,6 +61,183 @@ static struct cppc_workaround_oem_info wa_info[] = {
 	}
 };
 
+#ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
+
+/* Frequency invariance support */
+struct cppc_freq_invariance {
+	int cpu;
+	struct irq_work irq_work;
+	struct kthread_work work;
+	struct cppc_perf_fb_ctrs prev_perf_fb_ctrs;
+	struct cppc_cpudata *cpu_data;
+};
+
+static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
+static struct kthread_worker *kworker_fie;
+
+static struct cpufreq_driver cppc_cpufreq_driver;
+static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpu);
+static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
+				 struct cppc_perf_fb_ctrs *fb_ctrs_t0,
+				 struct cppc_perf_fb_ctrs *fb_ctrs_t1);
+
+/**
+ * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance
+ * @work: The work item.
+ *
+ * The CPPC driver register itself with the topology core to provide its own
+ * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which
+ * gets called by the scheduler on every tick.
+ *
+ * Note that the arch specific counters have higher priority than CPPC counters,
+ * if available, though the CPPC driver doesn't need to have any special
+ * handling for that.
+ *
+ * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we
+ * reach here from hard-irq context), which then schedules a normal work item
+ * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable
+ * based on the counter updates since the last tick.
+ */
+static void cppc_scale_freq_workfn(struct kthread_work *work)
+{
+	struct cppc_freq_invariance *cppc_fi;
+	struct cppc_perf_fb_ctrs fb_ctrs = {0};
+	struct cppc_cpudata *cpu_data;
+	unsigned long local_freq_scale;
+	u64 perf;
+
+	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
+	cpu_data = cppc_fi->cpu_data;
+
+	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
+		pr_warn("%s: failed to read perf counters\n", __func__);
+		return;
+	}
+
+	perf = cppc_perf_from_fbctrs(cpu_data, &cppc_fi->prev_perf_fb_ctrs,
+				     &fb_ctrs);
+	cppc_fi->prev_perf_fb_ctrs = fb_ctrs;
+
+	perf <<= SCHED_CAPACITY_SHIFT;
+	local_freq_scale = div64_u64(perf, cpu_data->perf_caps.highest_perf);
+	if (WARN_ON(local_freq_scale > 1024))
+		local_freq_scale = 1024;
+
+	per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale;
+}
+
+static void cppc_irq_work(struct irq_work *irq_work)
+{
+	struct cppc_freq_invariance *cppc_fi;
+
+	cppc_fi = container_of(irq_work, struct cppc_freq_invariance, irq_work);
+	kthread_queue_work(kworker_fie, &cppc_fi->work);
+}
+
+static void cppc_scale_freq_tick(void)
+{
+	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id());
+
+	/*
+	 * cppc_get_perf_ctrs() can potentially sleep, call that from the right
+	 * context.
+	 */
+	irq_work_queue(&cppc_fi->irq_work);
+}
+
+static struct scale_freq_data cppc_sftd = {
+	.source = SCALE_FREQ_SOURCE_CPPC,
+	.set_freq_scale = cppc_scale_freq_tick,
+};
+
+static void cppc_cpufreq_start_cpu(struct cpufreq_policy *policy,
+				   unsigned int cpu)
+{
+	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, cpu);
+	int ret;
+
+	cppc_fi->cpu = cpu;
+	cppc_fi->cpu_data = policy->driver_data;
+	kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
+	init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
+
+	ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
+	if (ret) {
+		pr_warn("%s: failed to read perf counters: %d\n", __func__,
+			ret);
+		return;
+	}
+
+	/* Register for freq-invariance */
+	topology_set_scale_freq_source(&cppc_sftd, cpumask_of(cpu));
+}
+
+static void cppc_cpufreq_stop_cpu(struct cpufreq_policy *policy,
+				  unsigned int cpu)
+{
+	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, cpu);
+
+	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, cpumask_of(cpu));
+
+	irq_work_sync(&cppc_fi->irq_work);
+	kthread_cancel_work_sync(&cppc_fi->work);
+}
+
+static void __init cppc_freq_invariance_init(void)
+{
+	struct sched_attr attr = {
+		.size		= sizeof(struct sched_attr),
+		.sched_policy	= SCHED_DEADLINE,
+		.sched_nice	= 0,
+		.sched_priority	= 0,
+		/*
+		 * Fake (unused) bandwidth; workaround to "fix"
+		 * priority inheritance.
+		 */
+		.sched_runtime	= 1000000,
+		.sched_deadline = 10000000,
+		.sched_period	= 10000000,
+	};
+	int ret;
+
+	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+		return;
+
+	kworker_fie = kthread_create_worker(0, "cppc_fie");
+	if (IS_ERR(kworker_fie))
+		return;
+
+	ret = sched_setattr_nocheck(kworker_fie->task, &attr);
+	if (ret) {
+		pr_warn("%s: failed to set SCHED_DEADLINE: %d\n", __func__,
+			ret);
+		kthread_destroy_worker(kworker_fie);
+		return;
+	}
+
+	cppc_cpufreq_driver.start_cpu = cppc_cpufreq_start_cpu;
+	cppc_cpufreq_driver.stop_cpu = cppc_cpufreq_stop_cpu;
+}
+
+static void cppc_freq_invariance_exit(void)
+{
+	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+		return;
+
+	kthread_destroy_worker(kworker_fie);
+	kworker_fie = NULL;
+}
+
+#else
+static inline void cppc_freq_invariance_init(void)
+{
+}
+
+static inline void cppc_freq_invariance_exit(void)
+{
+}
+#endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */
+
 /* Callback function used to retrieve the max frequency from DMI */
 static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
 {
@@ -362,26 +543,35 @@ static inline u64 get_delta(u64 t1, u64 t0)
 	return (u32)t1 - (u32)t0;
 }
 
-static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
-				     struct cppc_perf_fb_ctrs fb_ctrs_t0,
-				     struct cppc_perf_fb_ctrs fb_ctrs_t1)
+static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
+				 struct cppc_perf_fb_ctrs *fb_ctrs_t0,
+				 struct cppc_perf_fb_ctrs *fb_ctrs_t1)
 {
 	u64 delta_reference, delta_delivered;
-	u64 reference_perf, delivered_perf;
+	u64 reference_perf;
 
-	reference_perf = fb_ctrs_t0.reference_perf;
+	reference_perf = fb_ctrs_t0->reference_perf;
 
-	delta_reference = get_delta(fb_ctrs_t1.reference,
-				    fb_ctrs_t0.reference);
-	delta_delivered = get_delta(fb_ctrs_t1.delivered,
-				    fb_ctrs_t0.delivered);
+	delta_reference = get_delta(fb_ctrs_t1->reference,
+				    fb_ctrs_t0->reference);
+	delta_delivered = get_delta(fb_ctrs_t1->delivered,
+				    fb_ctrs_t0->delivered);
 
-	/* Check to avoid divide-by zero */
-	if (delta_reference || delta_delivered)
-		delivered_perf = (reference_perf * delta_delivered) /
-					delta_reference;
-	else
-		delivered_perf = cpu_data->perf_ctrls.desired_perf;
+	/* Check to avoid divide-by zero and invalid delivered_perf */
+	if (!delta_reference || !delta_delivered)
+		return cpu_data->perf_ctrls.desired_perf;
+
+	return (reference_perf * delta_delivered) / delta_reference;
+}
+
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
+				     struct cppc_perf_fb_ctrs *fb_ctrs_t0,
+				     struct cppc_perf_fb_ctrs *fb_ctrs_t1)
+{
+	u64 delivered_perf;
+
+	delivered_perf = cppc_perf_from_fbctrs(cpu_data, fb_ctrs_t0,
+					       fb_ctrs_t1);
 
 	return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
 }
@@ -405,7 +595,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
 	if (ret)
 		return ret;
 
-	return cppc_get_rate_from_fbctrs(cpu_data, fb_ctrs_t0, fb_ctrs_t1);
+	return cppc_get_rate_from_fbctrs(cpu_data, &fb_ctrs_t0, &fb_ctrs_t1);
 }
 
 static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
@@ -506,14 +696,21 @@ static void cppc_check_hisi_workaround(void)
 
 static int __init cppc_cpufreq_init(void)
 {
+	int ret;
+
 	if ((acpi_disabled) || !acpi_cpc_valid())
 		return -ENODEV;
 
 	INIT_LIST_HEAD(&cpu_data_list);
 
 	cppc_check_hisi_workaround();
+	cppc_freq_invariance_init();
 
-	return cpufreq_register_driver(&cppc_cpufreq_driver);
+	ret = cpufreq_register_driver(&cppc_cpufreq_driver);
+	if (ret)
+		cppc_freq_invariance_exit();
+
+	return ret;
 }
 
 static inline void free_cpu_data(void)
@@ -531,6 +728,7 @@ static inline void free_cpu_data(void)
 static void __exit cppc_cpufreq_exit(void)
 {
 	cpufreq_unregister_driver(&cppc_cpufreq_driver);
+	cppc_freq_invariance_exit();
 
 	free_cpu_data();
 }
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 11e555cfaecb..f180240dc95f 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -37,6 +37,7 @@ bool topology_scale_freq_invariant(void);
 enum scale_freq_source {
 	SCALE_FREQ_SOURCE_CPUFREQ = 0,
 	SCALE_FREQ_SOURCE_ARCH,
+	SCALE_FREQ_SOURCE_CPPC,
 };
 
 struct scale_freq_data {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4ca80df205ce..5226cc26a095 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6389,6 +6389,7 @@ int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)
 {
 	return __sched_setscheduler(p, attr, false, true);
 }
+EXPORT_SYMBOL_GPL(sched_setattr_nocheck);
 
 /**
  * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
-- 
2.31.1.272.g89b43f80a514


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 0/3] cpufreq: cppc: Add support for frequency invariance
  2021-06-16  6:48 [PATCH V2 0/3] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
  2021-06-16  6:48 ` [PATCH V2 3/3] cpufreq: CPPC: " Viresh Kumar
@ 2021-06-16 10:02 ` Vincent Guittot
  2021-06-16 11:54   ` Viresh Kumar
  1 sibling, 1 reply; 12+ messages in thread
From: Vincent Guittot @ 2021-06-16 10:02 UTC (permalink / raw
  To: Viresh Kumar
  Cc: Rafael Wysocki, Ionela Voinescu, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman,
	Ingo Molnar, Jonathan Corbet, Juri Lelli, Mel Gorman,
	Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla,
	open list:THERMAL, Qian Cai, ACPI Devel Maling List,
	Linux Doc Mailing List, linux-kernel, Paul E. McKenney

On Wed, 16 Jun 2021 at 08:48, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> Hello,
>
> Changes since V1:
>
> - Few of the patches migrating users to ->exit() callback are posted separately.
>
> - The CPPC patch was completely reverted and so the support for FIE is again
>   added here from scratch.
>
> - The start_cpu() and stop_cpu() interface is reworked a little so stop_cpu() is
>   only ever called for a CPU if start_cpu() was called for it earlier.
>
> - A new patch to implement RCU locking in arch_topology core to avoid some
>   races.
>
> - Some cleanup and very clear/separate paths for FIE in cppc driver now.
>
>
> -------------------------8<-------------------------
>
> CPPC cpufreq driver is used for ARM servers and this patch series tries to
> provide counter-based frequency invariance support for them in the absence for
> architecture specific counters (like AMUs).
>
> This was reverted earlier for the 5.13 kernel after Qian Cai reported kernel
> oops during suspend/resume.
>
> This is based of pm/linux-next + a cleanup patchset:
>
> https://lore.kernel.org/linux-pm/cover.1623825358.git.viresh.kumar@linaro.org/
>
> All the patches are pushed here together for people to run.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc
>
> This is tested on my Hikey platform (without the actual read/write to
> performance counters), with this script for over an hour:
>
> while true; do
>     for i in `seq 1 7`;
>     do
>         echo 0 > /sys/devices/system/cpu/cpu$i/online;
>     done;
>
>     for i in `seq 1 7`;
>     do
>         echo 1 > /sys/devices/system/cpu/cpu$i/online;
>     done;
> done
>
>
> Vincent will be giving this patchset a try on ThunderX2.

I tested your branch and got the following while booting:

[   24.454543] zswap: loaded using pool lzo/zbud
[   24.454753] pstore: Using crash dump compression: deflate
[   24.454776] AppArmor: AppArmor sha1 policy hashing enabled
[   24.454784] ima: No TPM chip found, activating TPM-bypass!
[   24.454789] ima: Allocated hash algorithm: sha256
[   24.454801] ima: No architecture policies found
[   24.455750] pcieport 0000:0f:00.0: Adding to iommu group 0
[   24.893888] ------------[ cut here ]------------
[   24.893891] WARNING: CPU: 95 PID: 1442 at
drivers/cpufreq/cppc_cpufreq.c:123 cppc_scale_freq_workfn+0xc8/0xf8
[   24.893901] Modules linked in:
[   24.893906] CPU: 95 PID: 1442 Comm: cppc_fie Not tainted 5.13.0-rc6+ #359
[   24.893910] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS
0ACKL026 03/19/2019
[   24.893912] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
[   24.893915] pc : cppc_scale_freq_workfn+0xc8/0xf8
[   24.893918] lr : cppc_scale_freq_workfn+0x5c/0xf8
[   24.893921] sp : ffff80003727bd90
[   24.893922] x29: ffff80003727bd90 x28: 0000000000000000 x27: ffff800010ec2000
[   24.893928] x26: ffff800010ec2000 x25: ffff8000107c3d90 x24: 0000000000000001
[   24.893932] x23: ffff000816244880 x22: ffff8000113f9000 x21: ffff009f825a0a80
[   24.893935] x20: ffff009efc394220 x19: ffff800011199000 x18: 000000000000001b
[   24.893939] x17: 0000000000000007 x16: 0000000000000001 x15: 00000000000000bf
[   24.893943] x14: 0000000000000016 x13: 000000000000029b x12: 0000000000000016
[   24.893946] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff009efc6958c0
[   24.893950] x8 : ffff009efc394248 x7 : 0000000002bde780 x6 : 00000000ffffffff
[   24.893954] x5 : 00000000916e502a x4 : 00000000d9730e80 x3 : ffffffffffffffff
[   24.893958] x2 : 00000000001e8480 x1 : 00000000002625a0 x0 : 0000000000000401
[   24.893962] Call trace:
[   24.893964]  cppc_scale_freq_workfn+0xc8/0xf8
[   24.893967]  kthread_worker_fn+0x110/0x318
[   24.893971]  kthread+0xf4/0x120
[   24.893973]  ret_from_fork+0x10/0x18
[   24.893977] ---[ end trace ea6dbaf832bce3e4 ]---


>
> Meanwhile it is up for review. Ideally I would like to get this merged for 5.14,
> but lets see how it goes.
>
> Thanks.
>
> --
> Viresh
>
> Viresh Kumar (3):
>   cpufreq: Add start_cpu() and stop_cpu() callbacks
>   arch_topology: Avoid use-after-free for scale_freq_data
>   cpufreq: CPPC: Add support for frequency invariance
>
>  Documentation/cpu-freq/cpu-drivers.rst |   7 +-
>  drivers/base/arch_topology.c           |  27 ++-
>  drivers/cpufreq/Kconfig.arm            |  10 ++
>  drivers/cpufreq/cppc_cpufreq.c         | 232 +++++++++++++++++++++++--
>  drivers/cpufreq/cpufreq.c              |  19 +-
>  include/linux/arch_topology.h          |   1 +
>  include/linux/cpufreq.h                |   5 +-
>  kernel/sched/core.c                    |   1 +
>  8 files changed, 272 insertions(+), 30 deletions(-)
>
> --
> 2.31.1.272.g89b43f80a514
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 0/3] cpufreq: cppc: Add support for frequency invariance
  2021-06-16 10:02 ` [PATCH V2 0/3] cpufreq: cppc: " Vincent Guittot
@ 2021-06-16 11:54   ` Viresh Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Viresh Kumar @ 2021-06-16 11:54 UTC (permalink / raw
  To: Vincent Guittot
  Cc: Rafael Wysocki, Ionela Voinescu, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman,
	Ingo Molnar, Jonathan Corbet, Juri Lelli, Mel Gorman,
	Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla,
	open list:THERMAL, Qian Cai, ACPI Devel Maling List,
	Linux Doc Mailing List, linux-kernel, Paul E. McKenney

On 16-06-21, 12:02, Vincent Guittot wrote:
> I tested your branch and got the following while booting:
> 
> [   24.454543] zswap: loaded using pool lzo/zbud
> [   24.454753] pstore: Using crash dump compression: deflate
> [   24.454776] AppArmor: AppArmor sha1 policy hashing enabled
> [   24.454784] ima: No TPM chip found, activating TPM-bypass!
> [   24.454789] ima: Allocated hash algorithm: sha256
> [   24.454801] ima: No architecture policies found
> [   24.455750] pcieport 0000:0f:00.0: Adding to iommu group 0
> [   24.893888] ------------[ cut here ]------------
> [   24.893891] WARNING: CPU: 95 PID: 1442 at
> drivers/cpufreq/cppc_cpufreq.c:123 cppc_scale_freq_workfn+0xc8/0xf8
> [   24.893901] Modules linked in:
> [   24.893906] CPU: 95 PID: 1442 Comm: cppc_fie Not tainted 5.13.0-rc6+ #359
> [   24.893910] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS
> 0ACKL026 03/19/2019
> [   24.893912] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
> [   24.893915] pc : cppc_scale_freq_workfn+0xc8/0xf8
> [   24.893918] lr : cppc_scale_freq_workfn+0x5c/0xf8
> [   24.893921] sp : ffff80003727bd90
> [   24.893922] x29: ffff80003727bd90 x28: 0000000000000000 x27: ffff800010ec2000
> [   24.893928] x26: ffff800010ec2000 x25: ffff8000107c3d90 x24: 0000000000000001
> [   24.893932] x23: ffff000816244880 x22: ffff8000113f9000 x21: ffff009f825a0a80
> [   24.893935] x20: ffff009efc394220 x19: ffff800011199000 x18: 000000000000001b
> [   24.893939] x17: 0000000000000007 x16: 0000000000000001 x15: 00000000000000bf
> [   24.893943] x14: 0000000000000016 x13: 000000000000029b x12: 0000000000000016
> [   24.893946] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff009efc6958c0
> [   24.893950] x8 : ffff009efc394248 x7 : 0000000002bde780 x6 : 00000000ffffffff
> [   24.893954] x5 : 00000000916e502a x4 : 00000000d9730e80 x3 : ffffffffffffffff
> [   24.893958] x2 : 00000000001e8480 x1 : 00000000002625a0 x0 : 0000000000000401
> [   24.893962] Call trace:
> [   24.893964]  cppc_scale_freq_workfn+0xc8/0xf8
> [   24.893967]  kthread_worker_fn+0x110/0x318
> [   24.893971]  kthread+0xf4/0x120
> [   24.893973]  ret_from_fork+0x10/0x18
> [   24.893977] ---[ end trace ea6dbaf832bce3e4 ]---

Thanks Vincent.

This is triggering from cppc_scale_freq_workfn():

        if (WARN_ON(local_freq_scale > 1024))

Looks like there is something fishy about the perf calculations here
after reading the counters, we tried to scale that in the range 0-1024
and it came larger than that.

Will keep you posted.

-- 
viresh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance
  2021-06-16  6:48 ` [PATCH V2 3/3] cpufreq: CPPC: " Viresh Kumar
@ 2021-06-16 12:48   ` Ionela Voinescu
  2021-06-17  3:24     ` Viresh Kumar
  0 siblings, 1 reply; 12+ messages in thread
From: Ionela Voinescu @ 2021-06-16 12:48 UTC (permalink / raw
  To: Viresh Kumar
  Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm,
	Qian Cai, linux-acpi, linux-kernel

Hi,

I was looking forward to the complete removal of stop_cpu() :).

On Wednesday 16 Jun 2021 at 12:18:09 (+0530), Viresh Kumar wrote:
> The Frequency Invariance Engine (FIE) is providing a frequency scaling
> correction factor that helps achieve more accurate load-tracking.
> 
> Normally, this scaling factor can be obtained directly with the help of
> the cpufreq drivers as they know the exact frequency the hardware is
> running at. But that isn't the case for CPPC cpufreq driver.
> 
> Another way of obtaining that is using the arch specific counter
> support, which is already present in kernel, but that hardware is
> optional for platforms.
> 
> This patch updates the CPPC driver to register itself with the topology
> core to provide its own implementation (cppc_scale_freq_tick()) of
> topology_scale_freq_tick() which gets called by the scheduler on every
> tick. Note that the arch specific counters have higher priority than
> CPPC counters, if available, though the CPPC driver doesn't need to have
> any special handling for that.
> 
> On an invocation of cppc_scale_freq_tick(), we schedule an irq work
> (since we reach here from hard-irq context), which then schedules a
> normal work item and cppc_scale_freq_workfn() updates the per_cpu
> arch_freq_scale variable based on the counter updates since the last
> tick.
> 
> To allow platforms to disable this CPPC counter-based frequency
> invariance support, this is all done under CONFIG_ACPI_CPPC_CPUFREQ_FIE,
> which is enabled by default.
> 
> This also exports sched_setattr_nocheck() as the CPPC driver can be
> built as a module.
> 
> Cc: linux-acpi@vger.kernel.org
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/Kconfig.arm    |  10 ++
>  drivers/cpufreq/cppc_cpufreq.c | 232 ++++++++++++++++++++++++++++++---
>  include/linux/arch_topology.h  |   1 +
>  kernel/sched/core.c            |   1 +
>  4 files changed, 227 insertions(+), 17 deletions(-)
> 
[..]
> +static void cppc_cpufreq_start_cpu(struct cpufreq_policy *policy,
> +				   unsigned int cpu)
> +{
> +	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> +	int ret;
> +
> +	cppc_fi->cpu = cpu;
> +	cppc_fi->cpu_data = policy->driver_data;
> +	kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
> +	init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
> +
> +	ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
> +	if (ret) {
> +		pr_warn("%s: failed to read perf counters: %d\n", __func__,
> +			ret);
> +		return;
> +	}
> +
> +	/* Register for freq-invariance */
> +	topology_set_scale_freq_source(&cppc_sftd, cpumask_of(cpu));
> +}
> +
> +static void cppc_cpufreq_stop_cpu(struct cpufreq_policy *policy,
> +				  unsigned int cpu)
> +{
> +	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> +
> +	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, cpumask_of(cpu));
> +
> +	irq_work_sync(&cppc_fi->irq_work);
> +	kthread_cancel_work_sync(&cppc_fi->work);
> +}

I'll only comment on this for now as I should know the rest.

Let's assume we don't have these, what happens now is the following:

1. We hotplug out the last CPU in a policy, we call the
   .stop_cpu()/exit() function which will free the cppc_cpudata structure.

   The only vulnerability is if we have a last tick on that last CPU,
   after the above callback was called.

2. When the CPU at 1. gets hotplugged back in, the cppc_fi->cpu_data is
   stale.

We do not have a problem when removing the CPPC cpufreq module as we're
doing cppc_freq_invariance_exit() before unregistering the driver and
freeing the data.

Are 1. and 2 the only problems we have, or have I missed any?

Thanks,
Ionela.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance
  2021-06-16 12:48   ` Ionela Voinescu
@ 2021-06-17  3:24     ` Viresh Kumar
  2021-06-17 10:34       ` Ionela Voinescu
  0 siblings, 1 reply; 12+ messages in thread
From: Viresh Kumar @ 2021-06-17  3:24 UTC (permalink / raw
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm,
	Qian Cai, linux-acpi, linux-kernel

On 16-06-21, 13:48, Ionela Voinescu wrote:
> I was looking forward to the complete removal of stop_cpu() :).

No one wants to remove more code than I do :)

> I'll only comment on this for now as I should know the rest.
> 
> Let's assume we don't have these, what happens now is the following:
> 
> 1. We hotplug out the last CPU in a policy, we call the
>    .stop_cpu()/exit() function which will free the cppc_cpudata structure.
> 
>    The only vulnerability is if we have a last tick on that last CPU,
>    after the above callback was called.
> 
> 2. When the CPU at 1. gets hotplugged back in, the cppc_fi->cpu_data is
>    stale.
> 
> We do not have a problem when removing the CPPC cpufreq module as we're
> doing cppc_freq_invariance_exit() before unregistering the driver and
> freeing the data.
> 
> Are 1. and 2 the only problems we have, or have I missed any?

There is more to it. For simplicity, lets assume a quad-core setup,
with all 4 CPUs sharing the cpufreq policy. And here is what happens
without the new changes:

- On CPPC cpufreq driver insertion, we start 4 kthreads/irq-works (1
  for each CPU as it fires from tick) from the ->init() callback.

- Now lets say we offline CPU3. The CPPC cpufreq driver isn't notified
  by anyone and it hasn't registered itself to hotplug notifier as
  well. So, the irq-work/kthread isn't stopped. This results in the
  issue reported by Qian earlier.

  The very same thing happens with schedutil governor too, which uses
  very similar mechanisms, and the cpufreq core takes care of it there
  by stopping the governor before removing the CPU from policy->cpus
  and starting it again. So there we stop irq-work/kthread for all 4
  CPUs, then start them only for remaining 3.

  I thought about that approach as well, but it was too heavy to stop
  everyone and start again in this case. And so I invented start_cpu()
  and stop_cpu() callbacks.

- In this case, because the CPU is going away, we need to make sure we
  don't queue any more irq-work or kthread to it and this is one of
  the main reasons for adding synchronization in the topology core,
  because we need a hard guarantee here that irq-work won't fire
  again, as the CPU won't be there or will not be in a sane state.

- The same sequence of events is true for the case where the last CPU
  of a policy goes away (not in this example, but lets say quad-core
  setup with separate policies for each CPU).

- Not just the policy, but the CPU may be going away as well.

I hope I was able to clarify a bit here.

-- 
viresh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance
  2021-06-17  3:24     ` Viresh Kumar
@ 2021-06-17 10:34       ` Ionela Voinescu
  2021-06-17 11:19         ` Viresh Kumar
  2021-06-18  7:37         ` Viresh Kumar
  0 siblings, 2 replies; 12+ messages in thread
From: Ionela Voinescu @ 2021-06-17 10:34 UTC (permalink / raw
  To: Viresh Kumar
  Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm,
	Qian Cai, linux-acpi, linux-kernel

Many thanks for the details!

On Thursday 17 Jun 2021 at 08:54:16 (+0530), Viresh Kumar wrote:
> On 16-06-21, 13:48, Ionela Voinescu wrote:
> > I was looking forward to the complete removal of stop_cpu() :).
> 
> No one wants to remove more code than I do :)
> 
> > I'll only comment on this for now as I should know the rest.
> > 
> > Let's assume we don't have these, what happens now is the following:
> > 
> > 1. We hotplug out the last CPU in a policy, we call the
> >    .stop_cpu()/exit() function which will free the cppc_cpudata structure.
> > 
> >    The only vulnerability is if we have a last tick on that last CPU,
> >    after the above callback was called.
> > 
> > 2. When the CPU at 1. gets hotplugged back in, the cppc_fi->cpu_data is
> >    stale.
> > 
> > We do not have a problem when removing the CPPC cpufreq module as we're
> > doing cppc_freq_invariance_exit() before unregistering the driver and
> > freeing the data.
> > 
> > Are 1. and 2 the only problems we have, or have I missed any?
> 
> There is more to it. For simplicity, lets assume a quad-core setup,
> with all 4 CPUs sharing the cpufreq policy. And here is what happens
> without the new changes:
> 
> - On CPPC cpufreq driver insertion, we start 4 kthreads/irq-works (1
>   for each CPU as it fires from tick) from the ->init() callback.
> 
> - Now lets say we offline CPU3. The CPPC cpufreq driver isn't notified
>   by anyone and it hasn't registered itself to hotplug notifier as
>   well. So, the irq-work/kthread isn't stopped. This results in the
>   issue reported by Qian earlier.
> 

I might be missing something, but when you offline a single CPU in a
policy, the worse that can happen is that a last call to
cppc_scale_freq_tick() would have sneaked in before irqs and the tick
are disabled. But even if we have a last call to
cppc_scale_freq_workfn(), the counter read methods would know how to
cope with hotplug, and the cppc_cpudata structure would still be
allocated and have valid desired_perf and highest_perf values.

Worse case, the last scale factor set for the CPU will be meaningless,
but it's already meaningless as the CPU is going down.

When you are referring to the issue reported by Qian I suppose you are
referring to this [1]. I think this is the case where you hotplug the
last CPU in a policy and free cppc_cpudata.

[1] https://lore.kernel.org/linux-pm/41f5195e-0e5f-fdfe-ba37-34e1fd8e4064@quicinc.com/

>   The very same thing happens with schedutil governor too, which uses
>   very similar mechanisms, and the cpufreq core takes care of it there
>   by stopping the governor before removing the CPU from policy->cpus
>   and starting it again. So there we stop irq-work/kthread for all 4
>   CPUs, then start them only for remaining 3.
> 

Things are different for sugov: you have to stop the governor when one
CPU in the policy goes down, and it's normal for sugov not to allow its
hooks to be called while the governor is down so it has to do a full
cleanup when going down and a full bringup when going back up.

The difference for CPPC invariance is that only a CPU can trigger the
work to update its own scale factor, through the tick. No other CPU x
can trigger a scale factor update for CPU y, but x can carry out the
work for CPU y (x can run the cppc_scale_freq_workfn(y)).

So when y goes down, it won't have a tick to queue any irq or kthread
work any longer until it's brought back up. So I believe that the only
cleanup that needs to be done when a CPU goes offline, is to ensure
that the work triggered by that last tick is done safely.

>   I thought about that approach as well, but it was too heavy to stop
>   everyone and start again in this case. And so I invented start_cpu()
>   and stop_cpu() callbacks.
> 

> - In this case, because the CPU is going away, we need to make sure we
>   don't queue any more irq-work or kthread to it and this is one of
>   the main reasons for adding synchronization in the topology core,
>   because we need a hard guarantee here that irq-work won't fire
>   again, as the CPU won't be there or will not be in a sane state.
> 

We can't queue any more work for it as there's no tick for the offline
CPU.

> - The same sequence of events is true for the case where the last CPU
>   of a policy goes away (not in this example, but lets say quad-core
>   setup with separate policies for each CPU).
> 
> - Not just the policy, but the CPU may be going away as well.
> 
> I hope I was able to clarify a bit here.
> 

Thanks! I do agree it is better to be cautious, but I initially wanted to
understand we don't see the problem bigger than it actually is.

Thanks,
Ionela.

P.S. I'll give more thought to the rcu use in the arch_topology driver.
I'm the boring kind that likes to err on the side of caution, so I tend
to agree that it might be good to have even if the current problem could
be solved in this driver.

> -- 
> viresh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance
  2021-06-17 10:34       ` Ionela Voinescu
@ 2021-06-17 11:19         ` Viresh Kumar
  2021-06-17 12:22           ` Peter Zijlstra
  2021-06-18  7:37         ` Viresh Kumar
  1 sibling, 1 reply; 12+ messages in thread
From: Viresh Kumar @ 2021-06-17 11:19 UTC (permalink / raw
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm,
	Qian Cai, linux-acpi, linux-kernel

On 17-06-21, 11:34, Ionela Voinescu wrote:
> I might be missing something, but when you offline a single CPU in a
> policy, the worse that can happen is that a last call to
> cppc_scale_freq_tick() would have sneaked in before irqs and the tick
> are disabled. But even if we have a last call to
> cppc_scale_freq_workfn(), the counter read methods would know how to
> cope with hotplug, and the cppc_cpudata structure would still be
> allocated and have valid desired_perf and highest_perf values.

Okay, I somehow assumed that cppc_scale_freq_workfn() needs to run on the local
CPU, while it can actually land anywhere. My fault.

But the irq-work being queued here is per-cpu and it will get queued on the
local CPU where the tick occurred.

Now I am not sure of what will happen to this irq-work in that case. I tried to
look now and I see that these irq-work items are processed first on tick and
then the tick handler of scheduler is called, so that will again queue the cppc
irq-work.

What happens if this happens along with CPU hotplug ? I am not sure I understand
that. There may or may not be any side effects of this. Lets assume the work
item is left in the queue as is and no tick happens after that as the CPU is
offlined. We are good.

Now if we unload the cpufreq driver at this moment, the driver will call
irq_work_sync(), which may end up in a while loop ? There is no
irq-work-cancel() API.

Peter: Can you help here on this ? Lemme try to explain a bit here:

We are starting an irq-work (in cppc cpufreq driver) from
scheduler_tick()->arch_scale_freq_tick(). What will happen if the driver doesn't
take care of CPU hotplug explicitly and make sure this work isn't queued again
from the next tick.

Is it important for user to make sure it gets rid of the irq-work during hotplug
here ?

> Worse case, the last scale factor set for the CPU will be meaningless,
> but it's already meaningless as the CPU is going down.
> 
> When you are referring to the issue reported by Qian I suppose you are
> referring to this [1]. I think this is the case where you hotplug the
> last CPU in a policy and free cppc_cpudata.
> 
> [1] https://lore.kernel.org/linux-pm/41f5195e-0e5f-fdfe-ba37-34e1fd8e4064@quicinc.com/

Yes, I was talking about this report only, I am not sure now if I understand
what actually happened here :)

Ionela: I have skipped replying to rest of your email, will get back to that
once we have clarification on the above first.

Thanks a lot for your reviews, always on time :)

-- 
viresh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance
  2021-06-17 11:19         ` Viresh Kumar
@ 2021-06-17 12:22           ` Peter Zijlstra
  2021-06-18  3:45             ` Viresh Kumar
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2021-06-17 12:22 UTC (permalink / raw
  To: Viresh Kumar
  Cc: Ionela Voinescu, Rafael Wysocki, Sudeep Holla, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm,
	Qian Cai, linux-acpi, linux-kernel

On Thu, Jun 17, 2021 at 04:49:36PM +0530, Viresh Kumar wrote:
> Peter: Can you help here on this ? Lemme try to explain a bit here:
> 
> We are starting an irq-work (in cppc cpufreq driver) from
> scheduler_tick()->arch_scale_freq_tick(). What will happen if the driver doesn't
> take care of CPU hotplug explicitly and make sure this work isn't queued again
> from the next tick.
> 
> Is it important for user to make sure it gets rid of the irq-work during hotplug
> here ?

irq-work is flushed on hotplug, see smpcfd_dying_cpu().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance
  2021-06-17 12:22           ` Peter Zijlstra
@ 2021-06-18  3:45             ` Viresh Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Viresh Kumar @ 2021-06-18  3:45 UTC (permalink / raw
  To: Peter Zijlstra
  Cc: Ionela Voinescu, Rafael Wysocki, Sudeep Holla, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm,
	Qian Cai, linux-acpi, linux-kernel

On 17-06-21, 14:22, Peter Zijlstra wrote:
> On Thu, Jun 17, 2021 at 04:49:36PM +0530, Viresh Kumar wrote:
> > Peter: Can you help here on this ? Lemme try to explain a bit here:
> > 
> > We are starting an irq-work (in cppc cpufreq driver) from
> > scheduler_tick()->arch_scale_freq_tick(). What will happen if the driver doesn't
> > take care of CPU hotplug explicitly and make sure this work isn't queued again
> > from the next tick.
> > 
> > Is it important for user to make sure it gets rid of the irq-work during hotplug
> > here ?
> 
> irq-work is flushed on hotplug, see smpcfd_dying_cpu().

Thanks Peter.

-- 
viresh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance
  2021-06-17 10:34       ` Ionela Voinescu
  2021-06-17 11:19         ` Viresh Kumar
@ 2021-06-18  7:37         ` Viresh Kumar
  2021-06-18 12:26           ` Ionela Voinescu
  1 sibling, 1 reply; 12+ messages in thread
From: Viresh Kumar @ 2021-06-18  7:37 UTC (permalink / raw
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm,
	Qian Cai, linux-acpi, linux-kernel

On 17-06-21, 11:34, Ionela Voinescu wrote:
> We can't queue any more work for it as there's no tick for the offline
> CPU.

Okay, so as discussed yesterday and confirmed by Peter, CPU hotplug shouldn't be
a problem and we can avoid handling that here. Good.

The patch 1/3 from this series is not required anymore. I will get rid of
stop_cpu() as well.

The second patch stays as is, as I still don't see another way of fixing the
same problem on policy ->exit(). We still need that guarantee from topology
core.

> P.S. I'll give more thought to the rcu use in the arch_topology driver.
> I'm the boring kind that likes to err on the side of caution, so I tend
> to agree that it might be good to have even if the current problem could
> be solved in this driver.

Before I resend the series again, maybe it is better to align on the idea here
itself first (as I need to fix some existing potential memleaks in policy
->init() first). So here is the new diff, looks okay now ?

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index be4f62e2c5f1..3e9070f107c5 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -10,14 +10,18 @@
 
 #define pr_fmt(fmt)	"CPPC Cpufreq:"	fmt
 
+#include <linux/arch_topology.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/cpu.h>
 #include <linux/cpufreq.h>
 #include <linux/dmi.h>
+#include <linux/irq_work.h>
+#include <linux/kthread.h>
 #include <linux/time.h>
 #include <linux/vmalloc.h>
+#include <uapi/linux/sched/types.h>
 
 #include <asm/unaligned.h>
 
@@ -57,6 +61,214 @@ static struct cppc_workaround_oem_info wa_info[] = {
 	}
 };
 
+#ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
+
+/* Frequency invariance support */
+struct cppc_freq_invariance {
+	int cpu;
+	struct irq_work irq_work;
+	struct kthread_work work;
+	struct cppc_perf_fb_ctrs prev_perf_fb_ctrs;
+	struct cppc_cpudata *cpu_data;
+};
+
+static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
+static struct kthread_worker *kworker_fie;
+
+static struct cpufreq_driver cppc_cpufreq_driver;
+static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpu);
+static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
+				 struct cppc_perf_fb_ctrs *fb_ctrs_t0,
+				 struct cppc_perf_fb_ctrs *fb_ctrs_t1);
+
+/**
+ * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance
+ * @work: The work item.
+ *
+ * The CPPC driver register itself with the topology core to provide its own
+ * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which
+ * gets called by the scheduler on every tick.
+ *
+ * Note that the arch specific counters have higher priority than CPPC counters,
+ * if available, though the CPPC driver doesn't need to have any special
+ * handling for that.
+ *
+ * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we
+ * reach here from hard-irq context), which then schedules a normal work item
+ * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable
+ * based on the counter updates since the last tick.
+ */
+static void cppc_scale_freq_workfn(struct kthread_work *work)
+{
+	struct cppc_freq_invariance *cppc_fi;
+	struct cppc_perf_fb_ctrs fb_ctrs = {0};
+	struct cppc_cpudata *cpu_data;
+	unsigned long local_freq_scale;
+	u64 perf;
+
+	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
+	cpu_data = cppc_fi->cpu_data;
+
+	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
+		pr_warn("%s: failed to read perf counters\n", __func__);
+		return;
+	}
+
+	perf = cppc_perf_from_fbctrs(cpu_data, &cppc_fi->prev_perf_fb_ctrs,
+				     &fb_ctrs);
+	cppc_fi->prev_perf_fb_ctrs = fb_ctrs;
+
+	perf <<= SCHED_CAPACITY_SHIFT;
+	local_freq_scale = div64_u64(perf, cpu_data->perf_caps.highest_perf);
+
+	/* This can happen due to counter overflows */
+	if (unlikely(local_freq_scale > 1024))
+		local_freq_scale = 1024;
+
+	per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale;
+}
+
+static void cppc_irq_work(struct irq_work *irq_work)
+{
+	struct cppc_freq_invariance *cppc_fi;
+
+	cppc_fi = container_of(irq_work, struct cppc_freq_invariance, irq_work);
+	kthread_queue_work(kworker_fie, &cppc_fi->work);
+}
+
+static void cppc_scale_freq_tick(void)
+{
+	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id());
+
+	/*
+	 * cppc_get_perf_ctrs() can potentially sleep, call that from the right
+	 * context.
+	 */
+	irq_work_queue(&cppc_fi->irq_work);
+}
+
+static struct scale_freq_data cppc_sftd = {
+	.source = SCALE_FREQ_SOURCE_CPPC,
+	.set_freq_scale = cppc_scale_freq_tick,
+};
+
+static int cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
+{
+	struct cppc_freq_invariance *cppc_fi;
+	int cpu, ret;
+
+	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+		return 0;
+
+	for_each_cpu(cpu, policy->cpus) {
+		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
+		cppc_fi->cpu = cpu;
+		cppc_fi->cpu_data = policy->driver_data;
+		kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
+		init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
+
+		ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
+		if (ret) {
+			pr_warn("%s: failed to read perf counters for cpu:%d: %d\n",
+				__func__, cpu, ret);
+
+			/* Avoid failing driver's probe because of FIE */
+			return 0;
+		}
+	}
+
+	/* Register for freq-invariance */
+	topology_set_scale_freq_source(&cppc_sftd, policy->cpus);
+	return 0;
+}
+
+/*
+ * We free all the resources on policy's removal and not on CPU removal as the
+ * irq-work are per-cpu and the hotplug core takes care of flushing the pending
+ * irq-works (hint: smpcfd_dying_cpu()) on CPU hotplug. Even if the kthread-work
+ * fires on another CPU after the concerned CPU is removed, it won't harm.
+ *
+ * We just need to make sure to remove them all on policy->exit().
+ */
+static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
+{
+	struct cppc_freq_invariance *cppc_fi;
+	int cpu;
+
+	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+		return;
+
+	/* policy->cpus will be empty here, use related_cpus instead */
+	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
+
+	for_each_cpu(cpu, policy->related_cpus) {
+		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
+		irq_work_sync(&cppc_fi->irq_work);
+		kthread_cancel_work_sync(&cppc_fi->work);
+	}
+}
+
+static void __init cppc_freq_invariance_init(void)
+{
+	struct sched_attr attr = {
+		.size		= sizeof(struct sched_attr),
+		.sched_policy	= SCHED_DEADLINE,
+		.sched_nice	= 0,
+		.sched_priority	= 0,
+		/*
+		 * Fake (unused) bandwidth; workaround to "fix"
+		 * priority inheritance.
+		 */
+		.sched_runtime	= 1000000,
+		.sched_deadline = 10000000,
+		.sched_period	= 10000000,
+	};
+	int ret;
+
+	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+		return;
+
+	kworker_fie = kthread_create_worker(0, "cppc_fie");
+	if (IS_ERR(kworker_fie))
+		return;
+
+	ret = sched_setattr_nocheck(kworker_fie->task, &attr);
+	if (ret) {
+		pr_warn("%s: failed to set SCHED_DEADLINE: %d\n", __func__,
+			ret);
+		kthread_destroy_worker(kworker_fie);
+		return;
+	}
+}
+
+static void cppc_freq_invariance_exit(void)
+{
+	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+		return;
+
+	kthread_destroy_worker(kworker_fie);
+	kworker_fie = NULL;
+}
+
+#else
+static inline int cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
+{
+	return 0;
+}
+
+static inline void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
+{
+}
+
+static inline void cppc_freq_invariance_init(void)
+{
+}
+
+static inline void cppc_freq_invariance_exit(void)
+{
+}
+#endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */
+
 /* Callback function used to retrieve the max frequency from DMI */
 static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
 {
@@ -324,11 +536,13 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	cpu_data->perf_ctrls.desired_perf =  caps->highest_perf;
 
 	ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
-	if (ret)
+	if (ret) {
 		pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
 			 caps->highest_perf, cpu, ret);
+		return ret;
+	}
 
-	return ret;
+	return cppc_cpufreq_cpu_fie_init(policy);
 }
 
 static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
@@ -338,6 +552,8 @@ static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 	unsigned int cpu = policy->cpu;
 	int ret;
 
+	cppc_cpufreq_cpu_fie_exit(policy);
+
 	cpu_data->perf_ctrls.desired_perf = caps->lowest_perf;
 
 	ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
@@ -362,26 +578,35 @@ static inline u64 get_delta(u64 t1, u64 t0)
 	return (u32)t1 - (u32)t0;
 }
 
-static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
-				     struct cppc_perf_fb_ctrs fb_ctrs_t0,
-				     struct cppc_perf_fb_ctrs fb_ctrs_t1)
+static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
+				 struct cppc_perf_fb_ctrs *fb_ctrs_t0,
+				 struct cppc_perf_fb_ctrs *fb_ctrs_t1)
 {
 	u64 delta_reference, delta_delivered;
-	u64 reference_perf, delivered_perf;
+	u64 reference_perf;
 
-	reference_perf = fb_ctrs_t0.reference_perf;
+	reference_perf = fb_ctrs_t0->reference_perf;
 
-	delta_reference = get_delta(fb_ctrs_t1.reference,
-				    fb_ctrs_t0.reference);
-	delta_delivered = get_delta(fb_ctrs_t1.delivered,
-				    fb_ctrs_t0.delivered);
+	delta_reference = get_delta(fb_ctrs_t1->reference,
+				    fb_ctrs_t0->reference);
+	delta_delivered = get_delta(fb_ctrs_t1->delivered,
+				    fb_ctrs_t0->delivered);
 
-	/* Check to avoid divide-by zero */
-	if (delta_reference || delta_delivered)
-		delivered_perf = (reference_perf * delta_delivered) /
-					delta_reference;
-	else
-		delivered_perf = cpu_data->perf_ctrls.desired_perf;
+	/* Check to avoid divide-by zero and invalid delivered_perf */
+	if (!delta_reference || !delta_delivered)
+		return cpu_data->perf_ctrls.desired_perf;
+
+	return (reference_perf * delta_delivered) / delta_reference;
+}
+
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
+				     struct cppc_perf_fb_ctrs *fb_ctrs_t0,
+				     struct cppc_perf_fb_ctrs *fb_ctrs_t1)
+{
+	u64 delivered_perf;
+
+	delivered_perf = cppc_perf_from_fbctrs(cpu_data, fb_ctrs_t0,
+					       fb_ctrs_t1);
 
 	return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
 }
@@ -405,7 +630,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
 	if (ret)
 		return ret;
 
-	return cppc_get_rate_from_fbctrs(cpu_data, fb_ctrs_t0, fb_ctrs_t1);
+	return cppc_get_rate_from_fbctrs(cpu_data, &fb_ctrs_t0, &fb_ctrs_t1);
 }
 
 static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
@@ -506,14 +731,21 @@ static void cppc_check_hisi_workaround(void)
 
 static int __init cppc_cpufreq_init(void)
 {
+	int ret;
+
 	if ((acpi_disabled) || !acpi_cpc_valid())
 		return -ENODEV;
 
 	INIT_LIST_HEAD(&cpu_data_list);
 
 	cppc_check_hisi_workaround();
+	cppc_freq_invariance_init();
 
-	return cpufreq_register_driver(&cppc_cpufreq_driver);
+	ret = cpufreq_register_driver(&cppc_cpufreq_driver);
+	if (ret)
+		cppc_freq_invariance_exit();
+
+	return ret;
 }
 
 static inline void free_cpu_data(void)
@@ -531,6 +763,7 @@ static inline void free_cpu_data(void)
 static void __exit cppc_cpufreq_exit(void)
 {
 	cpufreq_unregister_driver(&cppc_cpufreq_driver);
+	cppc_freq_invariance_exit();
 
 	free_cpu_data();
 }

-- 
viresh

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance
  2021-06-18  7:37         ` Viresh Kumar
@ 2021-06-18 12:26           ` Ionela Voinescu
  0 siblings, 0 replies; 12+ messages in thread
From: Ionela Voinescu @ 2021-06-18 12:26 UTC (permalink / raw
  To: Viresh Kumar
  Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm,
	Qian Cai, linux-acpi, linux-kernel

Hey,

On Friday 18 Jun 2021 at 13:07:13 (+0530), Viresh Kumar wrote:
> On 17-06-21, 11:34, Ionela Voinescu wrote:
> > We can't queue any more work for it as there's no tick for the offline
> > CPU.
> 
> Okay, so as discussed yesterday and confirmed by Peter, CPU hotplug shouldn't be
> a problem and we can avoid handling that here. Good.
> 
> The patch 1/3 from this series is not required anymore. I will get rid of
> stop_cpu() as well.
> 
> The second patch stays as is, as I still don't see another way of fixing the
> same problem on policy ->exit(). We still need that guarantee from topology
> core.
>

Right! That is because we need to make sure the tick does not queue any
more irq work while we do our cleanup (irq_work_sync() and then
kthread_cancel_work_sync()). Then kthread_cancel_work_sync() would
ensure the last worker wraps up so we can free the data.

I was hoping to avoid this but I don't currently see another way either.

> > P.S. I'll give more thought to the rcu use in the arch_topology driver.
> > I'm the boring kind that likes to err on the side of caution, so I tend
> > to agree that it might be good to have even if the current problem could
> > be solved in this driver.
> 
> Before I resend the series again, maybe it is better to align on the idea here
> itself first (as I need to fix some existing potential memleaks in policy
> ->init() first). So here is the new diff, looks okay now ?
> 
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index be4f62e2c5f1..3e9070f107c5 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -10,14 +10,18 @@
>  
>  #define pr_fmt(fmt)	"CPPC Cpufreq:"	fmt
>  
> +#include <linux/arch_topology.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/delay.h>
>  #include <linux/cpu.h>
>  #include <linux/cpufreq.h>
>  #include <linux/dmi.h>
> +#include <linux/irq_work.h>
> +#include <linux/kthread.h>
>  #include <linux/time.h>
>  #include <linux/vmalloc.h>
> +#include <uapi/linux/sched/types.h>
>  
>  #include <asm/unaligned.h>
>  
> @@ -57,6 +61,214 @@ static struct cppc_workaround_oem_info wa_info[] = {
>  	}
>  };
>  
> +#ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
> +
> +/* Frequency invariance support */
> +struct cppc_freq_invariance {
> +	int cpu;
> +	struct irq_work irq_work;
> +	struct kthread_work work;
> +	struct cppc_perf_fb_ctrs prev_perf_fb_ctrs;
> +	struct cppc_cpudata *cpu_data;
> +};
> +
> +static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
> +static struct kthread_worker *kworker_fie;
> +
> +static struct cpufreq_driver cppc_cpufreq_driver;
> +static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpu);
> +static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
> +				 struct cppc_perf_fb_ctrs *fb_ctrs_t0,
> +				 struct cppc_perf_fb_ctrs *fb_ctrs_t1);
> +
> +/**
> + * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance
> + * @work: The work item.
> + *
> + * The CPPC driver register itself with the topology core to provide its own
> + * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which
> + * gets called by the scheduler on every tick.
> + *
> + * Note that the arch specific counters have higher priority than CPPC counters,
> + * if available, though the CPPC driver doesn't need to have any special
> + * handling for that.
> + *
> + * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we
> + * reach here from hard-irq context), which then schedules a normal work item
> + * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable
> + * based on the counter updates since the last tick.
> + */
> +static void cppc_scale_freq_workfn(struct kthread_work *work)
> +{
> +	struct cppc_freq_invariance *cppc_fi;
> +	struct cppc_perf_fb_ctrs fb_ctrs = {0};
> +	struct cppc_cpudata *cpu_data;
> +	unsigned long local_freq_scale;
> +	u64 perf;
> +
> +	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
> +	cpu_data = cppc_fi->cpu_data;
> +
> +	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
> +		pr_warn("%s: failed to read perf counters\n", __func__);
> +		return;
> +	}
> +
> +	perf = cppc_perf_from_fbctrs(cpu_data, &cppc_fi->prev_perf_fb_ctrs,
> +				     &fb_ctrs);
> +	cppc_fi->prev_perf_fb_ctrs = fb_ctrs;
> +
> +	perf <<= SCHED_CAPACITY_SHIFT;
> +	local_freq_scale = div64_u64(perf, cpu_data->perf_caps.highest_perf);
> +
> +	/* This can happen due to counter overflows */
> +	if (unlikely(local_freq_scale > 1024))
> +		local_freq_scale = 1024;
> +
> +	per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale;
> +}
> +
> +static void cppc_irq_work(struct irq_work *irq_work)
> +{
> +	struct cppc_freq_invariance *cppc_fi;
> +
> +	cppc_fi = container_of(irq_work, struct cppc_freq_invariance, irq_work);
> +	kthread_queue_work(kworker_fie, &cppc_fi->work);
> +}
> +
> +static void cppc_scale_freq_tick(void)
> +{
> +	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id());
> +
> +	/*
> +	 * cppc_get_perf_ctrs() can potentially sleep, call that from the right
> +	 * context.
> +	 */
> +	irq_work_queue(&cppc_fi->irq_work);
> +}
> +
> +static struct scale_freq_data cppc_sftd = {
> +	.source = SCALE_FREQ_SOURCE_CPPC,
> +	.set_freq_scale = cppc_scale_freq_tick,
> +};
> +
> +static int cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
> +{
> +	struct cppc_freq_invariance *cppc_fi;
> +	int cpu, ret;
> +
> +	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> +		return 0;
> +
> +	for_each_cpu(cpu, policy->cpus) {
> +		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> +		cppc_fi->cpu = cpu;
> +		cppc_fi->cpu_data = policy->driver_data;
> +		kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
> +		init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
> +
> +		ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
> +		if (ret) {
> +			pr_warn("%s: failed to read perf counters for cpu:%d: %d\n",
> +				__func__, cpu, ret);
> +
> +			/* Avoid failing driver's probe because of FIE */
> +			return 0;
> +		}
> +	}
> +
> +	/* Register for freq-invariance */
> +	topology_set_scale_freq_source(&cppc_sftd, policy->cpus);
> +	return 0;
> +}
> +
> +/*
> + * We free all the resources on policy's removal and not on CPU removal as the
> + * irq-work are per-cpu and the hotplug core takes care of flushing the pending
> + * irq-works (hint: smpcfd_dying_cpu()) on CPU hotplug. Even if the kthread-work
> + * fires on another CPU after the concerned CPU is removed, it won't harm.
> + *
> + * We just need to make sure to remove them all on policy->exit().
> + */
> +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
> +{
> +	struct cppc_freq_invariance *cppc_fi;
> +	int cpu;
> +
> +	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> +		return;
> +
> +	/* policy->cpus will be empty here, use related_cpus instead */
> +	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
> +
> +	for_each_cpu(cpu, policy->related_cpus) {
> +		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> +		irq_work_sync(&cppc_fi->irq_work);
> +		kthread_cancel_work_sync(&cppc_fi->work);
> +	}
> +}

Yep, looks good (with its call on cppc_cpufreq_cpu_exit()).

Feel free to post V3 and I'll take an even closer look during testing.

Thanks,
Ionela.


> +
> +static void __init cppc_freq_invariance_init(void)
> +{
> +	struct sched_attr attr = {
> +		.size		= sizeof(struct sched_attr),
> +		.sched_policy	= SCHED_DEADLINE,
> +		.sched_nice	= 0,
> +		.sched_priority	= 0,
> +		/*
> +		 * Fake (unused) bandwidth; workaround to "fix"
> +		 * priority inheritance.
> +		 */
> +		.sched_runtime	= 1000000,
> +		.sched_deadline = 10000000,
> +		.sched_period	= 10000000,
> +	};
> +	int ret;
> +
> +	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> +		return;
> +
> +	kworker_fie = kthread_create_worker(0, "cppc_fie");
> +	if (IS_ERR(kworker_fie))
> +		return;
> +
> +	ret = sched_setattr_nocheck(kworker_fie->task, &attr);
> +	if (ret) {
> +		pr_warn("%s: failed to set SCHED_DEADLINE: %d\n", __func__,
> +			ret);
> +		kthread_destroy_worker(kworker_fie);
> +		return;
> +	}
> +}
> +
> +static void cppc_freq_invariance_exit(void)
> +{
> +	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> +		return;
> +
> +	kthread_destroy_worker(kworker_fie);
> +	kworker_fie = NULL;
> +}
> +
> +#else
> +static inline int cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
> +{
> +	return 0;
> +}
> +
> +static inline void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
> +{
> +}
> +
> +static inline void cppc_freq_invariance_init(void)
> +{
> +}
> +
> +static inline void cppc_freq_invariance_exit(void)
> +{
> +}
> +#endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */
> +
>  /* Callback function used to retrieve the max frequency from DMI */
>  static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
>  {
> @@ -324,11 +536,13 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  	cpu_data->perf_ctrls.desired_perf =  caps->highest_perf;
>  
>  	ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> -	if (ret)
> +	if (ret) {
>  		pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
>  			 caps->highest_perf, cpu, ret);
> +		return ret;
> +	}
>  
> -	return ret;
> +	return cppc_cpufreq_cpu_fie_init(policy);
>  }
>  
>  static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> @@ -338,6 +552,8 @@ static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
>  	unsigned int cpu = policy->cpu;
>  	int ret;
>  
> +	cppc_cpufreq_cpu_fie_exit(policy);
> +
>  	cpu_data->perf_ctrls.desired_perf = caps->lowest_perf;
>  
>  	ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> @@ -362,26 +578,35 @@ static inline u64 get_delta(u64 t1, u64 t0)
>  	return (u32)t1 - (u32)t0;
>  }
>  
> -static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
> -				     struct cppc_perf_fb_ctrs fb_ctrs_t0,
> -				     struct cppc_perf_fb_ctrs fb_ctrs_t1)
> +static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
> +				 struct cppc_perf_fb_ctrs *fb_ctrs_t0,
> +				 struct cppc_perf_fb_ctrs *fb_ctrs_t1)
>  {
>  	u64 delta_reference, delta_delivered;
> -	u64 reference_perf, delivered_perf;
> +	u64 reference_perf;
>  
> -	reference_perf = fb_ctrs_t0.reference_perf;
> +	reference_perf = fb_ctrs_t0->reference_perf;
>  
> -	delta_reference = get_delta(fb_ctrs_t1.reference,
> -				    fb_ctrs_t0.reference);
> -	delta_delivered = get_delta(fb_ctrs_t1.delivered,
> -				    fb_ctrs_t0.delivered);
> +	delta_reference = get_delta(fb_ctrs_t1->reference,
> +				    fb_ctrs_t0->reference);
> +	delta_delivered = get_delta(fb_ctrs_t1->delivered,
> +				    fb_ctrs_t0->delivered);
>  
> -	/* Check to avoid divide-by zero */
> -	if (delta_reference || delta_delivered)
> -		delivered_perf = (reference_perf * delta_delivered) /
> -					delta_reference;
> -	else
> -		delivered_perf = cpu_data->perf_ctrls.desired_perf;
> +	/* Check to avoid divide-by zero and invalid delivered_perf */
> +	if (!delta_reference || !delta_delivered)
> +		return cpu_data->perf_ctrls.desired_perf;
> +
> +	return (reference_perf * delta_delivered) / delta_reference;
> +}
> +
> +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
> +				     struct cppc_perf_fb_ctrs *fb_ctrs_t0,
> +				     struct cppc_perf_fb_ctrs *fb_ctrs_t1)
> +{
> +	u64 delivered_perf;
> +
> +	delivered_perf = cppc_perf_from_fbctrs(cpu_data, fb_ctrs_t0,
> +					       fb_ctrs_t1);
>  
>  	return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
>  }
> @@ -405,7 +630,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
>  	if (ret)
>  		return ret;
>  
> -	return cppc_get_rate_from_fbctrs(cpu_data, fb_ctrs_t0, fb_ctrs_t1);
> +	return cppc_get_rate_from_fbctrs(cpu_data, &fb_ctrs_t0, &fb_ctrs_t1);
>  }
>  
>  static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
> @@ -506,14 +731,21 @@ static void cppc_check_hisi_workaround(void)
>  
>  static int __init cppc_cpufreq_init(void)
>  {
> +	int ret;
> +
>  	if ((acpi_disabled) || !acpi_cpc_valid())
>  		return -ENODEV;
>  
>  	INIT_LIST_HEAD(&cpu_data_list);
>  
>  	cppc_check_hisi_workaround();
> +	cppc_freq_invariance_init();
>  
> -	return cpufreq_register_driver(&cppc_cpufreq_driver);
> +	ret = cpufreq_register_driver(&cppc_cpufreq_driver);
> +	if (ret)
> +		cppc_freq_invariance_exit();
> +
> +	return ret;
>  }
>  
>  static inline void free_cpu_data(void)
> @@ -531,6 +763,7 @@ static inline void free_cpu_data(void)
>  static void __exit cppc_cpufreq_exit(void)
>  {
>  	cpufreq_unregister_driver(&cppc_cpufreq_driver);
> +	cppc_freq_invariance_exit();
>  
>  	free_cpu_data();
>  }
> 
> -- 
> viresh

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-06-18 12:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-16  6:48 [PATCH V2 0/3] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
2021-06-16  6:48 ` [PATCH V2 3/3] cpufreq: CPPC: " Viresh Kumar
2021-06-16 12:48   ` Ionela Voinescu
2021-06-17  3:24     ` Viresh Kumar
2021-06-17 10:34       ` Ionela Voinescu
2021-06-17 11:19         ` Viresh Kumar
2021-06-17 12:22           ` Peter Zijlstra
2021-06-18  3:45             ` Viresh Kumar
2021-06-18  7:37         ` Viresh Kumar
2021-06-18 12:26           ` Ionela Voinescu
2021-06-16 10:02 ` [PATCH V2 0/3] cpufreq: cppc: " Vincent Guittot
2021-06-16 11:54   ` Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).