LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] thermal: core: Fix  thermal zone initialization and move passive polling management to the core
@ 2024-04-30 15:44 Rafael J. Wysocki
  2024-04-30 15:45 ` [PATCH v2 1/2] thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid Rafael J. Wysocki
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2024-04-30 15:44 UTC (permalink / raw
  To: Linux PM, Lukasz Luba; +Cc: Daniel Lezcano, LKML, Rafael J. Wysocki

Hi Everyone,

This a v2 of the patch at

https://lore.kernel.org/linux-pm/5938055.MhkbZ0Pkbq@kreacher/

with one fix patch added and a couple of changes more in the main patch.

Patch [1/2] fixes the thermal zone initialization in the cases when getting the
first zone temperature from the sensor is delayed and patch [2/2] is an update
of the patch above.

Thanks!




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid
  2024-04-30 15:44 [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move passive polling management to the core Rafael J. Wysocki
@ 2024-04-30 15:45 ` Rafael J. Wysocki
  2024-04-30 16:43   ` Lukasz Luba
  2024-04-30 15:52 ` [PATCH v2 2/2] thermal: core: Move passive polling management to the core Rafael J. Wysocki
  2024-04-30 18:32 ` [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move " Lukasz Luba
  2 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2024-04-30 15:45 UTC (permalink / raw
  To: Linux PM, Lukasz Luba; +Cc: Daniel Lezcano, LKML, Rafael J. Wysocki

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make __thermal_zone_device_update() bail out if update_temperature()
fails to update the zone temperature because __thermal_zone_get_temp()
has returned an error and the current zone temperature is
THERMAL_TEMP_INVALID (user space receiving netlink thermal messages,
thermal debug code and thermal governors may get confused otherwise).

Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

New patch in v2.

---
 drivers/thermal/thermal_core.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -487,6 +487,9 @@ void __thermal_zone_device_update(struct
 
 	update_temperature(tz);
 
+	if (tz->temperature == THERMAL_TEMP_INVALID)
+		return;
+
 	__thermal_zone_set_trips(tz);
 
 	tz->notify_event = event;




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] thermal: core: Move passive polling management to the core
  2024-04-30 15:44 [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move passive polling management to the core Rafael J. Wysocki
  2024-04-30 15:45 ` [PATCH v2 1/2] thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid Rafael J. Wysocki
@ 2024-04-30 15:52 ` Rafael J. Wysocki
  2024-04-30 16:47   ` Lukasz Luba
  2024-04-30 18:32 ` [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move " Lukasz Luba
  2 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2024-04-30 15:52 UTC (permalink / raw
  To: Linux PM, Lukasz Luba; +Cc: Daniel Lezcano, LKML, Rafael J. Wysocki

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Passive polling is enabled by setting the 'passive' field in
struct thermal_zone_device to a positive value so long as the
'passive_delay_jiffies' field is greater than zero.  It causes
the thermal core to actively check the thermal zone temperature
periodically which in theory should be done after crossing a
passive trip point on the way up in order to allow governors to
react more rapidly to temperature changes and adjust mitigation
more precisely.

However, the 'passive' field in struct thermal_zone_device is currently
managed by governors which is quite problematic.  First of all, only
two governors, Step-Wise and Power Allocator, update that field at
all, so the other governors do not benefit from passive polling,
although in principle they should.  Moreover, if the zone governor is
changed from, say, Step-Wise to Fair-Share after 'passive' has been
incremented by the former, it is not going to be reset back to zero by
the latter even if the zone temperature falls down below all passive
trip points.

For this reason, make handle_thermal_trip() increment 'passive'
to enable passive polling for the given thermal zone whenever a
passive trip point is crossed on the way up and decrement it
whenever a passive trip point is crossed on the way down.  Also
remove the 'passive' field updates from governors and additionally
clear it in thermal_zone_device_init() to prevent passive polling
from being enabled after a system resume just beacuse it was enabled
before suspending the system.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2:
   * Add a WARN_ON() check for tz->passive < 0.
   * Do not start passive polling until tz->passive is positive.

---
 drivers/thermal/gov_power_allocator.c |   12 +++++++-----
 drivers/thermal/gov_step_wise.c       |   10 ----------
 drivers/thermal/thermal_core.c        |   14 +++++++++++---
 3 files changed, 18 insertions(+), 18 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -296,7 +296,7 @@ static void monitor_thermal_zone(struct
 {
 	if (tz->mode != THERMAL_DEVICE_ENABLED)
 		thermal_zone_device_set_polling(tz, 0);
-	else if (tz->passive)
+	else if (tz->passive > 0)
 		thermal_zone_device_set_polling(tz, tz->passive_delay_jiffies);
 	else if (tz->polling_delay_jiffies)
 		thermal_zone_device_set_polling(tz, tz->polling_delay_jiffies);
@@ -389,6 +389,11 @@ static void handle_thermal_trip(struct t
 		if (tz->temperature < trip->temperature - trip->hysteresis) {
 			list_add(&td->notify_list_node, way_down_list);
 			td->notify_temp = trip->temperature - trip->hysteresis;
+
+			if (trip->type == THERMAL_TRIP_PASSIVE) {
+				tz->passive--;
+				WARN_ON(tz->passive < 0);
+			}
 		} else {
 			td->threshold -= trip->hysteresis;
 		}
@@ -402,8 +407,10 @@ static void handle_thermal_trip(struct t
 		td->notify_temp = trip->temperature;
 		td->threshold -= trip->hysteresis;
 
-		if (trip->type == THERMAL_TRIP_CRITICAL ||
-		    trip->type == THERMAL_TRIP_HOT)
+		if (trip->type == THERMAL_TRIP_PASSIVE)
+			tz->passive++;
+		else if (trip->type == THERMAL_TRIP_CRITICAL ||
+			 trip->type == THERMAL_TRIP_HOT)
 			handle_critical_trips(tz, trip);
 	}
 }
@@ -444,6 +451,7 @@ static void thermal_zone_device_init(str
 	INIT_DELAYED_WORK(&tz->poll_queue, thermal_zone_device_check);
 
 	tz->temperature = THERMAL_TEMP_INVALID;
+	tz->passive = 0;
 	tz->prev_low_trip = -INT_MAX;
 	tz->prev_high_trip = INT_MAX;
 	list_for_each_entry(pos, &tz->thermal_instances, tz_node)
Index: linux-pm/drivers/thermal/gov_step_wise.c
===================================================================
--- linux-pm.orig/drivers/thermal/gov_step_wise.c
+++ linux-pm/drivers/thermal/gov_step_wise.c
@@ -93,16 +93,6 @@ static void thermal_zone_trip_update(str
 		if (instance->initialized && old_target == instance->target)
 			continue;
 
-		if (trip->type == THERMAL_TRIP_PASSIVE) {
-			/* If needed, update the status of passive polling. */
-			if (old_target == THERMAL_NO_TARGET &&
-			    instance->target != THERMAL_NO_TARGET)
-				tz->passive++;
-			else if (old_target != THERMAL_NO_TARGET &&
-				 instance->target == THERMAL_NO_TARGET)
-				tz->passive--;
-		}
-
 		instance->initialized = true;
 
 		mutex_lock(&instance->cdev->lock);
Index: linux-pm/drivers/thermal/gov_power_allocator.c
===================================================================
--- linux-pm.orig/drivers/thermal/gov_power_allocator.c
+++ linux-pm/drivers/thermal/gov_power_allocator.c
@@ -66,6 +66,7 @@ struct power_actor {
  * struct power_allocator_params - parameters for the power allocator governor
  * @allocated_tzp:	whether we have allocated tzp for this thermal zone and
  *			it needs to be freed on unbind
+ * @update_cdevs:	whether or not update cdevs on the next run
  * @err_integral:	accumulated error in the PID controller.
  * @prev_err:	error in the previous iteration of the PID controller.
  *		Used to calculate the derivative term.
@@ -84,6 +85,7 @@ struct power_actor {
  */
 struct power_allocator_params {
 	bool allocated_tzp;
+	bool update_cdevs;
 	s64 err_integral;
 	s32 prev_err;
 	u32 sustainable_power;
@@ -533,7 +535,7 @@ static void reset_pid_controller(struct
 	params->prev_err = 0;
 }
 
-static void allow_maximum_power(struct thermal_zone_device *tz, bool update)
+static void allow_maximum_power(struct thermal_zone_device *tz)
 {
 	struct power_allocator_params *params = tz->governor_data;
 	struct thermal_cooling_device *cdev;
@@ -555,7 +557,7 @@ static void allow_maximum_power(struct t
 		 */
 		cdev->ops->get_requested_power(cdev, &req_power);
 
-		if (update)
+		if (params->update_cdevs)
 			__thermal_cdev_update(cdev);
 
 		mutex_unlock(&cdev->lock);
@@ -752,13 +754,13 @@ static void power_allocator_manage(struc
 
 	if (trip && tz->temperature < trip->temperature) {
 		reset_pid_controller(params);
-		allow_maximum_power(tz, tz->passive);
-		tz->passive = 0;
+		allow_maximum_power(tz);
+		params->update_cdevs = false;
 		return;
 	}
 
 	allocate_power(tz, params->trip_max->temperature);
-	tz->passive = 1;
+	params->update_cdevs = true;
 }
 
 static struct thermal_governor thermal_gov_power_allocator = {




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid
  2024-04-30 15:45 ` [PATCH v2 1/2] thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid Rafael J. Wysocki
@ 2024-04-30 16:43   ` Lukasz Luba
  0 siblings, 0 replies; 7+ messages in thread
From: Lukasz Luba @ 2024-04-30 16:43 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Daniel Lezcano, Linux PM, LKML, Rafael J. Wysocki



On 4/30/24 16:45, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make __thermal_zone_device_update() bail out if update_temperature()
> fails to update the zone temperature because __thermal_zone_get_temp()
> has returned an error and the current zone temperature is
> THERMAL_TEMP_INVALID (user space receiving netlink thermal messages,
> thermal debug code and thermal governors may get confused otherwise).
> 
> Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed")
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> New patch in v2.
> 
> ---
>   drivers/thermal/thermal_core.c |    3 +++
>   1 file changed, 3 insertions(+)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -487,6 +487,9 @@ void __thermal_zone_device_update(struct
>   
>   	update_temperature(tz);
>   
> +	if (tz->temperature == THERMAL_TEMP_INVALID)
> +		return;
> +
>   	__thermal_zone_set_trips(tz);
>   
>   	tz->notify_event = event;
> 
> 
> 

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] thermal: core: Move passive polling management to the core
  2024-04-30 15:52 ` [PATCH v2 2/2] thermal: core: Move passive polling management to the core Rafael J. Wysocki
@ 2024-04-30 16:47   ` Lukasz Luba
  0 siblings, 0 replies; 7+ messages in thread
From: Lukasz Luba @ 2024-04-30 16:47 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Daniel Lezcano, LKML, Rafael J. Wysocki, Linux PM



On 4/30/24 16:52, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Passive polling is enabled by setting the 'passive' field in
> struct thermal_zone_device to a positive value so long as the
> 'passive_delay_jiffies' field is greater than zero.  It causes
> the thermal core to actively check the thermal zone temperature
> periodically which in theory should be done after crossing a
> passive trip point on the way up in order to allow governors to
> react more rapidly to temperature changes and adjust mitigation
> more precisely.
> 
> However, the 'passive' field in struct thermal_zone_device is currently
> managed by governors which is quite problematic.  First of all, only
> two governors, Step-Wise and Power Allocator, update that field at
> all, so the other governors do not benefit from passive polling,
> although in principle they should.  Moreover, if the zone governor is
> changed from, say, Step-Wise to Fair-Share after 'passive' has been
> incremented by the former, it is not going to be reset back to zero by
> the latter even if the zone temperature falls down below all passive
> trip points.
> 
> For this reason, make handle_thermal_trip() increment 'passive'
> to enable passive polling for the given thermal zone whenever a
> passive trip point is crossed on the way up and decrement it
> whenever a passive trip point is crossed on the way down.  Also
> remove the 'passive' field updates from governors and additionally
> clear it in thermal_zone_device_init() to prevent passive polling
> from being enabled after a system resume just beacuse it was enabled
> before suspending the system.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v1 -> v2:
>     * Add a WARN_ON() check for tz->passive < 0.
>     * Do not start passive polling until tz->passive is positive.
> 
> ---
>   drivers/thermal/gov_power_allocator.c |   12 +++++++-----
>   drivers/thermal/gov_step_wise.c       |   10 ----------
>   drivers/thermal/thermal_core.c        |   14 +++++++++++---
>   3 files changed, 18 insertions(+), 18 deletions(-)
> 

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move passive polling management to the core
  2024-04-30 15:44 [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move passive polling management to the core Rafael J. Wysocki
  2024-04-30 15:45 ` [PATCH v2 1/2] thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid Rafael J. Wysocki
  2024-04-30 15:52 ` [PATCH v2 2/2] thermal: core: Move passive polling management to the core Rafael J. Wysocki
@ 2024-04-30 18:32 ` Lukasz Luba
  2024-04-30 19:15   ` Rafael J. Wysocki
  2 siblings, 1 reply; 7+ messages in thread
From: Lukasz Luba @ 2024-04-30 18:32 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Daniel Lezcano, LKML, Rafael J. Wysocki, Linux PM



On 4/30/24 16:44, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> This a v2 of the patch at
> 
> https://lore.kernel.org/linux-pm/5938055.MhkbZ0Pkbq@kreacher/
> 
> with one fix patch added and a couple of changes more in the main patch.
> 
> Patch [1/2] fixes the thermal zone initialization in the cases when getting the
> first zone temperature from the sensor is delayed and patch [2/2] is an update
> of the patch above.
> 
> Thanks!
> 
> 
> 

I have also tested the IPA and Step-wise - they work.

Tested-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move passive polling management to the core
  2024-04-30 18:32 ` [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move " Lukasz Luba
@ 2024-04-30 19:15   ` Rafael J. Wysocki
  0 siblings, 0 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2024-04-30 19:15 UTC (permalink / raw
  To: Lukasz Luba
  Cc: Rafael J. Wysocki, Daniel Lezcano, LKML, Rafael J. Wysocki,
	Linux PM

On Tue, Apr 30, 2024 at 8:32 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 4/30/24 16:44, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > This a v2 of the patch at
> >
> > https://lore.kernel.org/linux-pm/5938055.MhkbZ0Pkbq@kreacher/
> >
> > with one fix patch added and a couple of changes more in the main patch.
> >
> > Patch [1/2] fixes the thermal zone initialization in the cases when getting the
> > first zone temperature from the sensor is delayed and patch [2/2] is an update
> > of the patch above.
> >
> > Thanks!
> >
> >
> >
>
> I have also tested the IPA and Step-wise - they work.
>
> Tested-by: Lukasz Luba <lukasz.luba@arm.com>

Thank you!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-30 19:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-30 15:44 [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move passive polling management to the core Rafael J. Wysocki
2024-04-30 15:45 ` [PATCH v2 1/2] thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid Rafael J. Wysocki
2024-04-30 16:43   ` Lukasz Luba
2024-04-30 15:52 ` [PATCH v2 2/2] thermal: core: Move passive polling management to the core Rafael J. Wysocki
2024-04-30 16:47   ` Lukasz Luba
2024-04-30 18:32 ` [PATCH v2 0/2] thermal: core: Fix thermal zone initialization and move " Lukasz Luba
2024-04-30 19:15   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).