Linux-Doc Archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] x86/resctrl: Track llc_occupancy of RMIDs in limbo list
@ 2024-03-08  7:41 Haifeng Xu
  2024-03-08  7:41 ` [PATCH v5 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h Haifeng Xu
  2024-03-08  7:41 ` [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking Haifeng Xu
  0 siblings, 2 replies; 7+ messages in thread
From: Haifeng Xu @ 2024-03-08  7:41 UTC (permalink / raw
  To: reinette.chatre, james.morse
  Cc: fenghua.yu, babu.moger, tglx, mingo, bp, dave.hansen, hpa,
	peternewman, x86, linux-kernel, corbet, linux-doc, Haifeng Xu

After removing a monitor group, its RMID may not be freed immediately
unless its llc_occupancy is less than the re-allocation threshold. If
turning up the threshold, the RMID can be reused. In order to know how
much the threshold should be, it's necessary to acquire the llc_occupancy.

The patch series provides a new tracepoint to track the llc_occupancy.

Changes since v1:
- Rename pseudo_lock_event.h instead of creating a new header file.
- Modify names used in the tracepoint.
- Update changelog.

Changes since v2:
- Fix typo and use the x86/resctrl subject prefix in the cover letter.
- Track both CLOSID and RMID in the tracepoint.
- Remove the details on how perf can be used in patch2's changelog.

Changes since v3:
- Put the tracepoint in the 'else' section of the if/else after
  resctrl_arch_rmid_read().
- Modify names used in the tracepoint.
- Document the properties of the tracepoint.

Changes since v4:
- Add Reviewed-by tags.
- Include more maintainers in the submission.

Haifeng Xu (2):
  x86/resctrl: Rename pseudo_lock_event.h to trace.h
  x86/resctrl: Add tracepoint for llc_occupancy tracking

 Documentation/arch/x86/resctrl.rst            |  8 +++++++
 arch/x86/kernel/cpu/resctrl/monitor.c         |  9 ++++++++
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c     |  2 +-
 .../resctrl/{pseudo_lock_event.h => trace.h}  | 22 ++++++++++++++++---
 4 files changed, 37 insertions(+), 4 deletions(-)
 rename arch/x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} (57%)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h
  2024-03-08  7:41 [PATCH v5 0/2] x86/resctrl: Track llc_occupancy of RMIDs in limbo list Haifeng Xu
@ 2024-03-08  7:41 ` Haifeng Xu
  2024-03-13 22:46   ` Reinette Chatre
  2024-03-08  7:41 ` [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking Haifeng Xu
  1 sibling, 1 reply; 7+ messages in thread
From: Haifeng Xu @ 2024-03-08  7:41 UTC (permalink / raw
  To: reinette.chatre, james.morse
  Cc: fenghua.yu, babu.moger, tglx, mingo, bp, dave.hansen, hpa,
	peternewman, x86, linux-kernel, corbet, linux-doc, Haifeng Xu

Now only pseudo-locking part uses tracepoints to do event tracking, but
other parts of resctrl may need new tracepoints. It is unnecessary to
create separate header files and define CREATE_TRACE_POINTS in different
c files which fragments the resctrl tracing.

Therefore, give the resctrl tracepoint header file a generic name to
support its use for tracepoints that are not specific to pseudo-locking.

No functional change.

Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c                   | 2 +-
 .../x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)
 rename arch/x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} (88%)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 884b88e25141..492c8e28c4ce 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -31,7 +31,7 @@
 #include "internal.h"
 
 #define CREATE_TRACE_POINTS
-#include "pseudo_lock_event.h"
+#include "trace.h"
 
 /*
  * The bits needed to disable hardware prefetching varies based on the
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h b/arch/x86/kernel/cpu/resctrl/trace.h
similarity index 88%
rename from arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
rename to arch/x86/kernel/cpu/resctrl/trace.h
index 428ebbd4270b..ed5c66b8ab0b 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
+++ b/arch/x86/kernel/cpu/resctrl/trace.h
@@ -2,7 +2,7 @@
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM resctrl
 
-#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
+#if !defined(_TRACE_RESCTRL_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _TRACE_PSEUDO_LOCK_H
 
 #include <linux/tracepoint.h>
@@ -35,9 +35,9 @@ TRACE_EVENT(pseudo_lock_l3,
 	    TP_printk("hits=%llu miss=%llu",
 		      __entry->l3_hits, __entry->l3_miss));
 
-#endif /* _TRACE_PSEUDO_LOCK_H */
+#endif /* _TRACE_RESCTRL_H */
 
 #undef TRACE_INCLUDE_PATH
 #define TRACE_INCLUDE_PATH .
-#define TRACE_INCLUDE_FILE pseudo_lock_event
+#define TRACE_INCLUDE_FILE trace
 #include <trace/define_trace.h>
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking
  2024-03-08  7:41 [PATCH v5 0/2] x86/resctrl: Track llc_occupancy of RMIDs in limbo list Haifeng Xu
  2024-03-08  7:41 ` [PATCH v5 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h Haifeng Xu
@ 2024-03-08  7:41 ` Haifeng Xu
  2024-03-13 22:47   ` Reinette Chatre
  1 sibling, 1 reply; 7+ messages in thread
From: Haifeng Xu @ 2024-03-08  7:41 UTC (permalink / raw
  To: reinette.chatre, james.morse
  Cc: fenghua.yu, babu.moger, tglx, mingo, bp, dave.hansen, hpa,
	peternewman, x86, linux-kernel, corbet, linux-doc, Haifeng Xu

In our production environment, after removing monitor groups, those unused
RMIDs get stuck in the limbo list forever because their llc_occupancy are
always larger than the threshold. But the unused RMIDs can be successfully
freed by turning up the threshold.

In order to know how much the threshold should be, perf can be used to
acquire the llc_occupancy of RMIDs in each rdt domain.

Instead of using perf tool to track llc_occupancy and filter the log
manually, it is more convenient for users to use tracepoint to do this
work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
when scanning the limbo list.

Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Suggested-by: James Morse <james.morse@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
---
 Documentation/arch/x86/resctrl.rst    |  8 ++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c |  9 +++++++++
 arch/x86/kernel/cpu/resctrl/trace.h   | 16 ++++++++++++++++
 3 files changed, 33 insertions(+)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a6279df64a9d..dd3507dc765c 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
 each bit represents 5% of the capacity of the cache. You could partition
 the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
 
+Tracepoint - mon_llc_occupancy_limbo
+------------------------------------
+This tracepoint gives you the precise occupancy values for a subset of RMID
+that are not immediately available for allocation. This can't be relied on
+to produce output every second, it may be necessary to attempt to create an
+empty monitor group to force an update. Output may only be produced if creation
+of a control or monitor group fails.
+
 Memory bandwidth Allocation and monitoring
 ==========================================
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c34a35ec0f03..60b6a29a9e29 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -24,6 +24,7 @@
 #include <asm/resctrl.h>
 
 #include "internal.h"
+#include "trace.h"
 
 /**
  * struct rmid_entry - dirty tracking for all RMID.
@@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 			rmid_dirty = true;
 		} else {
 			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+
+			/* x86's CLOSID and RMID are independent numbers, so the entry's
+			 * closid is a invalid CLOSID. But on arm64, the RMID value isn't
+			 * a unique number for each CLOSID. It's necessary to track both
+			 * CLOSID and RMID because there may be dependencies between each
+			 * other on some architectures.
+			 */
+			trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
 		}
 
 		if (force_free || !rmid_dirty) {
diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
index ed5c66b8ab0b..b310b4985b94 100644
--- a/arch/x86/kernel/cpu/resctrl/trace.h
+++ b/arch/x86/kernel/cpu/resctrl/trace.h
@@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
 	    TP_printk("hits=%llu miss=%llu",
 		      __entry->l3_hits, __entry->l3_miss));
 
+TRACE_EVENT(mon_llc_occupancy_limbo,
+	    TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
+	    TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
+	    TP_STRUCT__entry(__field(u32, ctrl_hw_id)
+			     __field(u32, mon_hw_id)
+			     __field(int, domain_id)
+			     __field(u64, llc_occupancy_bytes)),
+	    TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
+			   __entry->mon_hw_id = mon_hw_id;
+			   __entry->domain_id = domain_id;
+			   __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
+	    TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu",
+		      __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
+		      __entry->llc_occupancy_bytes)
+	   );
+
 #endif /* _TRACE_RESCTRL_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h
  2024-03-08  7:41 ` [PATCH v5 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h Haifeng Xu
@ 2024-03-13 22:46   ` Reinette Chatre
  2024-03-19  6:27     ` Haifeng Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Reinette Chatre @ 2024-03-13 22:46 UTC (permalink / raw
  To: Haifeng Xu, james.morse
  Cc: fenghua.yu, babu.moger, tglx, mingo, bp, dave.hansen, hpa,
	peternewman, x86, linux-kernel, corbet, linux-doc

Hi Haifeng,

On 3/7/2024 11:41 PM, Haifeng Xu wrote:
> Now only pseudo-locking part uses tracepoints to do event tracking, but
> other parts of resctrl may need new tracepoints. It is unnecessary to
> create separate header files and define CREATE_TRACE_POINTS in different
> c files which fragments the resctrl tracing.
> 
> Therefore, give the resctrl tracepoint header file a generic name to
> support its use for tracepoints that are not specific to pseudo-locking.
> 
> No functional change.
> 
> Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c                   | 2 +-
>  .../x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
>  rename arch/x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} (88%)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> index 884b88e25141..492c8e28c4ce 100644
> --- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> +++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> @@ -31,7 +31,7 @@
>  #include "internal.h"
>  
>  #define CREATE_TRACE_POINTS
> -#include "pseudo_lock_event.h"
> +#include "trace.h"
>  
>  /*
>   * The bits needed to disable hardware prefetching varies based on the
> diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h b/arch/x86/kernel/cpu/resctrl/trace.h
> similarity index 88%
> rename from arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
> rename to arch/x86/kernel/cpu/resctrl/trace.h
> index 428ebbd4270b..ed5c66b8ab0b 100644
> --- a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
> +++ b/arch/x86/kernel/cpu/resctrl/trace.h
> @@ -2,7 +2,7 @@
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM resctrl
>  
> -#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
> +#if !defined(_TRACE_RESCTRL_H) || defined(TRACE_HEADER_MULTI_READ)
>  #define _TRACE_PSEUDO_LOCK_H

The above #define should match the new name also.

>  
>  #include <linux/tracepoint.h>
> @@ -35,9 +35,9 @@ TRACE_EVENT(pseudo_lock_l3,
>  	    TP_printk("hits=%llu miss=%llu",
>  		      __entry->l3_hits, __entry->l3_miss));
>  
> -#endif /* _TRACE_PSEUDO_LOCK_H */
> +#endif /* _TRACE_RESCTRL_H */
>  
>  #undef TRACE_INCLUDE_PATH
>  #define TRACE_INCLUDE_PATH .
> -#define TRACE_INCLUDE_FILE pseudo_lock_event
> +#define TRACE_INCLUDE_FILE trace
>  #include <trace/define_trace.h>

The rest looks good.

Thank you.

Reinette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking
  2024-03-08  7:41 ` [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking Haifeng Xu
@ 2024-03-13 22:47   ` Reinette Chatre
  2024-03-19  6:38     ` Haifeng Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Reinette Chatre @ 2024-03-13 22:47 UTC (permalink / raw
  To: Haifeng Xu, james.morse
  Cc: fenghua.yu, babu.moger, tglx, mingo, bp, dave.hansen, hpa,
	peternewman, x86, linux-kernel, corbet, linux-doc

Hi Haifeng,

On 3/7/2024 11:41 PM, Haifeng Xu wrote:
> In our production environment, after removing monitor groups, those unused
> RMIDs get stuck in the limbo list forever because their llc_occupancy are
> always larger than the threshold. But the unused RMIDs can be successfully
> freed by turning up the threshold.
> 
> In order to know how much the threshold should be, perf can be used to
> acquire the llc_occupancy of RMIDs in each rdt domain.
> 
> Instead of using perf tool to track llc_occupancy and filter the log
> manually, it is more convenient for users to use tracepoint to do this
> work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
> when scanning the limbo list.
> 
> Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Suggested-by: James Morse <james.morse@arm.com>
> Reviewed-by: James Morse <james.morse@arm.com>
> ---
>  Documentation/arch/x86/resctrl.rst    |  8 ++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c |  9 +++++++++
>  arch/x86/kernel/cpu/resctrl/trace.h   | 16 ++++++++++++++++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index a6279df64a9d..dd3507dc765c 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
>  each bit represents 5% of the capacity of the cache. You could partition
>  the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
>  
> +Tracepoint - mon_llc_occupancy_limbo
> +------------------------------------

I think that the below paragraph would fit nicely as a new paragraph in the
existing "max_threshold_occupancy - generic concepts" section. To support that
just one change to text below ...

> +This tracepoint gives you the precise occupancy values for a subset of RMID

The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes
for a subset of RMID ...

> +that are not immediately available for allocation. This can't be relied on
> +to produce output every second, it may be necessary to attempt to create an
> +empty monitor group to force an update. Output may only be produced if creation
> +of a control or monitor group fails.
> +
>  Memory bandwidth Allocation and monitoring
>  ==========================================
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index c34a35ec0f03..60b6a29a9e29 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -24,6 +24,7 @@
>  #include <asm/resctrl.h>
>  
>  #include "internal.h"
> +#include "trace.h"
>  
>  /**
>   * struct rmid_entry - dirty tracking for all RMID.
> @@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>  			rmid_dirty = true;
>  		} else {
>  			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
> +
> +			/* x86's CLOSID and RMID are independent numbers, so the entry's
> +			 * closid is a invalid CLOSID. But on arm64, the RMID value isn't
> +			 * a unique number for each CLOSID. It's necessary to track both
> +			 * CLOSID and RMID because there may be dependencies between each
> +			 * other on some architectures.
> +			 */

Please watch for proper formatting of multi-line comment and consistent capitalization.
I also think comment can be more accurate, for example:

	/*
	 * x86's CLOSID and RMID are independent numbers, so the entry's
 	 * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
	 * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't used
	 * to select the configuration. It is thus necessary to track both
	 * CLOSID and RMID because there may be dependencies between them
	 * on some architectures.
	 */

> +			trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
>  		}
>  
>  		if (force_free || !rmid_dirty) {
> diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
> index ed5c66b8ab0b..b310b4985b94 100644
> --- a/arch/x86/kernel/cpu/resctrl/trace.h
> +++ b/arch/x86/kernel/cpu/resctrl/trace.h
> @@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
>  	    TP_printk("hits=%llu miss=%llu",
>  		      __entry->l3_hits, __entry->l3_miss));
>  
> +TRACE_EVENT(mon_llc_occupancy_limbo,
> +	    TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
> +	    TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
> +	    TP_STRUCT__entry(__field(u32, ctrl_hw_id)
> +			     __field(u32, mon_hw_id)
> +			     __field(int, domain_id)
> +			     __field(u64, llc_occupancy_bytes)),
> +	    TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
> +			   __entry->mon_hw_id = mon_hw_id;
> +			   __entry->domain_id = domain_id;
> +			   __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
> +	    TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu",

domain_d -> domain_id

> +		      __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
> +		      __entry->llc_occupancy_bytes)
> +	   );
> +
>  #endif /* _TRACE_RESCTRL_H */
>  
>  #undef TRACE_INCLUDE_PATH


Reinette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h
  2024-03-13 22:46   ` Reinette Chatre
@ 2024-03-19  6:27     ` Haifeng Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Haifeng Xu @ 2024-03-19  6:27 UTC (permalink / raw
  To: Reinette Chatre, james.morse
  Cc: fenghua.yu, babu.moger, tglx, mingo, bp, dave.hansen, hpa,
	peternewman, x86, linux-kernel, corbet, linux-doc



On 2024/3/14 06:46, Reinette Chatre wrote:
> Hi Haifeng,
> 
> On 3/7/2024 11:41 PM, Haifeng Xu wrote:
>> Now only pseudo-locking part uses tracepoints to do event tracking, but
>> other parts of resctrl may need new tracepoints. It is unnecessary to
>> create separate header files and define CREATE_TRACE_POINTS in different
>> c files which fragments the resctrl tracing.
>>
>> Therefore, give the resctrl tracepoint header file a generic name to
>> support its use for tracepoints that are not specific to pseudo-locking.
>>
>> No functional change.
>>
>> Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
>> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c                   | 2 +-
>>  .../x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} | 6 +++---
>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>  rename arch/x86/kernel/cpu/resctrl/{pseudo_lock_event.h => trace.h} (88%)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
>> index 884b88e25141..492c8e28c4ce 100644
>> --- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
>> +++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
>> @@ -31,7 +31,7 @@
>>  #include "internal.h"
>>  
>>  #define CREATE_TRACE_POINTS
>> -#include "pseudo_lock_event.h"
>> +#include "trace.h"
>>  
>>  /*
>>   * The bits needed to disable hardware prefetching varies based on the
>> diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h b/arch/x86/kernel/cpu/resctrl/trace.h
>> similarity index 88%
>> rename from arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
>> rename to arch/x86/kernel/cpu/resctrl/trace.h
>> index 428ebbd4270b..ed5c66b8ab0b 100644
>> --- a/arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
>> +++ b/arch/x86/kernel/cpu/resctrl/trace.h
>> @@ -2,7 +2,7 @@
>>  #undef TRACE_SYSTEM
>>  #define TRACE_SYSTEM resctrl
>>  
>> -#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#if !defined(_TRACE_RESCTRL_H) || defined(TRACE_HEADER_MULTI_READ)
>>  #define _TRACE_PSEUDO_LOCK_H
> 
> The above #define should match the new name also.

Sorry, I forgot to replace it with 'RESCTRL' in this version.
Thanks for pointing that out!
> 
>>  
>>  #include <linux/tracepoint.h>
>> @@ -35,9 +35,9 @@ TRACE_EVENT(pseudo_lock_l3,
>>  	    TP_printk("hits=%llu miss=%llu",
>>  		      __entry->l3_hits, __entry->l3_miss));
>>  
>> -#endif /* _TRACE_PSEUDO_LOCK_H */
>> +#endif /* _TRACE_RESCTRL_H */
>>  
>>  #undef TRACE_INCLUDE_PATH
>>  #define TRACE_INCLUDE_PATH .
>> -#define TRACE_INCLUDE_FILE pseudo_lock_event
>> +#define TRACE_INCLUDE_FILE trace
>>  #include <trace/define_trace.h>
> 
> The rest looks good.
> 
> Thank you.
> 
> Reinette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking
  2024-03-13 22:47   ` Reinette Chatre
@ 2024-03-19  6:38     ` Haifeng Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Haifeng Xu @ 2024-03-19  6:38 UTC (permalink / raw
  To: Reinette Chatre, james.morse
  Cc: fenghua.yu, babu.moger, tglx, mingo, bp, dave.hansen, hpa,
	peternewman, x86, linux-kernel, corbet, linux-doc



On 2024/3/14 06:47, Reinette Chatre wrote:
> Hi Haifeng,
> 
> On 3/7/2024 11:41 PM, Haifeng Xu wrote:
>> In our production environment, after removing monitor groups, those unused
>> RMIDs get stuck in the limbo list forever because their llc_occupancy are
>> always larger than the threshold. But the unused RMIDs can be successfully
>> freed by turning up the threshold.
>>
>> In order to know how much the threshold should be, perf can be used to
>> acquire the llc_occupancy of RMIDs in each rdt domain.
>>
>> Instead of using perf tool to track llc_occupancy and filter the log
>> manually, it is more convenient for users to use tracepoint to do this
>> work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
>> when scanning the limbo list.
>>
>> Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
>> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
>> Suggested-by: James Morse <james.morse@arm.com>
>> Reviewed-by: James Morse <james.morse@arm.com>
>> ---
>>  Documentation/arch/x86/resctrl.rst    |  8 ++++++++
>>  arch/x86/kernel/cpu/resctrl/monitor.c |  9 +++++++++
>>  arch/x86/kernel/cpu/resctrl/trace.h   | 16 ++++++++++++++++
>>  3 files changed, 33 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index a6279df64a9d..dd3507dc765c 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
>>  each bit represents 5% of the capacity of the cache. You could partition
>>  the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
>>  
>> +Tracepoint - mon_llc_occupancy_limbo
>> +------------------------------------
> 
> I think that the below paragraph would fit nicely as a new paragraph in the
> existing "max_threshold_occupancy - generic concepts" section. To support that
> just one change to text below ...
> 
>> +This tracepoint gives you the precise occupancy values for a subset of RMID
> 
> The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes
> for a subset of RMID ...
> 

OK, I'll move the descriptions to "max_threshold_occupancy - generic concepts" section.

>> +that are not immediately available for allocation. This can't be relied on
>> +to produce output every second, it may be necessary to attempt to create an
>> +empty monitor group to force an update. Output may only be produced if creation
>> +of a control or monitor group fails.
>> +
>>  Memory bandwidth Allocation and monitoring
>>  ==========================================
>>  
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index c34a35ec0f03..60b6a29a9e29 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -24,6 +24,7 @@
>>  #include <asm/resctrl.h>
>>  
>>  #include "internal.h"
>> +#include "trace.h"
>>  
>>  /**
>>   * struct rmid_entry - dirty tracking for all RMID.
>> @@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>>  			rmid_dirty = true;
>>  		} else {
>>  			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
>> +
>> +			/* x86's CLOSID and RMID are independent numbers, so the entry's
>> +			 * closid is a invalid CLOSID. But on arm64, the RMID value isn't
>> +			 * a unique number for each CLOSID. It's necessary to track both
>> +			 * CLOSID and RMID because there may be dependencies between each
>> +			 * other on some architectures.
>> +			 */
> 
> Please watch for proper formatting of multi-line comment and consistent capitalization.
> I also think comment can be more accurate, for example:
> 
> 	/*
> 	 * x86's CLOSID and RMID are independent numbers, so the entry's
>  	 * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
> 	 * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't used
> 	 * to select the configuration. It is thus necessary to track both
> 	 * CLOSID and RMID because there may be dependencies between them
> 	 * on some architectures.
> 	 */
> 

Thanks for your suggestions!

>> +			trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
>>  		}
>>  
>>  		if (force_free || !rmid_dirty) {
>> diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
>> index ed5c66b8ab0b..b310b4985b94 100644
>> --- a/arch/x86/kernel/cpu/resctrl/trace.h
>> +++ b/arch/x86/kernel/cpu/resctrl/trace.h
>> @@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
>>  	    TP_printk("hits=%llu miss=%llu",
>>  		      __entry->l3_hits, __entry->l3_miss));
>>  
>> +TRACE_EVENT(mon_llc_occupancy_limbo,
>> +	    TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
>> +	    TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
>> +	    TP_STRUCT__entry(__field(u32, ctrl_hw_id)
>> +			     __field(u32, mon_hw_id)
>> +			     __field(int, domain_id)
>> +			     __field(u64, llc_occupancy_bytes)),
>> +	    TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
>> +			   __entry->mon_hw_id = mon_hw_id;
>> +			   __entry->domain_id = domain_id;
>> +			   __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
>> +	    TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu",
> 
> domain_d -> domain_id
> 
>> +		      __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
>> +		      __entry->llc_occupancy_bytes)
>> +	   );
>> +
>>  #endif /* _TRACE_RESCTRL_H */
>>  
>>  #undef TRACE_INCLUDE_PATH
> 
> 
> Reinette

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-03-19  6:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-08  7:41 [PATCH v5 0/2] x86/resctrl: Track llc_occupancy of RMIDs in limbo list Haifeng Xu
2024-03-08  7:41 ` [PATCH v5 1/2] x86/resctrl: Rename pseudo_lock_event.h to trace.h Haifeng Xu
2024-03-13 22:46   ` Reinette Chatre
2024-03-19  6:27     ` Haifeng Xu
2024-03-08  7:41 ` [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking Haifeng Xu
2024-03-13 22:47   ` Reinette Chatre
2024-03-19  6:38     ` Haifeng Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).