Historical speck list archives
 help / color / mirror / Atom feed
From: Andi Kleen <ak@linux.intel.com>
To: speck@linutronix.de
Subject: [MODERATED] mbox for PERFv3
Date: Thu, 7 Feb 2019 16:14:17 -0800	[thread overview]
Message-ID: <20190208001417.GA16922@tassilo.jf.intel.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 24 bytes --]


mbox attached.

-Andi


[-- Attachment #2: mbox --]
[-- Type: text/plain, Size: 27413 bytes --]

From 3c5756531a45ca9791a9c4bca295e92e063957ac Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak@linux.intel.com>
Date: Tue, 18 Sep 2018 13:18:07 -0700
Subject: [PATCH 1/6] x86/pmu/intel: Export number of counters in caps
Status: O
Content-Length: 1263
Lines: 45

Export the number of generic and fixed counters in the core PMU in caps/
This helps users and tools with constructing groups. Will be also more
important with upcoming patches that can change the number of counters.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/core.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a19712e20..58e659bfc2d9 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2248,8 +2248,28 @@ static ssize_t max_precise_show(struct device *cdev,
 
 static DEVICE_ATTR_RO(max_precise);
 
+static ssize_t num_counter_show(struct device *cdev,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.num_counters);
+}
+
+static DEVICE_ATTR_RO(num_counter);
+
+static ssize_t num_counter_fixed_show(struct device *cdev,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.num_counters_fixed);
+}
+
+static DEVICE_ATTR_RO(num_counter_fixed);
+
 static struct attribute *x86_pmu_caps_attrs[] = {
 	&dev_attr_max_precise.attr,
+	&dev_attr_num_counter.attr,
+	&dev_attr_num_counter_fixed.attr,
 	NULL
 };
 
-- 
2.17.2


From 3dd5d6e2bc9ac53f826c251c68ce84fcc79a6872 Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak@linux.intel.com>
Date: Mon, 4 Feb 2019 16:36:40 -0800
Subject: [PATCH 2/6] x86/pmu/intel: Handle TSX with counter 3 on Skylake
Status: O
Content-Length: 8362
Lines: 247

Most of the code is from Peter Ziljstra at this point,
based on earlier code from AK.

On Skylake with recent microcode updates due to errata XXX
perfmon general purpose counter 3 can be corrupted when RTM transactions
are executed.

The microcode provides a new MSR to force disable RTM
(make all RTM transactions abort).

This patch adds the low level code to manage this MSR.
Depending on a global flag (/sys/devices/cpu/enable_all_counters)
schedule or not schedule events on generic counter 3.

When the flag is set, and an event uses counter 3, disable TSX
while the event is active.

This patch assumes that the kernel is using
RETPOLINE (or IBRS), otherwise speculative execution could
still corrupt counter 3 in very unlikely cases.

The enable_all_counters flag default is set to zero in this
patch. This default could be changed.

The trade offs for setting the option default are:

    Using 4 (or 8 with HT off) events in perf versus
    allowing RTM usage while perf is active.

    - Existing programs that use perf groups with 4 counters
    may not retrieve perfmon data anymore. Perf usages
    that use less than four (or 7 with HT off) counters
    are not impacted. Perf usages that don't use group
    will still work, but will see increase multiplexing.

    - TSX programs should not functionally break from
    forcing RTM to abort because they always need a valid
    fall back path. However they will see significantly
    lower performance if they rely on TSX for performance
    (all RTM transactions will run and only abort at the end),
    potentially slowing them down so much that it is
    equivalent to functional breakage.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
v2:
Use u8 instead of bool
Rename force_rtm_abort_active.
v3:
Use correct patch version that actually compiles.
v4:
Switch to Peter's implementation with some updates by AK.
Now the TFA state is checked for in enable_all,
and the extra mask is handled by get_constraint
Use a temporary constraint instead of modifying the globals.
---
 arch/x86/events/core.c             |  6 ++-
 arch/x86/events/intel/core.c       | 64 +++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h       | 10 ++++-
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  5 +++
 5 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 58e659bfc2d9..f5d1435c6071 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2252,7 +2252,11 @@ static ssize_t num_counter_show(struct device *cdev,
 				  struct device_attribute *attr,
 				  char *buf)
 {
-	return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.num_counters);
+	int num = x86_pmu.num_counters;
+	if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT) &&
+		perf_enable_all_counters && num > 0)
+		num--;
+	return snprintf(buf, PAGE_SIZE, "%d\n", num);
 }
 
 static DEVICE_ATTR_RO(num_counter);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index daafb893449b..b4162b4b0899 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1999,6 +1999,30 @@ static void intel_pmu_nhm_enable_all(int added)
 	intel_pmu_enable_all(added);
 }
 
+static void intel_skl_pmu_enable_all(int added)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	u64 val;
+
+	/*
+	 * The perf code is not expected to execute RTM instructions
+	 * (and also cannot misspeculate into them due to RETPOLINE
+	 * use), so PMC3 should be 'stable'; IOW the values we
+	 * just potentially programmed into it, should still be there.
+	 *
+	 * If we programmed PMC3, make sure to set TFA before we make
+	 * things go and possibly encounter RTM instructions.
+	 * Similarly, if PMC3 got unused, make sure to clear TFA.
+	 */
+	val = MSR_TFA_RTM_FORCE_ABORT * test_bit(3, cpuc->active_mask);
+	if (cpuc->tfa_shadow != val) {
+		cpuc->tfa_shadow = val;
+		wrmsrl(MSR_TSX_FORCE_ABORT, val);
+	}
+
+	intel_pmu_enable_all(added);
+}
+
 static void enable_counter_freeze(void)
 {
 	update_debugctlmsr(get_debugctlmsr() |
@@ -3345,6 +3369,34 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 	return c;
 }
 
+bool perf_enable_all_counters __read_mostly;
+
+/*
+ * On Skylake counter 3 may get corrupted when RTM is used.
+ * Either avoid counter 3, or disable RTM when counter 3 used.
+ */
+
+static struct event_constraint *
+skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+			  struct perf_event *event)
+{
+	struct event_constraint *c;
+
+	c = hsw_get_event_constraints(cpuc, idx, event);
+
+	if (!perf_enable_all_counters) {
+		cpuc->counter3_constraint = *c;
+		c = &cpuc->counter3_constraint;
+
+		/*
+		 * Without TFA we must not use PMC3.
+		 */
+		__clear_bit(3, c->idxmsk);
+	}
+
+	return c;
+}
+
 /*
  * Broadwell:
  *
@@ -4061,8 +4113,11 @@ static struct attribute *intel_pmu_caps_attrs[] = {
        NULL
 };
 
+DEVICE_BOOL_ATTR(enable_all_counters, 0644, perf_enable_all_counters);
+
 static struct attribute *intel_pmu_attrs[] = {
 	&dev_attr_freeze_on_smi.attr,
+	NULL,	/* May be overriden with enable_all_counters */
 	NULL,
 };
 
@@ -4543,9 +4598,16 @@ __init int intel_pmu_init(void)
 		/* all extra regs are per-cpu when HT is on */
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
+		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+			x86_pmu.enable_all = intel_skl_pmu_enable_all;
+			intel_pmu_attrs[1] = &dev_attr_enable_all_counters.attr.attr;
+			x86_pmu.get_event_constraints = skl_get_event_constraints;
+			/* Could add checking&warning for !RETPOLINE here */
+		} else {
+			x86_pmu.get_event_constraints = hsw_get_event_constraints;
+		}
 
 		x86_pmu.hw_config = hsw_hw_config;
-		x86_pmu.get_event_constraints = hsw_get_event_constraints;
 		extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
 			hsw_format_attr : nhm_format_attr;
 		extra_attr = merge_attr(extra_attr, skl_format_attr);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b7031bfc..2474ebfad961 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -70,7 +70,7 @@ struct event_constraint {
 #define PERF_X86_EVENT_EXCL_ACCT	0x0200 /* accounted EXCL event */
 #define PERF_X86_EVENT_AUTO_RELOAD	0x0400 /* use PEBS auto-reload */
 #define PERF_X86_EVENT_LARGE_PEBS	0x0800 /* use large PEBS */
-
+#define PERF_X86_EVENT_ABORT_TSX	0x1000 /* force abort TSX */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -242,6 +242,12 @@ struct cpu_hw_events {
 	struct intel_excl_cntrs		*excl_cntrs;
 	int excl_thread_id; /* 0 or 1 */
 
+	/*
+	 * Manage using counter 3 on Skylake with TSX.
+	 */
+	int				tfa_shadow;
+	struct event_constraint		counter3_constraint;
+
 	/*
 	 * AMD specific bits
 	 */
@@ -998,6 +1004,8 @@ static inline int is_ht_workaround_enabled(void)
 	return !!(x86_pmu.flags & PMU_FL_EXCL_ENABLED);
 }
 
+extern bool perf_enable_all_counters;
+
 #else /* CONFIG_CPU_SUP_INTEL */
 
 static inline void reserve_ds_buffers(void)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6d6122524711..981ff9479648 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -344,6 +344,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
 #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
+#define X86_FEATURE_TSX_FORCE_ABORT	(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 8e40c2446fd1..492b18720dba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -666,6 +666,11 @@
 
 #define MSR_IA32_TSC_DEADLINE		0x000006E0
 
+#define MSR_TSX_FORCE_ABORT		0x0000010F
+
+#define MSR_TFA_RTM_FORCE_ABORT_BIT	0
+#define MSR_TFA_RTM_FORCE_ABORT		BIT_ULL(MSR_TFA_RTM_FORCE_ABORT_BIT)
+
 /* P4/Xeon+ specific */
 #define MSR_IA32_MCG_EAX		0x00000180
 #define MSR_IA32_MCG_EBX		0x00000181
-- 
2.17.2


From edd4b30b62a70017a01289b17a810d5f50560ecf Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak@linux.intel.com>
Date: Mon, 4 Feb 2019 16:47:07 -0800
Subject: [PATCH 3/6] x86/pmu/intel: Add perf event attribute to control RTM
Status: O
Content-Length: 3824
Lines: 129

Add a "force_rtm_abort" attribute per perf event attribute
to allow user programs to opt in to use counter 3 and disable
TSX while the perf event is active.

Also add a "allow_rtm" attribute to allow programs to
make sure TSX is enabled during the measurement (e.g.
if they want to measure TSX itself)

This allows non root programs to override the defaults
for their needs.

This is also needed for correct semantics in guests,
so that the KVM PMU can set the correct default
for its guests.

We use config2 bits 0/1 in the core PMU for these
interfaces.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c | 32 +++++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h |  3 +++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index b4162b4b0899..0e24b827adf8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3371,6 +3371,12 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 
 bool perf_enable_all_counters __read_mostly;
 
+static unsigned merged_config2(struct perf_event *event)
+{
+	return (event->group_leader ? event->group_leader->attr.config2 : 0) |
+		event->attr.config2;
+}
+
 /*
  * On Skylake counter 3 may get corrupted when RTM is used.
  * Either avoid counter 3, or disable RTM when counter 3 used.
@@ -3381,10 +3387,19 @@ skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 			  struct perf_event *event)
 {
 	struct event_constraint *c;
+	u64 config2;
+	bool use_rtm;
 
 	c = hsw_get_event_constraints(cpuc, idx, event);
 
-	if (!perf_enable_all_counters) {
+	config2 = merged_config2(event);
+	/* When both flags are set ALLOW wins. */
+	if (!(config2 & (ALLOW_RTM|FORCE_RTM_ABORT)))
+		use_rtm = !perf_enable_all_counters;
+	else
+		use_rtm = config2 & ALLOW_RTM;
+
+	if (use_rtm) {
 		cpuc->counter3_constraint = *c;
 		c = &cpuc->counter3_constraint;
 
@@ -3645,6 +3660,9 @@ PMU_FORMAT_ATTR(ldlat, "config1:0-15");
 
 PMU_FORMAT_ATTR(frontend, "config1:0-23");
 
+PMU_FORMAT_ATTR(force_rtm_abort, "config2:0");
+PMU_FORMAT_ATTR(allow_rtm, "config2:1");
+
 static struct attribute *intel_arch3_formats_attr[] = {
 	&format_attr_event.attr,
 	&format_attr_umask.attr,
@@ -3680,6 +3698,12 @@ static struct attribute *skl_format_attr[] = {
 	NULL,
 };
 
+static struct attribute *skl_extra_format_attr[] = {
+	&format_attr_force_rtm_abort.attr,
+	&format_attr_allow_rtm.attr,
+	NULL,
+};
+
 static __initconst const struct x86_pmu core_pmu = {
 	.name			= "core",
 	.handle_irq		= x86_pmu_handle_irq,
@@ -4148,6 +4172,7 @@ __init int intel_pmu_init(void)
 	struct attribute **mem_attr = NULL;
 	struct attribute **tsx_attr = NULL;
 	struct attribute **to_free = NULL;
+	struct attribute **to_free2 = NULL;
 	union cpuid10_edx edx;
 	union cpuid10_eax eax;
 	union cpuid10_ebx ebx;
@@ -4619,6 +4644,10 @@ __init int intel_pmu_init(void)
 			boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
 		pr_cont("Skylake events, ");
 		name = "skylake";
+		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+			extra_attr = merge_attr(extra_attr, skl_extra_format_attr);
+			to_free2 = extra_attr;
+		}
 		break;
 
 	default:
@@ -4732,6 +4761,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
 
 	kfree(to_free);
+	kfree(to_free2);
 	return 0;
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 2474ebfad961..b4eecd9316f1 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -776,6 +776,9 @@ int x86_pmu_hw_config(struct perf_event *event);
 
 void x86_pmu_disable_all(void);
 
+#define FORCE_RTM_ABORT BIT(0)
+#define ALLOW_RTM	BIT(1)
+
 static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
 					  u64 enable_mask)
 {
-- 
2.17.2


From f0b5b4dad741ade52b5b787c19aff9f3ad7a2912 Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak@linux.intel.com>
Date: Fri, 28 Sep 2018 16:03:06 -0700
Subject: [PATCH 4/6] perf stat: Make all existing groups weak
Status: O
Content-Length: 1268
Lines: 51

Now that we may only have three counters make the --topdown and
the --transaction groups weak, so that they still work.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e587808591e8..c94f5ed135f1 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -101,7 +101,7 @@ static const char *transaction_attrs = {
 	"cpu/tx-start/,"
 	"cpu/el-start/,"
 	"cpu/cycles-ct/"
-	"}"
+	"}:W"
 };
 
 /* More limited version when the CPU does not have all events. */
@@ -112,7 +112,7 @@ static const char * transaction_limited_attrs = {
 	"cycles,"
 	"cpu/cycles-t/,"
 	"cpu/tx-start/"
-	"}"
+	"}:W"
 };
 
 static const char * topdown_attrs[] = {
@@ -999,7 +999,7 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	}
 	attr[i - off] = NULL;
 
-	*str = malloc(len + 1 + 2);
+	*str = malloc(len + 1 + 2 + 2);
 	if (!*str)
 		return -1;
 	s = *str;
@@ -1016,6 +1016,8 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	}
 	if (use_group) {
 		s[-1] = '}';
+		*s++ = ':';
+		*s++ = 'W';
 		*s = 0;
 	} else
 		s[-1] = 0;
-- 
2.17.2


From 5cf21ef79d7930e410c9966293c3f4382c8dc8ad Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak@linux.intel.com>
Date: Fri, 28 Sep 2018 16:04:08 -0700
Subject: [PATCH 5/6] perf stat: Don't count EL for --transaction with three
 counters
Status: O
Content-Length: 3235
Lines: 103

When the system only has three counters HLE (EL) is also not
available. Detect that there are only three counters and
then automatically disable el-starts for --transaction.
This avoids event multiplexing in this situation with no loss
of information.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 30 ++++++++++++++++++++++--------
 tools/perf/util/pmu.c     | 10 ++++++++++
 tools/perf/util/pmu.h     |  1 +
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c94f5ed135f1..59a5bf0389b7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -104,6 +104,18 @@ static const char *transaction_attrs = {
 	"}:W"
 };
 
+static const char *transaction_noel_attrs = {
+	"task-clock,"
+	"{"
+	"instructions,"
+	"cycles,"
+	"cpu/cycles-t/,"
+	"cpu/tx-start/,"
+	"cpu/cycles-ct/"
+	"}:W"
+};
+
+
 /* More limited version when the CPU does not have all events. */
 static const char * transaction_limited_attrs = {
 	"task-clock,"
@@ -1160,6 +1172,8 @@ static int add_default_attributes(void)
 		return 0;
 
 	if (transaction_run) {
+		const char *attrs = transaction_limited_attrs;
+
 		/* Handle -T as -M transaction. Once platform specific metrics
 		 * support has been added to the json files, all archictures
 		 * will use this approach. To determine transaction support
@@ -1173,16 +1187,16 @@ static int add_default_attributes(void)
 		}
 
 		if (pmu_have_event("cpu", "cycles-ct") &&
-		    pmu_have_event("cpu", "el-start"))
-			err = parse_events(evsel_list, transaction_attrs,
-					   &errinfo);
-		else
-			err = parse_events(evsel_list,
-					   transaction_limited_attrs,
-					   &errinfo);
+		    pmu_have_event("cpu", "el-start")) {
+			if (pmu_num_counters("cpu") == 3)
+				attrs = transaction_noel_attrs;
+			else
+				attrs = transaction_attrs;
+		}
+		err = parse_events(evsel_list, attrs, &errinfo);
 		if (err) {
 			fprintf(stderr, "Cannot set up transaction events\n");
-			parse_events_print_error(&errinfo, transaction_attrs);
+			parse_events_print_error(&errinfo, attrs);
 			return -1;
 		}
 		return 0;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 11a234740632..5e17409b7ff6 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1495,3 +1495,13 @@ int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt,
 	va_end(args);
 	return ret;
 }
+
+int pmu_num_counters(const char *pname)
+{
+	unsigned long num;
+	struct perf_pmu *pmu = perf_pmu__find(pname);
+
+	if (pmu && perf_pmu__scan_file(pmu, "caps/num_counter", "%ld", &num) == 1)
+		return num;
+	return -1;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 76fecec7b3f9..6e772243fd1d 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -85,6 +85,7 @@ struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);
 void print_pmu_events(const char *event_glob, bool name_only, bool quiet,
 		      bool long_desc, bool details_flag);
 bool pmu_have_event(const char *pname, const char *name);
+int pmu_num_counters(const char *);
 
 int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt, ...) __scanf(3, 4);
 
-- 
2.17.2


From 8fb4335a313a71913ae398f3cc1dcd86306553b7 Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak@linux.intel.com>
Date: Fri, 1 Feb 2019 18:17:19 -0800
Subject: [PATCH 6/6] kvm: vmx: Support TSX_FORCE_ABORT in KVM guests
Status: O
Content-Length: 7884
Lines: 220

Recent microcode for Skylake added a new CPUID bit and MSR to control
TSX aborting and enabling PMU counter 3. This patch adds support
for controlling counter 3 from KVM guests.

Intercept the MSR and set the correct attribute on the perf events
used by the virtualized PMU. TSX is only disabled when counter
3 is actually used by the host PMU. The guest can use all four
counters without multiplexing.

Also export the CPUID bit.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
v2:
Redo implementation. We already intercept the MSR now and
pass the correct attribute to the host perf. The MSR is only
active when counter 3 is actually used, but the guest can
use all counters without multiplexing.
v3:
Also set ALLOW_RTM to completely control guest state independent
of host.
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            |  3 ++-
 arch/x86/kvm/pmu.c              | 19 ++++++++++++-------
 arch/x86/kvm/pmu.h              |  6 ++++--
 arch/x86/kvm/pmu_amd.c          |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c    | 20 ++++++++++++++++++--
 6 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4660ce90de7f..75f098142672 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -470,6 +470,7 @@ struct kvm_pmu {
 	u64 global_ctrl_mask;
 	u64 reserved_bits;
 	u8 version;
+	u8 force_tsx_abort;
 	struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
 	struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
 	struct irq_work irq_work;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index bbffa6c54697..2570b17ac372 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -409,7 +409,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
-		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP);
+		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
+		F(TSX_FORCE_ABORT);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 58ead7db71a3..427d9fc6dbda 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -99,7 +99,8 @@ static void kvm_perf_overflow_intr(struct perf_event *perf_event,
 static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 				  unsigned config, bool exclude_user,
 				  bool exclude_kernel, bool intr,
-				  bool in_tx, bool in_tx_cp)
+				  bool in_tx, bool in_tx_cp,
+				  int tsx_force_abort)
 {
 	struct perf_event *event;
 	struct perf_event_attr attr = {
@@ -111,6 +112,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 		.exclude_user = exclude_user,
 		.exclude_kernel = exclude_kernel,
 		.config = config,
+		.config2 = tsx_force_abort,
 	};
 
 	attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
@@ -140,7 +142,8 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 	clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
 }
 
-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
+void reprogram_gp_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc,
+			  u64 eventsel)
 {
 	unsigned config, type = PERF_TYPE_RAW;
 	u8 event_select, unit_mask;
@@ -178,11 +181,13 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 			      !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
 			      eventsel & ARCH_PERFMON_EVENTSEL_INT,
 			      (eventsel & HSW_IN_TX),
-			      (eventsel & HSW_IN_TX_CHECKPOINTED));
+			      (eventsel & HSW_IN_TX_CHECKPOINTED),
+			      pmu->force_tsx_abort);
 }
 EXPORT_SYMBOL_GPL(reprogram_gp_counter);
 
-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
+void reprogram_fixed_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc,
+			     u8 ctrl, int idx)
 {
 	unsigned en_field = ctrl & 0x3;
 	bool pmi = ctrl & 0x8;
@@ -196,7 +201,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
 			      kvm_x86_ops->pmu_ops->find_fixed_event(idx),
 			      !(en_field & 0x2), /* exclude user */
 			      !(en_field & 0x1), /* exclude kernel */
-			      pmi, false, false);
+			      pmi, false, false, pmu->force_tsx_abort);
 }
 EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
 
@@ -208,12 +213,12 @@ void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx)
 		return;
 
 	if (pmc_is_gp(pmc))
-		reprogram_gp_counter(pmc, pmc->eventsel);
+		reprogram_gp_counter(pmu, pmc, pmc->eventsel);
 	else {
 		int idx = pmc_idx - INTEL_PMC_IDX_FIXED;
 		u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
 
-		reprogram_fixed_counter(pmc, ctrl, idx);
+		reprogram_fixed_counter(pmu, pmc, ctrl, idx);
 	}
 }
 EXPORT_SYMBOL_GPL(reprogram_counter);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index ba8898e1a854..7c31f38be1ee 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -102,8 +102,10 @@ static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
 	return NULL;
 }
 
-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
+void reprogram_gp_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc,
+			  u64 eventsel);
+void reprogram_fixed_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc, u8 ctrl,
+			     int fixed_idx);
 void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
 
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 1495a735b38e..1de7fc73a634 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -250,7 +250,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (data == pmc->eventsel)
 			return 0;
 		if (!(data & pmu->reserved_bits)) {
-			reprogram_gp_counter(pmc, data);
+			reprogram_gp_counter(pmu, pmc, data);
 			return 0;
 		}
 	}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 5ab4a364348e..a7d330be2c8f 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -49,7 +49,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 		if (old_ctrl == new_ctrl)
 			continue;
 
-		reprogram_fixed_counter(pmc, new_ctrl, i);
+		reprogram_fixed_counter(pmu, pmc, new_ctrl, i);
 	}
 
 	pmu->fixed_ctr_ctrl = data;
@@ -148,6 +148,9 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	int ret;
 
 	switch (msr) {
+	case MSR_TSX_FORCE_ABORT:
+		return guest_cpuid_has(vcpu, X86_FEATURE_TSX_FORCE_ABORT);
+
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 	case MSR_CORE_PERF_GLOBAL_STATUS:
 	case MSR_CORE_PERF_GLOBAL_CTRL:
@@ -182,6 +185,11 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		*data = pmu->global_ovf_ctrl;
 		return 0;
+	case MSR_TSX_FORCE_ABORT:
+		*data = pmu->force_tsx_abort;
+		if (*data == 2)
+			*data = 0;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_fixed_pmc(pmu, msr))) {
@@ -234,6 +242,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 0;
 		}
 		break;
+	case MSR_TSX_FORCE_ABORT:
+		if (data & ~1ULL)
+			break;
+		/* Will take effect at next enable */
+		if (!data)
+			data = BIT(1); /* Transform into ALLOW_RTM perf ABI */
+		pmu->force_tsx_abort = data;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_fixed_pmc(pmu, msr))) {
@@ -245,7 +261,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if (data == pmc->eventsel)
 				return 0;
 			if (!(data & pmu->reserved_bits)) {
-				reprogram_gp_counter(pmc, data);
+				reprogram_gp_counter(pmu, pmc, data);
 				return 0;
 			}
 		}
-- 
2.17.2


                 reply	other threads:[~2019-02-08  0:14 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190208001417.GA16922@tassilo.jf.intel.com \
    --to=ak@linux.intel.com \
    --cc=speck@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).