LKML Archive mirror
 help / color / mirror / Atom feed
* (no subject)
@ 2013-10-02  0:15 Sukadev Bhattiprolu
  2013-10-02  0:15 ` [PATCH 1/9][v5] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

Subject: [PATCH 0/9][v5] powerpc/perf: Export memory hierarchy level in Power7/8.

Power7 and Power8 processors save the memory hierarchy level (eg: L2, L3)
from which a load or store instruction was satisfied. Export this hierarchy
information to the user via the perf_mem_data_src object.

Thanks to input from Stephane Eranian, Michael Ellerman, Michael Neuling
and Anshuman Khandual.

Sukadev Bhattiprolu (9):
  powerpc/perf: Rename Power8 macros to start with PME
  powerpc/perf: Export Power8 generic events in sysfs
  powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs.
  powerpc: Rename branch_opcode() to instr_opcode()
  powerpc: implement is_instr_load_store().
  powerpc/perf: Define big-endian version of perf_mem_data_src
  powerpc/perf: Export Power8 memory hierarchy info to user space.
  powerpc/perf: Export Power7 memory hierarchy info to user space.
  powerpc/perf: Update perf-mem man page for Power

 arch/powerpc/include/asm/code-patching.h     |    1 +
 arch/powerpc/include/asm/perf_event_server.h |    2 +
 arch/powerpc/lib/code-patching.c             |   96 ++++++++++++++++++++++-
 arch/powerpc/perf/core-book3s.c              |   11 +++
 arch/powerpc/perf/power7-pmu.c               |   94 +++++++++++++++++++++++
 arch/powerpc/perf/power8-pmu.c               |  105 +++++++++++++++++++++++---
 include/uapi/linux/perf_event.h              |   58 ++++++++++++++
 tools/perf/Documentation/perf-mem.txt        |   11 +++
 tools/perf/util/include/asm/byteorder.h      |    1 +
 9 files changed, 364 insertions(+), 15 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/9][v5] powerpc/perf: Rename Power8 macros to start with PME
  2013-10-02  0:15 Sukadev Bhattiprolu
@ 2013-10-02  0:15 ` Sukadev Bhattiprolu
  2013-10-03  4:02   ` Michael Ellerman
  2013-10-02  0:15 ` [PATCH 2/9][v5] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

We use helpers like GENERIC_EVENT_ATTR() to list the generic events in
sysfs. To avoid name collisions, GENERIC_EVENT_ATTR() requires the perf
event macros to start with PME.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 2ee4a70..976c203 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -18,12 +18,12 @@
 /*
  * Some power8 event codes.
  */
-#define PM_CYC				0x0001e
-#define PM_GCT_NOSLOT_CYC		0x100f8
-#define PM_CMPLU_STALL			0x4000a
-#define PM_INST_CMPL			0x00002
-#define PM_BRU_FIN			0x10068
-#define PM_BR_MPRED_CMPL		0x400f6
+#define PME_PM_CYC				0x0001e
+#define PME_PM_GCT_NOSLOT_CYC			0x100f8
+#define PME_PM_CMPLU_STALL			0x4000a
+#define PME_PM_INST_CMPL			0x00002
+#define PME_PM_BRU_FIN				0x10068
+#define PME_PM_BR_MPRED_CMPL			0x400f6
 
 
 /*
@@ -550,12 +550,12 @@ static const struct attribute_group *power8_pmu_attr_groups[] = {
 };
 
 static int power8_generic_events[] = {
-	[PERF_COUNT_HW_CPU_CYCLES] =			PM_CYC,
-	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PM_GCT_NOSLOT_CYC,
-	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =	PM_CMPLU_STALL,
-	[PERF_COUNT_HW_INSTRUCTIONS] =			PM_INST_CMPL,
-	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PM_BRU_FIN,
-	[PERF_COUNT_HW_BRANCH_MISSES] =			PM_BR_MPRED_CMPL,
+	[PERF_COUNT_HW_CPU_CYCLES] =			PME_PM_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PME_PM_GCT_NOSLOT_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =	PME_PM_CMPLU_STALL,
+	[PERF_COUNT_HW_INSTRUCTIONS] =			PME_PM_INST_CMPL,
+	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PME_PM_BRU_FIN,
+	[PERF_COUNT_HW_BRANCH_MISSES] =			PME_PM_BR_MPRED_CMPL,
 };
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/9][v5] powerpc/perf: Export Power8 generic events in sysfs
  2013-10-02  0:15 Sukadev Bhattiprolu
  2013-10-02  0:15 ` [PATCH 1/9][v5] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
@ 2013-10-02  0:15 ` Sukadev Bhattiprolu
  2013-10-03  4:04   ` Michael Ellerman
  2013-10-02  0:15 ` [PATCH 3/9][v5] powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs Sukadev Bhattiprolu
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

Export generic perf events for Power8 in sysfs.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 976c203..b991b2e 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -510,6 +510,28 @@ static void power8_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
 }
 
+GENERIC_EVENT_ATTR(cpu-cyles,			PM_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-frontend,	PM_GCT_NOSLOT_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-backend,	PM_CMPLU_STALL);
+GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
+GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
+GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);
+
+static struct attribute *power8_events_attr[] = {
+	GENERIC_EVENT_PTR(PM_CYC),
+	GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
+	GENERIC_EVENT_PTR(PM_CMPLU_STALL),
+	GENERIC_EVENT_PTR(PM_INST_CMPL),
+	GENERIC_EVENT_PTR(PM_BRU_FIN),
+	GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+	NULL
+};
+
+static struct attribute_group power8_pmu_events_group = {
+	.name = "events",
+	.attrs = power8_events_attr,
+};
+
 PMU_FORMAT_ATTR(event,		"config:0-49");
 PMU_FORMAT_ATTR(pmcxsel,	"config:0-7");
 PMU_FORMAT_ATTR(mark,		"config:8");
@@ -546,6 +568,7 @@ struct attribute_group power8_pmu_format_group = {
 
 static const struct attribute_group *power8_pmu_attr_groups[] = {
 	&power8_pmu_format_group,
+	&power8_pmu_events_group,
 	NULL,
 };
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/9][v5] powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs.
  2013-10-02  0:15 Sukadev Bhattiprolu
  2013-10-02  0:15 ` [PATCH 1/9][v5] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
  2013-10-02  0:15 ` [PATCH 2/9][v5] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
@ 2013-10-02  0:15 ` Sukadev Bhattiprolu
  2013-10-02  0:15 ` [PATCH 4/9][v5] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

The perf event PM_MRK_GRP_CMPL is useful in analyzing memory hierarchy
of applications.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index b991b2e..fc7ba38 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -24,6 +24,7 @@
 #define PME_PM_INST_CMPL			0x00002
 #define PME_PM_BRU_FIN				0x10068
 #define PME_PM_BR_MPRED_CMPL			0x400f6
+#define PME_PM_MRK_GRP_CMPL			0x40130
 
 
 /*
@@ -517,6 +518,8 @@ GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
 GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
 GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);
 
+POWER_EVENT_ATTR(PM_MRK_GRP_CMPL,		PM_MRK_GRP_CMPL);
+
 static struct attribute *power8_events_attr[] = {
 	GENERIC_EVENT_PTR(PM_CYC),
 	GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
@@ -524,6 +527,8 @@ static struct attribute *power8_events_attr[] = {
 	GENERIC_EVENT_PTR(PM_INST_CMPL),
 	GENERIC_EVENT_PTR(PM_BRU_FIN),
 	GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+
+	POWER_EVENT_PTR(PM_MRK_GRP_CMPL),
 	NULL
 };
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/9][v5] powerpc: Rename branch_opcode() to instr_opcode()
  2013-10-02  0:15 Sukadev Bhattiprolu
                   ` (2 preceding siblings ...)
  2013-10-02  0:15 ` [PATCH 3/9][v5] powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs Sukadev Bhattiprolu
@ 2013-10-02  0:15 ` Sukadev Bhattiprolu
  2013-10-02  0:15 ` [PATCH 5/9][v5] powerpc: implement is_instr_load_store() Sukadev Bhattiprolu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

The logic used in branch_opcode() to extract the opcode for an instruction
applies to non branch instructions also. So rename to instr_opcode().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/lib/code-patching.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..2bc9db3 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -72,19 +72,19 @@ unsigned int create_cond_branch(const unsigned int *addr,
 	return instruction;
 }
 
-static unsigned int branch_opcode(unsigned int instr)
+static unsigned int instr_opcode(unsigned int instr)
 {
 	return (instr >> 26) & 0x3F;
 }
 
 static int instr_is_branch_iform(unsigned int instr)
 {
-	return branch_opcode(instr) == 18;
+	return instr_opcode(instr) == 18;
 }
 
 static int instr_is_branch_bform(unsigned int instr)
 {
-	return branch_opcode(instr) == 16;
+	return instr_opcode(instr) == 16;
 }
 
 int instr_is_relative_branch(unsigned int instr)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/9][v5] powerpc: implement is_instr_load_store().
  2013-10-02  0:15 Sukadev Bhattiprolu
                   ` (3 preceding siblings ...)
  2013-10-02  0:15 ` [PATCH 4/9][v5] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
@ 2013-10-02  0:15 ` Sukadev Bhattiprolu
  2013-10-03  5:35   ` Michael Ellerman
  2013-10-02  0:15 ` [PATCH 6/9][v5] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

Implement is_instr_load_store() to detect whether a given instruction
is one of the fixed-point or floating-point load/store instructions.
This function will be used in a follow-on patch to save memory hierarchy
information of the load/store.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/code-patching.h |    1 +
 arch/powerpc/lib/code-patching.c         |   90 ++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..3e47fe0 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -34,6 +34,7 @@ int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
 unsigned long branch_target(const unsigned int *instr);
 unsigned int translate_branch(const unsigned int *dest,
 			      const unsigned int *src);
+int instr_is_load_store(const unsigned int *instr);
 
 static inline unsigned long ppc_function_entry(void *func)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 2bc9db3..7e5dc6f 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -159,6 +159,96 @@ unsigned int translate_branch(const unsigned int *dest, const unsigned int *src)
 	return 0;
 }
 
+static unsigned int load_store_xval(const unsigned int instr)
+{
+	return (instr >> 1) & 0x3FF;	/* bits 21..30 */
+}
+
+/*
+ * Values of bits 21:30 of Fixed-point and Floating-point Load and Store
+ * instructions.
+ *
+ * Reference:	PowerISA_V2.06B_Public.pdf, Sections 3.3.2 through 3.3.6 and
+ *		4.6.2 through 4.6.4.
+ */
+#define	x_lbzx		87
+#define	x_lbzux		119
+#define	x_lhzx		279
+#define	x_lhzux		311
+#define	x_lhax		343
+#define	x_lhaux		375
+#define	x_lwzx		23
+#define	x_lwzux		55
+#define	x_lwax		341
+#define	x_lwaux		373
+#define	x_ldx		21
+#define	x_ldux		53
+#define	x_stbx		215
+#define	x_stbux		247
+#define	x_sthx		407
+#define	x_sthux		439
+#define	x_stwx		151
+#define	x_stwux		183
+#define	x_stdx		149
+#define	x_stdux		181
+#define	x_lhbrx		790
+#define	x_lwbrx		534
+#define	x_sthbrx	918
+#define	x_stwbrx	662
+#define	x_ldbrx		532
+#define	x_stdbrx	660
+#define	x_lswi		597
+#define	x_lswx		533
+#define	x_stswi		725
+#define	x_stswx		661
+#define	x_lfsx		535
+#define	x_lfsux		567
+#define	x_lfdx		599
+#define	x_lfdux		631
+#define	x_lfiwax	855
+#define	x_lfiwzx	887
+#define	x_stfsx		663
+#define	x_stfsux	695
+#define	x_stfdx		727
+#define	x_stfdux	759
+#define	x_stfiwax	983
+#define	x_lfdpx		791
+#define	x_stfdpx	919
+
+static unsigned int x_form_load_store[] = {
+	x_lbzx,     x_lbzux,    x_lhzx,     x_lhzux,    x_lhax,
+	x_lhaux,    x_lwzx,     x_lwzux,    x_lwax,     x_lwaux,
+	x_ldx,      x_ldux,     x_stbx,     x_stbux,    x_sthx,
+	x_sthux,    x_stwx,     x_stwux,    x_stdx,     x_stdux,
+	x_lhbrx,    x_lwbrx,    x_sthbrx,   x_stwbrx,   x_ldbrx,
+	x_stdbrx,   x_lswi,     x_lswx,     x_stswi,    x_stswx,
+	x_lfsx,     x_lfsux,    x_lfdx,     x_lfdux,    x_lfiwax,
+	x_lfiwzx,   x_stfsx,    x_stfsux,   x_stfdx,    x_stfdux,
+	x_stfiwax,  x_lfdpx,    x_stfdpx
+};
+
+int instr_is_load_store(const unsigned int *instr)
+{
+	unsigned int op;
+	int i, n;
+
+	op = instr_opcode(*instr);
+
+	if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
+		return 1;
+
+	if (op == 31) {
+		n = sizeof(x_form_load_store) / sizeof(int);
+
+		for (i = 0; i < n; i++) {
+			if (x_form_load_store[i] == load_store_xval(*instr))
+				return 1;
+		}
+	}
+
+	return 0;
+}
+
 
 #ifdef CONFIG_CODE_PATCHING_SELFTEST
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/9][v5] powerpc/perf: Define big-endian version of perf_mem_data_src
  2013-10-02  0:15 Sukadev Bhattiprolu
                   ` (4 preceding siblings ...)
  2013-10-02  0:15 ` [PATCH 5/9][v5] powerpc: implement is_instr_load_store() Sukadev Bhattiprolu
@ 2013-10-02  0:15 ` Sukadev Bhattiprolu
  2013-10-03  5:39   ` Michael Ellerman
  2013-10-02  0:15 ` [PATCH 7/9][v5] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms, we also need a big-endian represenation of perf_mem_data_src.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---

Changelog [v5]:
	- include <endian.h> in local byteorder.h

Changelog [v4]:
	- perf_event.h includes <byteorder.h> which pulls in the local
	  byteorder.h when building the perf tool. This local byteorder.h
	  leaves __LITTLE_ENDIAN and __BIG_ENDIAN undefined.
	  Include <endian.h> explicitly in the local byteorder.h.

Changelog [v2]:
	- [Vince Weaver, Michael Ellerman] No __KERNEL__ in uapi headers.

 include/uapi/linux/perf_event.h         |   58 +++++++++++++++++++++++++++++++
 tools/perf/util/include/asm/byteorder.h |    1 +
 2 files changed, 59 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ca1d90b..846f399 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -19,6 +19,50 @@
 #include <asm/byteorder.h>
 
 /*
+ * Kernel and userspace check for endianness in incompatible ways.
+ * In user space, <endian.h> defines both __BIG_ENDIAN and __LITTLE_ENDIAN
+ * but sets __BYTE_ORDER to one or the other. So user space uses checks are:
+ *
+ *	#if __BYTE_ORDER == __LITTLE_ENDIAN
+ *
+ * In the kernel, __BYTE_ORDER is undefined, so using the above check doesn't
+ * work. Further, kernel code assumes that exactly one of __BIG_ENDIAN and
+ * __LITTLE_ENDIAN is defined.  So the kernel checks are like:
+ *
+ *	#if defined(__LITTLE_ENDIAN)
+ *
+ * But we can't use that check in user space since __LITTLE_ENDIAN (and
+ * __BIG_ENDIAN) are always defined.
+ *
+ * Since some perf data structures depend on endianness _and_ are shared
+ * between kernel and user, perf needs its own notion of endian macros (at
+ * least until user and kernel endian checks converge).
+ */
+#define __PERF_LE	1234
+#define __PERF_BE	4321
+
+#if defined(__BYTE_ORDER)
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define __PERF_BYTE_ORDER	__PERF_LE
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define __PERF_BYTE_ORDER	__PERF_BE
+#endif
+
+#else /* __BYTE_ORDER */
+
+#if defined(__LITTLE_ENDIAN) && defined(__BIG_ENDIAN)
+#error "Cannot determine endianness"
+#elif defined(__LITTLE_ENDIAN)
+#define __PERF_BYTE_ORDER	__PERF_LE
+#elif defined(__BIG_ENDIAN)
+#define __PERF_BYTE_ORDER	__PERF_BE
+#endif
+
+
+#endif /* __BYTE_ORDER */
+
+/*
  * User-space ABI bits:
  */
 
@@ -695,6 +739,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT		(1U << 1)
 #define PERF_FLAG_PID_CGROUP		(1U << 2) /* pid=cgroup id, per-cpu mode only */
 
+#if __PERF_BYTE_ORDER == __PERF_LE
 union perf_mem_data_src {
 	__u64 val;
 	struct {
@@ -706,6 +751,19 @@ union perf_mem_data_src {
 			mem_rsvd:31;
 	};
 };
+#elif __PERF_BYTE_ORDER == __PERF_BE
+union perf_mem_data_src {
+	__u64 val;
+	struct {
+		__u64	mem_rsvd:31,
+			mem_dtlb:7,	/* tlb access */
+			mem_lock:2,	/* lock instr */
+			mem_snoop:5,	/* snoop mode */
+			mem_lvl:14,	/* memory hierarchy level */
+			mem_op:5;	/* type of opcode */
+	};
+};
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA		0x01 /* not available */
diff --git a/tools/perf/util/include/asm/byteorder.h b/tools/perf/util/include/asm/byteorder.h
index 2a9bdc0..7112913 100644
--- a/tools/perf/util/include/asm/byteorder.h
+++ b/tools/perf/util/include/asm/byteorder.h
@@ -1,2 +1,3 @@
 #include <asm/types.h>
 #include "../../../../include/uapi/linux/swab.h"
+#include <endian.h>
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/9][v5] powerpc/perf: Export Power8 memory hierarchy info to user space.
  2013-10-02  0:15 Sukadev Bhattiprolu
                   ` (5 preceding siblings ...)
  2013-10-02  0:15 ` [PATCH 6/9][v5] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
@ 2013-10-02  0:15 ` Sukadev Bhattiprolu
  2013-10-02  0:15 ` [PATCH 8/9][v5] powerpc/perf: Export Power7 " Sukadev Bhattiprolu
  2013-10-02  0:15 ` [PATCH 9/9][v5] powerpc/perf: Update perf-mem man page for Power Sukadev Bhattiprolu
  8 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

On Power8, the LDST field in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Fortunately, the memory hierarchy levels in Power8 map fairly easily
into the arch-neutral levels as described by the ldst_src_map[] table.

Usage:

	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

		For samples involving load/store instructions, the memory
		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
	# or

	perf record --data <application>
	perf report -D

		Sample records contain a 'data_src' field which encodes the
		memory hierarchy level: Eg: data_src 0x442 indicates
		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled. One alternative of sampling on
PM_MRK_GRP_CMPL and throwing away non-loads and non-store samples could
yield an inconsistent profile of the application.

As the precise semantics of 'perf mem -t load' or 'perf mem -t store' (which
require sampling only loads or only stores) cannot be implemented on Power,
we don't implement 'perf mem' on Power for now.

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changelog[v2]:
	Drop support for 'perf mem' for Power (use perf-record and perf-report
	directly)

 arch/powerpc/include/asm/perf_event_server.h |    2 +
 arch/powerpc/perf/core-book3s.c              |   11 ++++++
 arch/powerpc/perf/power8-pmu.c               |   53 ++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 3fd2f1b..27d2c83 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -38,6 +38,8 @@ struct power_pmu {
 	void            (*config_bhrb)(u64 pmu_bhrb_filter);
 	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
 	int		(*limited_pmc_event)(u64 event_id);
+	void		(*get_mem_data_src)(union perf_mem_data_src *dsrc,
+				struct pt_regs *regs);
 	u32		flags;
 	const struct attribute_group	**attr_groups;
 	int		n_generic;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index eeae308..5221ba1 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1696,6 +1696,13 @@ ssize_t power_events_sysfs_show(struct device *dev,
 	return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
 }
 
+static inline void power_get_mem_data_src(union perf_mem_data_src *dsrc,
+				struct pt_regs *regs)
+{
+	if  (ppmu->get_mem_data_src)
+		ppmu->get_mem_data_src(dsrc, regs);
+}
+
 struct pmu power_pmu = {
 	.pmu_enable	= power_pmu_enable,
 	.pmu_disable	= power_pmu_disable,
@@ -1777,6 +1784,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 			data.br_stack = &cpuhw->bhrb_stack;
 		}
 
+		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
+						ppmu->get_mem_data_src)
+			ppmu->get_mem_data_src(&data.data_src, regs);
+
 		if (perf_event_overflow(event, &data, regs))
 			power_pmu_stop(event, 0);
 	}
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index fc7ba38..ff73206 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -537,6 +537,58 @@ static struct attribute_group power8_pmu_events_group = {
 	.attrs = power8_events_attr,
 };
 
+#define POWER8_SIER_TYPE_SHIFT	15
+#define POWER8_SIER_TYPE_MASK	(0x7LL << POWER8_SIER_TYPE_SHIFT)
+
+#define POWER8_SIER_LDST_SHIFT	1
+#define POWER8_SIER_LDST_MASK	(0x7LL << POWER8_SIER_LDST_SHIFT)
+
+#define P(a, b)			PERF_MEM_S(a, b)
+#define PLH(a, b)		(P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+#define PSM(a, b)		(P(OP, STORE) | P(LVL, MISS) | P(a, b))
+
+/*
+ * Power8 interpretations:
+ * REM_CCE1: 1-hop indicates L2/L3 cache of a different core on same chip
+ * REM_CCE2: 2-hop indicates different chip or different node.
+ */
+static u64 ldst_src_map[] = {
+	/* 000 */	P(LVL, NA),
+
+	/* 001 */	PLH(LVL, L1),
+	/* 010 */	PLH(LVL, L2),
+	/* 011 */	PLH(LVL, L3),
+	/* 100 */	PLH(LVL, LOC_RAM),
+	/* 101 */	PLH(LVL, REM_CCE1),
+	/* 110 */	PLH(LVL, REM_CCE2),
+
+	/* 111 */	PSM(LVL, L1),
+};
+
+static inline bool is_load_store_inst(u64 sier)
+{
+	u64 val;
+	val = (sier & POWER8_SIER_TYPE_MASK) >> POWER8_SIER_TYPE_SHIFT;
+
+	/* 1 = load, 2 = store */
+	return val == 1 || val == 2;
+}
+
+static void power8_get_mem_data_src(union perf_mem_data_src *dsrc,
+			struct pt_regs *regs)
+{
+	u64 idx;
+	u64 sier;
+
+	sier = mfspr(SPRN_SIER);
+
+	if (is_load_store_inst(sier)) {
+		idx = (sier & POWER8_SIER_LDST_MASK) >> POWER8_SIER_LDST_SHIFT;
+
+		dsrc->val |= ldst_src_map[idx];
+	}
+}
+
 PMU_FORMAT_ATTR(event,		"config:0-49");
 PMU_FORMAT_ATTR(pmcxsel,	"config:0-7");
 PMU_FORMAT_ATTR(mark,		"config:8");
@@ -635,6 +687,7 @@ static struct power_pmu power8_pmu = {
 	.get_constraint		= power8_get_constraint,
 	.get_alternatives	= power8_get_alternatives,
 	.disable_pmc		= power8_disable_pmc,
+	.get_mem_data_src	= power8_get_mem_data_src,
 	.flags			= PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB,
 	.n_generic		= ARRAY_SIZE(power8_generic_events),
 	.generic_events		= power8_generic_events,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 8/9][v5] powerpc/perf: Export Power7 memory hierarchy info to user space.
  2013-10-02  0:15 Sukadev Bhattiprolu
                   ` (6 preceding siblings ...)
  2013-10-02  0:15 ` [PATCH 7/9][v5] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
@ 2013-10-02  0:15 ` Sukadev Bhattiprolu
  2013-10-02  0:15 ` [PATCH 9/9][v5] powerpc/perf: Update perf-mem man page for Power Sukadev Bhattiprolu
  8 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

On Power7, the DCACHE_SRC field in MMCRA register identifies the memory
hierarchy level (eg: L2, L3 etc) from which a data-cache miss for a
marked instruction was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
levels. However, since newer generation of the processor (i.e. Power8) uses
fewer levels than in Power7, we don't really need to define new hierarchy
levels just for Power7.

We instead, map as many levels as possible and approximate the rest. See
comments near dcache-src_map[] in the patch.

Usage:

	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

		For samples involving load/store instructions, the memory
		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
	# or

	perf record --data <application>
	perf report -D

		Sample records contain a 'data_src' field which encodes the
		memory hierarchy level: Eg: data_src 0x442 indicates
		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled.

As such, the precise semantics of 'perf mem -t load' or 'perf mem -t store'
(which require sampling only loads or only stores cannot be implemented on
Power. (Sampling on PM_MRK_GRP_CMPL and throwing away non-loads and non-store
samples could yield an inconsistent profile of the application).

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changelog[v4]:
	Drop support for 'perf mem' for Power (use perf-record and perf-report
	directly)

Changelog[v3]:
	[Michael Ellerman] If newer levels that we defined in [v2] are not
	needed for Power8, ignore the new levels for Power7 also, and
	approximate them.
	Separate the TLB level mapping to a separate patchset.

Changelog[v2]:
        [Stephane Eranian] Define new levels rather than ORing the L2 and L3
        with REM_CCE1 and REM_CCE2.
        [Stephane Eranian] allocate a bit PERF_MEM_XLVL_NA for architectures
        that don't use the ->mem_xlvl field.
        Insert the TLB patch ahead so the new TLB bits are contigous with
        existing TLB bits.

 arch/powerpc/perf/power7-pmu.c |   94 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 56c67bc..ddfa548 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -11,8 +11,10 @@
 #include <linux/kernel.h>
 #include <linux/perf_event.h>
 #include <linux/string.h>
+#include <linux/uaccess.h>
 #include <asm/reg.h>
 #include <asm/cputable.h>
+#include <asm/code-patching.h>
 
 /*
  * Bits in event code for POWER7
@@ -317,6 +319,97 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
 }
 
+#define POWER7_MMCRA_DCACHE_MISS	(0x1LL << 55)
+#define POWER7_MMCRA_DCACHE_SRC_SHIFT	51
+#define POWER7_MMCRA_DCACHE_SRC_MASK	(0xFLL << POWER7_MMCRA_DCACHE_SRC_SHIFT)
+
+#define P(a, b)		PERF_MEM_S(a, b)
+#define PLH(a, b)	(P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+/*
+ * Map the Power7 DCACHE_SRC field (bits 9..12) in MMCRA register to the
+ * architecture-neutral memory hierarchy levels. For the levels in Power7
+ * that don't map to the arch-neutral levels, approximate to nearest
+ * level.
+ *
+ *	1-hop:	indicates another core on the same chip (2.1 and 3.1 levels).
+ *	2-hops:	indicates a different chip on same or different node (remote
+ *		and distant levels).
+ *
+ * For consistency with this interpretation of the hops, we dont use
+ * the REM_RAM1 level below.
+ *
+ * The *SHR and *MOD states of the cache are ignored/not exported to user.
+ *
+ * ### Levels marked with ### in comments below are approximated
+ */
+static u64 dcache_src_map[] = {
+	PLH(LVL, L2),			/* 00: FROM_L2 */
+	PLH(LVL, L3),			/* 01: FROM_L3 */
+
+	P(LVL, NA),			/* 02: Reserved */
+	P(LVL, NA),			/* 03: Reserved */
+
+	PLH(LVL, REM_CCE1),		/* 04: FROM_L2.1_SHR ### */
+	PLH(LVL, REM_CCE1),		/* 05: FROM_L2.1_MOD ### */
+
+	PLH(LVL, REM_CCE1),		/* 06: FROM_L3.1_SHR ### */
+	PLH(LVL, REM_CCE1),		/* 07: FROM_L3.1_MOD ### */
+
+	PLH(LVL, REM_CCE2),		/* 08: FROM_RL2L3_SHR ### */
+	PLH(LVL, REM_CCE2),		/* 09: FROM_RL2L3_MOD ### */
+
+	PLH(LVL, REM_CCE2),		/* 10: FROM_DL2L3_SHR ### */
+	PLH(LVL, REM_CCE2),		/* 11: FROM_DL2L3_MOD ### */
+
+	PLH(LVL, LOC_RAM),		/* 12: FROM_LMEM */
+	PLH(LVL, REM_RAM2),		/* 13: FROM_RMEM ### */
+	PLH(LVL, REM_RAM2),		/* 14: FROM_DMEM */
+
+	P(LVL, NA),			/* 15: Reserved */
+};
+
+/*
+ * Determine the memory-hierarchy information (if applicable) for the
+ * instruction/address we are sampling. If we encountered a DCACHE_MISS,
+ * mmcra[DCACHE_SRC_MASK] specifies the memory level from which the operand
+ * was loaded.
+ *
+ * Otherwise, it is an L1-hit, provided the instruction was a load/store.
+ */
+static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
+			struct pt_regs *regs)
+{
+	u64 idx;
+	u64 mmcra = regs->dsisr;
+	u64 addr;
+	int ret;
+	unsigned int instr;
+
+	if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
+		idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
+		idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
+
+		dsrc->val |= dcache_src_map[idx];
+		return;
+	}
+
+	instr = 0;
+	addr = perf_instruction_pointer(regs);
+
+	if (is_kernel_addr(addr))
+		instr = *(unsigned int *)addr;
+	else {
+		pagefault_disable();
+		ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
+		pagefault_enable();
+		if (ret)
+			instr = 0;
+	}
+	if (instr && instr_is_load_store(&instr))
+		dsrc->val |= PLH(LVL, L1);
+}
+
+
 static int power7_generic_events[] = {
 	[PERF_COUNT_HW_CPU_CYCLES] =			PME_PM_CYC,
 	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PME_PM_GCT_NOSLOT_CYC,
@@ -437,6 +530,7 @@ static struct power_pmu power7_pmu = {
 	.get_constraint		= power7_get_constraint,
 	.get_alternatives	= power7_get_alternatives,
 	.disable_pmc		= power7_disable_pmc,
+	.get_mem_data_src	= power7_get_mem_data_src,
 	.flags			= PPMU_ALT_SIPR,
 	.attr_groups		= power7_pmu_attr_groups,
 	.n_generic		= ARRAY_SIZE(power7_generic_events),
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 9/9][v5] powerpc/perf: Update perf-mem man page for Power
  2013-10-02  0:15 Sukadev Bhattiprolu
                   ` (7 preceding siblings ...)
  2013-10-02  0:15 ` [PATCH 8/9][v5] powerpc/perf: Export Power7 " Sukadev Bhattiprolu
@ 2013-10-02  0:15 ` Sukadev Bhattiprolu
  8 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-02  0:15 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras, Anshuman Khandual

Add a few lines to the perf-mem man page to indicate:

	- its dependence on the mem-loads and mem-stores events

	- how to use the feature on Power architecture.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 tools/perf/Documentation/perf-mem.txt |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 888d511..f4881a0 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -18,6 +18,17 @@ from it, into perf.data. Perf record options are accepted and are passed through
 "perf mem -t <TYPE> report" displays the result. It invokes perf report with the
 right set of options to display a memory access profile.
 
+This command works on architectures that implement *mem-loads* and *mem-stores*
+perf events.
+
+The PowerPC architecture does not implement *mem-loads* and *mem-stores*
+events.  To get the memory hierarchy information for samples involving
+memory loads and stores, use a marked event like PM_MRK_GRP_CMPL.
+
+	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
+
+	perf report -n --mem-mode
+
 OPTIONS
 -------
 <command>...::
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/9][v5] powerpc/perf: Rename Power8 macros to start with PME
  2013-10-02  0:15 ` [PATCH 1/9][v5] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
@ 2013-10-03  4:02   ` Michael Ellerman
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2013-10-03  4:02 UTC (permalink / raw
  To: Sukadev Bhattiprolu
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

On Tue, Oct 01, 2013 at 05:15:02PM -0700, Sukadev Bhattiprolu wrote:
> We use helpers like GENERIC_EVENT_ATTR() to list the generic events in
> sysfs. To avoid name collisions, GENERIC_EVENT_ATTR() requires the perf
> event macros to start with PME.

It's a bit unfortunate, because they no longer match the documentation,
or any of the comments. Are we seeing actual name collisions with PM_,
or is it just a theoretical worry?

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/9][v5] powerpc/perf: Export Power8 generic events in sysfs
  2013-10-02  0:15 ` [PATCH 2/9][v5] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
@ 2013-10-03  4:04   ` Michael Ellerman
  2013-10-03 17:57     ` Sukadev Bhattiprolu
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Ellerman @ 2013-10-03  4:04 UTC (permalink / raw
  To: Sukadev Bhattiprolu
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

On Tue, Oct 01, 2013 at 05:15:03PM -0700, Sukadev Bhattiprolu wrote:
> Export generic perf events for Power8 in sysfs.
> 
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  arch/powerpc/perf/power8-pmu.c |   23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
> index 976c203..b991b2e 100644
> --- a/arch/powerpc/perf/power8-pmu.c
> +++ b/arch/powerpc/perf/power8-pmu.c
> @@ -510,6 +510,28 @@ static void power8_disable_pmc(unsigned int pmc, unsigned long mmcr[])
>  		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
>  }
>  
> +GENERIC_EVENT_ATTR(cpu-cyles,			PM_CYC);
> +GENERIC_EVENT_ATTR(stalled-cycles-frontend,	PM_GCT_NOSLOT_CYC);
> +GENERIC_EVENT_ATTR(stalled-cycles-backend,	PM_CMPLU_STALL);
> +GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
> +GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
> +GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);

And here you use PM_ not PME_ - I'm confused.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().
  2013-10-02  0:15 ` [PATCH 5/9][v5] powerpc: implement is_instr_load_store() Sukadev Bhattiprolu
@ 2013-10-03  5:35   ` Michael Ellerman
  2013-10-03 19:03     ` Sukadev Bhattiprolu
  2013-10-08 19:31     ` Sukadev Bhattiprolu
  0 siblings, 2 replies; 26+ messages in thread
From: Michael Ellerman @ 2013-10-03  5:35 UTC (permalink / raw
  To: Sukadev Bhattiprolu
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

On Tue, Oct 01, 2013 at 05:15:06PM -0700, Sukadev Bhattiprolu wrote:
> Implement is_instr_load_store() to detect whether a given instruction
> is one of the fixed-point or floating-point load/store instructions.
> This function will be used in a follow-on patch to save memory hierarchy
> information of the load/store.

The search over the array is a bit of a pity, especially as the worst
case penalises you when you haven't hit a load/store.

I think we can do better. If you look at the opcode maps, and in
particular the extended table for opcode 31, you'll see there's a
reasonable amount of structure.

The following is only valid for arch 2.06, ie. it will classify reserved
opcodes as being load/store, but I think that's fine for the moment. If
we need to use it somewhere in future we can update it. But we should
add a big comment saying it's only valid in that case.

Anyway, I think the following logic is all we need for opcode 31:

bool is_load_store(int ext_opcode)
{
        upper = ext_opcode >> 5;
        lower = ext_opcode & 0x1f;

        /* Short circuit as many misses as we can */
        if (lower < 3 || lower > 23)
            return false;

        if (lower == 3)
            if (upper >= 16)
                return true;

            return false;

        if (lower == 6)
            if (upper <= 1)
                return true;
            return false;

        if (lower == 7 || lower == 12)
            return true;

        if (lower >= 20) /* && lower <= 23 (implicit) */
            return true;

        return false;
}


Which is not pretty, but I think it's preferable to the full search over the
array.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/9][v5] powerpc/perf: Define big-endian version of perf_mem_data_src
  2013-10-02  0:15 ` [PATCH 6/9][v5] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
@ 2013-10-03  5:39   ` Michael Ellerman
  2013-10-03  6:20     ` Sukadev Bhattiprolu
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Michael Ellerman @ 2013-10-03  5:39 UTC (permalink / raw
  To: Sukadev Bhattiprolu
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

On Tue, Oct 01, 2013 at 05:15:07PM -0700, Sukadev Bhattiprolu wrote:
> perf_mem_data_src is an union that is initialized via the ->val field
> and accessed via the bitmap fields. For this to work on big endian
> platforms, we also need a big-endian represenation of perf_mem_data_src.
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index ca1d90b..846f399 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -19,6 +19,50 @@
>  #include <asm/byteorder.h>
>  
>  /*
> + * Kernel and userspace check for endianness in incompatible ways.
> + * In user space, <endian.h> defines both __BIG_ENDIAN and __LITTLE_ENDIAN
> + * but sets __BYTE_ORDER to one or the other. So user space uses checks are:


Why can't you use __BIG_ENDIAN_BITFIELD ?

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/9][v5] powerpc/perf: Define big-endian version of perf_mem_data_src
  2013-10-03  5:39   ` Michael Ellerman
@ 2013-10-03  6:20     ` Sukadev Bhattiprolu
  2013-10-03 15:27     ` Sukadev Bhattiprolu
  2013-10-05  3:53     ` Sukadev Bhattiprolu
  2 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-03  6:20 UTC (permalink / raw
  To: Michael Ellerman
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

Michael Ellerman [michael@ellerman.id.au] wrote:
| On Tue, Oct 01, 2013 at 05:15:07PM -0700, Sukadev Bhattiprolu wrote:
| > perf_mem_data_src is an union that is initialized via the ->val field
| > and accessed via the bitmap fields. For this to work on big endian
| > platforms, we also need a big-endian represenation of perf_mem_data_src.
| > 
| > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
| > index ca1d90b..846f399 100644
| > --- a/include/uapi/linux/perf_event.h
| > +++ b/include/uapi/linux/perf_event.h
| > @@ -19,6 +19,50 @@
| >  #include <asm/byteorder.h>
| >  
| >  /*
| > + * Kernel and userspace check for endianness in incompatible ways.
| > + * In user space, <endian.h> defines both __BIG_ENDIAN and __LITTLE_ENDIAN
| > + * but sets __BYTE_ORDER to one or the other. So user space uses checks are:
| 
| 
| Why can't you use __BIG_ENDIAN_BITFIELD ?

That macro is not available when building the perf tool - bc there is
a util/include/asm/byterorder.h which gets included instead of the
usual <asm/byteorder.h>.

| 
| cheers


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/9][v5] powerpc/perf: Define big-endian version of perf_mem_data_src
  2013-10-03  5:39   ` Michael Ellerman
  2013-10-03  6:20     ` Sukadev Bhattiprolu
@ 2013-10-03 15:27     ` Sukadev Bhattiprolu
  2013-10-05  3:53     ` Sukadev Bhattiprolu
  2 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-03 15:27 UTC (permalink / raw
  To: Michael Ellerman
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

Michael Ellerman [michael@ellerman.id.au] wrote:
| On Tue, Oct 01, 2013 at 05:15:07PM -0700, Sukadev Bhattiprolu wrote:
| > perf_mem_data_src is an union that is initialized via the ->val field
| > and accessed via the bitmap fields. For this to work on big endian
| > platforms, we also need a big-endian represenation of perf_mem_data_src.
| > 
| > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
| > index ca1d90b..846f399 100644
| > --- a/include/uapi/linux/perf_event.h
| > +++ b/include/uapi/linux/perf_event.h
| > @@ -19,6 +19,50 @@
| >  #include <asm/byteorder.h>
| >  
| >  /*
| > + * Kernel and userspace check for endianness in incompatible ways.
| > + * In user space, <endian.h> defines both __BIG_ENDIAN and __LITTLE_ENDIAN
| > + * but sets __BYTE_ORDER to one or the other. So user space uses checks are:
| 
| 
| Why can't you use __BIG_ENDIAN_BITFIELD ?

BTW, any clues on why there are so many different ways of checking endianness ?

Any standards related stuff or just evolution ?

Sukadev


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/9][v5] powerpc/perf: Export Power8 generic events in sysfs
  2013-10-03  4:04   ` Michael Ellerman
@ 2013-10-03 17:57     ` Sukadev Bhattiprolu
  2013-10-08  3:58       ` Michael Ellerman
  0 siblings, 1 reply; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-03 17:57 UTC (permalink / raw
  To: Michael Ellerman
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

Michael Ellerman [michael@ellerman.id.au] wrote:
| On Tue, Oct 01, 2013 at 05:15:03PM -0700, Sukadev Bhattiprolu wrote:
| > Export generic perf events for Power8 in sysfs.
| > 
| > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
| > Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
| > ---
| >  arch/powerpc/perf/power8-pmu.c |   23 +++++++++++++++++++++++
| >  1 file changed, 23 insertions(+)
| > 
| > diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
| > index 976c203..b991b2e 100644
| > --- a/arch/powerpc/perf/power8-pmu.c
| > +++ b/arch/powerpc/perf/power8-pmu.c
| > @@ -510,6 +510,28 @@ static void power8_disable_pmc(unsigned int pmc, unsigned long mmcr[])
| >  		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
| >  }
| >  
| > +GENERIC_EVENT_ATTR(cpu-cyles,			PM_CYC);
| > +GENERIC_EVENT_ATTR(stalled-cycles-frontend,	PM_GCT_NOSLOT_CYC);
| > +GENERIC_EVENT_ATTR(stalled-cycles-backend,	PM_CMPLU_STALL);
| > +GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
| > +GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
| > +GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);
| 
| And here you use PM_ not PME_ - I'm confused.

It is a bit confusing. The GENERIC_EVENT_ATTR() adds the PME_ prefix. I
kept this change minimal for now, since we will have to revisit this once
the Power8 events are finalized.

Sukadev


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().
  2013-10-03  5:35   ` Michael Ellerman
@ 2013-10-03 19:03     ` Sukadev Bhattiprolu
  2013-10-03 19:52       ` Tom Musta
  2013-10-08  4:00       ` Michael Ellerman
  2013-10-08 19:31     ` Sukadev Bhattiprolu
  1 sibling, 2 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-03 19:03 UTC (permalink / raw
  To: Michael Ellerman
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

Michael Ellerman [michael@ellerman.id.au] wrote:
| On Tue, Oct 01, 2013 at 05:15:06PM -0700, Sukadev Bhattiprolu wrote:
| > Implement is_instr_load_store() to detect whether a given instruction
| > is one of the fixed-point or floating-point load/store instructions.
| > This function will be used in a follow-on patch to save memory hierarchy
| > information of the load/store.
| 
| The search over the array is a bit of a pity, especially as the worst
| case penalises you when you haven't hit a load/store.

Agree. Will try this out.  This is certainly more efficient.

| 
| I think we can do better. If you look at the opcode maps, and in
| particular the extended table for opcode 31, you'll see there's a
| reasonable amount of structure.
| 
| The following is only valid for arch 2.06, ie. it will classify reserved
| opcodes as being load/store, but I think that's fine for the moment. If
| we need to use it somewhere in future we can update it. But we should
| add a big comment saying it's only valid in that case.
| 
| Anyway, I think the following logic is all we need for opcode 31:
| 
| bool is_load_store(int ext_opcode)

how about I call this is_load_store_2_06() and add a comment. Horrible
but minimizes chance of misuse.

| {
|         upper = ext_opcode >> 5;
|         lower = ext_opcode & 0x1f;
| 
|         /* Short circuit as many misses as we can */
|         if (lower < 3 || lower > 23)
|             return false;
| 
|         if (lower == 3)
|             if (upper >= 16)
|                 return true;
| 
|             return false;
| 
|         if (lower == 6)
|             if (upper <= 1)
|                 return true;
|             return false;
| 
|         if (lower == 7 || lower == 12)
|             return true;
| 
|         if (lower >= 20) /* && lower <= 23 (implicit) */
|             return true;
| 
|         return false;
| }
| 
| 
| Which is not pretty, but I think it's preferable to the full search over the
| array.
| 
| cheers


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().
  2013-10-03 19:03     ` Sukadev Bhattiprolu
@ 2013-10-03 19:52       ` Tom Musta
  2013-10-08  3:28         ` Michael Ellerman
  2013-10-08  4:00       ` Michael Ellerman
  1 sibling, 1 reply; 26+ messages in thread
From: Tom Musta @ 2013-10-03 19:52 UTC (permalink / raw
  To: Sukadev Bhattiprolu
  Cc: Michael Ellerman, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras,
	Arnaldo Carvalho de Melo, Anshuman Khandual

On 10/3/2013 2:03 PM, Sukadev Bhattiprolu wrote:
> Michael Ellerman [michael@ellerman.id.au] wrote:
<snip>
> |
> |         if (lower == 6)
> |             if (upper <= 1)
> |                 return true;
> |             return false;
> v
Note that this case covers the lvsl/lvsr instructions, which, despite their
names are not actually loads.  So you could eliminate this check and do
just a little bit better.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/9][v5] powerpc/perf: Define big-endian version of perf_mem_data_src
  2013-10-03  5:39   ` Michael Ellerman
  2013-10-03  6:20     ` Sukadev Bhattiprolu
  2013-10-03 15:27     ` Sukadev Bhattiprolu
@ 2013-10-05  3:53     ` Sukadev Bhattiprolu
  2 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-05  3:53 UTC (permalink / raw
  To: Michael Ellerman
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

Michael Ellerman [michael@ellerman.id.au] wrote:
| On Tue, Oct 01, 2013 at 05:15:07PM -0700, Sukadev Bhattiprolu wrote:
| > perf_mem_data_src is an union that is initialized via the ->val field
| > and accessed via the bitmap fields. For this to work on big endian
| > platforms, we also need a big-endian represenation of perf_mem_data_src.
| > 
| > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
| > index ca1d90b..846f399 100644
| > --- a/include/uapi/linux/perf_event.h
| > +++ b/include/uapi/linux/perf_event.h
| > @@ -19,6 +19,50 @@
| >  #include <asm/byteorder.h>
| >  
| >  /*
| > + * Kernel and userspace check for endianness in incompatible ways.
| > + * In user space, <endian.h> defines both __BIG_ENDIAN and __LITTLE_ENDIAN
| > + * but sets __BYTE_ORDER to one or the other. So user space uses checks are:
| 
| 
| Why can't you use __BIG_ENDIAN_BITFIELD ?


So, the perf tool overrides the <asm/byteorder.h> with a local version.

And since this local version is arch neutral, we can't excplicitly include
the endian headers like <asm/byteorder.h> does.

How about we do something like this (both kernel and tool seem to build
on both x86 and power).

Sukadev.

---
 include/uapi/linux/perf_event.h         |   16 ++++++++++++++++
 tools/perf/util/include/asm/byteorder.h |   27 +++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ca1d90b..dcfa74f 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -695,6 +695,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT		(1U << 1)
 #define PERF_FLAG_PID_CGROUP		(1U << 2) /* pid=cgroup id, per-cpu mode only */
 
+#if defined (__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
 	__u64 val;
 	struct {
@@ -706,6 +707,21 @@ union perf_mem_data_src {
 			mem_rsvd:31;
 	};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+	__u64 val;
+	struct {
+		__u64	mem_rsvd:31,
+			mem_dtlb:7,	/* tlb access */
+			mem_lock:2,	/* lock instr */
+			mem_snoop:5,	/* snoop mode */
+			mem_lvl:14,	/* memory hierarchy level */
+			mem_op:5;	/* type of opcode */
+	};
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA		0x01 /* not available */
diff --git a/tools/perf/util/include/asm/byteorder.h b/tools/perf/util/include/asm/byteorder.h
index 2a9bdc0..521a382 100644
--- a/tools/perf/util/include/asm/byteorder.h
+++ b/tools/perf/util/include/asm/byteorder.h
@@ -1,2 +1,29 @@
 #include <asm/types.h>
 #include "../../../../include/uapi/linux/swab.h"
+#include <endian.h>
+
+/*
+ * __LITTLE_ENDIAN_BITFIELD and __BIG_ENDIAN_BITFIELD are normally picked
+ * from <byteorder.h>. Since we override the default <byteorder.h>, define
+ * them explicitly here
+ */
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#define __LITTLE_ENDIAN_BITFIELD
+#endif
+
+#elif __BYTE_ORDER == __BIG_ENDIAN
+
+#ifndef __BIG_ENDIAN_BITFIELD
+#define __BIG_ENDIAN_BITFIELD
+#endif
+
+#else
+#error "Unknown endianness"
+#endif
+
+#if defined(__LITTLE_ENDIAN_BITFIELD) && defined(__BIG_ENDIAN_BITFIELD)
+#error 	Both __LITTLE_ENDIAN_BITFIELD and __BIG_ENDIAN_BITFIELD defined! \
+	Some perf data structures (eg: perf_mem_data_src) will be wrong.
+#endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().
  2013-10-03 19:52       ` Tom Musta
@ 2013-10-08  3:28         ` Michael Ellerman
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2013-10-08  3:28 UTC (permalink / raw
  To: Tom Musta
  Cc: Sukadev Bhattiprolu, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Arnaldo Carvalho de Melo, Anshuman Khandual

On Thu, 2013-10-03 at 14:52 -0500, Tom Musta wrote:
> On 10/3/2013 2:03 PM, Sukadev Bhattiprolu wrote:
> > Michael Ellerman [michael@ellerman.id.au] wrote:
> <snip>
> > |
> > |         if (lower == 6)
> > |             if (upper <= 1)
> > |                 return true;
> > |             return false;
> > v

> Note that this case covers the lvsl/lvsr instructions, which, despite their
> names are not actually loads.  So you could eliminate this check and do
> just a little bit better.

Yes you're right Tom, thanks for checking.

I saw "Load" in the name and that was good enough for me :)

cheers




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/9][v5] powerpc/perf: Export Power8 generic events in sysfs
  2013-10-03 17:57     ` Sukadev Bhattiprolu
@ 2013-10-08  3:58       ` Michael Ellerman
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2013-10-08  3:58 UTC (permalink / raw
  To: Sukadev Bhattiprolu
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

On Thu, Oct 03, 2013 at 10:57:57AM -0700, Sukadev Bhattiprolu wrote:
> Michael Ellerman [michael@ellerman.id.au] wrote:
> | On Tue, Oct 01, 2013 at 05:15:03PM -0700, Sukadev Bhattiprolu wrote:
> | > Export generic perf events for Power8 in sysfs.
> | > 
> | > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> | > Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> | > ---
> | >  arch/powerpc/perf/power8-pmu.c |   23 +++++++++++++++++++++++
> | >  1 file changed, 23 insertions(+)
> | > 
> | > diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
> | > index 976c203..b991b2e 100644
> | > --- a/arch/powerpc/perf/power8-pmu.c
> | > +++ b/arch/powerpc/perf/power8-pmu.c
> | > @@ -510,6 +510,28 @@ static void power8_disable_pmc(unsigned int pmc, unsigned long mmcr[])
> | >  		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
> | >  }
> | >  
> | > +GENERIC_EVENT_ATTR(cpu-cyles,			PM_CYC);
> | > +GENERIC_EVENT_ATTR(stalled-cycles-frontend,	PM_GCT_NOSLOT_CYC);
> | > +GENERIC_EVENT_ATTR(stalled-cycles-backend,	PM_CMPLU_STALL);
> | > +GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
> | > +GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
> | > +GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);
> | 
> | And here you use PM_ not PME_ - I'm confused.
> 
> It is a bit confusing. The GENERIC_EVENT_ATTR() adds the PME_ prefix.

So doesn't that give you PME_PM_CYC ?

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().
  2013-10-03 19:03     ` Sukadev Bhattiprolu
  2013-10-03 19:52       ` Tom Musta
@ 2013-10-08  4:00       ` Michael Ellerman
  1 sibling, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2013-10-08  4:00 UTC (permalink / raw
  To: Sukadev Bhattiprolu
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

On Thu, Oct 03, 2013 at 12:03:25PM -0700, Sukadev Bhattiprolu wrote:
> Michael Ellerman [michael@ellerman.id.au] wrote:
> | On Tue, Oct 01, 2013 at 05:15:06PM -0700, Sukadev Bhattiprolu wrote:
> | > Implement is_instr_load_store() to detect whether a given instruction
> | > is one of the fixed-point or floating-point load/store instructions.
> | > This function will be used in a follow-on patch to save memory hierarchy
> | > information of the load/store.
> | 
> | Anyway, I think the following logic is all we need for opcode 31:
> | 
> | bool is_load_store(int ext_opcode)
> 
> how about I call this is_load_store_2_06() and add a comment. Horrible
> but minimizes chance of misuse.

Actually it's is_opcode_31_load_store_2_06() - which is even more
horrible :)

But you can probably fold it in to the main routine and then call that
is_load_store_2_06(). Or whatever seems best, but yeah I think we should
make it very clear that it's only for 2.06.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().
  2013-10-03  5:35   ` Michael Ellerman
  2013-10-03 19:03     ` Sukadev Bhattiprolu
@ 2013-10-08 19:31     ` Sukadev Bhattiprolu
  2013-10-09  1:03       ` Michael Ellerman
  1 sibling, 1 reply; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-08 19:31 UTC (permalink / raw
  To: Michael Ellerman
  Cc: Arnaldo Carvalho de Melo, Michael Ellerman, linux-kernel,
	Stephane Eranian, linuxppc-dev, Paul Mackerras, Anshuman Khandual

Michael Ellerman [michael@ellerman.id.au] wrote:
| bool is_load_store(int ext_opcode)
| {
|         upper = ext_opcode >> 5;
|         lower = ext_opcode & 0x1f;
| 
|         /* Short circuit as many misses as we can */
|         if (lower < 3 || lower > 23)
|             return false;

I see some loads/stores like these which are not covered by
the above check. Is it ok to ignore them ?

	lower == 29: ldepx, stdepx, eviddepx, evstddepx

	lower == 31: lwepx, lbepx, lfdepx, stfdepx,

Looking through the opcode maps, I also see these for primary
op code 4:

	evldd, evlddx, evldwx, evldw, evldh, evldhx.

Should we include those also ?

Sukadev


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().
  2013-10-08 19:31     ` Sukadev Bhattiprolu
@ 2013-10-09  1:03       ` Michael Ellerman
  2013-10-09  1:27         ` Michael Ellerman
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Ellerman @ 2013-10-09  1:03 UTC (permalink / raw
  To: Sukadev Bhattiprolu
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Stephane Eranian,
	linuxppc-dev, Paul Mackerras, Anshuman Khandual

On Tue, 2013-10-08 at 12:31 -0700, Sukadev Bhattiprolu wrote:
> Michael Ellerman [michael@ellerman.id.au] wrote:
> | bool is_load_store(int ext_opcode)
> | {
> |         upper = ext_opcode >> 5;
> |         lower = ext_opcode & 0x1f;
> | 
> |         /* Short circuit as many misses as we can */
> |         if (lower < 3 || lower > 23)
> |             return false;
> 
> I see some loads/stores like these which are not covered by
> the above check. Is it ok to ignore them ?
> 
> 	lower == 29: ldepx, stdepx, eviddepx, evstddepx
> 
> 	lower == 31: lwepx, lbepx, lfdepx, stfdepx,

Those are the external process ID instructions, which I've never heard
of anyone using, I think we can ignore them.

> Looking through the opcode maps, I also see these for primary
> op code 4:
> 
> 	evldd, evlddx, evldwx, evldw, evldh, evldhx.
> 
> Should we include those also ?

Yes I think so. I didn't check any of the other opcodes for you.

cheers


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().
  2013-10-09  1:03       ` Michael Ellerman
@ 2013-10-09  1:27         ` Michael Ellerman
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2013-10-09  1:27 UTC (permalink / raw
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, Stephane Eranian, linuxppc-dev, Paul Mackerras,
	Arnaldo Carvalho de Melo, Anshuman Khandual

On Wed, Oct 09, 2013 at 12:03:19PM +1100, Michael Ellerman wrote:
> On Tue, 2013-10-08 at 12:31 -0700, Sukadev Bhattiprolu wrote:
> > Michael Ellerman [michael@ellerman.id.au] wrote:
> > | bool is_load_store(int ext_opcode)
> > | {
> > |         upper = ext_opcode >> 5;
> > |         lower = ext_opcode & 0x1f;
> > | 
> > |         /* Short circuit as many misses as we can */
> > |         if (lower < 3 || lower > 23)
> > |             return false;
> > 
> > I see some loads/stores like these which are not covered by
> > the above check. Is it ok to ignore them ?
> > 
> > 	lower == 29: ldepx, stdepx, eviddepx, evstddepx
> > 
> > 	lower == 31: lwepx, lbepx, lfdepx, stfdepx,
> 
> Those are the external process ID instructions, which I've never heard
> of anyone using, I think we can ignore them.
> 
> > Looking through the opcode maps, I also see these for primary
> > op code 4:
> > 
> > 	evldd, evlddx, evldwx, evldw, evldh, evldhx.
> > 
> > Should we include those also ?
> 
> Yes I think so. I didn't check any of the other opcodes for you.

Paul points out these are for the SPE extension, which we also don't
care about. So ignore those as well.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-10-09  1:27 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-02  0:15 Sukadev Bhattiprolu
2013-10-02  0:15 ` [PATCH 1/9][v5] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
2013-10-03  4:02   ` Michael Ellerman
2013-10-02  0:15 ` [PATCH 2/9][v5] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
2013-10-03  4:04   ` Michael Ellerman
2013-10-03 17:57     ` Sukadev Bhattiprolu
2013-10-08  3:58       ` Michael Ellerman
2013-10-02  0:15 ` [PATCH 3/9][v5] powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs Sukadev Bhattiprolu
2013-10-02  0:15 ` [PATCH 4/9][v5] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
2013-10-02  0:15 ` [PATCH 5/9][v5] powerpc: implement is_instr_load_store() Sukadev Bhattiprolu
2013-10-03  5:35   ` Michael Ellerman
2013-10-03 19:03     ` Sukadev Bhattiprolu
2013-10-03 19:52       ` Tom Musta
2013-10-08  3:28         ` Michael Ellerman
2013-10-08  4:00       ` Michael Ellerman
2013-10-08 19:31     ` Sukadev Bhattiprolu
2013-10-09  1:03       ` Michael Ellerman
2013-10-09  1:27         ` Michael Ellerman
2013-10-02  0:15 ` [PATCH 6/9][v5] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
2013-10-03  5:39   ` Michael Ellerman
2013-10-03  6:20     ` Sukadev Bhattiprolu
2013-10-03 15:27     ` Sukadev Bhattiprolu
2013-10-05  3:53     ` Sukadev Bhattiprolu
2013-10-02  0:15 ` [PATCH 7/9][v5] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
2013-10-02  0:15 ` [PATCH 8/9][v5] powerpc/perf: Export Power7 " Sukadev Bhattiprolu
2013-10-02  0:15 ` [PATCH 9/9][v5] powerpc/perf: Update perf-mem man page for Power Sukadev Bhattiprolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).