All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] perf kvm: Guest Symbol Resolution for powerpc
@ 2016-02-24  9:07 Ravi Bangoria
  2016-02-24  9:07 ` [RFC 1/4] perf kvm: Enable 'record' on powerpc Ravi Bangoria
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Ravi Bangoria @ 2016-02-24  9:07 UTC (permalink / raw
  To: acme; +Cc: linux-kernel, hemant, naveen.n.rao, Ravi Bangoria

Design of [patch v2] Guest Symbol Resolution is focused on enabling
perf kvm {record|report} on powerpc. Here is the link for the same:
        thread.gmane.org/gmane.linux.kernel/2132409

As per the point raised by acme, this design does not enable cross
arch reporting functionality. i.e. record on powerpc and report on
!powerpc.

This patch aims to enable cross arch reporting functionality along with
enabling perf kvm {record|report} on powerpc. Note that basic principle
of enabling perf kvm {record|report} on powerpc using tracepoint
kvm_hv:kvm_guest_exit has not been changed.

Major change between [patch v2] and this [RFC] patch is, I've moved
'perf kvm report' related and ppc specific functionality from
tool/perf/arch/powerpc/ to generic tool/perf/ code. This is required
because perf binary needs ppc specific code even if it's compiled on
!ppc to enable cross arch reporting.

I need suggestion specifically on patch 3 (Enable 'report' on powerpc)
which contains arch specific code in generic area. Right now I've added
code in util/evsel.c. But please let me know if there's any better way
to do this.

This patch is to get suggestions on approach so I've tagged it as RFC
and not following the patch version series.

Ravi Bangoria (4):
  perf kvm: Enable 'record' on powerpc
  perf kvm: Introduce evsel as argument to perf_event__preprocess_sample
  perf kvm: Enable 'report' on powerpc
  perf kvm: Fix output fields instead of 'trace' for perf kvm report on
    powerpc

 tools/perf/arch/powerpc/util/Build |  1 +
 tools/perf/arch/powerpc/util/kvm.c | 18 +++++++++
 tools/perf/builtin-annotate.c      |  3 +-
 tools/perf/builtin-diff.c          |  3 +-
 tools/perf/builtin-mem.c           | 10 +++--
 tools/perf/builtin-report.c        |  8 +++-
 tools/perf/builtin-script.c        |  3 +-
 tools/perf/builtin-timechart.c     |  8 ++--
 tools/perf/builtin-top.c           |  3 +-
 tools/perf/tests/hists_cumulate.c  |  2 +-
 tools/perf/tests/hists_filter.c    |  2 +-
 tools/perf/tests/hists_link.c      |  4 +-
 tools/perf/tests/hists_output.c    |  2 +-
 tools/perf/util/event.c            |  8 ++--
 tools/perf/util/event.h            |  3 +-
 tools/perf/util/evlist.c           |  9 +++++
 tools/perf/util/evlist.h           |  1 +
 tools/perf/util/evsel.c            | 77 ++++++++++++++++++++++++++++++++++++++
 tools/perf/util/evsel.h            |  7 ++++
 tools/perf/util/session.c          |  7 ++--
 tools/perf/util/util.c             |  5 +++
 tools/perf/util/util.h             |  1 +
 22 files changed, 161 insertions(+), 24 deletions(-)
 create mode 100644 tools/perf/arch/powerpc/util/kvm.c

--
2.1.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC 1/4] perf kvm: Enable 'record' on powerpc
  2016-02-24  9:07 [RFC 0/4] perf kvm: Guest Symbol Resolution for powerpc Ravi Bangoria
@ 2016-02-24  9:07 ` Ravi Bangoria
  2016-03-22 19:12   ` Arnaldo Carvalho de Melo
  2016-02-24  9:07 ` [RFC 2/4] perf kvm: Introduce evsel as argument to perf_event__preprocess_sample Ravi Bangoria
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Ravi Bangoria @ 2016-02-24  9:07 UTC (permalink / raw
  To: acme; +Cc: linux-kernel, hemant, naveen.n.rao, Ravi Bangoria

'perf kvm record' is not available on powerpc because 'perf' relies on
the 'cycles' event (a PMU event) to profile the guest. However, for
powerpc, this can't be used from the host because the PMUs are controlled
by the guest rather than the host.

There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is
hit whenever any of the threads exit the guest context. The guest
instruction pointer dumped along with this tracepoint data in the field
'pc', can be used as guest instruction pointer.

This patch changes default event as kvm_hv:kvm_guest_exit for recording
guest data in host on powerpc. As we are using host event to record guest
data, this approach will enable only --guest option of 'perf kvm'. Still
--host --guest together won't work.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 tools/perf/arch/powerpc/util/Build |  1 +
 tools/perf/arch/powerpc/util/kvm.c | 18 ++++++++++++++++++
 tools/perf/util/evlist.c           |  9 +++++++++
 tools/perf/util/evlist.h           |  1 +
 tools/perf/util/util.c             |  5 +++++
 tools/perf/util/util.h             |  1 +
 6 files changed, 35 insertions(+)
 create mode 100644 tools/perf/arch/powerpc/util/kvm.c

diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
index c8fe207..4cb620d 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -1,6 +1,7 @@
 libperf-y += header.o
 libperf-y += sym-handling.o
 libperf-y += kvm-stat.o
+libperf-y += kvm.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_DWARF) += skip-callchain-idx.o
diff --git a/tools/perf/arch/powerpc/util/kvm.c b/tools/perf/arch/powerpc/util/kvm.c
new file mode 100644
index 0000000..878d323
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/kvm.c
@@ -0,0 +1,18 @@
+#include <linux/err.h>
+#include "../../../util/evsel.h"
+#include "../../../util/evlist.h"
+
+int perf_evlist__arch_add_default(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	if (!perf_guest_only())
+		return -1;
+
+	evsel = perf_evsel__newtp_idx("kvm_hv", "kvm_guest_exit", 0);
+	if (IS_ERR(evsel))
+		return PTR_ERR(evsel);
+
+	perf_evlist__add(evlist, evsel);
+	return 0;
+}
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c42e196..8b7b84f 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -231,6 +231,12 @@ void perf_event_attr__set_max_precise_ip(struct perf_event_attr *attr)
 	}
 }
 
+int __weak
+perf_evlist__arch_add_default(struct perf_evlist *evlist __maybe_unused)
+{
+	return -1;
+}
+
 int perf_evlist__add_default(struct perf_evlist *evlist)
 {
 	struct perf_event_attr attr = {
@@ -239,6 +245,9 @@ int perf_evlist__add_default(struct perf_evlist *evlist)
 	};
 	struct perf_evsel *evsel;
 
+	if (!perf_evlist__arch_add_default(evlist))
+		return 0;
+
 	event_attr_init(&attr);
 
 	perf_event_attr__set_max_precise_ip(&attr);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a0d1522..7951406 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -75,6 +75,7 @@ void perf_evlist__delete(struct perf_evlist *evlist);
 
 void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry);
 void perf_evlist__remove(struct perf_evlist *evlist, struct perf_evsel *evsel);
+int perf_evlist__arch_add_default(struct perf_evlist *evlist);
 int perf_evlist__add_default(struct perf_evlist *evlist);
 int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
 				     struct perf_event_attr *attrs, size_t nr_attrs);
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 35b20dd..567e3da 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -37,6 +37,11 @@ bool test_attr__enabled;
 bool perf_host  = true;
 bool perf_guest = false;
 
+bool perf_guest_only(void)
+{
+	return !perf_host && perf_guest;
+}
+
 void event_attr_init(struct perf_event_attr *attr)
 {
 	if (!perf_host)
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 3dd0408..c459c45 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -344,5 +344,6 @@ int fetch_kernel_version(unsigned int *puint,
 const char *perf_tip(const char *dirpath);
 bool is_regular_file(const char *file);
 int fetch_current_timestamp(char *buf, size_t sz);
+bool perf_guest_only(void);
 
 #endif /* GIT_COMPAT_UTIL_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 2/4] perf kvm: Introduce evsel as argument to perf_event__preprocess_sample
  2016-02-24  9:07 [RFC 0/4] perf kvm: Guest Symbol Resolution for powerpc Ravi Bangoria
  2016-02-24  9:07 ` [RFC 1/4] perf kvm: Enable 'record' on powerpc Ravi Bangoria
@ 2016-02-24  9:07 ` Ravi Bangoria
  2016-02-24  9:07 ` [RFC 3/4] perf kvm: Enable 'report' on powerpc Ravi Bangoria
  2016-02-24  9:07 ` [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report " Ravi Bangoria
  3 siblings, 0 replies; 18+ messages in thread
From: Ravi Bangoria @ 2016-02-24  9:07 UTC (permalink / raw
  To: acme; +Cc: linux-kernel, hemant, naveen.n.rao, Ravi Bangoria

This patch changes prototype of perf_event__preprocess_sample() with
additional argument evsel added at the end.

This change is required because perf_event__preprocess_sample()
function will use evsel to determine cpumode of samples for powerpc
architecture.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 tools/perf/builtin-annotate.c     |  3 ++-
 tools/perf/builtin-diff.c         |  3 ++-
 tools/perf/builtin-mem.c          | 10 ++++++----
 tools/perf/builtin-report.c       |  3 ++-
 tools/perf/builtin-script.c       |  3 ++-
 tools/perf/builtin-timechart.c    |  8 +++++---
 tools/perf/builtin-top.c          |  3 ++-
 tools/perf/tests/hists_cumulate.c |  2 +-
 tools/perf/tests/hists_filter.c   |  2 +-
 tools/perf/tests/hists_link.c     |  4 ++--
 tools/perf/tests/hists_output.c   |  2 +-
 tools/perf/util/event.c           |  3 ++-
 tools/perf/util/event.h           |  3 ++-
 13 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index cfe3663..da330ae 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -94,7 +94,8 @@ static int process_sample_event(struct perf_tool *tool,
 	struct addr_location al;
 	int ret = 0;
 
-	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
+	if (perf_event__preprocess_sample(event, machine, &al,
+					  sample, evsel) < 0) {
 		pr_warning("problem processing %d event, skipping it.\n",
 			   event->header.type);
 		return -1;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 36ccc2b..d2a27fe 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -330,7 +330,8 @@ static int diff__process_sample_event(struct perf_tool *tool __maybe_unused,
 	struct hists *hists = evsel__hists(evsel);
 	int ret = -1;
 
-	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
+	if (perf_event__preprocess_sample(event, machine, &al,
+					  sample, evsel) < 0) {
 		pr_warning("problem processing %d event, skipping it.\n",
 			   event->header.type);
 		return -1;
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index b3f8a89..a7c01fe 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -118,13 +118,15 @@ static int
 dump_raw_samples(struct perf_tool *tool,
 		 union perf_event *event,
 		 struct perf_sample *sample,
-		 struct machine *machine)
+		 struct machine *machine,
+		 struct perf_evsel *evsel)
 {
 	struct perf_mem *mem = container_of(tool, struct perf_mem, tool);
 	struct addr_location al;
 	const char *fmt;
 
-	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
+	if (perf_event__preprocess_sample(event, machine, &al,
+					  sample, evsel) < 0) {
 		fprintf(stderr, "problem processing %d event, skipping it.\n",
 				event->header.type);
 		return -1;
@@ -168,10 +170,10 @@ out_put:
 static int process_sample_event(struct perf_tool *tool,
 				union perf_event *event,
 				struct perf_sample *sample,
-				struct perf_evsel *evsel __maybe_unused,
+				struct perf_evsel *evsel,
 				struct machine *machine)
 {
-	return dump_raw_samples(tool, event, sample, machine);
+	return dump_raw_samples(tool, event, sample, machine, evsel);
 }
 
 static int report_raw_events(struct perf_mem *mem)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 760e886..31ec4ba 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -154,7 +154,8 @@ static int process_sample_event(struct perf_tool *tool,
 	};
 	int ret = 0;
 
-	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
+	if (perf_event__preprocess_sample(event, machine, &al,
+					  sample, evsel) < 0) {
 		pr_debug("problem processing %d event, skipping it.\n",
 			 event->header.type);
 		return -1;
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index f4caf48..792868e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -804,7 +804,8 @@ static int process_sample_event(struct perf_tool *tool,
 		return 0;
 	}
 
-	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
+	if (perf_event__preprocess_sample(event, machine, &al,
+					  sample, evsel) < 0) {
 		pr_err("problem processing %d event, skipping it.\n",
 		       event->header.type);
 		return -1;
diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index bd7a775..a177396 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -470,7 +470,8 @@ static void sched_switch(struct timechart *tchart, int cpu, u64 timestamp,
 
 static const char *cat_backtrace(union perf_event *event,
 				 struct perf_sample *sample,
-				 struct machine *machine)
+				 struct machine *machine,
+				 struct perf_evsel *evsel)
 {
 	struct addr_location al;
 	unsigned int i;
@@ -489,7 +490,8 @@ static const char *cat_backtrace(union perf_event *event,
 	if (!chain)
 		goto exit;
 
-	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
+	if (perf_event__preprocess_sample(event, machine, &al,
+					  sample, evsel) < 0) {
 		fprintf(stderr, "problem processing %d event, skipping it.\n",
 			event->header.type);
 		goto exit;
@@ -569,7 +571,7 @@ static int process_sample_event(struct perf_tool *tool,
 	if (evsel->handler != NULL) {
 		tracepoint_handler f = evsel->handler;
 		return f(tchart, evsel, sample,
-			 cat_backtrace(event, sample, machine));
+			 cat_backtrace(event, sample, machine, evsel));
 	}
 
 	return 0;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index a75de39..79ec8e1 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -728,7 +728,8 @@ static void perf_event__process_sample(struct perf_tool *tool,
 	if (event->header.misc & PERF_RECORD_MISC_EXACT_IP)
 		top->exact_samples++;
 
-	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0)
+	if (perf_event__preprocess_sample(event, machine, &al,
+					  sample, evsel) < 0)
 		return;
 
 	if (!top->kptr_restrict_warned &&
diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
index ecf136c..d5cdd41 100644
--- a/tools/perf/tests/hists_cumulate.c
+++ b/tools/perf/tests/hists_cumulate.c
@@ -103,7 +103,7 @@ static int add_hist_entries(struct hists *hists, struct machine *machine)
 		sample.callchain = (struct ip_callchain *)fake_callchains[i];
 
 		if (perf_event__preprocess_sample(&event, machine, &al,
-						  &sample) < 0)
+						  &sample, evsel) < 0)
 			goto out;
 
 		if (hist_entry_iter__add(&iter, &al, PERF_MAX_STACK_DEPTH,
diff --git a/tools/perf/tests/hists_filter.c b/tools/perf/tests/hists_filter.c
index 34b945a..2bede64 100644
--- a/tools/perf/tests/hists_filter.c
+++ b/tools/perf/tests/hists_filter.c
@@ -81,7 +81,7 @@ static int add_hist_entries(struct perf_evlist *evlist,
 			sample.ip = fake_samples[i].ip;
 
 			if (perf_event__preprocess_sample(&event, machine, &al,
-							  &sample) < 0)
+							  &sample, evsel) < 0)
 				goto out;
 
 			al.socket = fake_samples[i].socket;
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 64b257d..7b8c590 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -86,7 +86,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 			sample.tid = fake_common_samples[k].pid;
 			sample.ip = fake_common_samples[k].ip;
 			if (perf_event__preprocess_sample(&event, machine, &al,
-							  &sample) < 0)
+							  &sample, evsel) < 0)
 				goto out;
 
 			he = __hists__add_entry(hists, &al, NULL,
@@ -112,7 +112,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 			sample.tid = fake_samples[i][k].pid;
 			sample.ip = fake_samples[i][k].ip;
 			if (perf_event__preprocess_sample(&event, machine, &al,
-							  &sample) < 0)
+							  &sample, evsel) < 0)
 				goto out;
 
 			he = __hists__add_entry(hists, &al, NULL,
diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
index 23cce67..fba8996 100644
--- a/tools/perf/tests/hists_output.c
+++ b/tools/perf/tests/hists_output.c
@@ -69,7 +69,7 @@ static int add_hist_entries(struct hists *hists, struct machine *machine)
 		sample.ip = fake_samples[i].ip;
 
 		if (perf_event__preprocess_sample(&event, machine, &al,
-						  &sample) < 0)
+						  &sample, evsel) < 0)
 			goto out;
 
 		if (hist_entry_iter__add(&iter, &al, PERF_MAX_STACK_DEPTH,
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 7bad5c3..bc0a3f0 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1298,7 +1298,8 @@ void thread__find_addr_location(struct thread *thread,
 int perf_event__preprocess_sample(const union perf_event *event,
 				  struct machine *machine,
 				  struct addr_location *al,
-				  struct perf_sample *sample)
+				  struct perf_sample *sample,
+				  struct perf_evsel *evsel __maybe_unused)
 {
 	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
 	struct thread *thread = machine__findnew_thread(machine, sample->pid,
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index b7ffb7e..fb8489b 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -600,7 +600,8 @@ struct addr_location;
 int perf_event__preprocess_sample(const union perf_event *event,
 				  struct machine *machine,
 				  struct addr_location *al,
-				  struct perf_sample *sample);
+				  struct perf_sample *sample,
+				  struct perf_evsel *evsel);
 
 void addr_location__put(struct addr_location *al);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 3/4] perf kvm: Enable 'report' on powerpc
  2016-02-24  9:07 [RFC 0/4] perf kvm: Guest Symbol Resolution for powerpc Ravi Bangoria
  2016-02-24  9:07 ` [RFC 1/4] perf kvm: Enable 'record' on powerpc Ravi Bangoria
  2016-02-24  9:07 ` [RFC 2/4] perf kvm: Introduce evsel as argument to perf_event__preprocess_sample Ravi Bangoria
@ 2016-02-24  9:07 ` Ravi Bangoria
  2016-02-24  9:07 ` [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report " Ravi Bangoria
  3 siblings, 0 replies; 18+ messages in thread
From: Ravi Bangoria @ 2016-02-24  9:07 UTC (permalink / raw
  To: acme; +Cc: linux-kernel, hemant, naveen.n.rao, Ravi Bangoria

'perf kvm record' on powerpc will record kvm_hv:kvm_guest_exit event
instead of cycles. However, to have some kind of periodicity, we can't
use all the kvm exits, rather exits which are bound to happen in certain
intervals. HV_DECREMENTER Interrupt forces the threads to exit after an
interval of 10 ms.

This patch makes use of the 'kvm_guest_exit' tracepoint and checks the
exit reason for any kvm exit. If it is HV_DECREMENTER, then the
instruction pointer dumped along with this tracepoint is retrieved and
mapped with the guest kallsyms.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
---
 tools/perf/util/event.c   |  7 +++--
 tools/perf/util/evsel.c   | 77 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/evsel.h   |  7 +++++
 tools/perf/util/session.c |  7 +++--
 4 files changed, 92 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index bc0a3f0..31bbc50 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1299,15 +1299,16 @@ int perf_event__preprocess_sample(const union perf_event *event,
 				  struct machine *machine,
 				  struct addr_location *al,
 				  struct perf_sample *sample,
-				  struct perf_evsel *evsel __maybe_unused)
+				  struct perf_evsel *evsel)
 {
-	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+	u8 cpumode;
 	struct thread *thread = machine__findnew_thread(machine, sample->pid,
 							sample->tid);
-
 	if (thread == NULL)
 		return -1;
 
+	al->cpumode = cpumode = arch__get_cpumode(event, evsel, sample);
+
 	dump_printf(" ... thread: %s:%d\n", thread__comm_str(thread), thread->tid);
 	/*
 	 * Have we already created the kernel maps for this machine?
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0902fe4..a4d309e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1622,6 +1622,82 @@ static inline bool overflow(const void *endp, u16 max_size, const void *offset,
 #define OVERFLOW_CHECK_u64(offset) \
 	OVERFLOW_CHECK(offset, sizeof(u64), sizeof(u64))
 
+#define KVMPPC_EXIT "kvm_hv:kvm_guest_exit"
+#define HV_DECREMENTER 2432
+#define HV_BIT 3
+#define PR_BIT 49
+#define PPC_MAX 63
+
+bool is_kvmppc_exit_event(struct perf_evsel *evsel)
+{
+	static unsigned int kvmppc_exit;
+
+	if (evsel->attr.type != PERF_TYPE_TRACEPOINT)
+		return false;
+
+	if (unlikely(kvmppc_exit == 0)) {
+		if (strcmp(KVMPPC_EXIT, evsel->name))
+			return false;
+		kvmppc_exit = evsel->attr.config;
+	} else if (kvmppc_exit != evsel->attr.config) {
+		return false;
+	}
+
+	return true;
+}
+
+bool is_hv_dec_trap(struct perf_evsel *evsel, struct perf_sample *sample)
+{
+	int trap = perf_evsel__intval(evsel, sample, "trap");
+	return trap == HV_DECREMENTER;
+}
+
+bool is_perf_data_reorded_on_ppc(struct perf_evlist *evlist)
+{
+	if (evlist && evlist->env && evlist->env->arch)
+		return !strcmp(evlist->env->arch, "ppc64") ||
+			!strcmp(evlist->env->arch, "ppc64le");
+	return false;
+}
+
+u8 arch__get_cpumode(const union perf_event *event,
+		     struct perf_evsel *evsel,
+		     struct perf_sample *sample)
+{
+	unsigned long hv, pr, msr;
+	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+	if (!(is_perf_data_reorded_on_ppc(evsel->evlist) &&
+	      perf_guest_only() &&
+	      is_kvmppc_exit_event(evsel)))
+		goto ret;
+
+	if (sample->raw_data && is_hv_dec_trap(evsel, sample)) {
+		msr = perf_evsel__intval(evsel, sample, "msr");
+		hv = msr & ((unsigned long)1 << (PPC_MAX - HV_BIT));
+		pr = msr & ((unsigned long)1 << (PPC_MAX - PR_BIT));
+
+		if (!hv && pr)
+			cpumode = PERF_RECORD_MISC_GUEST_USER;
+		else
+			cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+	}
+
+ret:
+	return cpumode;
+}
+
+u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *sample)
+{
+	if (is_perf_data_reorded_on_ppc(evsel->evlist) &&
+	    perf_guest_only() &&
+	    is_kvmppc_exit_event(evsel) &&
+	    is_hv_dec_trap(evsel, sample))
+		return perf_evsel__intval(evsel, sample, "pc");
+
+	return sample->ip;
+}
+
 int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 			     struct perf_sample *data)
 {
@@ -1795,6 +1871,7 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 		OVERFLOW_CHECK(array, data->raw_size, max_size);
 		data->raw_data = (void *)array;
 		array = (void *)array + data->raw_size;
+		data->ip = arch__get_ip(evsel, data);
 	}
 
 	if (type & PERF_SAMPLE_BRANCH_STACK) {
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index efad78f..a5f5cb5 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -408,4 +408,11 @@ typedef int (*attr__fprintf_f)(FILE *, const char *, const char *, void *);
 int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 			     attr__fprintf_f attr__fprintf, void *priv);
 
+u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *sample);
+u8 arch__get_cpumode(const union perf_event *event, struct perf_evsel *evsel,
+		     struct perf_sample *sample);
+bool is_kvmppc_exit_event(struct perf_evsel *evsel);
+bool is_hv_dec_trap(struct perf_evsel *evsel, struct perf_sample *sample);
+bool is_perf_data_reorded_on_ppc(struct perf_evlist *evlist);
+
 #endif /* __PERF_EVSEL_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 40b7a0d..52beee8 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1131,9 +1131,10 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 
 static struct machine *machines__find_for_cpumode(struct machines *machines,
 					       union perf_event *event,
-					       struct perf_sample *sample)
+					       struct perf_sample *sample,
+					       struct perf_evsel *evsel)
 {
-	const u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+	const u8 cpumode = arch__get_cpumode(event, evsel, sample);
 	struct machine *machine;
 
 	if (perf_guest &&
@@ -1237,7 +1238,7 @@ static int machines__deliver_event(struct machines *machines,
 
 	evsel = perf_evlist__id2evsel(evlist, sample->id);
 
-	machine = machines__find_for_cpumode(machines, event, sample);
+	machine = machines__find_for_cpumode(machines, event, sample, evsel);
 
 	switch (event->header.type) {
 	case PERF_RECORD_SAMPLE:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
  2016-02-24  9:07 [RFC 0/4] perf kvm: Guest Symbol Resolution for powerpc Ravi Bangoria
                   ` (2 preceding siblings ...)
  2016-02-24  9:07 ` [RFC 3/4] perf kvm: Enable 'report' on powerpc Ravi Bangoria
@ 2016-02-24  9:07 ` Ravi Bangoria
  2016-03-02 14:25   ` Arnaldo Carvalho de Melo
  3 siblings, 1 reply; 18+ messages in thread
From: Ravi Bangoria @ 2016-02-24  9:07 UTC (permalink / raw
  To: acme; +Cc: linux-kernel, hemant, naveen.n.rao, Ravi Bangoria

commit d49dadea7862 ("perf tools: Make 'trace' or 'trace_fields' sort key
default for tracepoint events") makes 'trace' sort key as a default
while displaying report for tracepoint.

Because tracepoint(kvm_hv:kvm_guest_exit) is used as a default event,
perf kvm report will display output as a list of tracepoint hits and
not with a normal report columns.

This patch will replace 'trace' field with 'overhead,comm,dso,sym' while
displaying perf kvm report of powerpc.

Before applying patch:

  $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 181K of event 'kvm_hv:kvm_guest_exit'
  # Event count (approx.): 181061
  #
  # Overhead  Trace output
  # ........  .................................................................................
  #
       0.02%  VCPU 8: trap=HV_DECREMENTER pc=0xc000000000091924 msr=0x8000000000009032, ceded=0
       0.00%  VCPU 0: trap=HV_DECREMENTER pc=0xc000000000091924 msr=0x8000000000009032, ceded=0
       0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x10005c7c msr=0x800000000280f032, ceded=0
       0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x1001ef14 msr=0x800000000280f032, ceded=0
       0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x3fff83398830 msr=0x800000000280f032, ceded=0
       0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a6fe4 msr=0x800000000280f032, ceded=0
       0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a7a64 msr=0x800000000280f032, ceded=0

After applying patch:

  $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 181K of event 'kvm_hv:kvm_guest_exit'
  # Event count (approx.): 181061
  #
  # Overhead  Command  Shared Object            Symbol
  # ........  .......  .......................  ..............................
  #
       0.02%  :57276   [guest.kernel.kallsyms]  [g] .plpar_hcall_norets
       0.00%  :57274   [guest.kernel.kallsyms]  [g] .plpar_hcall_norets
       0.00%  :57276   [guest.kernel.kallsyms]  [g] .__copy_tofrom_user_power7
       0.00%  :57276   [guest.kernel.kallsyms]  [g] ._atomic_dec_and_lock
       0.00%  :57276   [guest.kernel.kallsyms]  [g] ._raw_spin_lock
       0.00%  :57276   [guest.kernel.kallsyms]  [g] ._switch
       0.00%  :57276   [guest.kernel.kallsyms]  [g] .bio_add_page
       0.00%  :57276   [guest.kernel.kallsyms]  [g] .kmem_cache_alloc

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 tools/perf/builtin-report.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 31ec4ba..5d96882 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -930,6 +930,11 @@ repeat:
 	else
 		use_browser = 0;
 
+	if (!field_order &&
+	    is_perf_data_reorded_on_ppc(session->evlist) &&
+	    perf_guest_only())
+		field_order = "overhead,comm,dso,sym";
+
 	if (setup_sorting(session->evlist) < 0) {
 		if (sort_order)
 			parse_options_usage(report_usage, options, "s", 1);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
  2016-02-24  9:07 ` [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report " Ravi Bangoria
@ 2016-03-02 14:25   ` Arnaldo Carvalho de Melo
  2016-03-02 15:46     ` Ravi Bangoria
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-03-02 14:25 UTC (permalink / raw
  To: Ravi Bangoria; +Cc: linux-kernel, hemant, naveen.n.rao

Em Wed, Feb 24, 2016 at 02:37:45PM +0530, Ravi Bangoria escreveu:
> commit d49dadea7862 ("perf tools: Make 'trace' or 'trace_fields' sort key
> default for tracepoint events") makes 'trace' sort key as a default
> while displaying report for tracepoint.
> 
> Because tracepoint(kvm_hv:kvm_guest_exit) is used as a default event,
> perf kvm report will display output as a list of tracepoint hits and
> not with a normal report columns.
> 
> This patch will replace 'trace' field with 'overhead,comm,dso,sym' while
> displaying perf kvm report of powerpc.
> 
> Before applying patch:
> 
>   $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules report --stdio
>   # To display the perf.data header info, please use --header/--header-only options.
>   #
>   #
>   # Total Lost Samples: 0
>   #
>   # Samples: 181K of event 'kvm_hv:kvm_guest_exit'
>   # Event count (approx.): 181061
>   #
>   # Overhead  Trace output
>   # ........  .................................................................................
>   #
>        0.02%  VCPU 8: trap=HV_DECREMENTER pc=0xc000000000091924 msr=0x8000000000009032, ceded=0
>        0.00%  VCPU 0: trap=HV_DECREMENTER pc=0xc000000000091924 msr=0x8000000000009032, ceded=0
>        0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x10005c7c msr=0x800000000280f032, ceded=0
>        0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x1001ef14 msr=0x800000000280f032, ceded=0
>        0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x3fff83398830 msr=0x800000000280f032, ceded=0
>        0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a6fe4 msr=0x800000000280f032, ceded=0
>        0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a7a64 msr=0x800000000280f032, ceded=0
> 
> After applying patch:
> 
>   $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules report --stdio
>   # To display the perf.data header info, please use --header/--header-only options.
>   #
>   #
>   # Total Lost Samples: 0
>   #
>   # Samples: 181K of event 'kvm_hv:kvm_guest_exit'
>   # Event count (approx.): 181061
>   #
>   # Overhead  Command  Shared Object            Symbol
>   # ........  .......  .......................  ..............................
>   #
>        0.02%  :57276   [guest.kernel.kallsyms]  [g] .plpar_hcall_norets
>        0.00%  :57274   [guest.kernel.kallsyms]  [g] .plpar_hcall_norets
>        0.00%  :57276   [guest.kernel.kallsyms]  [g] .__copy_tofrom_user_power7
>        0.00%  :57276   [guest.kernel.kallsyms]  [g] ._atomic_dec_and_lock
>        0.00%  :57276   [guest.kernel.kallsyms]  [g] ._raw_spin_lock
>        0.00%  :57276   [guest.kernel.kallsyms]  [g] ._switch
>        0.00%  :57276   [guest.kernel.kallsyms]  [g] .bio_add_page
>        0.00%  :57276   [guest.kernel.kallsyms]  [g] .kmem_cache_alloc
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> ---
>  tools/perf/builtin-report.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 31ec4ba..5d96882 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -930,6 +930,11 @@ repeat:
>  	else
>  		use_browser = 0;
>  
> +	if (!field_order &&
> +	    is_perf_data_reorded_on_ppc(session->evlist) &&
> +	    perf_guest_only())
> +		field_order = "overhead,comm,dso,sym";
> +

Can you please do it as:

__weak void arch__override_field_order(struct perf_evlist *evlist, const char **field_order)
{
}

This way we don't see any arch specific stuff in the tool, also I
haven't seen any doc update, are you sure nothing needs to be added to
tools/perf/Documentaton/ for any of these patches?

I think this needs to be documented further, probably in
tools/perf/design.txt too?

>  	if (setup_sorting(session->evlist) < 0) {
>  		if (sort_order)
>  			parse_options_usage(report_usage, options, "s", 1);
> -- 
> 2.1.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
  2016-03-02 14:25   ` Arnaldo Carvalho de Melo
@ 2016-03-02 15:46     ` Ravi Bangoria
  2016-03-02 16:22       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Ravi Bangoria @ 2016-03-02 15:46 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo; +Cc: linux-kernel, hemant, naveen.n.rao

Thanks Arnaldo,

Please find my comments.

On Wednesday 02 March 2016 07:55 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Feb 24, 2016 at 02:37:45PM +0530, Ravi Bangoria escreveu:
>>   		use_browser = 0;
>>   
>> +	if (!field_order &&
>> +	    is_perf_data_reorded_on_ppc(session->evlist) &&
>> +	    perf_guest_only())
>> +		field_order = "overhead,comm,dso,sym";
>> +
> Can you please do it as:
>
> __weak void arch__override_field_order(struct perf_evlist *evlist, const char **field_order)
> {
> }

So you mean like this - Just implement only weak function and move code 
into it?
ie. No strong implementation at this point of time.

Like,

__weak void arch__override_field_order(struct perf_evlist *evlist, const 
char **f_order)
{
     if (!field_order &&
         is_perf_data_reorded_on_ppc(session->evlist) &&
         perf_guest_only())
             *field_order = "overhead,comm,dso,sym";
}

Then I can do that.

But if you are proposing to implement a strong function and move this code
into in, then we won't be able to enable cross arch reporting.

>
> This way we don't see any arch specific stuff in the tool, also I
> haven't seen any doc update, are you sure nothing needs to be added to
> tools/perf/Documentaton/ for any of these patches?
>
> I think this needs to be documented further, probably in
> tools/perf/design.txt too?

Yes, I'll do this in next version.

Regards,
Ravi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
  2016-03-02 15:46     ` Ravi Bangoria
@ 2016-03-02 16:22       ` Arnaldo Carvalho de Melo
  2016-03-03  1:19         ` Ravi Bangoria
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-03-02 16:22 UTC (permalink / raw
  To: Ravi Bangoria; +Cc: linux-kernel, hemant, naveen.n.rao

Em Wed, Mar 02, 2016 at 09:16:48PM +0530, Ravi Bangoria escreveu:
> Thanks Arnaldo,
> 
> Please find my comments.
> 
> On Wednesday 02 March 2016 07:55 PM, Arnaldo Carvalho de Melo wrote:
> >Em Wed, Feb 24, 2016 at 02:37:45PM +0530, Ravi Bangoria escreveu:
> >>  		use_browser = 0;
> >>+	if (!field_order &&
> >>+	    is_perf_data_reorded_on_ppc(session->evlist) &&
> >>+	    perf_guest_only())
> >>+		field_order = "overhead,comm,dso,sym";
> >>+
> >Can you please do it as:
> >
> >__weak void arch__override_field_order(struct perf_evlist *evlist, const char **field_order)
> >{
> >}
> 
> So you mean like this - Just implement only weak function and move code into
> it?
> ie. No strong implementation at this point of time.
> 
> Like,
> 
> __weak void arch__override_field_order(struct perf_evlist *evlist, const
> char **f_order)
> {
>     if (!field_order &&
>         is_perf_data_reorded_on_ppc(session->evlist) &&

Oh, I see, ugh, when running on x86_64 we wouldn't use this, so we need
to have per arch default field orders, now I have to recall why is it
that we need this per-arch field order :-\

- Arnaldo

>         perf_guest_only())
>             *field_order = "overhead,comm,dso,sym";
> }
> 
> Then I can do that.
> 
> But if you are proposing to implement a strong function and move this code
> into in, then we won't be able to enable cross arch reporting.
> 
> >
> >This way we don't see any arch specific stuff in the tool, also I
> >haven't seen any doc update, are you sure nothing needs to be added to
> >tools/perf/Documentaton/ for any of these patches?
> >
> >I think this needs to be documented further, probably in
> >tools/perf/design.txt too?
> 
> Yes, I'll do this in next version.
> 
> Regards,
> Ravi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
  2016-03-02 16:22       ` Arnaldo Carvalho de Melo
@ 2016-03-03  1:19         ` Ravi Bangoria
  2016-03-08 15:42           ` Ravi Bangoria
  0 siblings, 1 reply; 18+ messages in thread
From: Ravi Bangoria @ 2016-03-03  1:19 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo; +Cc: linux-kernel, hemant, naveen.n.rao

Thanks acme,

On Wednesday 02 March 2016 09:52 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 02, 2016 at 09:16:48PM +0530, Ravi Bangoria escreveu:
>> Thanks Arnaldo,
>>
>> Please find my comments.
>>
>> On Wednesday 02 March 2016 07:55 PM, Arnaldo Carvalho de Melo wrote:
>>> Em Wed, Feb 24, 2016 at 02:37:45PM +0530, Ravi Bangoria escreveu:
>>>>   		use_browser = 0;
>>>> +	if (!field_order &&
>>>> +	    is_perf_data_reorded_on_ppc(session->evlist) &&
>>>> +	    perf_guest_only())
>>>> +		field_order = "overhead,comm,dso,sym";
>>>> +
>>> Can you please do it as:
>>>
>>> __weak void arch__override_field_order(struct perf_evlist *evlist, const char **field_order)
>>> {
>>> }
>> So you mean like this - Just implement only weak function and move code into
>> it?
>> ie. No strong implementation at this point of time.
>>
>> Like,
>>
>> __weak void arch__override_field_order(struct perf_evlist *evlist, const
>> char **f_order)
>> {
>>      if (!field_order &&
>>          is_perf_data_reorded_on_ppc(session->evlist) &&
> Oh, I see, ugh, when running on x86_64 we wouldn't use this, so we need
> to have per arch default field orders, now I have to recall why is it
> that we need this per-arch field order :-\

Sorry, I'm little bit confused. We need arch specific functionality present
on all arch to make cross arch reporting possible.

for example, record perf.data on ppc and report on x86, we need
ppc specific function present in perf binary compiled on x86.

Please let me know if I understood it wrong.

Regads,
Ravi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
  2016-03-03  1:19         ` Ravi Bangoria
@ 2016-03-08 15:42           ` Ravi Bangoria
  0 siblings, 0 replies; 18+ messages in thread
From: Ravi Bangoria @ 2016-03-08 15:42 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo; +Cc: linux-kernel, hemant, naveen.n.rao

Hi Arnaldo,

Gentle reminder :)  Any updates?

Regards,
Ravi

On Thursday 03 March 2016 06:49 AM, Ravi Bangoria wrote:
> Thanks acme,
>
> On Wednesday 02 March 2016 09:52 PM, Arnaldo Carvalho de Melo wrote:
>> Em Wed, Mar 02, 2016 at 09:16:48PM +0530, Ravi Bangoria escreveu:
>>> Thanks Arnaldo,
>>>
>>> Please find my comments.
>>>
>>> On Wednesday 02 March 2016 07:55 PM, Arnaldo Carvalho de Melo wrote:
>>>> Em Wed, Feb 24, 2016 at 02:37:45PM +0530, Ravi Bangoria escreveu:
>>>>>           use_browser = 0;
>>>>> +    if (!field_order &&
>>>>> +        is_perf_data_reorded_on_ppc(session->evlist) &&
>>>>> +        perf_guest_only())
>>>>> +        field_order = "overhead,comm,dso,sym";
>>>>> +
>>>> Can you please do it as:
>>>>
>>>> __weak void arch__override_field_order(struct perf_evlist *evlist, 
>>>> const char **field_order)
>>>> {
>>>> }
>>> So you mean like this - Just implement only weak function and move 
>>> code into
>>> it?
>>> ie. No strong implementation at this point of time.
>>>
>>> Like,
>>>
>>> __weak void arch__override_field_order(struct perf_evlist *evlist, 
>>> const
>>> char **f_order)
>>> {
>>>      if (!field_order &&
>>>          is_perf_data_reorded_on_ppc(session->evlist) &&
>> Oh, I see, ugh, when running on x86_64 we wouldn't use this, so we need
>> to have per arch default field orders, now I have to recall why is it
>> that we need this per-arch field order :-\
>
> Sorry, I'm little bit confused. We need arch specific functionality 
> present
> on all arch to make cross arch reporting possible.
>
> for example, record perf.data on ppc and report on x86, we need
> ppc specific function present in perf binary compiled on x86.
>
> Please let me know if I understood it wrong.
>
> Regads,
> Ravi
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
  2016-02-24  9:07 ` [RFC 1/4] perf kvm: Enable 'record' on powerpc Ravi Bangoria
@ 2016-03-22 19:12   ` Arnaldo Carvalho de Melo
  2016-03-23  2:19     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-03-22 19:12 UTC (permalink / raw
  To: Ravi Bangoria; +Cc: linux-kernel, hemant, naveen.n.rao

Em Wed, Feb 24, 2016 at 02:37:42PM +0530, Ravi Bangoria escreveu:
> 'perf kvm record' is not available on powerpc because 'perf' relies on
> the 'cycles' event (a PMU event) to profile the guest. However, for
> powerpc, this can't be used from the host because the PMUs are controlled
> by the guest rather than the host.
> 
> There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is
> hit whenever any of the threads exit the guest context. The guest
> instruction pointer dumped along with this tracepoint data in the field
> 'pc', can be used as guest instruction pointer.
> 
> This patch changes default event as kvm_hv:kvm_guest_exit for recording
> guest data in host on powerpc. As we are using host event to record guest
> data, this approach will enable only --guest option of 'perf kvm'. Still
> --host --guest together won't work.

It should, i.e. --host --guest should translate to:

   -e cycles:H,kvm_hv:kvm_guest_exit

I.e. both collect cycles only in the host, and also the tracepoint that
will allow us to get the guest approximation for the unavailable cycles
event, no?

I'm putting the infrastructure work needed for this the perf/cpumode
branch. More work will be put there soon.

I didn't like the fact that this was touching the common code too much,
so I'm providing infrastructure for the evsel to be used to obtain the
cpumode, but trying to constrain the kvm specific bits.

I.e. we should not touch perf_evlist__add_default, but instead replace
"cycles" with what is appropriate for 'perf kvm record' when that event
is specified, be it explicitely or not.

Doing it this way I _think_ we'll be able to remove the limitation
stated on your last paragraph above ("Still --host --guest together
won't work").

I.e. this, but automatically:

[root@jouet ~]# perf kvm record -a -e cycles:H,kvm:kvm_exit
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.428 MB perf.data.guest (1644
samples) ]

[root@jouet ~]# 
[root@jouet ~]# perf evlist -i perf.data.guest
cycles:H
kvm:kvm_exit
# Tip: use 'perf evlist --trace-fields' to show fields for tracepoint
# events
[root@jouet ~]#

If we look at the verbose output:

[root@jouet ~]# perf evlist -v -i perf.data.guest
cycles:H: size: 112, { sample_period, sample_freq }: 4000, sample_type:
IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID, disabled: 1,
inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1,
exclude_guest: 1, mmap2: 1, comm_exec: 1
kvm:kvm_exit: type: 2, size: 112, config: 0x596, { sample_period,
sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER,
read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1,
exclude_host: 1
# Tip: use 'perf evlist --trace-fields' to show fields for tracepoint
# events
[root@jouet ~]#

----------------------------------------------------------------------------

Which works as well with:

[root@jouet ~]# perf kvm --guest --host record -a -e cycles:H,kvm:kvm_exit
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.387 MB perf.data.kvm (1090 samples)
]

[root@jouet ~]# perf evlist -v -i perf.data.kvm
cycles:H: size: 112, { sample_period, sample_freq }: 4000, sample_type:
IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID, disabled: 1,
inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1,
exclude_guest: 1, mmap2: 1, comm_exec: 1
kvm:kvm_exit: type: 2, size: 112, config: 0x596, { sample_period,
sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER,
read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1
# Tip: use 'perf evlist --trace-fields' to show fields for tracepoint
# events
[root@jouet ~]# 

----------------------------------------------------------------------------

Comments?

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
  2016-03-22 19:12   ` Arnaldo Carvalho de Melo
@ 2016-03-23  2:19     ` Arnaldo Carvalho de Melo
  2016-03-24 21:15       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-03-23  2:19 UTC (permalink / raw
  To: Ravi Bangoria; +Cc: linux-kernel, hemant, naveen.n.rao

Em Tue, Mar 22, 2016 at 04:12:11PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Feb 24, 2016 at 02:37:42PM +0530, Ravi Bangoria escreveu:
> > 'perf kvm record' is not available on powerpc because 'perf' relies on
> > the 'cycles' event (a PMU event) to profile the guest. However, for
> > powerpc, this can't be used from the host because the PMUs are controlled
> > by the guest rather than the host.
> > 
> > There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is
> > hit whenever any of the threads exit the guest context. The guest
> > instruction pointer dumped along with this tracepoint data in the field
> > 'pc', can be used as guest instruction pointer.
> > 
> > This patch changes default event as kvm_hv:kvm_guest_exit for recording
> > guest data in host on powerpc. As we are using host event to record guest
> > data, this approach will enable only --guest option of 'perf kvm'. Still
> > --host --guest together won't work.
> 
> It should, i.e. --host --guest should translate to:
> 
>    -e cycles:H,kvm_hv:kvm_guest_exit
> 
> I.e. both collect cycles only in the host, and also the tracepoint that
> will allow us to get the guest approximation for the unavailable cycles
> event, no?
> 
> I'm putting the infrastructure work needed for this the perf/cpumode
> branch. More work will be put there soon.

So I took a different path and made perf_evsel__parse_sample set a new
perf_sample.cpumode field, this way we'll end up having just to set a
per-evsel ->post_parse_sample() callback for the event that replaces
"cycles" for PPC guests where we'll just set data->ip and data->cpumode,
the rest of the code remains unchanged.

The changes I made looks useful in itself, as, IIRC more code was
removed than added.

I'll continue tomorrow and will test with the kvm:kvm_exit on x86_64 for
testing, that has:

[root@jouet ~]# perf kvm --guest record -a -e cycles:H,kvm:kvm_exit
sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.410 MB perf.data.guest (16076
samples) ]
[root@jouet ~]# perf evlist -i perf.data.guest --trace-fields
cycles:H: (not a tracepoint)
kvm:kvm_exit: trace_fields: exit_reason,guest_rip,isa,info1,info2
[root@jouet ~]#

Enough for me to test this code without requiring access to a PPC64
machine with a guest.

The first approach, more in line with your latest patchkit is at
perf/cpumode.v1, the one I'm working now is at the perf/cpumode branch
in my git tree at:

https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=perf/cpumode

- Arnaldo

> I didn't like the fact that this was touching the common code too much,
> so I'm providing infrastructure for the evsel to be used to obtain the
> cpumode, but trying to constrain the kvm specific bits.
> 
> I.e. we should not touch perf_evlist__add_default, but instead replace
> "cycles" with what is appropriate for 'perf kvm record' when that event
> is specified, be it explicitely or not.
> 
> Doing it this way I _think_ we'll be able to remove the limitation
> stated on your last paragraph above ("Still --host --guest together
> won't work").
> 
> I.e. this, but automatically:
> 
> [root@jouet ~]# perf kvm record -a -e cycles:H,kvm:kvm_exit
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.428 MB perf.data.guest (1644
> samples) ]
> 
> [root@jouet ~]# 
> [root@jouet ~]# perf evlist -i perf.data.guest
> cycles:H
> kvm:kvm_exit
> # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint
> # events
> [root@jouet ~]#
> 
> If we look at the verbose output:
> 
> [root@jouet ~]# perf evlist -v -i perf.data.guest
> cycles:H: size: 112, { sample_period, sample_freq }: 4000, sample_type:
> IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID, disabled: 1,
> inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1,
> exclude_guest: 1, mmap2: 1, comm_exec: 1
> kvm:kvm_exit: type: 2, size: 112, config: 0x596, { sample_period,
> sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER,
> read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1,
> exclude_host: 1
> # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint
> # events
> [root@jouet ~]#
> 
> ----------------------------------------------------------------------------
> 
> Which works as well with:
> 
> [root@jouet ~]# perf kvm --guest --host record -a -e cycles:H,kvm:kvm_exit
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.387 MB perf.data.kvm (1090 samples)
> ]
> 
> [root@jouet ~]# perf evlist -v -i perf.data.kvm
> cycles:H: size: 112, { sample_period, sample_freq }: 4000, sample_type:
> IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID, disabled: 1,
> inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1,
> exclude_guest: 1, mmap2: 1, comm_exec: 1
> kvm:kvm_exit: type: 2, size: 112, config: 0x596, { sample_period,
> sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER,
> read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1
> # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint
> # events
> [root@jouet ~]# 
> 
> ----------------------------------------------------------------------------
> 
> Comments?
> 
> - Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
  2016-03-23  2:19     ` Arnaldo Carvalho de Melo
@ 2016-03-24 21:15       ` Arnaldo Carvalho de Melo
  2016-03-28 10:58         ` Ravi Bangoria
  2016-04-27 12:32         ` Ravi Bangoria
  0 siblings, 2 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-03-24 21:15 UTC (permalink / raw
  To: Ravi Bangoria; +Cc: linux-kernel, hemant, naveen.n.rao

Em Tue, Mar 22, 2016 at 11:19:21PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Mar 22, 2016 at 04:12:11PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Wed, Feb 24, 2016 at 02:37:42PM +0530, Ravi Bangoria escreveu:
> > > 'perf kvm record' is not available on powerpc because 'perf' relies on
> > > the 'cycles' event (a PMU event) to profile the guest. However, for
> > > powerpc, this can't be used from the host because the PMUs are controlled
> > > by the guest rather than the host.
> > > 
> > > There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is
> > > hit whenever any of the threads exit the guest context. The guest
> > > instruction pointer dumped along with this tracepoint data in the field
> > > 'pc', can be used as guest instruction pointer.
> > > 
> > > This patch changes default event as kvm_hv:kvm_guest_exit for recording
> > > guest data in host on powerpc. As we are using host event to record guest
> > > data, this approach will enable only --guest option of 'perf kvm'. Still
> > > --host --guest together won't work.
> > 
> > It should, i.e. --host --guest should translate to:
> > 
> >    -e cycles:H,kvm_hv:kvm_guest_exit
> > 
> > I.e. both collect cycles only in the host, and also the tracepoint that
> > will allow us to get the guest approximation for the unavailable cycles
> > event, no?
> > 
> > I'm putting the infrastructure work needed for this the perf/cpumode
> > branch. More work will be put there soon.
> 
> So I took a different path and made perf_evsel__parse_sample set a new
> perf_sample.cpumode field, this way we'll end up having just to set a
> per-evsel ->post_parse_sample() callback for the event that replaces
> "cycles" for PPC guests where we'll just set data->ip and data->cpumode,
> the rest of the code remains unchanged.
> 
> The changes I made looks useful in itself, as, IIRC more code was
> removed than added.
> 
> I'll continue tomorrow and will test with the kvm:kvm_exit on x86_64 for
> testing, that has:

Ok, so the infrastructure got merged already and from there the next
steps are in running with:

 perf kvm --guest record -a -e cycles:H,kvm:kvm_exit

And then, with the patch below applied, try:

perf kvm --guestkallsyms kallsyms.guest --guestmodules modules.guest report -i perf.data.guest --munge-ppc-guest-sample kvm:kvm_exit

I'm almost there, it is still not resolving to the kernel DSO, etc, so I
get:

Samples: 1K of event 'kvm:kvm_exit', Event count (approx.): 1924
Overhead  Command  Shared Object     Symbol
  34.77%  :5343    [unknown]         [g] 0xffffffff81043158
  16.84%  :5345    [unknown]         [g] 0xffffffff813f3d5a
  16.84%  :5345    [unknown]         [g] 0xffffffff813f43ec
  13.83%  :5345    [unknown]         [g] 0xffffffff81043158
   9.62%  :5343    [unknown]         [g] 0xffffffff8104301a
   3.79%  :5345    [unknown]         [g] 0xffffffff8104301a
   1.77%  :5345    [unknown]         [u] 0x0000003ae6c75dc9
   0.52%  :5343    [unknown]         [g] 0xffffffff812a29b1
   0.16%  :5343    [unknown]         [g] 0xffffffff8100bc00
   0.10%  :5343    [unknown]         [g] 0xffffffff8104315a
   0.10%  :5343    [unknown]         [g] 0xffffffff8106306f
   0.10%  :5343    [unknown]         [g] 0xffffffff8153b7fc
   0.10%  :5345    [unknown]         [g] 0xffffffff8106306f
   0.05%  :5343    [unknown]         [g] 0xffffffff8100b720

[root@jouet ~]# cat /proc/*/task/5343/comm
qemu-system-x86
[root@jouet ~]#

The patch does several of the things you did per sample, but only right after
opening the perf.data file, and I'll break it down in multiple patches, this is
just a heads up, please review it if you have the time, in the end we should
have a mechanism useful not just for PPC and that affects just 'perf kvm' in
this specific case.

- Arnaldo

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index bff666458b28..b7b6527446f8 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1480,6 +1480,86 @@ perf_stat:
 }
 #endif /* HAVE_KVM_STAT_SUPPORT */
 
+#ifdef __powerpc__
+#define PPC_HV_DECREMENTER 2432
+#define PPC_HV_BIT 3
+#define PPC_PR_BIT 49
+#define PPC_MAX 63
+
+static bool perf_sample__is_hv_dec_trap(struct perf_sample *sample, struct perf_evsel *evsel)
+{
+	int trap = perf_evsel__intval(evsel, sample, "trap");
+	return trap == PPC_HV_DECREMENTER;
+}
+
+static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel, struct perf_sample *sample)
+{
+	unsigned long msr, hv, pr;
+
+	if (!perf_sample__is_hv_dec_trap(sample, evsel))
+		return;
+
+	sample->ip = perf_evsel__intval(evsel, sample, "pc");
+	sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+
+	msr = perf_evsel__intval(evsel, sample, "msr");
+	hv = msr & ((unsigned long)1 << (PPC_MAX - PPC_HV_BIT));
+	pr = msr & ((unsigned long)1 << (PPC_MAX - PPC_PR_BIT));
+	if (!hv && pr)
+		sample->cpumode = PERF_RECORD_MISC_GUEST_USER;
+}
+
+static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist)
+{
+	if (evlist->env && evlist->env->arch) {
+		return !strcmp(evlist->env->arch, "ppc64") ||
+		       !strcmp(evlist->env->arch, "ppc64le");
+	}
+	return false;
+}
+#elif defined(__i386__) || defined(__x86_64__)
+/*
+ * For testing with kvm:kvm_exit
+ */
+static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel, struct perf_sample *sample)
+{
+	sample->ip = perf_evsel__intval(evsel, sample, "guest_rip");
+	sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+
+	if (sample->ip < 0xffffffff00000000)
+		sample->cpumode = PERF_RECORD_MISC_GUEST_USER;
+
+	if (0) {
+		fprintf(stderr, "%s: %s, cpumode=%d ip=%" PRIx64 "\n",
+			__func__, perf_evsel__name(evsel), sample->cpumode, sample->ip);
+	}
+}
+
+static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist __maybe_unused)
+{
+	return true;
+}
+#endif
+
+int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist, const char *name)
+{
+	struct perf_evsel *evsel;
+
+	if (!perf_evlist__recorded_on_ppc(evlist))
+		return 0;
+
+	evsel = perf_evlist__find_tracepoint_by_name(evlist, name);
+	if (evsel == NULL)
+		return -1;
+
+	if (0)
+		fprintf(stderr, "%s: %s\n", __func__, perf_evsel__name(evsel));
+
+	evsel->munge_sample = perf_kvm__munge_ppc_guest_sample;
+
+	return 0;
+}
+
 static int __cmd_record(const char *file_name, int argc, const char **argv)
 {
 	int rec_argc, i = 0, j;
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 160ea23b45aa..ba8786d12423 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -667,6 +667,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	int branch_mode = -1;
 	bool branch_call_mode = false;
 	char callchain_default_opt[] = CALLCHAIN_DEFAULT_OPT;
+	const char *munge_ppc_guest_event = NULL;
 	const char * const report_usage[] = {
 		"perf report [<options>]",
 		NULL
@@ -814,6 +815,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show raw trace event output (do not use print fmt or plugins)"),
 	OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy,
 		    "Show entries in a hierarchy"),
+	OPT_STRING(0, "munge-ppc-guest-sample", &munge_ppc_guest_event, "event name",
+		   "Use a different event to simulate guest cycles, not accessible on PPC hosts"),
 	OPT_END()
 	};
 	struct perf_data_file file = {
@@ -880,6 +883,12 @@ repeat:
 	if (session == NULL)
 		return -1;
 
+	if (munge_ppc_guest_event &&
+	    perf_kvm__setup_munge_ppc_guest_event(session->evlist, munge_ppc_guest_event)) {
+		pr_err("PPC event not present in %s file\n", input_name);
+		goto error;
+	}
+
 	if (report.queue_size) {
 		ordered_events__set_alloc_size(&session->ordered_events,
 					       report.queue_size);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 738ce226002b..1665171a6e9c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -216,6 +216,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
 	evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
 	perf_evsel__calc_id_pos(evsel);
 	evsel->cmdline_group_boundary = false;
+	evsel->munge_sample = NULL;
 }
 
 struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
@@ -1887,6 +1888,9 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 		}
 	}
 
+	if (evsel->munge_sample != NULL)
+		evsel->munge_sample(evsel, data);
+
 	return 0;
 }
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 501ea6e565f1..46379451f641 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -12,6 +12,7 @@
 #include "counts.h"
 
 struct perf_evsel;
+struct perf_sample;
 
 /*
  * Per fd, to map back from PERF_SAMPLE_ID to evsel, only used when there are
@@ -92,6 +93,8 @@ struct perf_evsel {
 	double			scale;
 	const char		*unit;
 	struct event_format	*tp_format;
+	void			(*munge_sample)(struct perf_evsel *evsel,
+						struct perf_sample *sample);
 	off_t			id_offset;
 	void			*priv;
 	u64			db_id;
@@ -236,8 +239,6 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 		     struct thread_map *threads);
 void perf_evsel__close(struct perf_evsel *evsel, int ncpus, int nthreads);
 
-struct perf_sample;
-
 void *perf_evsel__rawptr(struct perf_evsel *evsel, struct perf_sample *sample,
 			 const char *name);
 u64 perf_evsel__intval(struct perf_evsel *evsel, struct perf_sample *sample,
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 8298d607c738..12bc51450d56 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -356,4 +356,9 @@ typedef void (*print_binary_t)(enum binary_printer_ops,
 void print_binary(unsigned char *data, size_t len,
 		  size_t bytes_per_line, print_binary_t printer,
 		  void *extra);
+
+struct perf_evlist;
+
+int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist, const char *name);
+
 #endif /* GIT_COMPAT_UTIL_H */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
  2016-03-24 21:15       ` Arnaldo Carvalho de Melo
@ 2016-03-28 10:58         ` Ravi Bangoria
  2016-03-28 12:28           ` Arnaldo Carvalho de Melo
  2016-04-27 12:32         ` Ravi Bangoria
  1 sibling, 1 reply; 18+ messages in thread
From: Ravi Bangoria @ 2016-03-28 10:58 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo; +Cc: linux-kernel, hemant, naveen.n.rao

Thanks Arnaldo for putting the effort.

I've tested this patch on powerpc and it looks fine to me. Please find 
my below comments.

On Friday 25 March 2016 02:45 AM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Mar 22, 2016 at 11:19:21PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Tue, Mar 22, 2016 at 04:12:11PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Wed, Feb 24, 2016 at 02:37:42PM +0530, Ravi Bangoria escreveu:
>>>> 'perf kvm record' is not available on powerpc because 'perf' relies on
>>>> the 'cycles' event (a PMU event) to profile the guest. However, for
>>>> powerpc, this can't be used from the host because the PMUs are controlled
>>>> by the guest rather than the host.
>>>>
>>>> There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is
>>>> hit whenever any of the threads exit the guest context. The guest
>>>> instruction pointer dumped along with this tracepoint data in the field
>>>> 'pc', can be used as guest instruction pointer.
>>>>
>>>> This patch changes default event as kvm_hv:kvm_guest_exit for recording
>>>> guest data in host on powerpc. As we are using host event to record guest
>>>> data, this approach will enable only --guest option of 'perf kvm'. Still
>>>> --host --guest together won't work.
>>> It should, i.e. --host --guest should translate to:
>>>
>>>     -e cycles:H,kvm_hv:kvm_guest_exit
>>>
>>> I.e. both collect cycles only in the host, and also the tracepoint that
>>> will allow us to get the guest approximation for the unavailable cycles
>>> event, no?
>>>
>>> I'm putting the infrastructure work needed for this the perf/cpumode
>>> branch. More work will be put there soon.
>> So I took a different path and made perf_evsel__parse_sample set a new
>> perf_sample.cpumode field, this way we'll end up having just to set a
>> per-evsel ->post_parse_sample() callback for the event that replaces
>> "cycles" for PPC guests where we'll just set data->ip and data->cpumode,
>> the rest of the code remains unchanged.
>>
>> The changes I made looks useful in itself, as, IIRC more code was
>> removed than added.
>>
>> I'll continue tomorrow and will test with the kvm:kvm_exit on x86_64 for
>> testing, that has:
> Ok, so the infrastructure got merged already and from there the next
> steps are in running with:
>
>   perf kvm --guest record -a -e cycles:H,kvm:kvm_exit
>
> And then, with the patch below applied, try:
>
> perf kvm --guestkallsyms kallsyms.guest --guestmodules modules.guest report -i perf.data.guest --munge-ppc-guest-sample kvm:kvm_exit


The initial proposal was to change the default event as "kvm_guest_exit" 
for kvm recording/reporting
on ppc. If I understand it correctly, your patch creates a handler for 
reporting kvm events
based on "munge_ppc_guest_event" and the required tracepoint i.e., we 
need to mention the
required tracepoint event name for recording and reporting.

There might be a little bit of an issue here. For scripts which depend 
on generic perf kvm record/report,
we need to change those appropriately to prevent those from failing on 
powerpc. Otherwise, (just a
thought) can we create some kind of an alias to map the ppc specific 
perf kvm commands with the
generic perf kvm.
For e.g :
perf kvm record -e "kvm_hv:kvm_guest_exit"  mapped to  perf kvm record
&
perf kvm report --munge-ppc-guest-sample kvm_hv:kvm_guest_exit mapped 
to  perf kvm report.


Regards,
Ravi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
  2016-03-28 10:58         ` Ravi Bangoria
@ 2016-03-28 12:28           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-03-28 12:28 UTC (permalink / raw
  To: Ravi Bangoria; +Cc: linux-kernel, hemant, naveen.n.rao

Em Mon, Mar 28, 2016 at 04:28:45PM +0530, Ravi Bangoria escreveu:
> Thanks Arnaldo for putting the effort.
> 
> I've tested this patch on powerpc and it looks fine to me. Please find my
> below comments.
> 
> On Friday 25 March 2016 02:45 AM, Arnaldo Carvalho de Melo wrote:
> >Em Tue, Mar 22, 2016 at 11:19:21PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>Em Tue, Mar 22, 2016 at 04:12:11PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>>Em Wed, Feb 24, 2016 at 02:37:42PM +0530, Ravi Bangoria escreveu:
> >>>>'perf kvm record' is not available on powerpc because 'perf' relies on
> >>>>the 'cycles' event (a PMU event) to profile the guest. However, for
> >>>>powerpc, this can't be used from the host because the PMUs are controlled
> >>>>by the guest rather than the host.
> >>>>
> >>>>There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is
> >>>>hit whenever any of the threads exit the guest context. The guest
> >>>>instruction pointer dumped along with this tracepoint data in the field
> >>>>'pc', can be used as guest instruction pointer.
> >>>>
> >>>>This patch changes default event as kvm_hv:kvm_guest_exit for recording
> >>>>guest data in host on powerpc. As we are using host event to record guest
> >>>>data, this approach will enable only --guest option of 'perf kvm'. Still
> >>>>--host --guest together won't work.
> >>>It should, i.e. --host --guest should translate to:
> >>>
> >>>    -e cycles:H,kvm_hv:kvm_guest_exit
> >>>
> >>>I.e. both collect cycles only in the host, and also the tracepoint that
> >>>will allow us to get the guest approximation for the unavailable cycles
> >>>event, no?
> >>>
> >>>I'm putting the infrastructure work needed for this the perf/cpumode
> >>>branch. More work will be put there soon.
> >>So I took a different path and made perf_evsel__parse_sample set a new
> >>perf_sample.cpumode field, this way we'll end up having just to set a
> >>per-evsel ->post_parse_sample() callback for the event that replaces
> >>"cycles" for PPC guests where we'll just set data->ip and data->cpumode,
> >>the rest of the code remains unchanged.
> >>
> >>The changes I made looks useful in itself, as, IIRC more code was
> >>removed than added.
> >>
> >>I'll continue tomorrow and will test with the kvm:kvm_exit on x86_64 for
> >>testing, that has:
> >Ok, so the infrastructure got merged already and from there the next
> >steps are in running with:
> >
> >  perf kvm --guest record -a -e cycles:H,kvm:kvm_exit
> >
> >And then, with the patch below applied, try:
> >
> >perf kvm --guestkallsyms kallsyms.guest --guestmodules modules.guest report -i perf.data.guest --munge-ppc-guest-sample kvm:kvm_exit
> 
> 
> The initial proposal was to change the default event as "kvm_guest_exit" for
> kvm recording/reporting
> on ppc. If I understand it correctly, your patch creates a handler for
> reporting kvm events
> based on "munge_ppc_guest_event" and the required tracepoint i.e., we need
> to mention the
> required tracepoint event name for recording and reporting.

What is present in my patch is the creation of infrastructure that is as
much generic as possible, to enable doing this event conversion for PPC.

Doing it transparently in PPC systems and on other systems when handling
a perf.data file generated by 'perf kvm record' on PPC  will be done on
top of this, thanks for testing, I'll send a complete series soon.

- Arnaldo
 
> There might be a little bit of an issue here. For scripts which depend on
> generic perf kvm record/report,
> we need to change those appropriately to prevent those from failing on
> powerpc. Otherwise, (just a
> thought) can we create some kind of an alias to map the ppc specific perf
> kvm commands with the
> generic perf kvm.
> For e.g :
> perf kvm record -e "kvm_hv:kvm_guest_exit"  mapped to  perf kvm record
> &
> perf kvm report --munge-ppc-guest-sample kvm_hv:kvm_guest_exit mapped to
> perf kvm report.
> 
> 
> Regards,
> Ravi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
  2016-03-24 21:15       ` Arnaldo Carvalho de Melo
  2016-03-28 10:58         ` Ravi Bangoria
@ 2016-04-27 12:32         ` Ravi Bangoria
  2016-04-27 21:47           ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 18+ messages in thread
From: Ravi Bangoria @ 2016-04-27 12:32 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo; +Cc: linux-kernel, hemant, naveen.n.rao

Hi Arnaldo,

I've worked on your patch. I'm sending this patch(diff) to check if this
is the same idea you want to progress with. I cleanup your patch,
removed arch specific compile time directives and changed code to
enable cross arch reporting. I tested record on powerpc and report
on x86 and it's working.

Please give suggestion about your approach. Let me know if you have
some other idea to progress with.

Here is the diff w.r.t perf/cpumode branch:

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index bff6664..83ef6c6 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1480,6 +1480,60 @@ perf_stat:
  }
  #endif /* HAVE_KVM_STAT_SUPPORT */

+#define PPC_HV_DECREMENTER 2432
+#define PPC_HV_BIT 3
+#define PPC_PR_BIT 49
+#define PPC_MAX 63
+
+static bool perf_sample__is_hv_dec_trap(struct perf_sample *sample, 
struct perf_evsel *evsel)
+{
+    int trap = perf_evsel__intval(evsel, sample, "trap");
+    return trap == PPC_HV_DECREMENTER;
+}
+
+static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel, 
struct perf_sample *sample)
+{
+    unsigned long msr, hv, pr;
+
+    if (!perf_sample__is_hv_dec_trap(sample, evsel))
+        return;
+
+    sample->ip = perf_evsel__intval(evsel, sample, "pc");
+    sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+
+    msr = perf_evsel__intval(evsel, sample, "msr");
+    hv = msr & ((unsigned long)1 << (PPC_MAX - PPC_HV_BIT));
+    pr = msr & ((unsigned long)1 << (PPC_MAX - PPC_PR_BIT));
+    if (!hv && pr)
+        sample->cpumode = PERF_RECORD_MISC_GUEST_USER;
+}
+
+static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist)
+{
+    if (evlist->env && evlist->env->arch) {
+        return !strcmp(evlist->env->arch, "ppc64") ||
+               !strcmp(evlist->env->arch, "ppc64le");
+    }
+    return false;
+}
+
+int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist)
+{
+    struct perf_evsel *evsel;
+    const char name[] = "kvm_hv:kvm_guest_exit";
+
+    if (!perf_evlist__recorded_on_ppc(evlist))
+        return 0;
+
+    evsel = perf_evlist__find_tracepoint_by_name(evlist, name);
+    if (evsel == NULL)
+        return -1;
+
+    evsel->munge_sample = perf_kvm__munge_ppc_guest_sample;
+
+    return 0;
+}
+
  static int __cmd_record(const char *file_name, int argc, const char 
**argv)
  {
      int rec_argc, i = 0, j;
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index ab47273..7cb41f7 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -879,6 +879,12 @@ repeat:
      if (session == NULL)
          return -1;

+    if (perf_guest &&
+        perf_kvm__setup_munge_ppc_guest_event(session->evlist)) {
+        pr_err("PPC event not present in %s file\n", input_name);
+        goto error;
+    }
+
      if (report.queue_size) {
ordered_events__set_alloc_size(&session->ordered_events,
                             report.queue_size);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 738ce22..1665171 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -216,6 +216,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
      evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
      perf_evsel__calc_id_pos(evsel);
      evsel->cmdline_group_boundary = false;
+    evsel->munge_sample = NULL;
  }

  struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, 
int idx)
@@ -1887,6 +1888,9 @@ int perf_evsel__parse_sample(struct perf_evsel 
*evsel, union perf_event *event,
          }
      }

+    if (evsel->munge_sample != NULL)
+        evsel->munge_sample(evsel, data);
+
      return 0;
  }

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 501ea6e..4637945 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -12,6 +12,7 @@
  #include "counts.h"

  struct perf_evsel;
+struct perf_sample;

  /*
   * Per fd, to map back from PERF_SAMPLE_ID to evsel, only used when 
there are
@@ -92,6 +93,8 @@ struct perf_evsel {
      double            scale;
      const char        *unit;
      struct event_format    *tp_format;
+    void            (*munge_sample)(struct perf_evsel *evsel,
+                        struct perf_sample *sample);
      off_t            id_offset;
      void            *priv;
      u64            db_id;
@@ -236,8 +239,6 @@ int perf_evsel__open(struct perf_evsel *evsel, 
struct cpu_map *cpus,
               struct thread_map *threads);
  void perf_evsel__close(struct perf_evsel *evsel, int ncpus, int nthreads);

-struct perf_sample;
-
  void *perf_evsel__rawptr(struct perf_evsel *evsel, struct perf_sample 
*sample,
               const char *name);
  u64 perf_evsel__intval(struct perf_evsel *evsel, struct perf_sample 
*sample,
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index d0d50ce..ffe5bbb 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -367,4 +367,9 @@ typedef void (*print_binary_t)(enum binary_printer_ops,
  void print_binary(unsigned char *data, size_t len,
            size_t bytes_per_line, print_binary_t printer,
            void *extra);
+
+struct perf_evlist;
+
+int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist);
+
  #endif /* GIT_COMPAT_UTIL_H */




On Friday 25 March 2016 02:45 AM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Mar 22, 2016 at 11:19:21PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Tue, Mar 22, 2016 at 04:12:11PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Wed, Feb 24, 2016 at 02:37:42PM +0530, Ravi Bangoria escreveu:
>>>> 'perf kvm record' is not available on powerpc because 'perf' relies on
>>>> the 'cycles' event (a PMU event) to profile the guest. However, for
>>>> powerpc, this can't be used from the host because the PMUs are controlled
>>>> by the guest rather than the host.
>>>>
>>>> There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is
>>>> hit whenever any of the threads exit the guest context. The guest
>>>> instruction pointer dumped along with this tracepoint data in the field
>>>> 'pc', can be used as guest instruction pointer.
>>>>
>>>> This patch changes default event as kvm_hv:kvm_guest_exit for recording
>>>> guest data in host on powerpc. As we are using host event to record guest
>>>> data, this approach will enable only --guest option of 'perf kvm'. Still
>>>> --host --guest together won't work.
>>> It should, i.e. --host --guest should translate to:
>>>
>>>     -e cycles:H,kvm_hv:kvm_guest_exit
>>>
>>> I.e. both collect cycles only in the host, and also the tracepoint that
>>> will allow us to get the guest approximation for the unavailable cycles
>>> event, no?
>>>
>>> I'm putting the infrastructure work needed for this the perf/cpumode
>>> branch. More work will be put there soon.
>> So I took a different path and made perf_evsel__parse_sample set a new
>> perf_sample.cpumode field, this way we'll end up having just to set a
>> per-evsel ->post_parse_sample() callback for the event that replaces
>> "cycles" for PPC guests where we'll just set data->ip and data->cpumode,
>> the rest of the code remains unchanged.
>>
>> The changes I made looks useful in itself, as, IIRC more code was
>> removed than added.
>>
>> I'll continue tomorrow and will test with the kvm:kvm_exit on x86_64 for
>> testing, that has:
> Ok, so the infrastructure got merged already and from there the next
> steps are in running with:
>
>   perf kvm --guest record -a -e cycles:H,kvm:kvm_exit
>
> And then, with the patch below applied, try:
>
> perf kvm --guestkallsyms kallsyms.guest --guestmodules modules.guest report -i perf.data.guest --munge-ppc-guest-sample kvm:kvm_exit
>
> I'm almost there, it is still not resolving to the kernel DSO, etc, so I
> get:
>
> Samples: 1K of event 'kvm:kvm_exit', Event count (approx.): 1924
> Overhead  Command  Shared Object     Symbol
>    34.77%  :5343    [unknown]         [g] 0xffffffff81043158
>    16.84%  :5345    [unknown]         [g] 0xffffffff813f3d5a
>    16.84%  :5345    [unknown]         [g] 0xffffffff813f43ec
>    13.83%  :5345    [unknown]         [g] 0xffffffff81043158
>     9.62%  :5343    [unknown]         [g] 0xffffffff8104301a
>     3.79%  :5345    [unknown]         [g] 0xffffffff8104301a
>     1.77%  :5345    [unknown]         [u] 0x0000003ae6c75dc9
>     0.52%  :5343    [unknown]         [g] 0xffffffff812a29b1
>     0.16%  :5343    [unknown]         [g] 0xffffffff8100bc00
>     0.10%  :5343    [unknown]         [g] 0xffffffff8104315a
>     0.10%  :5343    [unknown]         [g] 0xffffffff8106306f
>     0.10%  :5343    [unknown]         [g] 0xffffffff8153b7fc
>     0.10%  :5345    [unknown]         [g] 0xffffffff8106306f
>     0.05%  :5343    [unknown]         [g] 0xffffffff8100b720
>
> [root@jouet ~]# cat /proc/*/task/5343/comm
> qemu-system-x86
> [root@jouet ~]#
>
> The patch does several of the things you did per sample, but only right after
> opening the perf.data file, and I'll break it down in multiple patches, this is
> just a heads up, please review it if you have the time, in the end we should
> have a mechanism useful not just for PPC and that affects just 'perf kvm' in
> this specific case.
>
> - Arnaldo
>
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index bff666458b28..b7b6527446f8 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -1480,6 +1480,86 @@ perf_stat:
>   }
>   #endif /* HAVE_KVM_STAT_SUPPORT */
>
> +#ifdef __powerpc__
> +#define PPC_HV_DECREMENTER 2432
> +#define PPC_HV_BIT 3
> +#define PPC_PR_BIT 49
> +#define PPC_MAX 63
> +
> +static bool perf_sample__is_hv_dec_trap(struct perf_sample *sample, struct perf_evsel *evsel)
> +{
> +	int trap = perf_evsel__intval(evsel, sample, "trap");
> +	return trap == PPC_HV_DECREMENTER;
> +}
> +
> +static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel, struct perf_sample *sample)
> +{
> +	unsigned long msr, hv, pr;
> +
> +	if (!perf_sample__is_hv_dec_trap(sample, evsel))
> +		return;
> +
> +	sample->ip = perf_evsel__intval(evsel, sample, "pc");
> +	sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
> +
> +	msr = perf_evsel__intval(evsel, sample, "msr");
> +	hv = msr & ((unsigned long)1 << (PPC_MAX - PPC_HV_BIT));
> +	pr = msr & ((unsigned long)1 << (PPC_MAX - PPC_PR_BIT));
> +	if (!hv && pr)
> +		sample->cpumode = PERF_RECORD_MISC_GUEST_USER;
> +}
> +
> +static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist)
> +{
> +	if (evlist->env && evlist->env->arch) {
> +		return !strcmp(evlist->env->arch, "ppc64") ||
> +		       !strcmp(evlist->env->arch, "ppc64le");
> +	}
> +	return false;
> +}
> +#elif defined(__i386__) || defined(__x86_64__)
> +/*
> + * For testing with kvm:kvm_exit
> + */
> +static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel, struct perf_sample *sample)
> +{
> +	sample->ip = perf_evsel__intval(evsel, sample, "guest_rip");
> +	sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
> +
> +	if (sample->ip < 0xffffffff00000000)
> +		sample->cpumode = PERF_RECORD_MISC_GUEST_USER;
> +
> +	if (0) {
> +		fprintf(stderr, "%s: %s, cpumode=%d ip=%" PRIx64 "\n",
> +			__func__, perf_evsel__name(evsel), sample->cpumode, sample->ip);
> +	}
> +}
> +
> +static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist __maybe_unused)
> +{
> +	return true;
> +}
> +#endif
> +
> +int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist, const char *name)
> +{
> +	struct perf_evsel *evsel;
> +
> +	if (!perf_evlist__recorded_on_ppc(evlist))
> +		return 0;
> +
> +	evsel = perf_evlist__find_tracepoint_by_name(evlist, name);
> +	if (evsel == NULL)
> +		return -1;
> +
> +	if (0)
> +		fprintf(stderr, "%s: %s\n", __func__, perf_evsel__name(evsel));
> +
> +	evsel->munge_sample = perf_kvm__munge_ppc_guest_sample;
> +
> +	return 0;
> +}
> +
>   static int __cmd_record(const char *file_name, int argc, const char **argv)
>   {
>   	int rec_argc, i = 0, j;
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 160ea23b45aa..ba8786d12423 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -667,6 +667,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>   	int branch_mode = -1;
>   	bool branch_call_mode = false;
>   	char callchain_default_opt[] = CALLCHAIN_DEFAULT_OPT;
> +	const char *munge_ppc_guest_event = NULL;
>   	const char * const report_usage[] = {
>   		"perf report [<options>]",
>   		NULL
> @@ -814,6 +815,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>   		    "Show raw trace event output (do not use print fmt or plugins)"),
>   	OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy,
>   		    "Show entries in a hierarchy"),
> +	OPT_STRING(0, "munge-ppc-guest-sample", &munge_ppc_guest_event, "event name",
> +		   "Use a different event to simulate guest cycles, not accessible on PPC hosts"),
>   	OPT_END()
>   	};
>   	struct perf_data_file file = {
> @@ -880,6 +883,12 @@ repeat:
>   	if (session == NULL)
>   		return -1;
>
> +	if (munge_ppc_guest_event &&
> +	    perf_kvm__setup_munge_ppc_guest_event(session->evlist, munge_ppc_guest_event)) {
> +		pr_err("PPC event not present in %s file\n", input_name);
> +		goto error;
> +	}
> +
>   	if (report.queue_size) {
>   		ordered_events__set_alloc_size(&session->ordered_events,
>   					       report.queue_size);
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 738ce226002b..1665171a6e9c 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -216,6 +216,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
>   	evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
>   	perf_evsel__calc_id_pos(evsel);
>   	evsel->cmdline_group_boundary = false;
> +	evsel->munge_sample = NULL;
>   }
>
>   struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
> @@ -1887,6 +1888,9 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
>   		}
>   	}
>
> +	if (evsel->munge_sample != NULL)
> +		evsel->munge_sample(evsel, data);
> +
>   	return 0;
>   }
>
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 501ea6e565f1..46379451f641 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -12,6 +12,7 @@
>   #include "counts.h"
>
>   struct perf_evsel;
> +struct perf_sample;
>
>   /*
>    * Per fd, to map back from PERF_SAMPLE_ID to evsel, only used when there are
> @@ -92,6 +93,8 @@ struct perf_evsel {
>   	double			scale;
>   	const char		*unit;
>   	struct event_format	*tp_format;
> +	void			(*munge_sample)(struct perf_evsel *evsel,
> +						struct perf_sample *sample);
>   	off_t			id_offset;
>   	void			*priv;
>   	u64			db_id;
> @@ -236,8 +239,6 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
>   		     struct thread_map *threads);
>   void perf_evsel__close(struct perf_evsel *evsel, int ncpus, int nthreads);
>
> -struct perf_sample;
> -
>   void *perf_evsel__rawptr(struct perf_evsel *evsel, struct perf_sample *sample,
>   			 const char *name);
>   u64 perf_evsel__intval(struct perf_evsel *evsel, struct perf_sample *sample,
> diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
> index 8298d607c738..12bc51450d56 100644
> --- a/tools/perf/util/util.h
> +++ b/tools/perf/util/util.h
> @@ -356,4 +356,9 @@ typedef void (*print_binary_t)(enum binary_printer_ops,
>   void print_binary(unsigned char *data, size_t len,
>   		  size_t bytes_per_line, print_binary_t printer,
>   		  void *extra);
> +
> +struct perf_evlist;
> +
> +int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist, const char *name);
> +
>   #endif /* GIT_COMPAT_UTIL_H */
>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
  2016-04-27 12:32         ` Ravi Bangoria
@ 2016-04-27 21:47           ` Arnaldo Carvalho de Melo
  2016-05-09 14:58             ` Ravi Bangoria
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-04-27 21:47 UTC (permalink / raw
  To: Ravi Bangoria; +Cc: linux-kernel, hemant, naveen.n.rao

Em Wed, Apr 27, 2016 at 06:02:21PM +0530, Ravi Bangoria escreveu:
> Hi Arnaldo,
> 
> I've worked on your patch. I'm sending this patch(diff) to check if this
> is the same idea you want to progress with. I cleanup your patch,
> removed arch specific compile time directives and changed code to
> enable cross arch reporting. I tested record on powerpc and report
> on x86 and it's working.
> 
> Please give suggestion about your approach. Let me know if you have
> some other idea to progress with.
> 
> Here is the diff w.r.t perf/cpumode branch:
> 
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index bff6664..83ef6c6 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -1480,6 +1480,60 @@ perf_stat:
>  }
>  #endif /* HAVE_KVM_STAT_SUPPORT */
> 
> +#define PPC_HV_DECREMENTER 2432
> +#define PPC_HV_BIT 3
> +#define PPC_PR_BIT 49
> +#define PPC_MAX 63
> +
> +static bool perf_sample__is_hv_dec_trap(struct perf_sample *sample, struct
> perf_evsel *evsel)
> +{
> +    int trap = perf_evsel__intval(evsel, sample, "trap");
> +    return trap == PPC_HV_DECREMENTER;
> +}
> +
> +static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel,
> struct perf_sample *sample)
> +{
> +    unsigned long msr, hv, pr;
> +
> +    if (!perf_sample__is_hv_dec_trap(sample, evsel))
> +        return;
> +
> +    sample->ip = perf_evsel__intval(evsel, sample, "pc");
> +    sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
> +
> +    msr = perf_evsel__intval(evsel, sample, "msr");
> +    hv = msr & ((unsigned long)1 << (PPC_MAX - PPC_HV_BIT));
> +    pr = msr & ((unsigned long)1 << (PPC_MAX - PPC_PR_BIT));
> +    if (!hv && pr)
> +        sample->cpumode = PERF_RECORD_MISC_GUEST_USER;
> +}
> +
> +static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist)
> +{
> +    if (evlist->env && evlist->env->arch) {
> +        return !strcmp(evlist->env->arch, "ppc64") ||
> +               !strcmp(evlist->env->arch, "ppc64le");
> +    }
> +    return false;
> +}
> +
> +int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist)
> +{
> +    struct perf_evsel *evsel;
> +    const char name[] = "kvm_hv:kvm_guest_exit";
> +
> +    if (!perf_evlist__recorded_on_ppc(evlist))
> +        return 0;
> +
> +    evsel = perf_evlist__find_tracepoint_by_name(evlist, name);
> +    if (evsel == NULL)
> +        return -1;
> +
> +    evsel->munge_sample = perf_kvm__munge_ppc_guest_sample;
> +
> +    return 0;
> +}
> +
>  static int __cmd_record(const char *file_name, int argc, const char **argv)
>  {
>      int rec_argc, i = 0, j;
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index ab47273..7cb41f7 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -879,6 +879,12 @@ repeat:
>      if (session == NULL)
>          return -1;
> 
> +    if (perf_guest &&
> +        perf_kvm__setup_munge_ppc_guest_event(session->evlist)) {
> +        pr_err("PPC event not present in %s file\n", input_name);
> +        goto error;
> +    }

This looks out of place, i.e. this reads: "For all cases where there is
a guest and we can't setup the ppc KVM guest related stuff, its an
error"

	I think this gets clearer as:

	if (perf_guest && perf_evlist__recorded_on_ppc(evlist) &&
	    perf_kvm__setup_munge_ppc_guest_event(session->evlist)) {
		pr_err("PPC event not present in %s file\n", input_name);
		goto error;
	}

Then we read this as "Hey, if this was recorded on ppc, try to set
things up for ppc", but then again, what is this KVM stuff doing in the
generic 'perf report' code? 

What if this is a perf.data file generated on PPC but being read on PPC?
This will not make sense to munge it, right?

This is with what I remember from this case, please bear with me.

- Arnaldo

> +
>      if (report.queue_size) {
> ordered_events__set_alloc_size(&session->ordered_events,
>                             report.queue_size);
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 738ce22..1665171 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -216,6 +216,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
>      evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
>      perf_evsel__calc_id_pos(evsel);
>      evsel->cmdline_group_boundary = false;
> +    evsel->munge_sample = NULL;
>  }
> 
>  struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int
> idx)
> @@ -1887,6 +1888,9 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel,
> union perf_event *event,
>          }
>      }
> 
> +    if (evsel->munge_sample != NULL)
> +        evsel->munge_sample(evsel, data);
> +
>      return 0;
>  }
> 
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 501ea6e..4637945 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -12,6 +12,7 @@
>  #include "counts.h"
> 
>  struct perf_evsel;
> +struct perf_sample;
> 
>  /*
>   * Per fd, to map back from PERF_SAMPLE_ID to evsel, only used when there
> are
> @@ -92,6 +93,8 @@ struct perf_evsel {
>      double            scale;
>      const char        *unit;
>      struct event_format    *tp_format;
> +    void            (*munge_sample)(struct perf_evsel *evsel,
> +                        struct perf_sample *sample);
>      off_t            id_offset;
>      void            *priv;
>      u64            db_id;
> @@ -236,8 +239,6 @@ int perf_evsel__open(struct perf_evsel *evsel, struct
> cpu_map *cpus,
>               struct thread_map *threads);
>  void perf_evsel__close(struct perf_evsel *evsel, int ncpus, int nthreads);
> 
> -struct perf_sample;
> -
>  void *perf_evsel__rawptr(struct perf_evsel *evsel, struct perf_sample
> *sample,
>               const char *name);
>  u64 perf_evsel__intval(struct perf_evsel *evsel, struct perf_sample
> *sample,
> diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
> index d0d50ce..ffe5bbb 100644
> --- a/tools/perf/util/util.h
> +++ b/tools/perf/util/util.h
> @@ -367,4 +367,9 @@ typedef void (*print_binary_t)(enum binary_printer_ops,
>  void print_binary(unsigned char *data, size_t len,
>            size_t bytes_per_line, print_binary_t printer,
>            void *extra);
> +
> +struct perf_evlist;
> +
> +int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist);
> +
>  #endif /* GIT_COMPAT_UTIL_H */
> 
> 
> 
> 
> On Friday 25 March 2016 02:45 AM, Arnaldo Carvalho de Melo wrote:
> >Em Tue, Mar 22, 2016 at 11:19:21PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>Em Tue, Mar 22, 2016 at 04:12:11PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>>Em Wed, Feb 24, 2016 at 02:37:42PM +0530, Ravi Bangoria escreveu:
> >>>>'perf kvm record' is not available on powerpc because 'perf' relies on
> >>>>the 'cycles' event (a PMU event) to profile the guest. However, for
> >>>>powerpc, this can't be used from the host because the PMUs are controlled
> >>>>by the guest rather than the host.
> >>>>
> >>>>There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is
> >>>>hit whenever any of the threads exit the guest context. The guest
> >>>>instruction pointer dumped along with this tracepoint data in the field
> >>>>'pc', can be used as guest instruction pointer.
> >>>>
> >>>>This patch changes default event as kvm_hv:kvm_guest_exit for recording
> >>>>guest data in host on powerpc. As we are using host event to record guest
> >>>>data, this approach will enable only --guest option of 'perf kvm'. Still
> >>>>--host --guest together won't work.
> >>>It should, i.e. --host --guest should translate to:
> >>>
> >>>    -e cycles:H,kvm_hv:kvm_guest_exit
> >>>
> >>>I.e. both collect cycles only in the host, and also the tracepoint that
> >>>will allow us to get the guest approximation for the unavailable cycles
> >>>event, no?
> >>>
> >>>I'm putting the infrastructure work needed for this the perf/cpumode
> >>>branch. More work will be put there soon.
> >>So I took a different path and made perf_evsel__parse_sample set a new
> >>perf_sample.cpumode field, this way we'll end up having just to set a
> >>per-evsel ->post_parse_sample() callback for the event that replaces
> >>"cycles" for PPC guests where we'll just set data->ip and data->cpumode,
> >>the rest of the code remains unchanged.
> >>
> >>The changes I made looks useful in itself, as, IIRC more code was
> >>removed than added.
> >>
> >>I'll continue tomorrow and will test with the kvm:kvm_exit on x86_64 for
> >>testing, that has:
> >Ok, so the infrastructure got merged already and from there the next
> >steps are in running with:
> >
> >  perf kvm --guest record -a -e cycles:H,kvm:kvm_exit
> >
> >And then, with the patch below applied, try:
> >
> >perf kvm --guestkallsyms kallsyms.guest --guestmodules modules.guest report -i perf.data.guest --munge-ppc-guest-sample kvm:kvm_exit
> >
> >I'm almost there, it is still not resolving to the kernel DSO, etc, so I
> >get:
> >
> >Samples: 1K of event 'kvm:kvm_exit', Event count (approx.): 1924
> >Overhead  Command  Shared Object     Symbol
> >   34.77%  :5343    [unknown]         [g] 0xffffffff81043158
> >   16.84%  :5345    [unknown]         [g] 0xffffffff813f3d5a
> >   16.84%  :5345    [unknown]         [g] 0xffffffff813f43ec
> >   13.83%  :5345    [unknown]         [g] 0xffffffff81043158
> >    9.62%  :5343    [unknown]         [g] 0xffffffff8104301a
> >    3.79%  :5345    [unknown]         [g] 0xffffffff8104301a
> >    1.77%  :5345    [unknown]         [u] 0x0000003ae6c75dc9
> >    0.52%  :5343    [unknown]         [g] 0xffffffff812a29b1
> >    0.16%  :5343    [unknown]         [g] 0xffffffff8100bc00
> >    0.10%  :5343    [unknown]         [g] 0xffffffff8104315a
> >    0.10%  :5343    [unknown]         [g] 0xffffffff8106306f
> >    0.10%  :5343    [unknown]         [g] 0xffffffff8153b7fc
> >    0.10%  :5345    [unknown]         [g] 0xffffffff8106306f
> >    0.05%  :5343    [unknown]         [g] 0xffffffff8100b720
> >
> >[root@jouet ~]# cat /proc/*/task/5343/comm
> >qemu-system-x86
> >[root@jouet ~]#
> >
> >The patch does several of the things you did per sample, but only right after
> >opening the perf.data file, and I'll break it down in multiple patches, this is
> >just a heads up, please review it if you have the time, in the end we should
> >have a mechanism useful not just for PPC and that affects just 'perf kvm' in
> >this specific case.
> >
> >- Arnaldo
> >
> >diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> >index bff666458b28..b7b6527446f8 100644
> >--- a/tools/perf/builtin-kvm.c
> >+++ b/tools/perf/builtin-kvm.c
> >@@ -1480,6 +1480,86 @@ perf_stat:
> >  }
> >  #endif /* HAVE_KVM_STAT_SUPPORT */
> >
> >+#ifdef __powerpc__
> >+#define PPC_HV_DECREMENTER 2432
> >+#define PPC_HV_BIT 3
> >+#define PPC_PR_BIT 49
> >+#define PPC_MAX 63
> >+
> >+static bool perf_sample__is_hv_dec_trap(struct perf_sample *sample, struct perf_evsel *evsel)
> >+{
> >+	int trap = perf_evsel__intval(evsel, sample, "trap");
> >+	return trap == PPC_HV_DECREMENTER;
> >+}
> >+
> >+static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel, struct perf_sample *sample)
> >+{
> >+	unsigned long msr, hv, pr;
> >+
> >+	if (!perf_sample__is_hv_dec_trap(sample, evsel))
> >+		return;
> >+
> >+	sample->ip = perf_evsel__intval(evsel, sample, "pc");
> >+	sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
> >+
> >+	msr = perf_evsel__intval(evsel, sample, "msr");
> >+	hv = msr & ((unsigned long)1 << (PPC_MAX - PPC_HV_BIT));
> >+	pr = msr & ((unsigned long)1 << (PPC_MAX - PPC_PR_BIT));
> >+	if (!hv && pr)
> >+		sample->cpumode = PERF_RECORD_MISC_GUEST_USER;
> >+}
> >+
> >+static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist)
> >+{
> >+	if (evlist->env && evlist->env->arch) {
> >+		return !strcmp(evlist->env->arch, "ppc64") ||
> >+		       !strcmp(evlist->env->arch, "ppc64le");
> >+	}
> >+	return false;
> >+}
> >+#elif defined(__i386__) || defined(__x86_64__)
> >+/*
> >+ * For testing with kvm:kvm_exit
> >+ */
> >+static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel, struct perf_sample *sample)
> >+{
> >+	sample->ip = perf_evsel__intval(evsel, sample, "guest_rip");
> >+	sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
> >+
> >+	if (sample->ip < 0xffffffff00000000)
> >+		sample->cpumode = PERF_RECORD_MISC_GUEST_USER;
> >+
> >+	if (0) {
> >+		fprintf(stderr, "%s: %s, cpumode=%d ip=%" PRIx64 "\n",
> >+			__func__, perf_evsel__name(evsel), sample->cpumode, sample->ip);
> >+	}
> >+}
> >+
> >+static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist __maybe_unused)
> >+{
> >+	return true;
> >+}
> >+#endif
> >+
> >+int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist, const char *name)
> >+{
> >+	struct perf_evsel *evsel;
> >+
> >+	if (!perf_evlist__recorded_on_ppc(evlist))
> >+		return 0;
> >+
> >+	evsel = perf_evlist__find_tracepoint_by_name(evlist, name);
> >+	if (evsel == NULL)
> >+		return -1;
> >+
> >+	if (0)
> >+		fprintf(stderr, "%s: %s\n", __func__, perf_evsel__name(evsel));
> >+
> >+	evsel->munge_sample = perf_kvm__munge_ppc_guest_sample;
> >+
> >+	return 0;
> >+}
> >+
> >  static int __cmd_record(const char *file_name, int argc, const char **argv)
> >  {
> >  	int rec_argc, i = 0, j;
> >diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> >index 160ea23b45aa..ba8786d12423 100644
> >--- a/tools/perf/builtin-report.c
> >+++ b/tools/perf/builtin-report.c
> >@@ -667,6 +667,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
> >  	int branch_mode = -1;
> >  	bool branch_call_mode = false;
> >  	char callchain_default_opt[] = CALLCHAIN_DEFAULT_OPT;
> >+	const char *munge_ppc_guest_event = NULL;
> >  	const char * const report_usage[] = {
> >  		"perf report [<options>]",
> >  		NULL
> >@@ -814,6 +815,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
> >  		    "Show raw trace event output (do not use print fmt or plugins)"),
> >  	OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy,
> >  		    "Show entries in a hierarchy"),
> >+	OPT_STRING(0, "munge-ppc-guest-sample", &munge_ppc_guest_event, "event name",
> >+		   "Use a different event to simulate guest cycles, not accessible on PPC hosts"),
> >  	OPT_END()
> >  	};
> >  	struct perf_data_file file = {
> >@@ -880,6 +883,12 @@ repeat:
> >  	if (session == NULL)
> >  		return -1;
> >
> >+	if (munge_ppc_guest_event &&
> >+	    perf_kvm__setup_munge_ppc_guest_event(session->evlist, munge_ppc_guest_event)) {
> >+		pr_err("PPC event not present in %s file\n", input_name);
> >+		goto error;
> >+	}
> >+
> >  	if (report.queue_size) {
> >  		ordered_events__set_alloc_size(&session->ordered_events,
> >  					       report.queue_size);
> >diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> >index 738ce226002b..1665171a6e9c 100644
> >--- a/tools/perf/util/evsel.c
> >+++ b/tools/perf/util/evsel.c
> >@@ -216,6 +216,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
> >  	evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
> >  	perf_evsel__calc_id_pos(evsel);
> >  	evsel->cmdline_group_boundary = false;
> >+	evsel->munge_sample = NULL;
> >  }
> >
> >  struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
> >@@ -1887,6 +1888,9 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
> >  		}
> >  	}
> >
> >+	if (evsel->munge_sample != NULL)
> >+		evsel->munge_sample(evsel, data);
> >+
> >  	return 0;
> >  }
> >
> >diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> >index 501ea6e565f1..46379451f641 100644
> >--- a/tools/perf/util/evsel.h
> >+++ b/tools/perf/util/evsel.h
> >@@ -12,6 +12,7 @@
> >  #include "counts.h"
> >
> >  struct perf_evsel;
> >+struct perf_sample;
> >
> >  /*
> >   * Per fd, to map back from PERF_SAMPLE_ID to evsel, only used when there are
> >@@ -92,6 +93,8 @@ struct perf_evsel {
> >  	double			scale;
> >  	const char		*unit;
> >  	struct event_format	*tp_format;
> >+	void			(*munge_sample)(struct perf_evsel *evsel,
> >+						struct perf_sample *sample);
> >  	off_t			id_offset;
> >  	void			*priv;
> >  	u64			db_id;
> >@@ -236,8 +239,6 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
> >  		     struct thread_map *threads);
> >  void perf_evsel__close(struct perf_evsel *evsel, int ncpus, int nthreads);
> >
> >-struct perf_sample;
> >-
> >  void *perf_evsel__rawptr(struct perf_evsel *evsel, struct perf_sample *sample,
> >  			 const char *name);
> >  u64 perf_evsel__intval(struct perf_evsel *evsel, struct perf_sample *sample,
> >diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
> >index 8298d607c738..12bc51450d56 100644
> >--- a/tools/perf/util/util.h
> >+++ b/tools/perf/util/util.h
> >@@ -356,4 +356,9 @@ typedef void (*print_binary_t)(enum binary_printer_ops,
> >  void print_binary(unsigned char *data, size_t len,
> >  		  size_t bytes_per_line, print_binary_t printer,
> >  		  void *extra);
> >+
> >+struct perf_evlist;
> >+
> >+int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist, const char *name);
> >+
> >  #endif /* GIT_COMPAT_UTIL_H */
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
  2016-04-27 21:47           ` Arnaldo Carvalho de Melo
@ 2016-05-09 14:58             ` Ravi Bangoria
  0 siblings, 0 replies; 18+ messages in thread
From: Ravi Bangoria @ 2016-05-09 14:58 UTC (permalink / raw
  To: Arnaldo Carvalho de Melo; +Cc: linux-kernel, hemant, naveen.n.rao

Hi Arnaldo,

Thanks for the review.  Please find my comments below.

On Thursday 28 April 2016 03:17 AM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Apr 27, 2016 at 06:02:21PM +0530, Ravi Bangoria escreveu:
>> Hi Arnaldo,
>>
>> I've worked on your patch. I'm sending this patch(diff) to check if this
>> is the same idea you want to progress with. I cleanup your patch,
>> removed arch specific compile time directives and changed code to
>> enable cross arch reporting. I tested record on powerpc and report
>> on x86 and it's working.
>>
>> Please give suggestion about your approach. Let me know if you have
>> some other idea to progress with.
>>
>> Here is the diff w.r.t perf/cpumode branch:
>>
>> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
>> index bff6664..83ef6c6 100644
>> --- a/tools/perf/builtin-kvm.c
>> +++ b/tools/perf/builtin-kvm.c
>> @@ -1480,6 +1480,60 @@ perf_stat:
>>   }
>>   #endif /* HAVE_KVM_STAT_SUPPORT */
>>
>> +#define PPC_HV_DECREMENTER 2432
>> +#define PPC_HV_BIT 3
>> +#define PPC_PR_BIT 49
>> +#define PPC_MAX 63
>> +
>> +static bool perf_sample__is_hv_dec_trap(struct perf_sample *sample, struct
>> perf_evsel *evsel)
>> +{
>> +    int trap = perf_evsel__intval(evsel, sample, "trap");
>> +    return trap == PPC_HV_DECREMENTER;
>> +}
>> +
>> +static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel,
>> struct perf_sample *sample)
>> +{
>> +    unsigned long msr, hv, pr;
>> +
>> +    if (!perf_sample__is_hv_dec_trap(sample, evsel))
>> +        return;
>> +
>> +    sample->ip = perf_evsel__intval(evsel, sample, "pc");
>> +    sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
>> +
>> +    msr = perf_evsel__intval(evsel, sample, "msr");
>> +    hv = msr & ((unsigned long)1 << (PPC_MAX - PPC_HV_BIT));
>> +    pr = msr & ((unsigned long)1 << (PPC_MAX - PPC_PR_BIT));
>> +    if (!hv && pr)
>> +        sample->cpumode = PERF_RECORD_MISC_GUEST_USER;
>> +}
>> +
>> +static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist)
>> +{
>> +    if (evlist->env && evlist->env->arch) {
>> +        return !strcmp(evlist->env->arch, "ppc64") ||
>> +               !strcmp(evlist->env->arch, "ppc64le");
>> +    }
>> +    return false;
>> +}
>> +
>> +int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist)
>> +{
>> +    struct perf_evsel *evsel;
>> +    const char name[] = "kvm_hv:kvm_guest_exit";
>> +
>> +    if (!perf_evlist__recorded_on_ppc(evlist))
>> +        return 0;
>> +
>> +    evsel = perf_evlist__find_tracepoint_by_name(evlist, name);
>> +    if (evsel == NULL)
>> +        return -1;
>> +
>> +    evsel->munge_sample = perf_kvm__munge_ppc_guest_sample;
>> +
>> +    return 0;
>> +}
>> +
>>   static int __cmd_record(const char *file_name, int argc, const char **argv)
>>   {
>>       int rec_argc, i = 0, j;
>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>> index ab47273..7cb41f7 100644
>> --- a/tools/perf/builtin-report.c
>> +++ b/tools/perf/builtin-report.c
>> @@ -879,6 +879,12 @@ repeat:
>>       if (session == NULL)
>>           return -1;
>>
>> +    if (perf_guest &&
>> +        perf_kvm__setup_munge_ppc_guest_event(session->evlist)) {
>> +        pr_err("PPC event not present in %s file\n", input_name);
>> +        goto error;
>> +    }
> This looks out of place, i.e. this reads: "For all cases where there is
> a guest and we can't setup the ppc KVM guest related stuff, its an
> error"
>
> 	I think this gets clearer as:
>
> 	if (perf_guest && perf_evlist__recorded_on_ppc(evlist) &&
> 	    perf_kvm__setup_munge_ppc_guest_event(session->evlist)) {
> 		pr_err("PPC event not present in %s file\n", input_name);
> 		goto error;
> 	}
>
> Then we read this as "Hey, if this was recorded on ppc, try to set
> things up for ppc",

Yes I'll change this.

>   but then again, what is this KVM stuff doing in the
> generic 'perf report' code?

Basically we are checking if data recorded on ppc in perf.data file. Which
can be done after opening a file and mapping header info in evlist. And
evlist is initialized in builtin-record.c only. So, I don't see any 
possibility to
move this in builtin-kvm.c. Kindly guide how can we do it.

> What if this is a perf.data file generated on PPC but being read on PPC?
> This will not make sense to munge it, right?

If you are asking about normal(without kvm) perf record and report, it's
working with this patch. Otherwise can you please explain little bit more.

But yes, we can change this code like this:

     if (perf_guest && perf_evlist__recorded_on_ppc(session->evlist))
         perf_kvm__setup_munge_ppc_guest_event(session->evlist);

and change definition of perf_kvm__setup_munge_ppc_guest_event as:

     void perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist)
     {
         struct perf_evsel *evsel;
         const char name[] = "kvm_hv:kvm_guest_exit";

         evsel = perf_evlist__find_tracepoint_by_name(evlist, name);
         if (evsel == NULL)
             return;

         evsel->munge_sample = perf_kvm__munge_ppc_guest_sample;
     }

>
> This is with what I remember from this case, please bear with me.

Regards,
Ravi

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-05-09 14:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-24  9:07 [RFC 0/4] perf kvm: Guest Symbol Resolution for powerpc Ravi Bangoria
2016-02-24  9:07 ` [RFC 1/4] perf kvm: Enable 'record' on powerpc Ravi Bangoria
2016-03-22 19:12   ` Arnaldo Carvalho de Melo
2016-03-23  2:19     ` Arnaldo Carvalho de Melo
2016-03-24 21:15       ` Arnaldo Carvalho de Melo
2016-03-28 10:58         ` Ravi Bangoria
2016-03-28 12:28           ` Arnaldo Carvalho de Melo
2016-04-27 12:32         ` Ravi Bangoria
2016-04-27 21:47           ` Arnaldo Carvalho de Melo
2016-05-09 14:58             ` Ravi Bangoria
2016-02-24  9:07 ` [RFC 2/4] perf kvm: Introduce evsel as argument to perf_event__preprocess_sample Ravi Bangoria
2016-02-24  9:07 ` [RFC 3/4] perf kvm: Enable 'report' on powerpc Ravi Bangoria
2016-02-24  9:07 ` [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report " Ravi Bangoria
2016-03-02 14:25   ` Arnaldo Carvalho de Melo
2016-03-02 15:46     ` Ravi Bangoria
2016-03-02 16:22       ` Arnaldo Carvalho de Melo
2016-03-03  1:19         ` Ravi Bangoria
2016-03-08 15:42           ` Ravi Bangoria

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.