LKML Archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer
@ 2011-02-03 15:42 Jiri Olsa
  2011-02-03 15:42 ` [PATCH 1/4] kprobe - ktrace instruction slot cache interface Jiri Olsa
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Jiri Olsa @ 2011-02-03 15:42 UTC (permalink / raw
  To: mingo, rostedt, fweisbec; +Cc: linux-kernel, masami.hiramatsu.pt

hi,

I recently saw the direct jump probing made for kprobes
and tried to use it inside the trace framework.

The global idea is patching the function entry with direct
jump to the trace code, instead of using pregenerated gcc
profile code.

I started this just to see if it would be even possible
to hook with new probing to the current trace code. It
appears it's not that bad. I was able to run function
and function_graph trace on x86_64.

For details on direct jumps probe, please check:
http://www.linuxinsight.com/ols2007-djprobe-kernel-probing-with-the-smallest-overhead.html


I realize using this way to hook the functions has some
drawbacks, from what I can see it's roughly:
- no all functions could be patched
- need to find a way to say which function is safe to patch
- memory consumption for detour buffers and symbol records

but seems there're some advantages as well:
- trace code could be in a module
- no profiling code is needed
- framepointer can be disabled (framepointer is needed for
  generating profile code)


As for the attached implementation it's hack mostly (expect bugs),
especially the ftrace/kprobe integration could be probably done better.
It's only for x86_64.

It can be used like this:

- new menu config item is added (function tracer engine),
  to choose mcount or ktrace
- new file "ktrace" is added to the tracing dir
- to add symbols to trace run:
	echo mutex_unlock > ./ktrace
	echo mutex_lock >> ./ktrace
- to display trace symbols:
	cat ktrace
- to enable the trace, the usual is needed:
	echo function > ./current_tracer
	echo function_graph > ./current_tracer
- to remove symbols from trace:
	echo nop > ./current_tracer 
	echo > ./ktrace 
- if the function is added while the tracer is running,
  the symbol is enabled automatically.
- only all symbols could be removed and only if there's
  no tracer running.

I'm not sure how to choose from kallsyms interface what function
is safe to patch, so I omit patching of all symbols so far.


attached patches:
 1/4 - kprobe - ktrace instruction slot cache interface
     using kprobe detour buffer allocation, adding interface
     to use it from trace framework

 2/4 - tracing - adding size parameter to do_ftrace_mod_code
     adding size parameter to be able to restore the saved
     instructions, which could be longer than relative call

 3/4 - ktrace - function trace support
     adding ktrace support with function tracer

 4/4 - ktrace - function trace support
     adding function graph support


please let me know what you think, thanks
jirka
---
 Makefile                   |    2 +-
 arch/x86/Kconfig           |    4 +-
 arch/x86/kernel/Makefile   |    1 +
 arch/x86/kernel/entry_64.S |   50 +++++++
 arch/x86/kernel/ftrace.c   |  157 +++++++++++----------
 arch/x86/kernel/ktrace.c   |  256 ++++++++++++++++++++++++++++++++++
 include/linux/ftrace.h     |   36 +++++-
 include/linux/kprobes.h    |    8 +
 kernel/kprobes.c           |   33 +++++
 kernel/trace/Kconfig       |   28 ++++-
 kernel/trace/Makefile      |    1 +
 kernel/trace/ftrace.c      |   21 +++
 kernel/trace/ktrace.c      |  330 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.c       |    1 +
 14 files changed, 846 insertions(+), 82 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] kprobe - ktrace instruction slot cache interface
  2011-02-03 15:42 [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Jiri Olsa
@ 2011-02-03 15:42 ` Jiri Olsa
  2011-02-03 15:42 ` [PATCH 2/4] tracing - adding size parameter to do_ftrace_mod_code Jiri Olsa
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Jiri Olsa @ 2011-02-03 15:42 UTC (permalink / raw
  To: mingo, rostedt, fweisbec; +Cc: linux-kernel, masami.hiramatsu.pt

using kprobe detour buffer allocation, adding interface
to use it from trace framework

wbr,
jirka
---
 include/linux/kprobes.h |    8 ++++++++
 kernel/kprobes.c        |   33 +++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index dd7c12e..1e984e9 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -436,4 +436,12 @@ static inline int enable_jprobe(struct jprobe *jp)
 	return enable_kprobe(&jp->kp);
 }
 
+#ifdef CONFIG_KTRACE
+
+extern kprobe_opcode_t __kprobes *get_ktrace_insn_slot(void);
+extern void __kprobes free_ktrace_insn_slot(kprobe_opcode_t * slot, int dirty);
+extern void __init ktrace_insn_init(int size);
+
+#endif /* CONFIG_KTRACE */
+
 #endif /* _LINUX_KPROBES_H */
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 7798181..5bc31d6 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -285,6 +285,39 @@ void __kprobes free_insn_slot(kprobe_opcode_t * slot, int dirty)
 	__free_insn_slot(&kprobe_insn_slots, slot, dirty);
 	mutex_unlock(&kprobe_insn_mutex);
 }
+
+#ifdef CONFIG_KTRACE
+static DEFINE_MUTEX(ktrace_insn_mutex);
+static struct kprobe_insn_cache ktrace_insn_slots = {
+	.pages = LIST_HEAD_INIT(ktrace_insn_slots.pages),
+	.insn_size = MAX_INSN_SIZE,
+	.nr_garbage = 0,
+};
+
+kprobe_opcode_t __kprobes *get_ktrace_insn_slot(void)
+{
+	kprobe_opcode_t *ret = NULL;
+
+	mutex_lock(&ktrace_insn_mutex);
+	ret = __get_insn_slot(&ktrace_insn_slots);
+	mutex_unlock(&ktrace_insn_mutex);
+
+	return ret;
+}
+
+void __kprobes free_ktrace_insn_slot(kprobe_opcode_t * slot, int dirty)
+{
+	mutex_lock(&ktrace_insn_mutex);
+	__free_insn_slot(&ktrace_insn_slots, slot, dirty);
+	mutex_unlock(&ktrace_insn_mutex);
+}
+
+void __init ktrace_insn_init(int size)
+{
+	ktrace_insn_slots.insn_size = size;
+}
+#endif /* CONFIG_KTRACE */
+
 #ifdef CONFIG_OPTPROBES
 /* For optimized_kprobe buffer */
 static DEFINE_MUTEX(kprobe_optinsn_mutex); /* Protects kprobe_optinsn_slots */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] tracing - adding size parameter to do_ftrace_mod_code
  2011-02-03 15:42 [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Jiri Olsa
  2011-02-03 15:42 ` [PATCH 1/4] kprobe - ktrace instruction slot cache interface Jiri Olsa
@ 2011-02-03 15:42 ` Jiri Olsa
  2011-02-03 15:42 ` [PATCH 3/4] ktrace - function trace support Jiri Olsa
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Jiri Olsa @ 2011-02-03 15:42 UTC (permalink / raw
  To: mingo, rostedt, fweisbec; +Cc: linux-kernel, masami.hiramatsu.pt

adding size parameter to be able to restore the saved
instructions, which could be longer than relative call

wbr,
jirka
---
 arch/x86/kernel/ftrace.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 382eb29..979ec14 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -124,6 +124,7 @@ static atomic_t nmi_running = ATOMIC_INIT(0);
 static int mod_code_status;		/* holds return value of text write */
 static void *mod_code_ip;		/* holds the IP to write to */
 static void *mod_code_newcode;		/* holds the text to write to the IP */
+static int mod_code_size;		/* holds the size of the new code */
 
 static unsigned nmi_wait_count;
 static atomic_t nmi_update_count = ATOMIC_INIT(0);
@@ -161,7 +162,7 @@ static void ftrace_mod_code(void)
 	 * to succeed, then they all should.
 	 */
 	mod_code_status = probe_kernel_write(mod_code_ip, mod_code_newcode,
-					     MCOUNT_INSN_SIZE);
+					     mod_code_size);
 
 	/* if we fail, then kill any new writers */
 	if (mod_code_status)
@@ -225,7 +226,7 @@ within(unsigned long addr, unsigned long start, unsigned long end)
 }
 
 static int
-do_ftrace_mod_code(unsigned long ip, void *new_code)
+do_ftrace_mod_code(unsigned long ip, void *new_code, int size)
 {
 	/*
 	 * On x86_64, kernel text mappings are mapped read-only with
@@ -240,6 +241,7 @@ do_ftrace_mod_code(unsigned long ip, void *new_code)
 
 	mod_code_ip = (void *)ip;
 	mod_code_newcode = new_code;
+	mod_code_size = size;
 
 	/* The buffers need to be visible before we let NMIs write them */
 	smp_mb();
@@ -290,7 +292,7 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code,
 		return -EINVAL;
 
 	/* replace the text with the new text */
-	if (do_ftrace_mod_code(ip, new_code))
+	if (do_ftrace_mod_code(ip, new_code, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	sync_core();
@@ -361,7 +363,7 @@ static int ftrace_mod_jmp(unsigned long ip,
 
 	*(int *)(&code[1]) = new_offset;
 
-	if (do_ftrace_mod_code(ip, &code))
+	if (do_ftrace_mod_code(ip, &code, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] ktrace - function trace support
  2011-02-03 15:42 [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Jiri Olsa
  2011-02-03 15:42 ` [PATCH 1/4] kprobe - ktrace instruction slot cache interface Jiri Olsa
  2011-02-03 15:42 ` [PATCH 2/4] tracing - adding size parameter to do_ftrace_mod_code Jiri Olsa
@ 2011-02-03 15:42 ` Jiri Olsa
  2011-02-03 15:42 ` [PATCH 4/4] ktrace - function graph " Jiri Olsa
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Jiri Olsa @ 2011-02-03 15:42 UTC (permalink / raw
  To: mingo, rostedt, fweisbec; +Cc: linux-kernel, masami.hiramatsu.pt

adding ktrace support with function tracer

wbr,
jirka
---
 Makefile                   |    2 +-
 arch/x86/Kconfig           |    2 +-
 arch/x86/kernel/Makefile   |    1 +
 arch/x86/kernel/entry_64.S |   23 +++
 arch/x86/kernel/ftrace.c   |  153 +++++++++++----------
 arch/x86/kernel/ktrace.c   |  256 ++++++++++++++++++++++++++++++++++
 include/linux/ftrace.h     |   36 +++++-
 kernel/trace/Kconfig       |   28 ++++-
 kernel/trace/Makefile      |    1 +
 kernel/trace/ftrace.c      |   11 ++
 kernel/trace/ktrace.c      |  330 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.c       |    1 +
 12 files changed, 764 insertions(+), 80 deletions(-)
 create mode 100644 arch/x86/kernel/ktrace.c
 create mode 100644 kernel/trace/ktrace.c

diff --git a/Makefile b/Makefile
index 66e7e97..26d3d60 100644
--- a/Makefile
+++ b/Makefile
@@ -577,7 +577,7 @@ ifdef CONFIG_DEBUG_INFO_REDUCED
 KBUILD_CFLAGS 	+= $(call cc-option, -femit-struct-debug-baseonly)
 endif
 
-ifdef CONFIG_FUNCTION_TRACER
+ifdef CONFIG_FTRACE_MCOUNT_RECORD
 KBUILD_CFLAGS	+= -pg
 ifdef CONFIG_DYNAMIC_FTRACE
 	ifdef CONFIG_HAVE_C_RECORDMCOUNT
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 95c36c4..a02718c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -38,7 +38,7 @@ config X86
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_GRAPH_FP_TEST
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
-	select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE
+	select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE || KTRACE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_KVM
 	select HAVE_ARCH_KGDB
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 34244b2..b664584 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -73,6 +73,7 @@ obj-$(CONFIG_X86_TRAMPOLINE)	+= trampoline_$(BITS).o
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse.o
 obj-y				+= apic/
 obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
+obj-$(CONFIG_KTRACE)		+= ktrace.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index aed1ffb..4d70019 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -62,6 +62,29 @@
 
 	.code64
 #ifdef CONFIG_FUNCTION_TRACER
+#ifdef CONFIG_KTRACE
+ENTRY(ktrace_callback)
+	cmpl $0, function_trace_stop
+	jne  ftrace_stub
+
+	cmpq $ftrace_stub, ftrace_trace_function
+	jnz ktrace_trace
+	retq
+
+ktrace_trace:
+	MCOUNT_SAVE_FRAME
+
+	movq 0x48(%rsp), %rdi
+	movq 0x50(%rsp), %rsi
+
+	call   *ftrace_trace_function
+
+	MCOUNT_RESTORE_FRAME
+
+	retq
+END(ktrace_callback)
+#endif /* CONFIG_KTRACE */
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 ENTRY(mcount)
 	retq
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 979ec14..ffa87f9 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -29,67 +29,7 @@
 #include <asm/nmi.h>
 
 
-#ifdef CONFIG_DYNAMIC_FTRACE
-
-/*
- * modifying_code is set to notify NMIs that they need to use
- * memory barriers when entering or exiting. But we don't want
- * to burden NMIs with unnecessary memory barriers when code
- * modification is not being done (which is most of the time).
- *
- * A mutex is already held when ftrace_arch_code_modify_prepare
- * and post_process are called. No locks need to be taken here.
- *
- * Stop machine will make sure currently running NMIs are done
- * and new NMIs will see the updated variable before we need
- * to worry about NMIs doing memory barriers.
- */
-static int modifying_code __read_mostly;
-static DEFINE_PER_CPU(int, save_modifying_code);
-
-int ftrace_arch_code_modify_prepare(void)
-{
-	set_kernel_text_rw();
-	set_all_modules_text_rw();
-	modifying_code = 1;
-	return 0;
-}
-
-int ftrace_arch_code_modify_post_process(void)
-{
-	modifying_code = 0;
-	set_all_modules_text_ro();
-	set_kernel_text_ro();
-	return 0;
-}
-
-union ftrace_code_union {
-	char code[MCOUNT_INSN_SIZE];
-	struct {
-		char e8;
-		int offset;
-	} __attribute__((packed));
-};
-
-static int ftrace_calc_offset(long ip, long addr)
-{
-	return (int)(addr - ip);
-}
-
-static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
-{
-	static union ftrace_code_union calc;
-
-	calc.e8		= 0xe8;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
-
-	/*
-	 * No locking needed, this must be called via kstop_machine
-	 * which in essence is like running on a uniprocessor machine.
-	 */
-	return calc.code;
-}
-
+#if defined(CONFIG_DYNAMIC_FTRACE) || defined(CONFIG_KTRACE)
 /*
  * Modifying code must take extra care. On an SMP machine, if
  * the code being modified is also being executed on another CPU
@@ -129,15 +69,21 @@ static int mod_code_size;		/* holds the size of the new code */
 static unsigned nmi_wait_count;
 static atomic_t nmi_update_count = ATOMIC_INIT(0);
 
-int ftrace_arch_read_dyn_info(char *buf, int size)
-{
-	int r;
-
-	r = snprintf(buf, size, "%u %u",
-		     nmi_wait_count,
-		     atomic_read(&nmi_update_count));
-	return r;
-}
+/*
+ * modifying_code is set to notify NMIs that they need to use
+ * memory barriers when entering or exiting. But we don't want
+ * to burden NMIs with unnecessary memory barriers when code
+ * modification is not being done (which is most of the time).
+ *
+ * A mutex is already held when ftrace_arch_code_modify_prepare
+ * and post_process are called. No locks need to be taken here.
+ *
+ * Stop machine will make sure currently running NMIs are done
+ * and new NMIs will see the updated variable before we need
+ * to worry about NMIs doing memory barriers.
+ */
+static int modifying_code __read_mostly;
+static DEFINE_PER_CPU(int, save_modifying_code);
 
 static void clear_mod_flag(void)
 {
@@ -226,7 +172,7 @@ within(unsigned long addr, unsigned long start, unsigned long end)
 }
 
 static int
-do_ftrace_mod_code(unsigned long ip, void *new_code, int size)
+__do_ftrace_mod_code(unsigned long ip, void *new_code, int size)
 {
 	/*
 	 * On x86_64, kernel text mappings are mapped read-only with
@@ -262,6 +208,67 @@ do_ftrace_mod_code(unsigned long ip, void *new_code, int size)
 	return mod_code_status;
 }
 
+int do_ftrace_mod_code(unsigned long ip, void *new_code, int size)
+{
+	return __do_ftrace_mod_code(ip, new_code, size);
+}
+
+int ftrace_arch_code_modify_post_process(void)
+{
+	modifying_code = 0;
+	set_all_modules_text_ro();
+	set_kernel_text_ro();
+	return 0;
+}
+
+int ftrace_arch_code_modify_prepare(void)
+{
+	set_kernel_text_rw();
+	set_all_modules_text_rw();
+	modifying_code = 1;
+	return 0;
+}
+
+#endif
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+int ftrace_arch_read_dyn_info(char *buf, int size)
+{
+	int r;
+
+	r = snprintf(buf, size, "%u %u",
+		     nmi_wait_count,
+		     atomic_read(&nmi_update_count));
+	return r;
+}
+
+union ftrace_code_union {
+	char code[MCOUNT_INSN_SIZE];
+	struct {
+		char e8;
+		int offset;
+	} __attribute__((packed));
+};
+
+static int ftrace_calc_offset(long ip, long addr)
+{
+	return (int)(addr - ip);
+}
+
+static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
+{
+	static union ftrace_code_union calc;
+
+	calc.e8		= 0xe8;
+	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+
+	/*
+	 * No locking needed, this must be called via kstop_machine
+	 * which in essence is like running on a uniprocessor machine.
+	 */
+	return calc.code;
+}
+
 static unsigned char *ftrace_nop_replace(void)
 {
 	return ideal_nop5;
@@ -292,7 +299,7 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code,
 		return -EINVAL;
 
 	/* replace the text with the new text */
-	if (do_ftrace_mod_code(ip, new_code, MCOUNT_INSN_SIZE))
+	if (__do_ftrace_mod_code(ip, new_code, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	sync_core();
@@ -363,7 +370,7 @@ static int ftrace_mod_jmp(unsigned long ip,
 
 	*(int *)(&code[1]) = new_offset;
 
-	if (do_ftrace_mod_code(ip, &code, MCOUNT_INSN_SIZE))
+	if (__do_ftrace_mod_code(ip, &code, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	return 0;
diff --git a/arch/x86/kernel/ktrace.c b/arch/x86/kernel/ktrace.c
new file mode 100644
index 0000000..2bfaa77
--- /dev/null
+++ b/arch/x86/kernel/ktrace.c
@@ -0,0 +1,256 @@
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/ftrace.h>
+#include <asm/insn.h>
+#include <asm/nops.h>
+#include <linux/kprobes.h>
+
+static void __used ktrace_template_holder(void)
+{
+	asm volatile (
+		".global ktrace_template_entry \n"
+		"ktrace_template_entry: \n"
+		"	pushfq \n"
+
+		".global ktrace_template_call \n"
+		"ktrace_template_call: \n"
+		ASM_NOP5
+
+		"	popfq \n"
+		/* eat ret value */
+		"	addq $8, %rsp \n"
+		".global ktrace_template_end \n"
+		"ktrace_template_end: \n"
+	);
+}
+
+extern u8 ktrace_template_entry;
+extern u8 ktrace_template_end;
+extern u8 ktrace_template_call;
+
+extern void ktrace_callback(void);
+
+#define TMPL_CALL_IDX \
+        ((long)&ktrace_template_call - (long)&ktrace_template_entry)
+
+#define TMPL_END_IDX \
+        ((long)&ktrace_template_end - (long)&ktrace_template_entry)
+
+#define RELATIVECALL_SIZE 5
+#define RELATIVE_ADDR_SIZE 4
+#define RELATIVECALL_OPCODE 0xe8
+#define RELATIVEJUMP_OPCODE 0xe9
+#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
+
+#define MAX_KTRACE_INSN_SIZE                          \
+	(((unsigned long)&ktrace_template_end -       \
+	  (unsigned long)&ktrace_template_entry) +    \
+	MAX_OPTIMIZED_LENGTH + RELATIVECALL_SIZE)
+
+#define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
+	(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
+	  (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) |   \
+	  (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) |   \
+	  (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf))    \
+	 << (row % 32))
+	/*
+	 * Undefined/reserved opcodes, conditional jump, Opcode Extension
+	 * Groups, and some special opcodes can not boost.
+	 */
+static const u32 twobyte_is_boostable[256 / 32] = {
+	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
+	/*      ----------------------------------------------          */
+	W(0x00, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0) | /* 00 */
+	W(0x10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 10 */
+	W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 20 */
+	W(0x30, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 30 */
+	W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
+	W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */
+	W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1) | /* 60 */
+	W(0x70, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) , /* 70 */
+	W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 80 */
+	W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
+	W(0xa0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1) | /* a0 */
+	W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) , /* b0 */
+	W(0xc0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* c0 */
+	W(0xd0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1) , /* d0 */
+	W(0xe0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1) | /* e0 */
+	W(0xf0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0)   /* f0 */
+	/*      -----------------------------------------------         */
+	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
+};
+#undef W
+
+static int __copy_instruction(u8 *dest, u8 *src)
+{
+	struct insn insn;
+
+	kernel_insn_init(&insn, src);
+	insn_get_length(&insn);
+	memcpy(dest, insn.kaddr, insn.length);
+
+#ifdef CONFIG_X86_64
+	if (insn_rip_relative(&insn)) {
+		s64 newdisp;
+		u8 *disp;
+		kernel_insn_init(&insn, dest);
+		insn_get_displacement(&insn);
+		/*
+		 * The copied instruction uses the %rip-relative addressing
+		 * mode.  Adjust the displacement for the difference between
+		 * the original location of this instruction and the location
+		 * of the copy that will actually be run.  The tricky bit here
+		 * is making sure that the sign extension happens correctly in
+		 * this calculation, since we need a signed 32-bit result to
+		 * be sign-extended to 64 bits when it's added to the %rip
+		 * value and yield the same 64-bit result that the sign-
+		 * extension of the original signed 32-bit displacement would
+		 * have given.
+		 */
+		newdisp = (u8 *) src + (s64) insn.displacement.value -
+			  (u8 *) dest;
+		BUG_ON((s64) (s32) newdisp != newdisp); /* Sanity check.  */
+		disp = (u8 *) dest + insn_offset_displacement(&insn);
+		*(s32 *) disp = (s32) newdisp;
+	}
+#endif
+	return insn.length;
+}
+
+static int can_boost(u8 *opcodes)
+{
+	u8 opcode;
+	u8 *orig_opcodes = opcodes;
+
+	if (search_exception_tables((unsigned long)opcodes))
+		return 0;	/* Page fault may occur on this address. */
+
+retry:
+	if (opcodes - orig_opcodes > MAX_INSN_SIZE - 1)
+		return 0;
+	opcode = *(opcodes++);
+
+	/* 2nd-byte opcode */
+	if (opcode == 0x0f) {
+		if (opcodes - orig_opcodes > MAX_INSN_SIZE - 1)
+			return 0;
+		return test_bit(*opcodes,
+				(unsigned long *)twobyte_is_boostable);
+	}
+
+	switch (opcode & 0xf0) {
+#ifdef CONFIG_X86_64
+	case 0x40:
+		goto retry; /* REX prefix is boostable */
+#endif
+	case 0x60:
+		if (0x63 < opcode && opcode < 0x67)
+			goto retry; /* prefixes */
+		/* can't boost Address-size override and bound */
+		return (opcode != 0x62 && opcode != 0x67);
+	case 0x70:
+		return 0; /* can't boost conditional jump */
+	case 0xc0:
+		/* can't boost software-interruptions */
+		return (0xc1 < opcode && opcode < 0xcc) || opcode == 0xcf;
+	case 0xd0:
+		/* can boost AA* and XLAT */
+		return (opcode == 0xd4 || opcode == 0xd5 || opcode == 0xd7);
+	case 0xe0:
+		/* can boost in/out and absolute jmps */
+		return ((opcode & 0x04) || opcode == 0xea);
+	case 0xf0:
+		if ((opcode & 0x0c) == 0 && opcode != 0xf1)
+			goto retry; /* lock/rep(ne) prefix */
+		/* clear and set flags are boostable */
+		return (opcode == 0xf5 || (0xf7 < opcode && opcode < 0xfe));
+	default:
+		/* segment override prefixes are boostable */
+		if (opcode == 0x26 || opcode == 0x36 || opcode == 0x3e)
+			goto retry; /* prefixes */
+		/* CS override prefix and call are not boostable */
+		return (opcode != 0x2e && opcode != 0x9a);
+	}
+}
+
+static int copy_instructions(u8 *dest, u8 *src)
+{
+	int len = 0, ret;
+
+	while (len < RELATIVECALL_SIZE) {
+		ret = __copy_instruction(dest + len, src + len);
+		if (!ret || !can_boost(dest + len))
+			return -EINVAL;
+		len += ret;
+	}
+
+	return len;
+}
+
+static void synthesize_relative_insn(u8 *buf, void *from, void *to, u8 op)
+{
+	struct __arch_relative_insn {
+		u8 op;
+		s32 raddr;
+	} __attribute__((packed)) *insn;
+
+	insn = (struct __arch_relative_insn *) buf;
+	insn->raddr = (s32)((long)(to) - ((long)(from) + 5));
+	insn->op = op;
+}
+
+void ktrace_enable_sym(struct ktrace_symbol *ksym)
+{
+	u8 call_buf[RELATIVECALL_SIZE];
+
+	synthesize_relative_insn(call_buf,
+				 ksym->addr,
+				 ksym->insn_templ,
+				 RELATIVECALL_OPCODE);
+
+	do_ftrace_mod_code((unsigned long) ksym->addr,
+			   call_buf, RELATIVECALL_SIZE);
+	ksym->enabled = 1;
+}
+
+void ktrace_disable_sym(struct ktrace_symbol *ksym)
+{
+	do_ftrace_mod_code((unsigned long) ksym->addr,
+			   ksym->insn_saved,
+			   ksym->insn_saved_size);
+	ksym->enabled = 0;
+}
+
+int ktrace_init_template(struct ktrace_symbol *ksym)
+{
+	u8* insn_templ = ksym->insn_templ;
+	u8 *addr = ksym->addr;
+	int size;
+
+	size = copy_instructions(insn_templ + TMPL_END_IDX, addr);
+	if (size < 0)
+		return -EINVAL;
+
+	memcpy(insn_templ, &ktrace_template_entry, TMPL_END_IDX);
+
+	synthesize_relative_insn(insn_templ + TMPL_END_IDX + size,
+				 insn_templ + TMPL_END_IDX + size,
+				 addr + size,
+				 RELATIVEJUMP_OPCODE);
+
+	synthesize_relative_insn(insn_templ + TMPL_CALL_IDX,
+				 insn_templ + TMPL_CALL_IDX,
+				 ktrace_callback,
+				 RELATIVECALL_OPCODE);
+
+	ksym->insn_saved = insn_templ + TMPL_END_IDX;
+	ksym->insn_saved_size = size;
+	return 0;
+}
+
+int __init ktrace_arch_init(void)
+{
+	ktrace_insn_init(MAX_KTRACE_INSN_SIZE);
+	return 0;
+}
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dcd6a7c..11c3d5b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -116,9 +116,6 @@ struct ftrace_func_command {
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
-int ftrace_arch_code_modify_prepare(void);
-int ftrace_arch_code_modify_post_process(void);
-
 struct seq_file;
 
 struct ftrace_probe_ops {
@@ -530,4 +527,37 @@ unsigned long arch_syscall_addr(int nr);
 
 #endif /* CONFIG_FTRACE_SYSCALLS */
 
+#ifdef CONFIG_KTRACE
+enum {
+	KTRACE_ENABLE,
+	KTRACE_DISABLE
+};
+
+struct ktrace_symbol {
+        struct list_head list;
+        int enabled;
+
+        u8 *addr;
+        u8 *insn_templ;
+        u8 *insn_saved;
+        int insn_saved_size;
+};
+
+extern void ktrace_init(void);
+extern int ktrace_init_template(struct ktrace_symbol *ksym);
+extern int ktrace_arch_init(void);
+extern void ktrace_startup(void);
+extern void ktrace_shutdown(void);
+extern void ktrace_enable_sym(struct ktrace_symbol *ksym);
+extern void ktrace_disable_sym(struct ktrace_symbol *ksym);
+#else
+static inline void ktrace_init(void) {}
+#endif /* CONFIG_KTRACE */
+
+#if defined CONFIG_DYNAMIC_FTRACE || defined CONFIG_KTRACE
+extern int do_ftrace_mod_code(unsigned long ip, void *new_code, int size);
+extern int ftrace_arch_code_modify_prepare(void);
+extern int ftrace_arch_code_modify_post_process(void);
+#endif
+
 #endif /* _LINUX_FTRACE_H */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 14674dc..1cf0aba 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -140,8 +140,6 @@ if FTRACE
 
 config FUNCTION_TRACER
 	bool "Kernel Function Tracer"
-	depends on HAVE_FUNCTION_TRACER
-	select FRAME_POINTER if !ARM_UNWIND && !S390
 	select KALLSYMS
 	select GENERIC_TRACER
 	select CONTEXT_SWITCH_TRACER
@@ -168,6 +166,30 @@ config FUNCTION_GRAPH_TRACER
 	  the return value. This is done by setting the current return
 	  address on the current task structure into a stack of calls.
 
+config KTRACE
+	bool
+	depends on FTRACER_ENG_KTRACE
+
+choice
+	prompt "Function trace engine"
+	default FTRACER_ENG_MCOUNT_RECORD
+	depends on FUNCTION_TRACER
+
+config FTRACER_ENG_MCOUNT_RECORD
+	bool "mcount"
+	depends on HAVE_FUNCTION_TRACER
+	select FRAME_POINTER if !ARM_UNWIND && !S390
+	help
+	  standard -pg mcount record generation
+
+config FTRACER_ENG_KTRACE
+	bool "ktrace"
+	select KTRACE
+	help
+	  dynamic call probes
+
+endchoice
+
 
 config IRQSOFF_TRACER
 	bool "Interrupts-off Latency Tracer"
@@ -389,6 +411,7 @@ config DYNAMIC_FTRACE
 	bool "enable/disable ftrace tracepoints dynamically"
 	depends on FUNCTION_TRACER
 	depends on HAVE_DYNAMIC_FTRACE
+	depends on FTRACER_ENG_MCOUNT_RECORD
 	default y
 	help
           This option will modify all the calls to ftrace dynamically
@@ -422,6 +445,7 @@ config FTRACE_MCOUNT_RECORD
 	def_bool y
 	depends on DYNAMIC_FTRACE
 	depends on HAVE_FTRACE_MCOUNT_RECORD
+	depends on FTRACER_ENG_MCOUNT_RECORD
 
 config FTRACE_SELFTEST
 	bool
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 761c510..f557200 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -21,6 +21,7 @@ endif
 #
 obj-y += trace_clock.o
 
+obj-$(CONFIG_KTRACE) += ktrace.o
 obj-$(CONFIG_FUNCTION_TRACER) += libftrace.o
 obj-$(CONFIG_RING_BUFFER) += ring_buffer.o
 obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ring_buffer_benchmark.o
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index f3dadae..762e2b3 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3152,7 +3152,12 @@ int register_ftrace_function(struct ftrace_ops *ops)
 	mutex_lock(&ftrace_lock);
 
 	ret = __register_ftrace_function(ops);
+
+#ifdef CONFIG_KTRACE
+	ktrace_startup();
+#else
 	ftrace_startup(0);
+#endif
 
 	mutex_unlock(&ftrace_lock);
 	return ret;
@@ -3170,7 +3175,13 @@ int unregister_ftrace_function(struct ftrace_ops *ops)
 
 	mutex_lock(&ftrace_lock);
 	ret = __unregister_ftrace_function(ops);
+
+#ifdef CONFIG_KTRACE
+	ktrace_shutdown();
+#else
 	ftrace_shutdown(0);
+#endif
+
 	mutex_unlock(&ftrace_lock);
 
 	return ret;
diff --git a/kernel/trace/ktrace.c b/kernel/trace/ktrace.c
new file mode 100644
index 0000000..3e45e2c
--- /dev/null
+++ b/kernel/trace/ktrace.c
@@ -0,0 +1,330 @@
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/kallsyms.h>
+#include <linux/ctype.h>
+#include <linux/slab.h>
+#include <linux/kprobes.h>
+#include <linux/slab.h>
+#include <linux/stop_machine.h>
+
+#include "trace.h"
+
+static DEFINE_MUTEX(symbols_mutex);
+static LIST_HEAD(symbols);
+
+static struct kmem_cache *symbols_cache;
+static int ktrace_disabled;
+static int ktrace_enabled;
+
+static void ktrace_enable_all(void);
+
+static struct ktrace_symbol* ktrace_find_symbol(u8 *addr)
+{
+	struct ktrace_symbol *ksym, *found = NULL;
+
+	mutex_lock(&symbols_mutex);
+
+	list_for_each_entry(ksym, &symbols, list) {
+		if (ksym->addr == addr) {
+			found = ksym;
+			break;
+		}
+	}
+
+	mutex_unlock(&symbols_mutex);
+	return found;
+}
+
+static int ktrace_unregister_symbol(struct ktrace_symbol *ksym)
+{
+	free_ktrace_insn_slot(ksym->insn_templ, 1);
+	kmem_cache_free(symbols_cache, ksym);
+	return 0;
+}
+
+static int ktrace_unregister_all_symbols(void)
+{
+	struct ktrace_symbol *ksym, *n;
+
+	if (ktrace_enabled)
+		return -EINVAL;
+
+	mutex_lock(&symbols_mutex);
+
+	list_for_each_entry_safe(ksym, n, &symbols, list) {
+		list_del(&ksym->list);
+		ktrace_unregister_symbol(ksym);
+	}
+
+	mutex_unlock(&symbols_mutex);
+	return 0;
+}
+
+static int ktrace_register_symbol(char *symbol)
+{
+	struct ktrace_symbol *ksym;
+	u8 *addr, *insn_templ;
+	int ret = -ENOMEM;
+
+	/* Is it really symbol address. */
+	addr = (void*) kallsyms_lookup_name(symbol);
+	if (!addr)
+		return -EINVAL;
+
+	/* Is it already registered. */
+	if (ktrace_find_symbol(addr))
+		return -EINVAL;
+
+	/* Register new symbol. */
+	ksym = kmem_cache_zalloc(symbols_cache, GFP_KERNEL);
+	if (!ksym)
+		return -ENOMEM;
+
+	insn_templ = get_ktrace_insn_slot();
+	if (!insn_templ)
+		goto err_release_ksym;
+
+	ksym->insn_templ = insn_templ;
+	ksym->addr = addr;
+
+	ret = ktrace_init_template(ksym);
+	if (ret)
+		goto err_release_insn;
+
+	mutex_lock(&symbols_mutex);
+	list_add(&ksym->list, &symbols);
+	mutex_unlock(&symbols_mutex);
+
+	return 0;
+
+ err_release_insn:
+	free_ktrace_insn_slot(insn_templ, 1);
+
+ err_release_ksym:
+	kmem_cache_free(symbols_cache, ksym);
+
+	return ret;
+}
+
+static inline int
+within(unsigned long addr, unsigned long start, unsigned long end)
+{
+	return addr >= start && addr < end;
+}
+
+static int ktrace_symbol(void *data, const char *symbol,
+		  struct module *mod, unsigned long addr)
+{
+	if (!within(addr, (unsigned long)_text, (unsigned long)_etext))
+		return 0;
+
+	ktrace_register_symbol((char*) symbol);
+	return 0;
+}
+
+static int ktrace_register_all(void)
+{
+	printk("not supported\n");
+	return 0;
+
+	kallsyms_on_each_symbol(ktrace_symbol, NULL);
+	return 0;
+}
+
+static void *ktrace_start(struct seq_file *m, loff_t *pos)
+{
+	mutex_lock(&symbols_mutex);
+
+	if (list_empty(&symbols) && (!*pos))
+		return (void *) 1;
+
+	return seq_list_start(&symbols, *pos);
+}
+
+static void *ktrace_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	if (v == (void *)1)
+		return NULL;
+
+	return seq_list_next(v, &symbols, pos);
+}
+
+static void ktrace_stop(struct seq_file *m, void *p)
+{
+	mutex_unlock(&symbols_mutex);
+}
+
+static int ktrace_show(struct seq_file *m, void *v)
+{
+	const struct ktrace_symbol *ksym = list_entry(v, struct ktrace_symbol, list);
+
+	if (v == (void *)1) {
+		seq_printf(m, "no symbol\n");
+		return 0;
+	}
+
+	seq_printf(m, "%ps\n", ksym->addr);
+	return 0;
+}
+
+static const struct seq_operations ktrace_sops = {
+        .start = ktrace_start,
+        .next = ktrace_next,
+        .stop = ktrace_stop,
+        .show = ktrace_show,
+};
+
+static int
+ktrace_open(struct inode *inode, struct file *file)
+{
+	int ret = 0;
+
+	if ((file->f_mode & FMODE_WRITE) &&
+	    (file->f_flags & O_TRUNC))
+		ktrace_unregister_all_symbols();
+
+	if (file->f_mode & FMODE_READ)
+		ret = seq_open(file, &ktrace_sops);
+
+	return ret;
+}
+
+static ssize_t
+ktrace_write(struct file *filp, const char __user *ubuf,
+                      size_t cnt, loff_t *ppos)
+{
+#define SYMMAX 50
+	char symbol[SYMMAX];
+	int ret, i;
+
+	if (cnt >= SYMMAX)
+		return -EINVAL;
+
+	if (copy_from_user(&symbol, ubuf, cnt))
+		return -EFAULT;
+
+	symbol[cnt] = 0;
+
+	for (i = cnt - 1;
+	     i >= 0 && (isspace(symbol[i]) || (symbol[i] == '\n')); i--)
+		symbol[i] = 0;
+
+	if (!symbol[0])
+		return cnt;
+
+	if (!strcmp(symbol, "all"))
+		ret = ktrace_register_all();
+	else
+		ret = ktrace_register_symbol(symbol);
+
+	if (ret)
+		return ret;
+
+	if (ktrace_enabled)
+		ktrace_startup();
+
+	return ret ? ret : cnt;
+}
+
+static const struct file_operations ktrace_fops = {
+	.open           = ktrace_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.write          = ktrace_write,
+};
+
+static void ktrace_enable_all(void)
+{
+	struct ktrace_symbol *ksym;
+
+	list_for_each_entry(ksym, &symbols, list) {
+		if (ksym->enabled)
+			continue;
+
+		ktrace_enable_sym(ksym);
+	}
+
+	ktrace_enabled = 1;
+}
+
+static void ktrace_disable_all(void)
+{
+	struct ktrace_symbol *ksym;
+
+	list_for_each_entry(ksym, &symbols, list) {
+		if (ksym->enabled)
+			continue;
+
+		ktrace_disable_sym(ksym);
+	}
+
+	ktrace_enabled = 0;
+}
+
+static int __ktrace_modify_code(void *data)
+{
+	int *command = data;
+
+	if (*command == KTRACE_ENABLE)
+		ktrace_enable_all();
+
+	if (*command == KTRACE_DISABLE)
+		ktrace_disable_all();
+
+	return 0;
+}
+
+#define FTRACE_WARN_ON(cond)	\
+do {				\
+	if (WARN_ON(cond))	\
+	ftrace_kill();		\
+} while (0)
+
+static void ktrace_run_update_code(int command)
+{
+	int ret;
+
+	if (ktrace_disabled)
+		return;
+
+	ret = ftrace_arch_code_modify_prepare();
+	FTRACE_WARN_ON(ret);
+	if (ret)
+		return;
+
+	stop_machine(__ktrace_modify_code, &command, NULL);
+
+	ret = ftrace_arch_code_modify_post_process();
+	FTRACE_WARN_ON(ret);
+}
+
+void ktrace_startup(void)
+{
+	ktrace_run_update_code(KTRACE_ENABLE);
+}
+
+void ktrace_shutdown(void)
+{
+	ktrace_run_update_code(KTRACE_DISABLE);
+}
+
+void __init ktrace_init(void)
+{
+	struct dentry *d_tracer = tracing_init_dentry();
+
+	trace_create_file("ktrace", 0644, d_tracer,
+			NULL, &ktrace_fops);
+
+	symbols_cache = KMEM_CACHE(ktrace_symbol, 0);
+	if (!symbols_cache) {
+		printk("ktrace disabled - kmem cache allocation failed\n");
+		ktrace_disabled = 1;
+		return;
+	}
+
+	ktrace_arch_init();
+	printk("ktrace initialized\n");
+}
+
+MODULE_LICENSE("GPL");
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index dc53ecb..b901c94 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4361,6 +4361,7 @@ static __init int tracer_init_debugfs(void)
 	for_each_tracing_cpu(cpu)
 		tracing_init_debugfs_percpu(cpu);
 
+	ktrace_init();
 	return 0;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] ktrace - function graph trace support
  2011-02-03 15:42 [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Jiri Olsa
                   ` (2 preceding siblings ...)
  2011-02-03 15:42 ` [PATCH 3/4] ktrace - function trace support Jiri Olsa
@ 2011-02-03 15:42 ` Jiri Olsa
  2011-02-03 16:33 ` [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Steven Rostedt
  2011-02-04  6:03 ` Masami Hiramatsu
  5 siblings, 0 replies; 9+ messages in thread
From: Jiri Olsa @ 2011-02-03 15:42 UTC (permalink / raw
  To: mingo, rostedt, fweisbec; +Cc: linux-kernel, masami.hiramatsu.pt

adding function graph support

wbr,
jirka
---
 arch/x86/Kconfig           |    2 +-
 arch/x86/kernel/entry_64.S |   27 +++++++++++++++++++++++++++
 kernel/trace/ftrace.c      |   10 ++++++++++
 3 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a02718c..befe1e0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -36,7 +36,7 @@ config X86
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_GRAPH_TRACER
-	select HAVE_FUNCTION_GRAPH_FP_TEST
+	select HAVE_FUNCTION_GRAPH_FP_TEST if !KTRACE
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
 	select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE || KTRACE
 	select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4d70019..ec9e234 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -69,6 +69,14 @@ ENTRY(ktrace_callback)
 
 	cmpq $ftrace_stub, ftrace_trace_function
 	jnz ktrace_trace
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	cmpq $ftrace_stub, ftrace_graph_return
+	jnz ktrace_graph_caller
+
+	cmpq $ftrace_graph_entry_stub, ftrace_graph_entry
+	jnz ktrace_graph_caller
+#endif
 	retq
 
 ktrace_trace:
@@ -83,6 +91,25 @@ ktrace_trace:
 
 	retq
 END(ktrace_callback)
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ENTRY(ktrace_graph_caller)
+	cmpl $0, function_trace_stop
+	jne ftrace_stub
+
+	MCOUNT_SAVE_FRAME
+
+	leaq 0x50(%rsp), %rdi
+	movq 0x48(%rsp), %rsi
+	movq $0, %rdx
+
+	call	prepare_ftrace_return
+
+	MCOUNT_RESTORE_FRAME
+
+	retq
+END(ktrace_graph_caller)
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 #endif /* CONFIG_KTRACE */
 
 #ifdef CONFIG_DYNAMIC_FTRACE
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 762e2b3..f6e30a8 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3404,7 +3404,11 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
 	ftrace_graph_return = retfunc;
 	ftrace_graph_entry = entryfunc;
 
+#ifdef CONFIG_KTRACE
+	ktrace_startup();
+#else
 	ftrace_startup(FTRACE_START_FUNC_RET);
+#endif
 
 out:
 	mutex_unlock(&ftrace_lock);
@@ -3421,7 +3425,13 @@ void unregister_ftrace_graph(void)
 	ftrace_graph_active--;
 	ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
 	ftrace_graph_entry = ftrace_graph_entry_stub;
+
+#ifdef CONFIG_KTRACE
+	ktrace_shutdown();
+#else
 	ftrace_shutdown(FTRACE_STOP_FUNC_RET);
+#endif
+
 	unregister_pm_notifier(&ftrace_suspend_notifier);
 	unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer
  2011-02-03 15:42 [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Jiri Olsa
                   ` (3 preceding siblings ...)
  2011-02-03 15:42 ` [PATCH 4/4] ktrace - function graph " Jiri Olsa
@ 2011-02-03 16:33 ` Steven Rostedt
  2011-02-03 17:35   ` Frederic Weisbecker
  2011-02-04  6:03 ` Masami Hiramatsu
  5 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2011-02-03 16:33 UTC (permalink / raw
  To: Jiri Olsa; +Cc: mingo, fweisbec, linux-kernel, masami.hiramatsu.pt

On Thu, 2011-02-03 at 16:42 +0100, Jiri Olsa wrote:
> hi,
> 
> I recently saw the direct jump probing made for kprobes
> and tried to use it inside the trace framework.
> 
> The global idea is patching the function entry with direct
> jump to the trace code, instead of using pregenerated gcc
> profile code.

Interesting, but ideally, it would be nice if gcc provided a better
"mcount" mechanism. One that calls mcount (or whatever new name it would
have) before it does anything with the stack.

> 
> I started this just to see if it would be even possible
> to hook with new probing to the current trace code. It
> appears it's not that bad. I was able to run function
> and function_graph trace on x86_64.
> 
> For details on direct jumps probe, please check:
> http://www.linuxinsight.com/ols2007-djprobe-kernel-probing-with-the-smallest-overhead.html
> 
> 
> I realize using this way to hook the functions has some
> drawbacks, from what I can see it's roughly:
> - no all functions could be patched

What's the reason for not all functions?

> - need to find a way to say which function is safe to patch
> - memory consumption for detour buffers and symbol records
> 
> but seems there're some advantages as well:
> - trace code could be in a module

What makes this allow module code?

ftrace could do that now, but it would require a separate handler. I
would need to disable preemption before calling the module code function
handler.

> - no profiling code is needed
> - framepointer can be disabled (framepointer is needed for
>   generating profile code)

Again ideally, gcc should fix this.

-- Steve



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer
  2011-02-03 16:33 ` [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Steven Rostedt
@ 2011-02-03 17:35   ` Frederic Weisbecker
  2011-02-03 19:00     ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2011-02-03 17:35 UTC (permalink / raw
  To: Steven Rostedt; +Cc: Jiri Olsa, mingo, linux-kernel, masami.hiramatsu.pt

On Thu, Feb 03, 2011 at 11:33:25AM -0500, Steven Rostedt wrote:
> On Thu, 2011-02-03 at 16:42 +0100, Jiri Olsa wrote:
> > hi,
> > 
> > I recently saw the direct jump probing made for kprobes
> > and tried to use it inside the trace framework.
> > 
> > The global idea is patching the function entry with direct
> > jump to the trace code, instead of using pregenerated gcc
> > profile code.
> 
> Interesting, but ideally, it would be nice if gcc provided a better
> "mcount" mechanism. One that calls mcount (or whatever new name it would
> have) before it does anything with the stack.
> 
> > 
> > I started this just to see if it would be even possible
> > to hook with new probing to the current trace code. It
> > appears it's not that bad. I was able to run function
> > and function_graph trace on x86_64.
> > 
> > For details on direct jumps probe, please check:
> > http://www.linuxinsight.com/ols2007-djprobe-kernel-probing-with-the-smallest-overhead.html
> > 
> > 
> > I realize using this way to hook the functions has some
> > drawbacks, from what I can see it's roughly:
> > - no all functions could be patched
> 
> What's the reason for not all functions?

Because of those that kprobes calls, so to avoid recursion.
kprobes has some recursion detection mechanism, IIRC, but
until we reach that checkpoint, I think there are some functions
in the path.

Well, ftrace has the same problem. That's just due to the nature of
function tracing.

There may be some places too fragile to use kprobes there too.

Ah, the whole trap path for example :-(

> > - need to find a way to say which function is safe to patch
> > - memory consumption for detour buffers and symbol records
> > 
> > but seems there're some advantages as well:
> > - trace code could be in a module
> 
> What makes this allow module code?
> 
> ftrace could do that now, but it would require a separate handler. I
> would need to disable preemption before calling the module code function
> handler.

Kprobes takes care of handlers from modules already.
I'm not sure we want that, it makes the tracing code more sensitive.

Look, for example I think kprobes doesn't trace kernel faults path
because module space is allocated through vmalloc (hmm, is it still
the case?).

> > - no profiling code is needed
> > - framepointer can be disabled (framepointer is needed for
> >   generating profile code)
> 
> Again ideally, gcc should fix this.

As another drawback of using kprobes, there is also the overhead.
I can't imagine a trap triggering for every functions. But then
yeah we have the jmp optimisation. But then it needs that detour
buffer that we can avoid with mcount.

So like Steve I think mcount is still a better backend for function
tracing. More optimized by nature, even though it indeed needs
some fixes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer
  2011-02-03 17:35   ` Frederic Weisbecker
@ 2011-02-03 19:00     ` Steven Rostedt
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2011-02-03 19:00 UTC (permalink / raw
  To: Masami Hiramatsu; +Cc: Frederic Weisbecker, Jiri Olsa, mingo, linux-kernel

On Thu, 2011-02-03 at 18:35 +0100, Frederic Weisbecker wrote:

> > ftrace could do that now, but it would require a separate handler. I
> > would need to disable preemption before calling the module code function
> > handler.
> 
> Kprobes takes care of handlers from modules already.
> I'm not sure we want that, it makes the tracing code more sensitive.

Masami,

I'm looking at the optimize code, particularly
kprobes_optinsn_template_holder(), which looks to be the template that
is called on optimized kprobes. I don't see where preemption or
interrupts are disabled when a probe is called.

If modules can register probes, and we can call it in any arbitrary
location of the kernel, then preemption must be disabled prior to
calling the module code. Otherwise you risk crashing the system on
module unload.


module:
-------
register_kprobe(probe);


Core:
-----
hit break point
call probe

      module:
      -------
      in probe function
      preempted

module:
-------
unregister_kprobe(probe);
stop_machine();
<module unloaded>

Core:
-----
      module <zombie>:
      ----------------
      gets CPU again
      executes module code that's been freed
      DEATH BY ZOMBIES

Maybe I missed something. But does the optimize kprobes disable
preemption or interrupts before calling the optimized probe?

-- Steve



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer
  2011-02-03 15:42 [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Jiri Olsa
                   ` (4 preceding siblings ...)
  2011-02-03 16:33 ` [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Steven Rostedt
@ 2011-02-04  6:03 ` Masami Hiramatsu
  5 siblings, 0 replies; 9+ messages in thread
From: Masami Hiramatsu @ 2011-02-04  6:03 UTC (permalink / raw
  To: Jiri Olsa; +Cc: mingo, rostedt, fweisbec, linux-kernel, 2nddept-manager

Hi,

(2011/02/04 0:42), Jiri Olsa wrote:
> hi,
> 
> I recently saw the direct jump probing made for kprobes
> and tried to use it inside the trace framework.
> 
> The global idea is patching the function entry with direct
> jump to the trace code, instead of using pregenerated gcc
> profile code.
> 
> I started this just to see if it would be even possible
> to hook with new probing to the current trace code. It
> appears it's not that bad. I was able to run function
> and function_graph trace on x86_64.
> 
> For details on direct jumps probe, please check:
> http://www.linuxinsight.com/ols2007-djprobe-kernel-probing-with-the-smallest-overhead.html

Thank you for referring it ;-)

> I realize using this way to hook the functions has some
> drawbacks, from what I can see it's roughly:
> - no all functions could be patched

Yeah, that is why the "djprobe" becomes "optprobe". If kprobe
finds there is no space to patch, it just fallback to a
breakpoint. Since this check is done internally, kprobes
user takes this benefit transparently ( don't need to
change user's code).

> - need to find a way to say which function is safe to patch
> - memory consumption for detour buffers and symbol records

And also, you can't patch more than two instructions without
int3 bypass method (or special stack checker), because a processor
can run and may have been interrupted on the 2nd instruction
when stop_machine is issued.
That's the 2nd reason why the djprobe is a part of kprobes.
this "int3 bypass" method disallow you to probe NMI handlers,
since int3 inside NMI will clear additional NMI masking by
issuing IRET.

> but seems there're some advantages as well:
> - trace code could be in a module
> - no profiling code is needed
> - framepointer can be disabled (framepointer is needed for
>   generating profile code)

nowadays profiling code with dynamic ftrace will not make
visible overhead, and if you need to do that without
profiling binary, you can already use kprobe-tracer for it.
(Using kprobe-tracer via perf-probe allows you to probe not
 only actual function but also inlined function entry ;-))


Thank you,

> 
> As for the attached implementation it's hack mostly (expect bugs),
> especially the ftrace/kprobe integration could be probably done better.
> It's only for x86_64.
> 
> It can be used like this:
> 
> - new menu config item is added (function tracer engine),
>   to choose mcount or ktrace
> - new file "ktrace" is added to the tracing dir
> - to add symbols to trace run:
> 	echo mutex_unlock > ./ktrace
> 	echo mutex_lock >> ./ktrace
> - to display trace symbols:
> 	cat ktrace
> - to enable the trace, the usual is needed:
> 	echo function > ./current_tracer
> 	echo function_graph > ./current_tracer
> - to remove symbols from trace:
> 	echo nop > ./current_tracer 
> 	echo > ./ktrace 
> - if the function is added while the tracer is running,
>   the symbol is enabled automatically.
> - only all symbols could be removed and only if there's
>   no tracer running.
> 
> I'm not sure how to choose from kallsyms interface what function
> is safe to patch, so I omit patching of all symbols so far.


> 
> 
> attached patches:
>  1/4 - kprobe - ktrace instruction slot cache interface
>      using kprobe detour buffer allocation, adding interface
>      to use it from trace framework
> 
>  2/4 - tracing - adding size parameter to do_ftrace_mod_code
>      adding size parameter to be able to restore the saved
>      instructions, which could be longer than relative call
> 
>  3/4 - ktrace - function trace support
>      adding ktrace support with function tracer
> 
>  4/4 - ktrace - function trace support
>      adding function graph support
> 
> 
> please let me know what you think, thanks
> jirka
> ---
>  Makefile                   |    2 +-
>  arch/x86/Kconfig           |    4 +-
>  arch/x86/kernel/Makefile   |    1 +
>  arch/x86/kernel/entry_64.S |   50 +++++++
>  arch/x86/kernel/ftrace.c   |  157 +++++++++++----------
>  arch/x86/kernel/ktrace.c   |  256 ++++++++++++++++++++++++++++++++++
>  include/linux/ftrace.h     |   36 +++++-
>  include/linux/kprobes.h    |    8 +
>  kernel/kprobes.c           |   33 +++++
>  kernel/trace/Kconfig       |   28 ++++-
>  kernel/trace/Makefile      |    1 +
>  kernel/trace/ftrace.c      |   21 +++
>  kernel/trace/ktrace.c      |  330 ++++++++++++++++++++++++++++++++++++++++++++
>  kernel/trace/trace.c       |    1 +
>  14 files changed, 846 insertions(+), 82 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-02-04  6:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-03 15:42 [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Jiri Olsa
2011-02-03 15:42 ` [PATCH 1/4] kprobe - ktrace instruction slot cache interface Jiri Olsa
2011-02-03 15:42 ` [PATCH 2/4] tracing - adding size parameter to do_ftrace_mod_code Jiri Olsa
2011-02-03 15:42 ` [PATCH 3/4] ktrace - function trace support Jiri Olsa
2011-02-03 15:42 ` [PATCH 4/4] ktrace - function graph " Jiri Olsa
2011-02-03 16:33 ` [RFC 0/4] tracing,x86_64 - function/graph trace without mcount/-pg/framepointer Steven Rostedt
2011-02-03 17:35   ` Frederic Weisbecker
2011-02-03 19:00     ` Steven Rostedt
2011-02-04  6:03 ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).