Containers Archive mirror
 help / color / mirror / Atom feed
From: Alban Crequy <alban.crequy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Alban Crequy <alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
Subject: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
Date: Sun, 13 May 2018 19:33:18 +0200	[thread overview]
Message-ID: <20180513173318.21680-1-alban__404.0096806877$1526232815$gmane$org@kinvolk.io> (raw)

From: Alban Crequy <alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>

bpf_get_current_cgroup_ino() allows BPF trace programs to get the inode
of the cgroup where the current process resides.

My use case is to get statistics about syscalls done by a specific
Kubernetes container. I have a tracepoint on raw_syscalls/sys_enter and
a BPF map containing the cgroup inode that I want to trace. I use
bpf_get_current_cgroup_ino() and I quickly return from the tracepoint if
the inode is not in the BPF hash map.

Without this BPF helper, I would need to keep track of all pids in the
container. The Netlink proc connector can be used to follow process
creation and destruction but it is racy.

This patch only looks at the memory cgroup, which was enough for me
since each Kubernetes container is placed in a different mem cgroup.
For a generic implementation, I'm not sure how to proceed: it seems I
would need to use 'for_each_root(root)' (see example in
proc_cgroup_show() from kernel/cgroup/cgroup.c) but I don't know if
taking the cgroup mutex is possible in the BPF helper function. It might
be ok in the tracepoint raw_syscalls/sys_enter but could the mutex
already be taken in some other tracepoints?

Signed-off-by: Alban Crequy <alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
---
 include/uapi/linux/bpf.h | 11 ++++++++++-
 kernel/trace/bpf_trace.c | 25 +++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec89732a8d..38ac3959cdf3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -755,6 +755,14 @@ union bpf_attr {
  *     @addr: pointer to struct sockaddr to bind socket to
  *     @addr_len: length of sockaddr structure
  *     Return: 0 on success or negative error code
+ *
+ * u64 bpf_get_current_cgroup_ino(hierarchy, flags)
+ *     Get the cgroup{1,2} inode of current task under the specified hierarchy.
+ *     @hierarchy: cgroup hierarchy
+ *     @flags: reserved for future use
+ *     Return:
+ *       == 0 error
+ *        > 0 inode of the cgroup
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -821,7 +829,8 @@ union bpf_attr {
 	FN(msg_apply_bytes),		\
 	FN(msg_cork_bytes),		\
 	FN(msg_pull_data),		\
-	FN(bind),
+	FN(bind),			\
+	FN(get_current_cgroup_ino),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 56ba0f2a01db..9bf92a786639 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -524,6 +524,29 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
+{
+	// TODO: pick the correct hierarchy instead of the mem controller
+	struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
+
+	if (unlikely(!cgrp))
+		return -EINVAL;
+	if (unlikely(hierarchy))
+		return -EINVAL;
+	if (unlikely(flags))
+		return -EINVAL;
+
+	return cgrp->kn->id.ino;
+}
+
+static const struct bpf_func_proto bpf_get_current_cgroup_ino_proto = {
+	.func           = bpf_get_current_cgroup_ino,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER,
+	.arg1_type      = ARG_DONTCARE,
+	.arg2_type      = ARG_DONTCARE,
+};
+
 static const struct bpf_func_proto *
 tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -564,6 +587,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_probe_read_str:
 		return &bpf_probe_read_str_proto;
+	case BPF_FUNC_get_current_cgroup_ino:
+		return &bpf_get_current_cgroup_ino_proto;
 	default:
 		return NULL;
 	}
-- 
2.14.3

             reply	other threads:[~2018-05-13 17:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-13 17:33 Alban Crequy [this message]
     [not found] <20180513173318.21680-1-alban@kinvolk.io>
     [not found] ` <CAH3MdRUe7K8zJHuGAfnY6_VEkBLAWY1F_WaJgcLs4qDdQv1bTA@mail.gmail.com>
     [not found]   ` <CAH3MdRUe7K8zJHuGAfnY6_VEkBLAWY1F_WaJgcLs4qDdQv1bTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-21 13:52     ` [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino Alban Crequy
     [not found] ` <20180513173318.21680-1-alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
2018-05-14 19:38   ` Y Song
2018-05-21 16:26   ` Alexei Starovoitov
     [not found] ` <20180521162609.lpdrnozowmzdn57m@ast-mbp.dhcp.thefacebook.com>
     [not found]   ` <20180521162609.lpdrnozowmzdn57m-+o4/htvd0TCa6kscz5V53/3mLCh9rsb+VpNB7YpNyf8@public.gmane.org>
2018-05-22  0:24     ` Y Song
     [not found]   ` <CAH3MdRWgruVq+3r+2pHTah-c2zTO03vPkepjWDZ0_KrYcroy9A@mail.gmail.com>
     [not found]     ` <CAH3MdRWgruVq+3r+2pHTah-c2zTO03vPkepjWDZ0_KrYcroy9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-23  3:33       ` Y Song
     [not found]     ` <CAH3MdRVdfw52atavT3KL8MpPw7zDM_hR6aUcqDP1PogLn_sH+w@mail.gmail.com>
     [not found]       ` <20180523033550.z3tqo4lhd3zrmtdu@ast-mbp>
2018-05-23  4:31         ` Y Song
     [not found]         ` <CAH3MdRVwmKd84ePvNX+NuAj3TfA_28BObEmzBqGXv=P5_A=8fQ@mail.gmail.com>
     [not found]           ` <CAH3MdRVwmKd84ePvNX+NuAj3TfA_28BObEmzBqGXv=P5_A=8fQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-23  8:57             ` Daniel Borkmann
     [not found]       ` <CAH3MdRVdfw52atavT3KL8MpPw7zDM_hR6aUcqDP1PogLn_sH+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-23  3:35         ` Alexei Starovoitov
2018-05-23  3:35         ` Alexei Starovoitov
2018-05-25 15:21         ` Alban Crequy
     [not found]           ` <CADZs7q4xd1CwGULvYe2-Y2aYpwhiiw3upF=mAK0ve_-jrk1yFg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-25 16:28             ` Y Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20180513173318.21680-1-alban__404.0096806877$1526232815$gmane$org@kinvolk.io' \
    --to=alban.crequy-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).