Containers Archive mirror
 help / color / mirror / Atom feed
From: Y Song <ys114321-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Alban Crequy <alban.crequy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Alban Crequy <alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
Date: Mon, 14 May 2018 12:38:39 -0700	[thread overview]
Message-ID: <CAH3MdRUe7K8zJHuGAfnY6_VEkBLAWY1F_WaJgcLs4qDdQv1bTA__47133.6255818189$1526326640$gmane$org@mail.gmail.com> (raw)
In-Reply-To: <20180513173318.21680-1-alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>

On Sun, May 13, 2018 at 10:33 AM, Alban Crequy <alban.crequy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> From: Alban Crequy <alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
>
> bpf_get_current_cgroup_ino() allows BPF trace programs to get the inode
> of the cgroup where the current process resides.
>
> My use case is to get statistics about syscalls done by a specific
> Kubernetes container. I have a tracepoint on raw_syscalls/sys_enter and
> a BPF map containing the cgroup inode that I want to trace. I use
> bpf_get_current_cgroup_ino() and I quickly return from the tracepoint if
> the inode is not in the BPF hash map.

Alternatively, the kernel already has bpf_current_task_under_cgroup helper
which uses a cgroup array to store cgroup fd's. If the current task is
in the hierarchy of a particular cgroup, the helper will return true.

One difference between your helper and bpf_current_task_under_cgroup() is
that your helper tests against a particular cgroup, not including its
children, but
bpf_current_task_under_cgroup() will return true even the task is in a
nested cgroup.

Maybe this will work for you?

>
> Without this BPF helper, I would need to keep track of all pids in the
> container. The Netlink proc connector can be used to follow process
> creation and destruction but it is racy.
>
> This patch only looks at the memory cgroup, which was enough for me
> since each Kubernetes container is placed in a different mem cgroup.
> For a generic implementation, I'm not sure how to proceed: it seems I
> would need to use 'for_each_root(root)' (see example in
> proc_cgroup_show() from kernel/cgroup/cgroup.c) but I don't know if
> taking the cgroup mutex is possible in the BPF helper function. It might
> be ok in the tracepoint raw_syscalls/sys_enter but could the mutex
> already be taken in some other tracepoints?

mutex is not allowed in a helper since it can block.

>
> Signed-off-by: Alban Crequy <alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
> ---
>  include/uapi/linux/bpf.h | 11 ++++++++++-
>  kernel/trace/bpf_trace.c | 25 +++++++++++++++++++++++++
>  2 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec89732a8d..38ac3959cdf3 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -755,6 +755,14 @@ union bpf_attr {
>   *     @addr: pointer to struct sockaddr to bind socket to
>   *     @addr_len: length of sockaddr structure
>   *     Return: 0 on success or negative error code
> + *
> + * u64 bpf_get_current_cgroup_ino(hierarchy, flags)
> + *     Get the cgroup{1,2} inode of current task under the specified hierarchy.
> + *     @hierarchy: cgroup hierarchy

Not sure what is the value to specify hierarchy here.
A cgroup directory fd?

> + *     @flags: reserved for future use
> + *     Return:
> + *       == 0 error

looks like < 0 means error.

> + *        > 0 inode of the cgroup
               >= 0 means good?
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -821,7 +829,8 @@ union bpf_attr {
>         FN(msg_apply_bytes),            \
>         FN(msg_cork_bytes),             \
>         FN(msg_pull_data),              \
> -       FN(bind),
> +       FN(bind),                       \
> +       FN(get_current_cgroup_ino),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 56ba0f2a01db..9bf92a786639 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -524,6 +524,29 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>         .arg3_type      = ARG_ANYTHING,
>  };
>
> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
> +{
> +       // TODO: pick the correct hierarchy instead of the mem controller
> +       struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
> +
> +       if (unlikely(!cgrp))
> +               return -EINVAL;
> +       if (unlikely(hierarchy))
> +               return -EINVAL;
> +       if (unlikely(flags))
> +               return -EINVAL;
> +
> +       return cgrp->kn->id.ino;
> +}
> +
> +static const struct bpf_func_proto bpf_get_current_cgroup_ino_proto = {
> +       .func           = bpf_get_current_cgroup_ino,
> +       .gpl_only       = false,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_DONTCARE,
> +       .arg2_type      = ARG_DONTCARE,
> +};
> +
>  static const struct bpf_func_proto *
>  tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> @@ -564,6 +587,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>                 return &bpf_get_prandom_u32_proto;
>         case BPF_FUNC_probe_read_str:
>                 return &bpf_probe_read_str_proto;
> +       case BPF_FUNC_get_current_cgroup_ino:
> +               return &bpf_get_current_cgroup_ino_proto;
>         default:
>                 return NULL;
>         }
> --
> 2.14.3
>

  parent reply	other threads:[~2018-05-14 19:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20180513173318.21680-1-alban@kinvolk.io>
     [not found] ` <CAH3MdRUe7K8zJHuGAfnY6_VEkBLAWY1F_WaJgcLs4qDdQv1bTA@mail.gmail.com>
     [not found]   ` <CAH3MdRUe7K8zJHuGAfnY6_VEkBLAWY1F_WaJgcLs4qDdQv1bTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-21 13:52     ` [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino Alban Crequy
     [not found] ` <20180513173318.21680-1-alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
2018-05-14 19:38   ` Y Song [this message]
2018-05-21 16:26   ` Alexei Starovoitov
     [not found] ` <20180521162609.lpdrnozowmzdn57m@ast-mbp.dhcp.thefacebook.com>
     [not found]   ` <20180521162609.lpdrnozowmzdn57m-+o4/htvd0TCa6kscz5V53/3mLCh9rsb+VpNB7YpNyf8@public.gmane.org>
2018-05-22  0:24     ` Y Song
     [not found]   ` <CAH3MdRWgruVq+3r+2pHTah-c2zTO03vPkepjWDZ0_KrYcroy9A@mail.gmail.com>
     [not found]     ` <CAH3MdRWgruVq+3r+2pHTah-c2zTO03vPkepjWDZ0_KrYcroy9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-23  3:33       ` Y Song
     [not found]     ` <CAH3MdRVdfw52atavT3KL8MpPw7zDM_hR6aUcqDP1PogLn_sH+w@mail.gmail.com>
     [not found]       ` <CAH3MdRVdfw52atavT3KL8MpPw7zDM_hR6aUcqDP1PogLn_sH+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-23  3:35         ` Alexei Starovoitov
2018-05-23  3:35         ` Alexei Starovoitov
2018-05-25 15:21         ` Alban Crequy
     [not found]           ` <CADZs7q4xd1CwGULvYe2-Y2aYpwhiiw3upF=mAK0ve_-jrk1yFg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-25 16:28             ` Y Song
     [not found]       ` <20180523033550.z3tqo4lhd3zrmtdu@ast-mbp>
2018-05-23  4:31         ` Y Song
     [not found]         ` <CAH3MdRVwmKd84ePvNX+NuAj3TfA_28BObEmzBqGXv=P5_A=8fQ@mail.gmail.com>
     [not found]           ` <CAH3MdRVwmKd84ePvNX+NuAj3TfA_28BObEmzBqGXv=P5_A=8fQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-23  8:57             ` Daniel Borkmann
2018-05-13 17:33 Alban Crequy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH3MdRUe7K8zJHuGAfnY6_VEkBLAWY1F_WaJgcLs4qDdQv1bTA__47133.6255818189$1526326640$gmane$org@mail.gmail.com' \
    --to=ys114321-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org \
    --cc=alban.crequy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).