From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
To: brauner@kernel.org, amir73il@gmail.com, hu1.chen@intel.com
Cc: miklos@szeredi.hu, malini.bhandaru@intel.com,
tim.c.chen@intel.com, mikko.ylinen@intel.com,
lizhen.you@intel.com, linux-unionfs@vger.kernel.org,
linux-kernel@vger.kernel.org,
Vinicius Costa Gomes <vinicius.gomes@intel.com>
Subject: [RFC v2 0/4] overlayfs: Optimize override/revert creds
Date: Thu, 25 Jan 2024 15:57:19 -0800 [thread overview]
Message-ID: <20240125235723.39507-1-vinicius.gomes@intel.com> (raw)
Hi,
It was noticed that some workloads suffer from contention on
increasing/decrementing the ->usage counter in their credentials,
those refcount operations are associated with overriding/reverting the
current task credentials. (the linked thread adds more context)
In some specialized cases, overlayfs is one of them, the credentials
in question have a longer lifetime than the override/revert "critical
section". In the overlayfs case, the credentials are created when the
fs is mounted and destroyed when it's unmounted. In this case of long
lived credentials, the usage counter doesn't need to be
incremented/decremented.
Add a lighter version of credentials override/revert to be used in
these specialized cases. To make sure that the override/revert calls
are paired, add a cleanup guard macro. This was suggested here:
https://lore.kernel.org/all/20231219-marken-pochen-26d888fb9bb9@brauner/
With a small number of tweaks:
- Used inline functions instead of macros;
- A small change to store the credentials into the passed argument,
the guard is now defined as (note the added '_T ='):
DEFINE_GUARD(cred, const struct cred *, _T = override_creds_light(_T),
revert_creds_light(_T));
- Allow "const" arguments to be used with these kind of guards;
Some comments:
- If patch 1/4 is not a good idea (adding the cast), the alternative
I can see is using some kind of container for the credentials;
- The only user for the backing file ops is overlayfs, so these
changes make sense, but may not make sense in the most general
case;
For the numbers, some from 'perf c2c', before this series:
(edited to fit)
#
# ----- HITM ----- Shared
# Num RmtHitm LclHitm Symbol Object Source:Line Node
# ..... ....... ....... .......................... ................ .................. ....
#
-------------------------
0 412 1028
-------------------------
41.50% 42.22% [k] revert_creds [kernel.vmlinux] atomic64_64.h:39 0 1
15.05% 10.60% [k] override_creds [kernel.vmlinux] atomic64_64.h:25 0 1
0.73% 0.58% [k] init_file [kernel.vmlinux] atomic64_64.h:25 0 1
0.24% 0.10% [k] revert_creds [kernel.vmlinux] cred.h:266 0 1
32.28% 37.16% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
9.47% 8.75% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
0.49% 0.58% [k] inode_owner_or_capable [kernel.vmlinux] mnt_idmapping.h:81 0 1
0.24% 0.00% [k] generic_permission [kernel.vmlinux] namei.c:354 0
-------------------------
1 50 103
-------------------------
100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
-------------------------
2 50 98
-------------------------
96.00% 96.94% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
2.00% 1.02% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1
0.00% 2.04% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0
2.00% 0.00% [k] update_cfs_group [kernel.vmlinux] fair.c:3932 0 1
after this series:
#
# ----- HITM ----- Shared
# Num RmtHitm LclHitm Symbol Object Source:Line Node
# ..... ....... ....... .................... ................ ................ ....
#
-------------------------
0 54 88
-------------------------
100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
-------------------------
1 48 83
-------------------------
97.92% 97.59% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
2.08% 1.20% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1
0.00% 1.20% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 1
-------------------------
2 28 44
-------------------------
85.71% 79.55% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
14.29% 20.45% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
The contention is practically gone.
Link: https://lore.kernel.org/all/20231018074553.41333-1-hu1.chen@intel.com/
Vinicius Costa Gomes (4):
cleanup: Fix discarded const warning when defining guards
cred: Add a light version of override/revert_creds()
overlayfs: Optimize credentials usage
fs: Optimize credentials reference count for backing file ops
fs/backing-file.c | 124 +++++++++++++++++++---------------------
fs/overlayfs/copy_up.c | 4 +-
fs/overlayfs/dir.c | 22 +++----
fs/overlayfs/file.c | 70 ++++++++++++-----------
fs/overlayfs/inode.c | 60 ++++++++++---------
fs/overlayfs/namei.c | 21 ++++---
fs/overlayfs/readdir.c | 18 +++---
fs/overlayfs/util.c | 23 ++++----
fs/overlayfs/xattrs.c | 34 +++++------
include/linux/cleanup.h | 2 +-
include/linux/cred.h | 21 +++++++
kernel/cred.c | 6 +-
12 files changed, 215 insertions(+), 190 deletions(-)
--
2.43.0
next reply other threads:[~2024-01-25 23:57 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 23:57 Vinicius Costa Gomes [this message]
2024-01-25 23:57 ` [RFC v2 1/4] cleanup: Fix discarded const warning when defining guards Vinicius Costa Gomes
2024-01-26 14:46 ` Amir Goldstein
2024-01-25 23:57 ` [RFC v2 2/4] cred: Add a light version of override/revert_creds() Vinicius Costa Gomes
2024-01-26 14:34 ` Amir Goldstein
2024-01-27 0:16 ` Vinicius Costa Gomes
2024-01-25 23:57 ` [RFC v2 3/4] overlayfs: Optimize credentials usage Vinicius Costa Gomes
2024-01-26 17:22 ` Amir Goldstein
2024-01-27 0:34 ` Vinicius Costa Gomes
2024-01-25 23:57 ` [RFC v2 4/4] fs: Optimize credentials reference count for backing file ops Vinicius Costa Gomes
2024-01-26 14:50 ` Amir Goldstein
2024-01-27 0:25 ` Vinicius Costa Gomes
2024-01-26 11:40 ` [RFC v2 0/4] overlayfs: Optimize override/revert creds Amir Goldstein
2024-01-27 0:02 ` Vinicius Costa Gomes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240125235723.39507-1-vinicius.gomes@intel.com \
--to=vinicius.gomes@intel.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=hu1.chen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=lizhen.you@intel.com \
--cc=malini.bhandaru@intel.com \
--cc=mikko.ylinen@intel.com \
--cc=miklos@szeredi.hu \
--cc=tim.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).