From: "Christian Göttsche" <cgzones@googlemail.com>
To: selinux@vger.kernel.org
Subject: [RFC PATCH v2 0/9] libselinux: rework selabel_file(5) database
Date: Wed, 31 Jan 2024 14:08:26 +0100 [thread overview]
Message-ID: <20240131130840.48155-1-cgzones@googlemail.com> (raw)
Currently the database for file backend of selabel stores the file
context specifications in a single long array. This array is sorted by
special precedence rules, e.g. regular expressions without meta
character first, ordered by length, and the remaining regular
expressions ordered by stem (the prefix part of the regular expressions
without meta characters) length.
This results in suboptimal lookup performance for two reasons;
File context specifications without any meta characters (e.g.
'/etc/passwd') are still matched via an expensive regular expression
match operation.
All such trivial regular expressions are matched against before any non-
trivial regular expression, resulting in thousands of regex match
operations for lookups for paths not matching any of the trivial ones.
Rework the internal representation of the database in two ways:
Convert regular expressions without any meta characters and containing
only supported escaped characters (e.g. '/etc/rc\.d/init\.d') into
literal strings, which get compared via strcmp(3) later on.
Store the specifications in a tree structure to reduce the to number of
specifications that need to be checked.
Since the internal representation is completely rewritten introduce a
new compiled file context file format mirroring the tree structure.
The new format also stores all multi-byte data in network byte-order, so
that such compiled files can be cross-compiled, e.g. for embedded
devices with read-only filesystems (except for the regular expressions,
which are still architecture-dependent, but ignored on architecture mis-
match).
The improved lookup performance will also benefit SELinux aware daemons,
which create files with their default context, e.g. systemd.
Fedora 39 (pre-compiled regular expressions are omitted on Fedora):
file_contexts.bin: 599034 -> 430211 (bytes)
file_contexts.homedirs.bin: 21275 -> 13491 (bytes)
Debian Sid (pre-compiled regular expressions are included):
file_contexts.bin: 7790690 -> 3646256 (bytes)
file_contexts.homedirs.bin: 835950 -> 708793 (bytes)
(selabel_lookup -b file -k /bin/bash)
Fedora 39 in VM:
text: time: 3.1 ms -> 3.6 ms
peak heap: 2.33M -> 1.81M
peak rss: 6.64M -> 6.37M
compiled: time: 1.8 ms -> 1.7 ms
peak heap: 2.14M -> 1.23M
peak rss: 6.76M -> 5.91M
Debian Sid on Raspberry Pi 3:
text: time: 33.4 ms -> 21.2 ms
peak heap: 10.59M -> 607.32K
peak rss: 6.55M -> 4.46M
compiled: time: 38.3 ms -> 23.5 ms
peak heap: 13.28M -> 2.00M
peak rss: 12.21M -> 7.60M
(restorecon -vRn /)
Fedora 39 in VM:
28.3 s -> 3.6 s
Debian Sid on Raspberry Pi 3:
94.6 s -> 12.1 s
(restorecon -vRn -T0 /)
Fedora 39 in VM (8 cores):
31.1 s -> 2.5 s
Debian Sid on Raspberry Pi 3 (4 cores):
58.9 s -> 12.6 s
(note: I am unsure why the parallel runs on Fedora are slower)
There might be subtle differences in lookup results which evaded my
testing, because some precedence rules are oblique. For example
`/usr/(.*/)?lib(/.*)?` has to have a higher precedence than
`/usr/(.*/)?bin(/.*)?` to match the current Fedora behavior. Please
report any behavior changes.
The maximum node depth in the database is set to 3, which seems to give
the best performance to memory usage ratio. Might be tweaked for
systems with different filesystem hierarchies (Android?).
I am not that familiar with the selabel_partial_match(3),
selabel_get_digests_all_partial_matches(3) and
selabel_hash_all_partial_matches(3) related interfaces, so I only did
some rudimentary tests for them.
# Patches
Patches 1 and 2 introduce two helpers useful for developers and users.
Patches 3.5 tweak the sidtab code to be used in a later patch.
Patch 6 is the main rework. Due to its complete rewrite it is too large
for the mailing list, so I added some developers in CC for this one and
the patch is available on GitHub (see below). I'd like to refrain
splitting it since there are no trivial splitable parts and future reverts
or bisections will be more complicated.
Patch 7 is removing unused code after the rework in patch 6.
Patch 8 introduces new fuzzers for selabel_file(5).
Patch 9 improves thread-safety for concurrent selabel lookup.
This patchset is also available at https://github.com/SELinuxProject/selinux/pull/406
v2:
- add two fuzzers performing label lookup, one for textual and one for
compiled fcontext definitions
- various fixes, among others encountered via fuzzing
- rename helper to unsefiles
- add sidtab tweaks to store a context array in the binary fcontext format
to deduplicate context strings
- add thread-safety patch
Christian Göttsche (9):
policycoreutils: introduce unsetfiles
libselinux/utils: introduce selabel_compare
libselinux: use more appropriate types in sidtab
libselinux: add unique id to sidtab entries
libselinux: sidtab updates
libselinux: rework selabel_file(5) database
libselinux: remove unused hashtab code
libselinux: add selabel_file(5) fuzzer
libselinux: support parallel selabel_lookup(3)
libselinux/fuzz/input | 0
.../fuzz/selabel_file_compiled-fuzzer.c | 281 +++
libselinux/fuzz/selabel_file_text-fuzzer.c | 225 ++
libselinux/include/selinux/avc.h | 2 +-
libselinux/src/avc_sidtab.c | 68 +-
libselinux/src/avc_sidtab.h | 4 +-
libselinux/src/hashtab.c | 234 --
libselinux/src/hashtab.h | 117 -
libselinux/src/label.c | 56 +-
libselinux/src/label_backends_android.c | 2 +-
libselinux/src/label_db.c | 2 +
libselinux/src/label_file.c | 2216 ++++++++++++-----
libselinux/src/label_file.h | 972 +++++---
libselinux/src/label_internal.h | 7 +-
libselinux/src/label_media.c | 1 +
libselinux/src/label_support.c | 26 +-
libselinux/src/label_x.c | 1 +
libselinux/src/regex.c | 55 +-
libselinux/utils/.gitignore | 1 +
libselinux/utils/sefcontext_compile.c | 658 +++--
libselinux/utils/selabel_compare.c | 122 +
policycoreutils/.gitignore | 1 +
policycoreutils/Makefile | 2 +-
policycoreutils/unsetfiles/Makefile | 26 +
policycoreutils/unsetfiles/unsetfiles.1 | 46 +
policycoreutils/unsetfiles/unsetfiles.c | 183 ++
scripts/oss-fuzz.sh | 25 +
27 files changed, 3771 insertions(+), 1562 deletions(-)
create mode 100644 libselinux/fuzz/input
create mode 100644 libselinux/fuzz/selabel_file_compiled-fuzzer.c
create mode 100644 libselinux/fuzz/selabel_file_text-fuzzer.c
delete mode 100644 libselinux/src/hashtab.c
delete mode 100644 libselinux/src/hashtab.h
create mode 100644 libselinux/utils/selabel_compare.c
create mode 100644 policycoreutils/unsetfiles/Makefile
create mode 100644 policycoreutils/unsetfiles/unsetfiles.1
create mode 100644 policycoreutils/unsetfiles/unsetfiles.c
--
2.43.0
next reply other threads:[~2024-01-31 13:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 13:08 Christian Göttsche [this message]
2024-01-31 13:08 ` [RFC PATCH v2 1/9] policycoreutils: introduce unsetfiles Christian Göttsche
2024-01-31 13:08 ` [RFC PATCH v2 2/9] libselinux/utils: introduce selabel_compare Christian Göttsche
2024-03-07 19:50 ` James Carter
2024-03-11 17:20 ` Christian Göttsche
2024-03-11 20:49 ` James Carter
2024-01-31 13:08 ` [RFC PATCH v2 3/9] libselinux: use more appropriate types in sidtab Christian Göttsche
2024-01-31 13:08 ` [RFC PATCH v2 4/9] libselinux: add unique id to sidtab entries Christian Göttsche
2024-01-31 13:08 ` [RFC PATCH v2 5/9] libselinux: sidtab updates Christian Göttsche
2024-03-07 20:53 ` James Carter
2024-03-11 16:32 ` Christian Göttsche
2024-01-31 13:08 ` [RFC PATCH v2 6/9] libselinux: rework selabel_file(5) database Christian Göttsche
2024-01-31 13:08 ` [RFC PATCH v2 7/9] libselinux: remove unused hashtab code Christian Göttsche
2024-01-31 13:08 ` [RFC PATCH v2 8/9] libselinux: add selabel_file(5) fuzzer Christian Göttsche
2024-01-31 13:08 ` [RFC PATCH v2 9/9] libselinux: support parallel selabel_lookup(3) Christian Göttsche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240131130840.48155-1-cgzones@googlemail.com \
--to=cgzones@googlemail.com \
--cc=selinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).