Linux-Fsdevel Archive mirror
 help / color / mirror / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	 Chuck Lever <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>,
	 Amir Goldstein <amir73il@gmail.com>,
	Alexander Aring <alex.aring@gmail.com>,
	 linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] fhandle: expose u64 mount id to name_to_handle_at(2)
Date: Fri, 24 May 2024 08:58:55 -0700	[thread overview]
Message-ID: <20240524.154429-smoked.node.sleepy.dragster-w2EokFBsl7RC@cyphar.com> (raw)
In-Reply-To: <20240524-ahnden-danken-02a2e9b87190@brauner>

[-- Attachment #1: Type: text/plain, Size: 6488 bytes --]

On 2024-05-24, Christian Brauner <brauner@kernel.org> wrote:
> On Thu, May 23, 2024 at 09:52:20AM -0600, Aleksa Sarai wrote:
> > On 2024-05-21, Christian Brauner <brauner@kernel.org> wrote:
> > > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote:
> > > > Now that we have stabilised the unique 64-bit mount ID interface in
> > > > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > > > provide a file handle and corresponding mount without needing to worry
> > > > about racing with /proc/mountinfo parsing.
> > > > 
> > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > > > that doesn't make sense for name_to_handle_at(2).
> > > > 
> > > > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> > > > ---
> > > 
> > > So I think overall this is probably fine (famous last words). If it's
> > > just about being able to retrieve the new mount id without having to
> > > take the hit of another statx system call it's indeed a bit much to
> > > add a revised system call for this. Althoug I did say earlier that I
> > > wouldn't rule that out.
> > > 
> > > But if we'd that then it'll be a long discussion on the form of the new
> > > system call and the information it exposes.
> > > 
> > > For example, I lack the grey hair needed to understand why
> > > name_to_handle_at() returns a mount id at all. The pitch in commit
> > > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that
> > > the (old) mount id can be used to "lookup file system specific
> > > information [...] in /proc/<pid>/mountinfo".
> > 
> > The logic was presumably to allow you to know what mount the resolved
> > file handle came from. If you use AT_EMPTY_PATH this is not needed
> > because you could just fstatfs (and now statx(AT_EMPTY_PATH)), but if
> > you just give name_to_handle_at() almost any path, there is no race-free
> > way to make sure that you know which filesystem the file handle came
> > from.
> > 
> > I don't know if that could lead to security issues (I guess an attacker
> > could find a way to try to manipulate the file handle you get back, and
> > then try to trick you into operating on the wrong filesystem with
> > open_by_handle_at()) but it is definitely something you'd want to avoid.
> 
> So the following paragraphs are prefaced with: I'm not an expert on file
> handle encoding and so might be totally wrong.
> 
> Afaiu, the uniqueness guarantee of the file handle mostly depends on:
> 
> (1) the filesystem
> (2) the actual file handling encoding
> 
> Looking at file handle encoding to me it looks like it's fairly easy to
> fake them in userspace (I guess that's ok if you think about them like a
> path but with a weird permission model built around them.) for quite a
> few filesystems.

The old Docker breakout attack did brute-force the fhandle for the root
directory of the host filesystem, so it is entirely possible.

However, the attack I was thinking of was whether a directory tree that
an attacker had mount permissions over could be manipulated such that a
privileged process doing name_to_handle_at() on a path within the tree
would get a file handle that open_by_handle_at() on a different
filesystem would result in a potentially dangerous path being opened.

For instance (M is management process, C is the malicious container
process):

 C: Bind-mounts the root of the container filesystem at /foo.
 M: name_to_handle_at($CONTAINER/foo)
     -> gets an fhandle of / of the container filesystem
     -> stores a copy of the (recycled) mount id
 C: Swaps /foo with a bind-mount of the host root filesystem such that
    the mount id is recycled, before M can scan /proc/self/mountinfo.
 C: Triggers M to try to use the filehandle for some administrative
    process.
 M: open_by_handle_at(...) on the wrong mount id, getting an fd of the
    host / directory. Possibly does something bad to this directory
    (deleting files, passing the fd back to the container, etc).

It seems possible that this could happen, so having a unique mount id is
kind of important if you plan to use open_by_handle_at() with malicious
actors in control of the target filesystem tree.

Though, regardless of the attack you are worried about, I guess we are
in agreement that a unique mount id from name_to_handle_at() would be a
good idea if we are planning for userspace to use file handles for
everything.

> For example, for anything that uses fs/libfs.c:generic_encode_ino32_fh()
> it's easy to construct a file handle by retrieving the inode number via
> stat and the generation number via FS_IOC_GETVERSION.
> 
> Encoding using the inode number and the inode generation number is
> probably not very strong so it's not impossible to generate a file
> handle that is not unique without knowing in which filesystem to
> interpret it in.
> 
> The problem is with what name_to_handle_at() returns imho. A mnt_id
> doesn't pin the filesystem and the old mnt_id isn't unique. That means
> the filesystem can be unmounted and go away and the mnt_id can be
> recycled almost immediately for another mount but the file handle is
> still there.
> 
> So to guarantee that a (weakly encoded) file handle is interpreted in
> the right filesystem the file handle must either be accompanied by a
> file descriptor that pins the relevant mount or have any other guarantee
> that the filesystem doesn't go away (Or of course, the file handle
> encodes the uuid of the filesystem or something or uses some sort of
> hashing scheme.).
> 
> One of the features of file handles is that they're globally usable so
> they're interesting to use as handles that can be shared. IOW, one can
> send around a file handle to another process without having to pin
> anything or have a file descriptor open that needs to be sent via
> AF_UNIX.
> 
> But as stated above that's potentially risky so one might still have to
> send around an fd together with the file handle if sender and receiver
> don't share the filesystem for the handle.
> 
> However, with the unique mount id things improve quite a bit. Because
> now it's possible to send around the unique mount id and the file
> handle. Then one can use statmount() to figure out which filesystem this
> file handle needs to be interpreted in.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply	other threads:[~2024-05-24 15:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-20 21:35 [PATCH RFC] fhandle: expose u64 mount id to name_to_handle_at(2) Aleksa Sarai
2024-05-20 21:53 ` Jeff Layton
2024-05-20 22:27   ` Aleksa Sarai
2024-05-21  5:04     ` Amir Goldstein
2024-05-21 10:42       ` Aleksa Sarai
2024-05-21 13:45 ` Christian Brauner
2024-05-21 14:11   ` Christian Brauner
2024-05-21 14:27     ` Jeff Layton
2024-05-21 16:42       ` Amir Goldstein
2024-05-23 19:16         ` Aleksa Sarai
2024-05-23 15:52   ` Aleksa Sarai
2024-05-24 12:25     ` Christian Brauner
2024-05-24 15:58       ` Aleksa Sarai [this message]
2024-05-27  0:48         ` Dave Chinner
2024-05-27  0:06       ` Dave Chinner
2024-05-27 13:39         ` Christian Brauner
2024-05-26  8:55 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240524.154429-smoked.node.sleepy.dragster-w2EokFBsl7RC@cyphar.com \
    --to=cyphar@cyphar.com \
    --cc=alex.aring@gmail.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).