QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtio-fs@redhat.com, qemu-devel@nongnu.org,
	Vivek Goyal <vgoyal@redhat.com>,
	groug@kaod.org
Subject: Re: [PATCH v3 26/26] virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it
Date: Thu, 27 May 2021 20:09:05 +0100	[thread overview]
Message-ID: <YK/uUUZI3zy9k8Vk@work-vm> (raw)
In-Reply-To: <YJlSHZ0vzNtCAjkJ@stefanha-x1.localdomain>

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Mon, May 10, 2021 at 11:23:24AM -0400, Vivek Goyal wrote:
> > On Mon, May 10, 2021 at 10:05:09AM +0100, Stefan Hajnoczi wrote:
> > > On Thu, May 06, 2021 at 12:02:23PM -0400, Vivek Goyal wrote:
> > > > On Thu, May 06, 2021 at 04:37:04PM +0100, Stefan Hajnoczi wrote:
> > > > > On Wed, Apr 28, 2021 at 12:01:00PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > > > From: Vivek Goyal <vgoyal@redhat.com>
> > > > > > 
> > > > > > If qemu guest asked to drop CAP_FSETID upon write, send that info
> > > > > > to qemu in SLAVE_FS_IO message so that qemu can drop capability
> > > > > > before WRITE. This is to make sure that any setuid bit is killed
> > > > > > on fd (if there is one set).
> > > > > > 
> > > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > > 
> > > > > I'm not sure if the QEMU FSETID patches make sense. QEMU shouldn't be
> > > > > running with FSETID because QEMU is untrusted. FSETGID would allow QEMU
> > > > > to create setgid files, thereby potentially allowing an attacker to gain
> > > > > any GID.
> > > > 
> > > > Sure, its not recommended to run QEMU as root, but we don't block that
> > > > either and I do regularly test with qemu running as root.
> > > > 
> > > > > 
> > > > > I think it's better not to implement QEMU FSETID functionality at all
> > > > > and to handle it another way.
> > > > 
> > > > One way could be that virtiofsd tries to clear setuid bit after I/O
> > > > has finished. But that will be non-atomic operation and it is filled
> > > > with perils as it requires virtiofsd to know what all kernel will
> > > > do if this write has been done with CAP_FSETID dropped.
> > > > 
> > > > > In the worst case I/O requests should just
> > > > > fail, it seems like a rare case anyway:
> > > > 
> > > > Is there a way for virtiofsd to know if qemu is running with CAP_FSETID
> > > > or not. If there is one, it might be reasonable to error out. If we
> > > > don't know, then we can't fail all the operations.
> > > > 
> > > > > I/O to a setuid/setgid file with
> > > > > a memory buffer that is not mapped in virtiofsd.
> > > > 
> > > > With DAX it is easily triggerable. User has to append to a setuid file
> > > > in virtiofs and this path will trigger.
> > > > 
> > > > I am fine with not supporting this patch but will also need a reaosonable
> > > > alternative solution.
> > > 
> > > One way to avoid this problem is by introducing DMA read/write functions
> > > into the vhost-user protocol that can be used by all device types, not
> > > just virtio-fs.
> > > 
> > > Today virtio-fs uses the IO slave request when it cannot access a region
> > > of guest memory. It sends the file descriptor to QEMU and QEMU performs
> > > the pread(2)/pwrite(2) on behalf of virtiofsd.
> > > 
> > > I mentioned in the past that this solution is over-specialized. It
> > > doesn't solve the larger problem that vhost-user processes do not have
> > > full access to the guest memory space (e.g. DAX window).
> > > 
> > > Instead of sending file I/O requests over to QEMU, the vhost-user
> > > protocol should offer DMA read/write requests so any vhost-user process
> > > can access the guest memory space where vhost's shared memory mechanism
> > > is insufficient.
> > > 
> > > Here is how it would work:
> > > 
> > > 1. Drop the IO slave request, replace it with DMA read/write slave
> > >    requests.
> > > 
> > >    Note that these new requests can also be used in environments where
> > >    maximum vIOMMU isolation is needed for security reasons and sharing
> > >    all of guest RAM with the vhost-user process is considered
> > >    unacceptable.
> > > 
> > > 2. When virtqueue buffer mapping fails, send DMA read/write slave
> > >    requests to transfer the data from/to QEMU. virtiofsd calls
> > >    pread(2)/pwrite(2) itself with virtiofsd's Linux capabilities.
> > 
> > Can you elaborate a bit more how will this new DMA read/write vhost-user
> > commands can be implemented. I am assuming its not a real DMA and just
> > sort of emulation of DMA. Effectively we have two processes and one
> > process needs to read/write to/from address space of other process.
> > 
> > We were also wondering if we can make use of process_vm_readv()
> > and process_vm_write() syscalls to achieve this. But this atleast
> > requires virtiofsd to be more priviliged than qemu and also virtiofsd
> > needs to know where DAX mapping window is. We briefly discussed this here.
> > 
> > https://lore.kernel.org/qemu-devel/20210421200746.GH1579961@redhat.com/
> 
> I wasn't thinking of directly allowing QEMU virtual memory access via
> process_vm_readv/writev(). That would be more efficient but requires
> privileges and also exposes internals of QEMU's virtual memory layout
> and vIOMMU translation to the vhost-user process.
> 
> Instead I was thinking about VHOST_USER_DMA_READ/WRITE messages
> containing the address (a device IOVA, it could just be a guest physical
> memory address in most cases) and the length. The WRITE message would
> also contain the data that the vhost-user device wishes to write. The
> READ message reply would contain the data that the device read from
> QEMU.
> 
> QEMU would implement this using QEMU's address_space_read/write() APIs.
> 
> So basically just a new vhost-user protocol message to do a memcpy(),
> but with guest addresses and vIOMMU support :).

This doesn't actually feel that hard - ignoring vIOMMU for a minute
which I know very little about - I'd have to think where the data
actually flows, probably the slave fd.

> The vhost-user device will need to do bounce buffering so using these
> new messages is slower than zero-copy I/O to shared guest RAM.

I guess the theory is it's only in the weird corner cases anyway.

Dave

> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2021-05-27 19:10 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28 11:00 [PATCH v3 00/26] virtiofs dax patches Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 01/26] virtiofs: Fixup printf args Dr. David Alan Gilbert (git)
2021-05-04 14:54   ` Stefan Hajnoczi
2021-05-05 11:06     ` Dr. David Alan Gilbert
2021-05-06 15:56   ` Dr. David Alan Gilbert
2021-04-28 11:00 ` [PATCH v3 02/26] virtiofsd: Don't assume header layout Dr. David Alan Gilbert (git)
2021-05-04 15:12   ` Stefan Hajnoczi
2021-05-06 15:56   ` Dr. David Alan Gilbert
2021-04-28 11:00 ` [PATCH v3 03/26] DAX: vhost-user: Rework slave return values Dr. David Alan Gilbert (git)
2021-05-04 15:23   ` Stefan Hajnoczi
2021-05-27 15:59     ` Dr. David Alan Gilbert
2021-04-28 11:00 ` [PATCH v3 04/26] DAX: libvhost-user: Route slave message payload Dr. David Alan Gilbert (git)
2021-05-04 15:26   ` Stefan Hajnoczi
2021-04-28 11:00 ` [PATCH v3 05/26] DAX: libvhost-user: Allow popping a queue element with bad pointers Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 06/26] DAX subprojects/libvhost-user: Add virtio-fs slave types Dr. David Alan Gilbert (git)
2021-04-29 15:48   ` Dr. David Alan Gilbert
2021-04-28 11:00 ` [PATCH v3 07/26] DAX: virtio: Add shared memory capability Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 08/26] DAX: virtio-fs: Add cache BAR Dr. David Alan Gilbert (git)
2021-05-05 12:12   ` Stefan Hajnoczi
2021-05-05 18:59     ` Dr. David Alan Gilbert
2021-04-28 11:00 ` [PATCH v3 09/26] DAX: virtio-fs: Add vhost-user slave commands for mapping Dr. David Alan Gilbert (git)
2021-05-05 14:15   ` Stefan Hajnoczi
2021-05-27 16:57     ` Dr. David Alan Gilbert
2021-04-28 11:00 ` [PATCH v3 10/26] DAX: virtio-fs: Fill in " Dr. David Alan Gilbert (git)
2021-05-05 16:43   ` Stefan Hajnoczi
2021-04-28 11:00 ` [PATCH v3 11/26] DAX: virtiofsd Add cache accessor functions Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 12/26] DAX: virtiofsd: Add setup/remove mappings fuse commands Dr. David Alan Gilbert (git)
2021-05-06 15:02   ` Stefan Hajnoczi
2021-04-28 11:00 ` [PATCH v3 13/26] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 14/26] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 15/26] DAX: virtiofsd: Make lo_removemapping() work Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 16/26] DAX: virtiofsd: route se down to destroy method Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 17/26] DAX: virtiofsd: Perform an unmap on destroy Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 18/26] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
2021-05-06 15:12   ` Stefan Hajnoczi
2021-05-27 17:44     ` Dr. David Alan Gilbert
2021-05-06 15:16   ` Stefan Hajnoczi
2021-05-27 17:31     ` Dr. David Alan Gilbert
2021-04-28 11:00 ` [PATCH v3 19/26] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
2021-04-28 12:53   ` Dr. David Alan Gilbert
2021-04-28 11:00 ` [PATCH v3 20/26] DAX/unmap virtiofsd: Parse unmappable elements Dr. David Alan Gilbert (git)
2021-05-06 15:23   ` Stefan Hajnoczi
2021-05-27 17:56     ` Dr. David Alan Gilbert
2021-04-28 11:00 ` [PATCH v3 21/26] DAX/unmap virtiofsd: Route unmappable reads Dr. David Alan Gilbert (git)
2021-05-06 15:27   ` Stefan Hajnoczi
2021-04-28 11:00 ` [PATCH v3 22/26] DAX/unmap virtiofsd: route unmappable write to slave command Dr. David Alan Gilbert (git)
2021-05-06 15:28   ` Stefan Hajnoczi
2021-04-28 11:00 ` [PATCH v3 23/26] DAX:virtiofsd: implement FUSE_INIT map_alignment field Dr. David Alan Gilbert (git)
2021-04-28 11:00 ` [PATCH v3 24/26] vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info Dr. David Alan Gilbert (git)
2021-05-06 15:31   ` Stefan Hajnoczi
2021-05-06 15:32   ` Stefan Hajnoczi
2021-04-28 11:00 ` [PATCH v3 25/26] vhost-user-fs: Implement drop CAP_FSETID functionality Dr. David Alan Gilbert (git)
2021-04-28 11:01 ` [PATCH v3 26/26] virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
2021-05-06 15:37   ` Stefan Hajnoczi
2021-05-06 16:02     ` Vivek Goyal
2021-05-10  9:05       ` Stefan Hajnoczi
2021-05-10 15:23         ` Vivek Goyal
2021-05-10 15:32           ` Stefan Hajnoczi
2021-05-27 19:09             ` Dr. David Alan Gilbert [this message]
2021-06-10 15:29               ` Dr. David Alan Gilbert
2021-06-10 16:23                 ` Stefan Hajnoczi
2021-06-16 12:36                   ` Dr. David Alan Gilbert
2021-06-16 15:29                     ` Stefan Hajnoczi
2021-06-16 18:35                       ` Dr. David Alan Gilbert
2021-04-28 11:27 ` [PATCH v3 00/26] virtiofs dax patches no-reply
2021-05-06 15:37 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YK/uUUZI3zy9k8Vk@work-vm \
    --to=dgilbert@redhat.com \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).