KVM Archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"nicolinc@nvidia.com" <nicolinc@nvidia.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"chao.p.peng@linux.intel.com" <chao.p.peng@linux.intel.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"Cédric Le Goater" <clg@redhat.com>
Subject: Re: [PATCH v2 0/4] vfio-pci support pasid attach/detach
Date: Fri, 26 Apr 2024 11:11:17 -0300	[thread overview]
Message-ID: <20240426141117.GY941030@nvidia.com> (raw)
In-Reply-To: <20240424141349.376bdbf9.alex.williamson@redhat.com>

On Wed, Apr 24, 2024 at 02:13:49PM -0600, Alex Williamson wrote:

> This is kind of an absurd example to portray as a ubiquitous problem.
> Typically the config space layout is a reflection of hardware whether
> the device supports migration or not.

Er, all our HW has FW constructed config space. It changes with FW
upgrades. We change it during the life of the product. This has to be
considered..

> If a driver were to insert a
> virtual capability, then yes it would want to be consistent about it if
> it also cares about migration.  If the driver needs to change the
> location of a virtual capability, problems will arise, but that's also
> not something that every driver needs to do.

Well, mlx5 has to cope with this. It supports so many devices with so
many config space layouts :( I don't know if we can just hard wire an
offset to stick in a PASID cap and expect that to work...

> Also, how exactly does emulating the capability in the VMM solve this
> problem?  Currently QEMU migration simply applies state to an identical
> VM on the target.  QEMU doesn't modify the target VM to conform to the
> data stream.  So in either case, the problem might be more along the
> lines of how to make a V1 device from a V2 driver, which is more the
> device type/flavor/persona problem.

Yes, it doesn't solve anything, it just puts the responsibility for
something that is very complicated in userspace where there are more
options to configure and customize it to the environment.

> Currently QEMU replies on determinism that a given command line results
> in an identical machine configuration and identical devices.  State of
> that target VM is then populated, not defined by, the migration stream.

But that won't be true if the kernel is making decisions. The config
space layout depends now on the kernel driver version too.

> > I think we need to decide, either only the VMM or only the kernel
> > should do this.
> 
> What are you actually proposing?

Okay, what I'm thinking about is a text file that describes the vPCI
function configuration space to create. The community will standardize
this and VMMs will have to implement to get PASID/etc. Maybe the
community will provide a BSD licensed library to do this job.

The text file allows the operator to specify exactly the configuration
space the VFIO function should have. It would not be derived
automatically from physical. AFAIK qemu does not have this capability
currently.

This reflects my observation and discussions around the live migration
standardization. I belive we are fast reaching a point where this is
required.

Consider standards based migration between wildly different
devices. The devices will not standardize their physical config space,
but an operator could generate a consistent vPCI config space that
works with all the devices in their fleet.

Consider the usual working model of the large operators - they define
instance types with some regularity. But an instance type is fixed in
concrete once it is specified, things like the vPCI config space are
fixed.

Running Instance A on newer hardware with a changed physical config
space should continue to present Instance A's vPCI config layout
regardless. Ie Instance A might not support PASID but Instance B can
run on newer HW that does. The config space layout depends on the
requested Instance Type, not the physical layout.

The auto-configuration of the config layout from physical is a nice
feature and is excellent for development/small scale, but it shouldn't
be the only way to work.

So - if we accept that text file configuration should be something the
VMM supports then let's reconsider how to solve the PASID problem.

I'd say the way to solve it should be via a text file specifying a
full config space layout that includes the PASID cap. From the VMM
perspective this works fine, and it ports to every VMM directly via
processing the text file.

The autoconfiguration use case can be done by making a tool build the
text file by deriving it from physical, much like today. The single
instance of that tool could have device specific knowledge to avoid
quirks. This way the smarts can still be shared by all the VMMs
without going into the kernel. Special devices with hidden config
space could get special quirks or special reference text files into
the tool repo.

Serious operators doing production SRIOV/etc would negotiate the text
file with the HW vendors when they define their Instance Type. Ideally
these reference text files would be contributed to the tool repo
above. I think there would be some nice idea to define fully open
source Instance Types that include VFIO devices too.

Is it too much of a fantasy?

Jason

  reply	other threads:[~2024-04-26 14:11 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-12  8:21 [PATCH v2 0/4] vfio-pci support pasid attach/detach Yi Liu
2024-04-12  8:21 ` [PATCH v2 1/4] ida: Add ida_get_lowest() Yi Liu
2024-04-16 16:03   ` Alex Williamson
2024-04-18  7:02     ` Yi Liu
2024-04-18 16:23       ` Alex Williamson
2024-04-18 17:12         ` Jason Gunthorpe
2024-04-19 13:43           ` Yi Liu
2024-04-19 13:55             ` Alex Williamson
2024-04-19 14:00               ` Jason Gunthorpe
2024-04-23  7:19                 ` Yi Liu
2024-04-19 13:40         ` Yi Liu
2024-04-12  8:21 ` [PATCH v2 2/4] vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices Yi Liu
2024-04-16  9:01   ` Tian, Kevin
2024-04-16  9:24     ` Yi Liu
2024-04-16  9:47       ` Tian, Kevin
2024-04-18  7:04         ` Yi Liu
2024-04-23 12:43   ` Jason Gunthorpe
2024-04-24  0:33     ` Tian, Kevin
2024-04-24  4:48     ` Yi Liu
2024-04-12  8:21 ` [PATCH v2 3/4] vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT Yi Liu
2024-04-16  9:13   ` Tian, Kevin
2024-04-16  9:36     ` Yi Liu
2024-04-23 12:45   ` Jason Gunthorpe
2024-04-12  8:21 ` [PATCH v2 4/4] vfio: Report PASID capability via VFIO_DEVICE_FEATURE ioctl Yi Liu
2024-04-16  9:40   ` Tian, Kevin
2024-04-16 17:57   ` Alex Williamson
2024-04-17  7:09     ` Tian, Kevin
2024-04-17 20:25       ` Alex Williamson
2024-04-18  0:21         ` Tian, Kevin
2024-04-18  8:23           ` Yi Liu
2024-04-18 16:34           ` Alex Williamson
2024-04-23 12:39   ` Jason Gunthorpe
2024-04-24  0:24     ` Tian, Kevin
2024-04-24 13:59       ` Jason Gunthorpe
2024-04-16  8:38 ` [PATCH v2 0/4] vfio-pci support pasid attach/detach Tian, Kevin
2024-04-16 17:50   ` Jason Gunthorpe
2024-04-17  7:16     ` Tian, Kevin
2024-04-17 12:20       ` Jason Gunthorpe
2024-04-17 23:02         ` Alex Williamson
2024-04-18  0:06           ` Tian, Kevin
2024-04-18  9:03             ` Yi Liu
2024-04-18 20:37               ` Alex Williamson
2024-04-19  5:52                 ` Tian, Kevin
2024-04-19 16:35                   ` Alex Williamson
2024-04-23  7:43                     ` Tian, Kevin
2024-04-23 12:01                       ` Jason Gunthorpe
2024-04-23 23:47                         ` Tian, Kevin
2024-04-24  0:12                           ` Jason Gunthorpe
2024-04-24  2:57                             ` Tian, Kevin
2024-04-24 12:29                               ` Baolu Lu
2024-04-24 14:04                               ` Jason Gunthorpe
2024-04-24  5:19                             ` Tian, Kevin
2024-04-24 14:15                               ` Jason Gunthorpe
2024-04-24 18:38                                 ` Alex Williamson
2024-04-24 18:45                                   ` Jason Gunthorpe
2024-04-24 18:24                             ` Alex Williamson
2024-04-24 18:36                               ` Jason Gunthorpe
2024-04-24 20:13                                 ` Alex Williamson
2024-04-26 14:11                                   ` Jason Gunthorpe [this message]
2024-04-26 20:13                                     ` Alex Williamson
2024-04-28  6:19                                       ` Tian, Kevin
2024-04-29  7:43                                         ` Yi Liu
2024-04-29 17:15                                         ` Jason Gunthorpe
2024-04-29 17:44                                       ` Jason Gunthorpe
2024-04-27  5:05                                     ` Christoph Hellwig
2024-04-25  9:26                               ` Yi Liu
2024-04-25 12:58                                 ` Alex Williamson
2024-04-26  9:01                                   ` Yi Liu
2024-04-19 13:59                 ` Jason Gunthorpe
2024-04-23  7:58                   ` Yi Liu
2024-04-23 12:05                     ` Jason Gunthorpe
2024-04-19 13:34           ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240426141117.GY941030@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=clg@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jacob.jun.pan@intel.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).