All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] TDX module configurability of 0x80000008
@ 2024-04-24 16:55 Edgecombe, Rick P
  2024-04-25 15:09 ` Xiaoyao Li
  2024-05-07 16:41 ` Xiaoyao Li
  0 siblings, 2 replies; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-04-24 16:55 UTC (permalink / raw
  To: kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com

Hi, 

This is a new effort to solicit community feedback for potential future TDX
module features. There are two features in different stages of development
around the configurability of the max physical address exposed in
0x80000008.EAX. I was hoping to get some comments on them and share the current
plans on whether to implement them in KVM. 

One of the TDX module features is called MAXPA_VIRT. In short, it is similar to
KVM’s allow_smaller_maxphyaddr. It requires an explicit opt-in by the VMM, and
allows a TD’s 0x80000008.EAX[7:0] to be configured by the VMM. Accesses to
physical addresses above the specified value by the TD will cause the TDX module
to inject a mostly correct #PF with the RSVD error code set. It has to deal with
the same problems as allow_smaller_maxphyaddr for correctly setting the RSVD
bit. I wasn’t thinking to push this feature for KVM due the movement away from
allow_smaller_maxphyaddr and towards 0x80000008.EAX[23:16]. 

There is also a potential future TDX module feature currently being evaluated
around the configurability of 0x80000008.EAX[23:16]. I wanted to get some
community comments on the feature while it is still in the early stages of
development. 

0x80000008[7:0] is defined by the SDM as MAXPHYADDR. KVM is designed to work
with guest MAXPHYADDR set to host MAXPHYADDR. In the future there is work for
KVM to also accommodate a potentially smaller value in 0x80000008.EAX[23:16] for
normal VMs. This value is defined by AMD spec as GuestPhysAddrSize:
   Maximum guest physical address size in bits. This number applies only to guests
   using nested paging. When this field is zero, refer to the PhysAddrSize field
   for the maximum guest physical address size. 

The idea is that TDX module could add the capability to configure these bits as
well, so that TDs could match normal VMs for cases where there is a desire for
the guests MAXPA to be smaller than the hosts. The requirements would be,
roughly: 
 - The VMM specifies the 0x80000008.EAX[23:16] when creating a TD.
 - The TDX module does sanity checking.  
 - The 0x80000008.EAX[23:16] field is used to communicate the max addressable 
 GPA to  the guest. It will be used by the guest firmware to make sure
 resources like PCI bars are mapped into the addressable GPA.
 - If the guest attempts to access memory beyond the max addressable GPA, then
 the TDX module generates EPT violation to the VMM. For the VMM, this case 
 means that the guest attempted to access "invalid" (I/O) memory. 
 - The VMM will be expected to terminate the TD guest. The VMM may send
 a notification, but the TDX module doesn't necessarily need to know how. 

Glad to hear any comments. Thanks.

Rick

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-24 16:55 [RFC] TDX module configurability of 0x80000008 Edgecombe, Rick P
@ 2024-04-25 15:09 ` Xiaoyao Li
  2024-04-25 16:31   ` Edgecombe, Rick P
  2024-05-07 16:41 ` Xiaoyao Li
  1 sibling, 1 reply; 18+ messages in thread
From: Xiaoyao Li @ 2024-04-25 15:09 UTC (permalink / raw
  To: Edgecombe, Rick P, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com

On 4/25/2024 12:55 AM, Edgecombe, Rick P wrote:
> Hi,
> 
> This is a new effort to solicit community feedback for potential future TDX
> module features. There are two features in different stages of development
> around the configurability of the max physical address exposed in
> 0x80000008.EAX. I was hoping to get some comments on them and share the current
> plans on whether to implement them in KVM.
> 
> One of the TDX module features is called MAXPA_VIRT. In short, it is similar to
> KVM’s allow_smaller_maxphyaddr. It requires an explicit opt-in by the VMM, and
> allows a TD’s 0x80000008.EAX[7:0] to be configured by the VMM. Accesses to
> physical addresses above the specified value by the TD will cause the TDX module
> to inject a mostly correct #PF with the RSVD error code set. It has to deal with
> the same problems as allow_smaller_maxphyaddr for correctly setting the RSVD
> bit. I wasn’t thinking to push this feature for KVM due the movement away from
> allow_smaller_maxphyaddr and towards 0x80000008.EAX[23:16].
> 
> There is also a potential future TDX module feature currently being evaluated
> around the configurability of 0x80000008.EAX[23:16]. I wanted to get some
> community comments on the feature while it is still in the early stages of
> development.
> 
> 0x80000008[7:0] is defined by the SDM as MAXPHYADDR. KVM is designed to work
> with guest MAXPHYADDR set to host MAXPHYADDR. In the future there is work for
> KVM to also accommodate a potentially smaller value in 0x80000008.EAX[23:16] for
> normal VMs. This value is defined by AMD spec as GuestPhysAddrSize:
>     Maximum guest physical address size in bits. This number applies only to guests
>     using nested paging. When this field is zero, refer to the PhysAddrSize field
>     for the maximum guest physical address size.
> 
> The idea is that TDX module could add the capability to configure these bits as
> well, so that TDs could match normal VMs for cases where there is a desire for
> the guests MAXPA to be smaller than the hosts. The requirements would be,
> roughly:
>   - The VMM specifies the 0x80000008.EAX[23:16] when creating a TD.
>   - The TDX module does sanity checking. 
>   - The 0x80000008.EAX[23:16] field is used to communicate the max addressable
>   GPA to  the guest. It will be used by the guest firmware to make sure
>   resources like PCI bars are mapped into the addressable GPA.
>   - If the guest attempts to access memory beyond the max addressable GPA, then
>   the TDX module generates EPT violation to the VMM. For the VMM, this case
>   means that the guest attempted to access "invalid" (I/O) memory.
>   - The VMM will be expected to terminate the TD guest. The VMM may send
>   a notification, but the TDX module doesn't necessarily need to know how.

This is not the same as how it works for normal (non-TDX) VMs.

For normal VMs, when userspace configures a smaller one than what 
hardware EPT/NPT supports, it doesn't cause any issue if guest accesses 
GPA beyond [23:16] but within hardware EPT/NTP capability.

It's more a hint to guest that KVM doesn't enforce the semantics of it. 
However, for TDX case, you are proposing to make it a hard rule.

> Glad to hear any comments. Thanks.
> 
> Rick


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 15:09 ` Xiaoyao Li
@ 2024-04-25 16:31   ` Edgecombe, Rick P
  2024-04-25 16:59     ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-04-25 16:31 UTC (permalink / raw
  To: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com

On Thu, 2024-04-25 at 23:09 +0800, Xiaoyao Li wrote:
> > The idea is that TDX module could add the capability to configure these bits
> > as
> > well, so that TDs could match normal VMs for cases where there is a desire
> > for
> > the guests MAXPA to be smaller than the hosts. The requirements would be,
> > roughly:
> >    - The VMM specifies the 0x80000008.EAX[23:16] when creating a TD.
> >    - The TDX module does sanity checking. 
> >    - The 0x80000008.EAX[23:16] field is used to communicate the max
> > addressable
> >    GPA to  the guest. It will be used by the guest firmware to make sure
> >    resources like PCI bars are mapped into the addressable GPA.
> >    - If the guest attempts to access memory beyond the max addressable GPA,
> > then
> >    the TDX module generates EPT violation to the VMM. For the VMM, this case
> >    means that the guest attempted to access "invalid" (I/O) memory.
> >    - The VMM will be expected to terminate the TD guest. The VMM may send
> >    a notification, but the TDX module doesn't necessarily need to know how.
> 
> This is not the same as how it works for normal (non-TDX) VMs.
> 
> For normal VMs, when userspace configures a smaller one than what 
> hardware EPT/NPT supports, it doesn't cause any issue if guest accesses 
> GPA beyond [23:16] but within hardware EPT/NTP capability.
> 
> It's more a hint to guest that KVM doesn't enforce the semantics of it. 
> However, for TDX case, you are proposing to make it a hard rule.

If we limit ourselves to worrying about valid configurations, accessing a GPA
beyond [23:16] is similar to accessing a GPA with no memslot. Like you say,
[23:16] is a hint, so there is really no change from KVM's perspective. It
behaves like normal based on the [7:0] MAXPA.

What do you think should happen in the case a TD accesses a GPA with no memslot?
KVM/QEMU don't have a lot of options to recover. So are the differences here
just the existing differences between normal VMs and TDX?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 16:31   ` Edgecombe, Rick P
@ 2024-04-25 16:59     ` Sean Christopherson
  2024-04-25 18:20       ` Edgecombe, Rick P
  0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-04-25 16:59 UTC (permalink / raw
  To: Rick P Edgecombe; +Cc: Xiaoyao Li, kvm@vger.kernel.org, pbonzini@redhat.com

On Thu, Apr 25, 2024, Rick P Edgecombe wrote:
> On Thu, 2024-04-25 at 23:09 +0800, Xiaoyao Li wrote:
> > > The idea is that TDX module could add the capability to configure these
> > > bits as well, so that TDs could match normal VMs for cases where there is
> > > a desire for the guests MAXPA to be smaller than the hosts. The
> > > requirements would be,
> > > roughly:
> > >    - The VMM specifies the 0x80000008.EAX[23:16] when creating a TD.
> > >    - The TDX module does sanity checking. 
> > >    - The 0x80000008.EAX[23:16] field is used to communicate the max
> > > addressable
> > >    GPA to  the guest. It will be used by the guest firmware to make sure
> > >    resources like PCI bars are mapped into the addressable GPA.
> > >    - If the guest attempts to access memory beyond the max addressable GPA,
> > > then
> > >    the TDX module generates EPT violation to the VMM. For the VMM, this case
> > >    means that the guest attempted to access "invalid" (I/O) memory.
> > >    - The VMM will be expected to terminate the TD guest. The VMM may send
> > >    a notification, but the TDX module doesn't necessarily need to know how.
> > 
> > This is not the same as how it works for normal (non-TDX) VMs.
> > 
> > For normal VMs, when userspace configures a smaller one than what 
> > hardware EPT/NPT supports, it doesn't cause any issue if guest accesses 
> > GPA beyond [23:16] but within hardware EPT/NTP capability.
> > 
> > It's more a hint to guest that KVM doesn't enforce the semantics of it. 
> > However, for TDX case, you are proposing to make it a hard rule.
> 
> If we limit ourselves to worrying about valid configurations,

Define "valid configurations".  

> accessing a GPA beyond [23:16] is similar to accessing a GPA with no memslot.

No, it's not.  A GPA without a memslot has *very* well-defined semantics in KVM,
and KVM can provide those semantics for all guest-legal GPAs regardless of
hardware EPT/NPT support.

> Like you say, [23:16] is a hint, so there is really no change from KVM's
> perspective. It behaves like normal based on the [7:0] MAXPA.
> 
> What do you think should happen in the case a TD accesses a GPA with no memslot?
 
Synthesize a #VE into the guest.  The GPA isn't a violation of the "real" MAXPHYADDR,
so killing the guest isn't warranted.  And that also means the VMM could legitimately
want to put emulated MMIO above the max addressable GPA.  Synthesizing a #VE is
also aligned with KVM's non-memslot behavior for TDX (configured to trigger #VE).

And most importantly, as you note above, the VMM *can't* resolve the problem.  On
the other hand, the guest *might* be able to resolve the issue, e.g. it could
request MMIO, which may or may not succeed.  Even if the guest panics, that's
far better than it being terminated by the host as it gives the guest a chance
to capture what led to the panic/crash.

The only downside is that the VMM doesn't have a chance to "bless" the #VE, but
since the VMM literally cannot handle the "bad" access in any other than killing
the guest, I don't see that as a major problem.

> KVM/QEMU don't have a lot of options to recover. So are the differences here
> just the existing differences between normal VMs and TDX?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 16:59     ` Sean Christopherson
@ 2024-04-25 18:20       ` Edgecombe, Rick P
  2024-04-25 21:39         ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-04-25 18:20 UTC (permalink / raw
  To: seanjc@google.com; +Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com

On Thu, 2024-04-25 at 09:59 -0700, Sean Christopherson wrote:
> > If we limit ourselves to worrying about valid configurations,
> 
> Define "valid configurations".  

I meant configurations with no memslots above guest max pa. If there are
memslots in that region, I don't know. Maybe valid is the wrong word.

> 
> > accessing a GPA beyond [23:16] is similar to accessing a GPA with no
> > memslot.
> 
> No, it's not.  A GPA without a memslot has *very* well-defined semantics in
> KVM,
> and KVM can provide those semantics for all guest-legal GPAs regardless of
> hardware EPT/NPT support.

Sorry, not following. Are we expecting there to be memslots above the guest
maxpa 23:16? If there are no memslots in that region, it seems exactly like
accessing a GPA with no memslots. What is the difference between before and
after the introduction of guest MAXPA? (there will be normal VMs and TDX
differences of course).

> 
> > Like you say, [23:16] is a hint, so there is really no change from KVM's
> > perspective. It behaves like normal based on the [7:0] MAXPA.
> > 
> > What do you think should happen in the case a TD accesses a GPA with no
> > memslot?
>  
> Synthesize a #VE into the guest.  The GPA isn't a violation of the "real"
> MAXPHYADDR,
> so killing the guest isn't warranted.  And that also means the VMM could
> legitimately
> want to put emulated MMIO above the max addressable GPA.  Synthesizing a #VE
> is
> also aligned with KVM's non-memslot behavior for TDX (configured to trigger
> #VE).
> 
> And most importantly, as you note above, the VMM *can't* resolve the problem. 
> On
> the other hand, the guest *might* be able to resolve the issue, e.g. it could
> request MMIO, which may or may not succeed.  Even if the guest panics, that's
> far better than it being terminated by the host as it gives the guest a chance
> to capture what led to the panic/crash.
> 
> The only downside is that the VMM doesn't have a chance to "bless" the #VE,
> but
> since the VMM literally cannot handle the "bad" access in any other than
> killing
> the guest, I don't see that as a major problem.

Ok, so we want the TDX module to expect the TD to continue to live. Then we need
to handle two things:
1. Trigger #VE for a GPA that is mappable by the EPT level (we can already do
this)
2. Trigger #VE for a GPA that is not mappable by the EPT level

We could ask the TDX module to just handle both of these cases. But this means
KVM loses a bit of control and debug-ability from the host side. Also, it adds
complexity for cases where KVM maps GPAs above guest maxpa anyway. So maybe we
want it to just handle 2? It might have some nuances still.

Another question, should we just tie guest maxpa to GPAW? Either enforce they
are the same, or expose 23:16 based on GPAW.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 18:20       ` Edgecombe, Rick P
@ 2024-04-25 21:39         ` Sean Christopherson
  2024-04-25 22:41           ` Edgecombe, Rick P
  0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-04-25 21:39 UTC (permalink / raw
  To: Rick P Edgecombe; +Cc: Xiaoyao Li, kvm@vger.kernel.org, pbonzini@redhat.com

On Thu, Apr 25, 2024, Rick P Edgecombe wrote:
> On Thu, 2024-04-25 at 09:59 -0700, Sean Christopherson wrote:
> > > accessing a GPA beyond [23:16] is similar to accessing a GPA with no
> > > memslot.
> > 
> > No, it's not.  A GPA without a memslot has *very* well-defined semantics in
> > KVM, and KVM can provide those semantics for all guest-legal GPAs
> > regardless of hardware EPT/NPT support.
> 
> Sorry, not following. Are we expecting there to be memslots above the guest
> maxpa 23:16? If there are no memslots in that region, it seems exactly like
> accessing a GPA with no memslots. What is the difference between before and
> after the introduction of guest MAXPA? (there will be normal VMs and TDX
> differences of course).

If there are no memslots, nothing from a functional perspectives, just a very
slight increase in latency.  Pre-TDX, KVM can always emulate in reponse to an EPT
violation on an unmappable GPA.  I.e. as long as there is no memslot, KVM doesn't
*need* to create SPTEs, and so whether or not a GPA is mappable is completely
irrelevant.

Enter TDX, and suddenly that doesn't work because KVM can't emulate without guest
cooperation.  And to get guest cooperation, _something_ needs to kick the guest
with a #VE.

> > > Like you say, [23:16] is a hint, so there is really no change from KVM's
> > > perspective. It behaves like normal based on the [7:0] MAXPA.
> > > 
> > > What do you think should happen in the case a TD accesses a GPA with no
> > > memslot?
> >  
> > Synthesize a #VE into the guest.  The GPA isn't a violation of the "real"
> > MAXPHYADDR, so killing the guest isn't warranted.  And that also means the
> > VMM could legitimately want to put emulated MMIO above the max addressable
> > GPA.  Synthesizing a #VE is also aligned with KVM's non-memslot behavior
> > for TDX (configured to trigger #VE).
> > 
> > And most importantly, as you note above, the VMM *can't* resolve the
> > problem.  On the other hand, the guest *might* be able to resolve the
> > issue, e.g. it could request MMIO, which may or may not succeed.  Even if
> > the guest panics, that's far better than it being terminated by the host as
> > it gives the guest a chance to capture what led to the panic/crash.
> > 
> > The only downside is that the VMM doesn't have a chance to "bless" the #VE,
> > but since the VMM literally cannot handle the "bad" access in any other
> > than killing the guest, I don't see that as a major problem.
> 
> Ok, so we want the TDX module to expect the TD to continue to live. Then we need
> to handle two things:
> 1. Trigger #VE for a GPA that is mappable by the EPT level (we can already do
> this)
> 2. Trigger #VE for a GPA that is not mappable by the EPT level
> 
> We could ask the TDX module to just handle both of these cases. But this means
> KVM loses a bit of control and debug-ability from the host side.

Why would the TDX module touch #1?  Just leave it as is.

> Also, it adds complexity for cases where KVM maps GPAs above guest maxpa
> anyway.

That should be disallowed.  If KVM tries to map an address that it told the guest
was impossible to map, then the TDX module should throw an error.

> So maybe we want it to just handle 2? It might have some nuances still.

I'm sure there are nuances, but I don't know that we care.  I see three options:

 1. Resume the guest without doing anything and hang the guest.

 2. Punt the issue to the VMM and kill the guest.

 3. Inject #VE into the guest and maybe the guest lives.

#1 is terrible for obvious reasons, so given the choice between guaranteed death
and a slim chance of survival, I'll take that slim chance of survival :-) 

> Another question, should we just tie guest maxpa to GPAW?

Yes

> Either enforce they are the same, or expose 23:16 based on GPAW.

I can't think of any reason not to derive 23:16 from GPAW, unless I'm missing
some subtlety, they're quite literally the same thing.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 21:39         ` Sean Christopherson
@ 2024-04-25 22:41           ` Edgecombe, Rick P
  2024-04-25 22:53             ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-04-25 22:41 UTC (permalink / raw
  To: seanjc@google.com; +Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com

On Thu, 2024-04-25 at 14:39 -0700, Sean Christopherson wrote:
> On Thu, Apr 25, 2024, Rick P Edgecombe wrote:
> > On Thu, 2024-04-25 at 09:59 -0700, Sean Christopherson wrote:
> > > > accessing a GPA beyond [23:16] is similar to accessing a GPA with no
> > > > memslot.
> > > 
> > > No, it's not.  A GPA without a memslot has *very* well-defined semantics
> > > in
> > > KVM, and KVM can provide those semantics for all guest-legal GPAs
> > > regardless of hardware EPT/NPT support.
> > 
> > Sorry, not following. Are we expecting there to be memslots above the guest
> > maxpa 23:16? If there are no memslots in that region, it seems exactly like
> > accessing a GPA with no memslots. What is the difference between before and
> > after the introduction of guest MAXPA? (there will be normal VMs and TDX
> > differences of course).
> 
> If there are no memslots, nothing from a functional perspectives, just a very
> slight increase in latency.  Pre-TDX, KVM can always emulate in reponse to an
> EPT
> violation on an unmappable GPA.  I.e. as long as there is no memslot, KVM
> doesn't
> *need* to create SPTEs, and so whether or not a GPA is mappable is completely
> irrelevant.

Right, although there are gaps in emulation that could fail. If the emulation
succeeds and there is an MMIO exit targeting a totally unknown GPA, then I guess
it's up to userspace to decide what to do.

KVM's done its job. But userspace still has to handle it. It can, but I was
under the impression it didn't (maybe bad assumption).

> 
> Enter TDX, and suddenly that doesn't work because KVM can't emulate without
> guest
> cooperation.  And to get guest cooperation, _something_ needs to kick the
> guest
> with a #VE.
> 
> > > > Like you say, [23:16] is a hint, so there is really no change from KVM's
> > > > perspective. It behaves like normal based on the [7:0] MAXPA.
> > > > 
> > > > What do you think should happen in the case a TD accesses a GPA with no
> > > > memslot?
> > >  
> > > Synthesize a #VE into the guest.  The GPA isn't a violation of the "real"
> > > MAXPHYADDR, so killing the guest isn't warranted.  And that also means the
> > > VMM could legitimately want to put emulated MMIO above the max addressable
> > > GPA.  Synthesizing a #VE is also aligned with KVM's non-memslot behavior
> > > for TDX (configured to trigger #VE).
> > > 
> > > And most importantly, as you note above, the VMM *can't* resolve the
> > > problem.  On the other hand, the guest *might* be able to resolve the
> > > issue, e.g. it could request MMIO, which may or may not succeed.  Even if
> > > the guest panics, that's far better than it being terminated by the host
> > > as
> > > it gives the guest a chance to capture what led to the panic/crash.
> > > 
> > > The only downside is that the VMM doesn't have a chance to "bless" the
> > > #VE,
> > > but since the VMM literally cannot handle the "bad" access in any other
> > > than killing the guest, I don't see that as a major problem.
> > 
> > Ok, so we want the TDX module to expect the TD to continue to live. Then we
> > need
> > to handle two things:
> > 1. Trigger #VE for a GPA that is mappable by the EPT level (we can already
> > do
> > this)
> > 2. Trigger #VE for a GPA that is not mappable by the EPT level
> > 
> > We could ask the TDX module to just handle both of these cases. But this
> > means
> > KVM loses a bit of control and debug-ability from the host side.
> 
> Why would the TDX module touch #1?  Just leave it as is.

I think it won't even come up if GPAW is locked to 23:16 like discussed below.
(and the current plan for picking EPT level).

> 
> > Also, it adds complexity for cases where KVM maps GPAs above guest maxpa
> > anyway.
> 
> That should be disallowed.  If KVM tries to map an address that it told the
> guest
> was impossible to map, then the TDX module should throw an error.

Hmm. I'll mention this, but I don't see why KVM needs the TDX module to filter
it. It seems in the range of userspace being allowed to create nonsense
configurations that only hurt its own guest.

If we think the TDX module should do it, then maybe we should have KVM sanity
filter these out today in preparation.

> 
> > So maybe we want it to just handle 2? It might have some nuances still.
> 
> I'm sure there are nuances, but I don't know that we care.  I see three
> options:
> 
>  1. Resume the guest without doing anything and hang the guest.
> 
>  2. Punt the issue to the VMM and kill the guest.
> 
>  3. Inject #VE into the guest and maybe the guest lives.
> 
> #1 is terrible for obvious reasons, so given the choice between guaranteed
> death
> and a slim chance of survival, I'll take that slim chance of survival :-) 

Yes, maybe this proposal was being a bit lazy.

> 
> > Another question, should we just tie guest maxpa to GPAW?
> 
> Yes
> 
> > Either enforce they are the same, or expose 23:16 based on GPAW.
> 
> I can't think of any reason not to derive 23:16 from GPAW, unless I'm missing
> some subtlety, they're quite literally the same thing.

So we have:
 - Expose GPAW in 23:16
 - Inject #VE if ept violation is for gpa that can't be mapped by EPT level

Seems relatively simple. I'll wait a bit for more comments, and circle back with
TDX module folks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 22:41           ` Edgecombe, Rick P
@ 2024-04-25 22:53             ` Sean Christopherson
  2024-04-25 23:08               ` Edgecombe, Rick P
  0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-04-25 22:53 UTC (permalink / raw
  To: Rick P Edgecombe; +Cc: Xiaoyao Li, kvm@vger.kernel.org, pbonzini@redhat.com

On Thu, Apr 25, 2024, Rick P Edgecombe wrote:
> On Thu, 2024-04-25 at 14:39 -0700, Sean Christopherson wrote:
> > On Thu, Apr 25, 2024, Rick P Edgecombe wrote:
> > > On Thu, 2024-04-25 at 09:59 -0700, Sean Christopherson wrote:
> > > > > accessing a GPA beyond [23:16] is similar to accessing a GPA with no
> > > > > memslot.
> > > > 
> > > > No, it's not.  A GPA without a memslot has *very* well-defined
> > > > semantics in KVM, and KVM can provide those semantics for all
> > > > guest-legal GPAs regardless of hardware EPT/NPT support.
> > > 
> > > Sorry, not following. Are we expecting there to be memslots above the guest
> > > maxpa 23:16? If there are no memslots in that region, it seems exactly like
> > > accessing a GPA with no memslots. What is the difference between before and
> > > after the introduction of guest MAXPA? (there will be normal VMs and TDX
> > > differences of course).
> > 
> > If there are no memslots, nothing from a functional perspectives, just a
> > very slight increase in latency.  Pre-TDX, KVM can always emulate in
> > reponse to an EPT violation on an unmappable GPA.  I.e. as long as there is
> > no memslot, KVM doesn't *need* to create SPTEs, and so whether or not a GPA
> > is mappable is completely irrelevant.
> 
> Right, although there are gaps in emulation that could fail. If the emulation
> succeeds and there is an MMIO exit targeting a totally unknown GPA, then I guess
> it's up to userspace to decide what to do.
> 
> KVM's done its job.

Yep.

> But userspace still has to handle it. It can, but I was under the impression
> it didn't (maybe bad assumption).

I'm pretty sure QEMU handles accesses to non-existent MMIO with PCI abort semantics,
i.e. ignores writes and returns all FFs for reads.

> > > Also, it adds complexity for cases where KVM maps GPAs above guest maxpa
> > > anyway.
> > 
> > That should be disallowed.  If KVM tries to map an address that it told the
> > guest was impossible to map, then the TDX module should throw an error.
> 
> Hmm. I'll mention this, but I don't see why KVM needs the TDX module to filter
> it. It seems in the range of userspace being allowed to create nonsense
> configurations that only hurt its own guest.

Because the whole point of TDX is to protect the guest from the bad, naughty host?

> If we think the TDX module should do it, then maybe we should have KVM sanity
> filter these out today in preparation.

Nope.  KVM isn't in the guest's TCB, TDX is.  KVM's stance is that userspace is
responsible for providing a sane vCPU model, because defining what is "sane" is
extremely difficult unless the definition is super prescriptive, a la TDX. 

E.g. letting the host map something that TDX's spec says will cause #VE would
create a novel attack surface.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 22:53             ` Sean Christopherson
@ 2024-04-25 23:08               ` Edgecombe, Rick P
  2024-04-25 23:28                 ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-04-25 23:08 UTC (permalink / raw
  To: seanjc@google.com; +Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com

On Thu, 2024-04-25 at 15:53 -0700, Sean Christopherson wrote:
> > Hmm. I'll mention this, but I don't see why KVM needs the TDX module to
> > filter
> > it. It seems in the range of userspace being allowed to create nonsense
> > configurations that only hurt its own guest.
> 
> Because the whole point of TDX is to protect the guest from the bad, naughty
> host?

DOS naughtiness by the host is allowed though.

> 
> > If we think the TDX module should do it, then maybe we should have KVM
> > sanity
> > filter these out today in preparation.
> 
> Nope.  KVM isn't in the guest's TCB, TDX is.
>   KVM's stance is that userspace is
> responsible for providing a sane vCPU model, because defining what is "sane"
> is
> extremely difficult unless the definition is super prescriptive, a la TDX. 
> 
> E.g. letting the host map something that TDX's spec says will cause #VE would
> create a novel attack surface.

I thought that the shared half could be mapped in that range unless KVM gets
involved. But, no, as long as we tie GPAW, 23:16, ept-level all together, then
mapping something above it won't even make sense.

I don't see attack surface risk immediately. I expect this will get more
internal scrutiny in that regard though.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 23:08               ` Edgecombe, Rick P
@ 2024-04-25 23:28                 ` Sean Christopherson
  2024-04-25 23:38                   ` Edgecombe, Rick P
  0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-04-25 23:28 UTC (permalink / raw
  To: Rick P Edgecombe; +Cc: Xiaoyao Li, kvm@vger.kernel.org, pbonzini@redhat.com

On Thu, Apr 25, 2024, Rick P Edgecombe wrote:
> On Thu, 2024-04-25 at 15:53 -0700, Sean Christopherson wrote:
> > > Hmm. I'll mention this, but I don't see why KVM needs the TDX module to
> > > filter
> > > it. It seems in the range of userspace being allowed to create nonsense
> > > configurations that only hurt its own guest.
> > 
> > Because the whole point of TDX is to protect the guest from the bad, naughty
> > host?
> 
> DOS naughtiness by the host is allowed though.
> 
> > 
> > > If we think the TDX module should do it, then maybe we should have KVM
> > > sanity filter these out today in preparation.
> > 
> > Nope.  KVM isn't in the guest's TCB, TDX is.    KVM's stance is that
> > userspace is responsible for providing a sane vCPU model, because defining
> > what is "sane" is extremely difficult unless the definition is super
> > prescriptive, a la TDX. 
> > 
> > E.g. letting the host map something that TDX's spec says will cause #VE would
> > create a novel attack surface.
> 
> I thought that the shared half could be mapped in that range unless KVM gets
> involved. But, no, as long as we tie GPAW, 23:16, ept-level all together, then
> mapping something above it won't even make sense.
> 
> I don't see attack surface risk immediately. I expect this will get more
> internal scrutiny in that regard though.

Oooh, I thought you were talking about KVM mapping a private GPA address in S-EPT
above the reported GPAW.  In hindsight, I don't know _why_ I thought that.

Yeah, trying to sanity check the shared EPT in the TDX module would be comically
pointless.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 23:28                 ` Sean Christopherson
@ 2024-04-25 23:38                   ` Edgecombe, Rick P
  2024-05-06 18:40                     ` Edgecombe, Rick P
  0 siblings, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-04-25 23:38 UTC (permalink / raw
  To: seanjc@google.com; +Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com

On Thu, 2024-04-25 at 16:28 -0700, Sean Christopherson wrote:
> > > > If we think the TDX module should do it, then maybe we should have KVM
> > > > sanity filter these out today in preparation.
> > > 
> > > Nope.  KVM isn't in the guest's TCB, TDX is.    KVM's stance is that
> > > userspace is responsible for providing a sane vCPU model, because defining
> > > what is "sane" is extremely difficult unless the definition is super
> > > prescriptive, a la TDX. 
> > > 
> > > E.g. letting the host map something that TDX's spec says will cause #VE
> > > would
> > > create a novel attack surface.
> > 
> > I thought that the shared half could be mapped in that range unless KVM gets
> > involved. But, no, as long as we tie GPAW, 23:16, ept-level all together,
> > then
> > mapping something above it won't even make sense.
> > 
> > I don't see attack surface risk immediately. I expect this will get more
> > internal scrutiny in that regard though.
> 
> Oooh, I thought you were talking about KVM mapping a private GPA address in S-
> EPT
> above the reported GPAW.  In hindsight, I don't know _why_ I thought that.
> 
> Yeah, trying to sanity check the shared EPT in the TDX module would be
> comically
> pointless.

I might have been thinking that as well? Wasn't the fullest thought. Sorry for
the confusion.

In any case it should be moot for the solution we are going for in KVM. I'll
mention it to them though, because just because KVM will not do GPA_48 and 5-
level EPT, doesn't mean another VMM wont.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-25 23:38                   ` Edgecombe, Rick P
@ 2024-05-06 18:40                     ` Edgecombe, Rick P
  2024-05-07 14:22                       ` Chao Gao
  0 siblings, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-05-06 18:40 UTC (permalink / raw
  To: seanjc@google.com; +Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com

Follow up on this:

1. The plan is to just always inject the #VEs for private and shared GPAs that
exceed GPAW. (i.e. not pass the subset of EPT violations that could be handled
by the VMM by clearing suppress #VE)


2. There was some concern that exposing non-zero bits in [23:16] could confuse
existing TDs. Of course KVM doesn't support any TDs today, but if this feature
comes after initial KVM support for TDX and KVM wants to set it by default, then
it could be an issue.

For normal VMs, is there any concern that guests might not be masking the bits
correctly?

TDX module folks were pushing for a guest opt-in out of concern some breakages
could result. Of course it requires additional enabling in the guest OS and
vBIOS then. I was thinking it should be a host opt-in without guest control. If
there was a problem it could be a host userspace opt-in. Any concerns there?

Thanks,

Rick

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-05-06 18:40                     ` Edgecombe, Rick P
@ 2024-05-07 14:22                       ` Chao Gao
  2024-05-07 14:49                         ` Edgecombe, Rick P
  0 siblings, 1 reply; 18+ messages in thread
From: Chao Gao @ 2024-05-07 14:22 UTC (permalink / raw
  To: Edgecombe, Rick P
  Cc: seanjc@google.com, Li, Xiaoyao, kvm@vger.kernel.org,
	pbonzini@redhat.com

On Mon, May 06, 2024 at 06:40:03PM +0000, Edgecombe, Rick P wrote:
>Follow up on this:
>
>1. The plan is to just always inject the #VEs for private and shared GPAs that
>exceed GPAW. (i.e. not pass the subset of EPT violations that could be handled
>by the VMM by clearing suppress #VE)
>
>
>2. There was some concern that exposing non-zero bits in [23:16] could confuse
>existing TDs. Of course KVM doesn't support any TDs today, but if this feature
>comes after initial KVM support for TDX and KVM wants to set it by default, then
>it could be an issue.

Do you mean some TDs may assert that [23:16] are 0s? A future-proof design
won't have this assertion. And this case (i.e., some CPUID bits become non-zero)
happens on every new generation of CPUs and doesn't confuse existing OSes. I
don't understand why it would be a problem for TDs.

>
>For normal VMs, is there any concern that guests might not be masking the bits
>correctly?
>
>TDX module folks were pushing for a guest opt-in out of concern some breakages
>could result. Of course it requires additional enabling in the guest OS and
>vBIOS then. I was thinking it should be a host opt-in without guest control. If
>there was a problem it could be a host userspace opt-in. Any concerns there?
>
>Thanks,
>
>Rick

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-05-07 14:22                       ` Chao Gao
@ 2024-05-07 14:49                         ` Edgecombe, Rick P
  2024-05-07 16:21                           ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-05-07 14:49 UTC (permalink / raw
  To: Gao, Chao
  Cc: Li, Xiaoyao, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com

On Tue, 2024-05-07 at 22:22 +0800, Chao Gao wrote:
> > 2. There was some concern that exposing non-zero bits in [23:16] could
> > confuse
> > existing TDs. Of course KVM doesn't support any TDs today, but if this
> > feature
> > comes after initial KVM support for TDX and KVM wants to set it by default,
> > then
> > it could be an issue.
> 
> Do you mean some TDs may assert that [23:16] are 0s? A future-proof design
> won't have this assertion. And this case (i.e., some CPUID bits become non-
> zero)
> happens on every new generation of CPUs and doesn't confuse existing OSes. I
> don't understand why it would be a problem for TDs.

Intel defined these as reserved. AMD defined them for guest MAXPA. So, yes, OSs
should be masking them. I'm not suggesting that any are not, but TDX module
folks were concerned about this, and that then KVM would not be able to turn
this on later without breaking them. So just circling back here to double check.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-05-07 14:49                         ` Edgecombe, Rick P
@ 2024-05-07 16:21                           ` Sean Christopherson
  0 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-05-07 16:21 UTC (permalink / raw
  To: Rick P Edgecombe
  Cc: Chao Gao, Xiaoyao Li, kvm@vger.kernel.org, pbonzini@redhat.com

On Tue, May 07, 2024, Rick P Edgecombe wrote:
> On Tue, 2024-05-07 at 22:22 +0800, Chao Gao wrote:
> > > 2. There was some concern that exposing non-zero bits in [23:16] could
> > > confuse existing TDs. Of course KVM doesn't support any TDs today, but if
> > > this feature comes after initial KVM support for TDX and KVM wants to set
> > > it by default, then it could be an issue.
> > 
> > Do you mean some TDs may assert that [23:16] are 0s? A future-proof design
> > won't have this assertion. And this case (i.e., some CPUID bits become non-
> > zero) happens on every new generation of CPUs and doesn't confuse existing
> > OSes. I don't understand why it would be a problem for TDs.
> 
> Intel defined these as reserved. AMD defined them for guest MAXPA. So, yes, OSs
> should be masking them. I'm not suggesting that any are not, but TDX module
> folks were concerned about this, and that then KVM would not be able to turn
> this on later without breaking them. So just circling back here to double check.

I'm with Chao, a kernel/firmware implementation that asserts some CPUID bits that
are currently reserved on _some_ CPUs are always zero deserves to be broken.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-04-24 16:55 [RFC] TDX module configurability of 0x80000008 Edgecombe, Rick P
  2024-04-25 15:09 ` Xiaoyao Li
@ 2024-05-07 16:41 ` Xiaoyao Li
  2024-05-07 17:11   ` Sean Christopherson
  1 sibling, 1 reply; 18+ messages in thread
From: Xiaoyao Li @ 2024-05-07 16:41 UTC (permalink / raw
  To: Edgecombe, Rick P, kvm@vger.kernel.org, pbonzini@redhat.com,
	seanjc@google.com

On 4/25/2024 12:55 AM, Edgecombe, Rick P wrote:
> Hi,
> 
> This is a new effort to solicit community feedback for potential future TDX
> module features. There are two features in different stages of development
> around the configurability of the max physical address exposed in
> 0x80000008.EAX. I was hoping to get some comments on them and share the current
> plans on whether to implement them in KVM.

Sean and Paolo,

> One of the TDX module features is called MAXPA_VIRT. In short, it is similar to
> KVM’s allow_smaller_maxphyaddr. It requires an explicit opt-in by the VMM, and
> allows a TD’s 0x80000008.EAX[7:0] to be configured by the VMM. Accesses to
> physical addresses above the specified value by the TD will cause the TDX module
> to inject a mostly correct #PF with the RSVD error code set. It has to deal with
> the same problems as allow_smaller_maxphyaddr for correctly setting the RSVD
> bit. I wasn’t thinking to push this feature for KVM due the movement away from
> allow_smaller_maxphyaddr and towards 0x80000008.EAX[23:16].
> 

I would like to get your opinion of the MAXPA_VIRT feature of TDX. What 
is likely the KVM's decision on it? Won't support it due to it has the 
same limitation of allow_smaller_maxphyaddr?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-05-07 16:41 ` Xiaoyao Li
@ 2024-05-07 17:11   ` Sean Christopherson
  2024-05-08  7:50     ` Xiaoyao Li
  0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-05-07 17:11 UTC (permalink / raw
  To: Xiaoyao Li; +Cc: Rick P Edgecombe, kvm@vger.kernel.org, pbonzini@redhat.com

On Wed, May 08, 2024, Xiaoyao Li wrote:
> On 4/25/2024 12:55 AM, Edgecombe, Rick P wrote:
> > One of the TDX module features is called MAXPA_VIRT. In short, it is similar to
> > KVM’s allow_smaller_maxphyaddr. It requires an explicit opt-in by the VMM, and
> > allows a TD’s 0x80000008.EAX[7:0] to be configured by the VMM. Accesses to
> > physical addresses above the specified value by the TD will cause the TDX module
> > to inject a mostly correct #PF with the RSVD error code set. It has to deal with
> > the same problems as allow_smaller_maxphyaddr for correctly setting the RSVD
> > bit. I wasn’t thinking to push this feature for KVM due the movement away from
> > allow_smaller_maxphyaddr and towards 0x80000008.EAX[23:16].
> > 
> 
> I would like to get your opinion of the MAXPA_VIRT feature of TDX. What is
> likely the KVM's decision on it? Won't support it due to it has the same
> limitation of allow_smaller_maxphyaddr?

Not supporting MAXPA_VIRT has my vote.  I'm of the opinion that allow_smaller_maxphyaddr
should die a horrible, fiery death :-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] TDX module configurability of 0x80000008
  2024-05-07 17:11   ` Sean Christopherson
@ 2024-05-08  7:50     ` Xiaoyao Li
  0 siblings, 0 replies; 18+ messages in thread
From: Xiaoyao Li @ 2024-05-08  7:50 UTC (permalink / raw
  To: Sean Christopherson
  Cc: Rick P Edgecombe, kvm@vger.kernel.org, pbonzini@redhat.com

On 5/8/2024 1:11 AM, Sean Christopherson wrote:
> On Wed, May 08, 2024, Xiaoyao Li wrote:
>> On 4/25/2024 12:55 AM, Edgecombe, Rick P wrote:
>>> One of the TDX module features is called MAXPA_VIRT. In short, it is similar to
>>> KVM’s allow_smaller_maxphyaddr. It requires an explicit opt-in by the VMM, and
>>> allows a TD’s 0x80000008.EAX[7:0] to be configured by the VMM. Accesses to
>>> physical addresses above the specified value by the TD will cause the TDX module
>>> to inject a mostly correct #PF with the RSVD error code set. It has to deal with
>>> the same problems as allow_smaller_maxphyaddr for correctly setting the RSVD
>>> bit. I wasn’t thinking to push this feature for KVM due the movement away from
>>> allow_smaller_maxphyaddr and towards 0x80000008.EAX[23:16].
>>>
>>
>> I would like to get your opinion of the MAXPA_VIRT feature of TDX. What is
>> likely the KVM's decision on it? Won't support it due to it has the same
>> limitation of allow_smaller_maxphyaddr?
> 
> Not supporting MAXPA_VIRT has my vote.  I'm of the opinion that allow_smaller_maxphyaddr
> should die a horrible, fiery death :-)

Thanks for the response. It's good to know your preference.

I'm not sure if there is any user of "allow_smaller_maxphyaddr". On QEMU 
side, it doesn't check it nor rely on it. QEMU always allow the user to 
configure a smaller PA.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-05-08  7:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-24 16:55 [RFC] TDX module configurability of 0x80000008 Edgecombe, Rick P
2024-04-25 15:09 ` Xiaoyao Li
2024-04-25 16:31   ` Edgecombe, Rick P
2024-04-25 16:59     ` Sean Christopherson
2024-04-25 18:20       ` Edgecombe, Rick P
2024-04-25 21:39         ` Sean Christopherson
2024-04-25 22:41           ` Edgecombe, Rick P
2024-04-25 22:53             ` Sean Christopherson
2024-04-25 23:08               ` Edgecombe, Rick P
2024-04-25 23:28                 ` Sean Christopherson
2024-04-25 23:38                   ` Edgecombe, Rick P
2024-05-06 18:40                     ` Edgecombe, Rick P
2024-05-07 14:22                       ` Chao Gao
2024-05-07 14:49                         ` Edgecombe, Rick P
2024-05-07 16:21                           ` Sean Christopherson
2024-05-07 16:41 ` Xiaoyao Li
2024-05-07 17:11   ` Sean Christopherson
2024-05-08  7:50     ` Xiaoyao Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.