KVM Archive mirror
 help / color / mirror / Atom feed
From: "Yang, Weijiang" <weijiang.yang@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: <pbonzini@redhat.com>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <chao.gao@intel.com>,
	<rick.p.edgecombe@intel.com>, <mlevitsk@redhat.com>,
	<john.allen@amd.com>, Aaron Lewis <aaronlewis@google.com>,
	Jim Mattson <jmattson@google.com>,
	Oliver Upton <oupton@google.com>,
	Mingwei Zhang <mizhang@google.com>
Subject: Re: [PATCH v10 20/27] KVM: VMX: Emulate read and write to CET MSRs
Date: Wed, 13 Mar 2024 17:43:12 +0800	[thread overview]
Message-ID: <9f820b96-0e4b-4cdc-93ff-f63aed829f0d@intel.com> (raw)
In-Reply-To: <ZfDdS8rtVtyEr0UR@google.com>

On 3/13/2024 6:55 AM, Sean Christopherson wrote:
> -non-KVM people, +Mingwei, Aaron, Oliver, and Jim
>
> On Sun, Feb 18, 2024, Yang Weijiang wrote:
>>   	case MSR_IA32_PERF_CAPABILITIES:
>>   		if (data && !vcpu_to_pmu(vcpu)->version)
>>   			return 1;
> Ha, perfect, this is already in the diff context.
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c0ed69353674..281c3fe728c5 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1849,6 +1849,36 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_msr_allowed);
>>   
>> +#define CET_US_RESERVED_BITS		GENMASK(9, 6)
>> +#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
>> +#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
>> +#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
>> +
>> +static bool is_set_cet_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u64 data,
>> +				   bool host_initiated)
>> +{
> ...
>
>> +	/*
>> +	 * If KVM supports the MSR, i.e. has enumerated the MSR existence to
>> +	 * userspace, then userspace is allowed to write '0' irrespective of
>> +	 * whether or not the MSR is exposed to the guest.
>> +	 */
>> +	if (!host_initiated || data)
>> +		return false;
> ...
>
>> @@ -1951,6 +2017,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>>   		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
>>   			return 1;
>>   		break;
>> +	case MSR_IA32_U_CET:
>> +	case MSR_IA32_S_CET:
>> +		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>> +		    !guest_can_use(vcpu, X86_FEATURE_IBT))
>> +			return 1;
> As pointed out by Mingwei in a conversation about PERF_CAPABILITIES, rejecting
> host *reads* while allowing host writes of '0' is inconsistent.  Which, while
> arguably par for the course for KVM's ABI, will likely result in the exact problem
> we're trying to avoid: killing userspace because it attempts to access an MSR KVM
> has said exists.

Thank you for the notification!
Agree on it.

>
> PERF_CAPABILITIES has a similar, but opposite, problem where KVM returns a non-zero
> value on reads, but rejects that same non-zero value on write.  PERF_CAPABILITIES
> is even more complicated because KVM stuff a non-zero value at vCPU creation, but
> that's not really relevant to this discussion, just another data point for how
> messed up this all is.
>
> Also relevant to this discussion are KVM's PV MSRs, e.g. MSR_KVM_ASYNC_PF_ACK,
> as KVM rejects attempts to write '0' if the guest doesn't support the MSR, but
> if and only userspace has enabled KVM_CAP_ENFORCE_PV_FEATURE_CPUID.
>
> Coming to the point, this mess is getting too hard to maintain, both from a code
> perspective and "what is KVM's ABI?" perspective.
>
> Rather than play whack-a-mole and inevitably end up with bugs and/or inconsistencies,
> what if we (a) return KVM_MSR_RET_INVALID when an MSR access is denied based on
> guest CPUID,

Can we define a new return value KVM_MSR_RET_REJECTED for this case in order to tell it from KVM_MSR_RET_INVALID which means the msr index doesn't exit?
> (b) wrap userspace MSR accesses at the very top level and convert
> KVM_MSR_RET_INVALID to "success" when KVM reported the MSR as savable and userspace
> is reading or writing '0',

Yes, this can limit the change on KVM side.

> and (c) drop all of the host_initiated checks that
> exist purely to exempt userspace access from guest CPUID checks.
>
> The only possible hiccup I can think of is that this could subtly break userspace
> that is setting CPUID _after_ MSRs, but my understanding is that we've agreed to
> draw a line and say that that's unsupported.

Yeah,  it would mess up things.

> And I think it's low risk, because
> I don't see how code like this:
>
> 	case MSR_TSC_AUX:
> 		if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
> 			return 1;
>
> 		if (!host_initiated &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> 			return 1;
>
> 		if (guest_cpuid_is_intel(vcpu) && (data >> 32) != 0)
> 			return 1;
>
> can possibly work if userspace sets MSRs first.  The RDTSCP/RDPID checks are
> exempt, but the vendor in guest CPUID would be '0', not Intel's magic string,
> and so setting MSRs before CPUID would fail, at least if the target vCPU model
> is Intel.
>
> P.S. I also want to rename KVM_MSR_RET_INVALID => KVM_MSR_RET_UNSUPPORTED, because
> I can never remember that "invalid" doesn't mean the value was invalid, it means
> the MSR index was invalid.

So do I :-)

>
> It'll take a few patches, but I believe we can end up with something like this:
>
> static bool kvm_is_msr_to_save(u32 msr_index)
> {
> 	unsigned int i;
>
> 	for (i = 0; i < num_msrs_to_save; i++) {
> 		if (msrs_to_save[i] == msr_index)
> 			return true;
> 	}

Should we also check emulated_msrs list here since KVM_GET_MSR_INDEX_LIST exposes it too?

>
> 	return false;
> }
> typedef int (*msr_uaccess_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
> 			     bool host_initiated);
>
> static __always_inline int kvm_do_msr_uaccess(struct kvm_vcpu *vcpu, u32 msr,
> 					      u64 *data, bool host_initiated,
> 					      enum kvm_msr_access rw,
> 					      msr_uaccess_t msr_uaccess_fn)
> {
> 	const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
> 	int ret;
>
> 	BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);
>
> 	/*
> 	 * Zero the data on read failures to avoid leaking stack data to the
> 	 * guest and/or userspace, e.g. if the failure is ignored below.
> 	 */
> 	ret = msr_uaccess_fn(vcpu, msr, data, host_initiated);
> 	if (ret && rw == MSR_TYPE_R)
> 		*data = 0;
>
> 	if (ret != KVM_MSR_RET_UNSUPPORTED)
> 		return ret;
>
> 	/*
> 	 * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
> 	 * reports as to-be-saved, even if an MSRs isn't fully supported.
> 	 * Simply check that @data is '0', which covers both the write '0' case
> 	 * and all reads (in which case @data is zeroed on failure; see above).
> 	 */
> 	if (kvm_is_msr_to_save(msr) && !*data)
> 		return 0;
>
> 	if (!ignore_msrs) {
> 		kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
> 				      op, msr, *data);
> 		return ret;
> 	}
>
> 	if (report_ignored_msrs)
> 		kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
> 	
> 	return 0;
> }

The handling flow looks good to me. Thanks a lot!



  reply	other threads:[~2024-03-13  9:43 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19  7:47 [PATCH v10 00/27] Enable CET Virtualization Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 01/27] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 02/27] x86/fpu/xstate: Refine CET user xstate bit enabling Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 03/27] x86/fpu/xstate: Add CET supervisor mode state support Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 04/27] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set Yang Weijiang
2024-05-01 18:45   ` Sean Christopherson
2024-05-02 17:46     ` Dave Hansen
2024-05-07 22:57       ` Sean Christopherson
2024-05-07 23:17         ` Dave Hansen
2024-05-08  1:19           ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 05/27] x86/fpu/xstate: Introduce fpu_guest_cfg for guest FPU configuration Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 06/27] x86/fpu/xstate: Create guest fpstate with guest specific config Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 07/27] x86/fpu/xstate: Warn if kernel dynamic xfeatures detected in normal fpstate Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 08/27] KVM: x86: Rework cpuid_get_supported_xcr0() to operate on vCPU data Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 09/27] KVM: x86: Rename kvm_{g,s}et_msr()* to menifest emulation operations Yang Weijiang
2024-05-01 18:54   ` Sean Christopherson
2024-05-06  5:58     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 10/27] KVM: x86: Refine xsave-managed guest register/MSR reset handling Yang Weijiang
2024-02-20  3:04   ` Chao Gao
2024-02-20 13:23     ` Yang, Weijiang
2024-05-01 20:40   ` Sean Christopherson
2024-05-06  7:26     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 11/27] KVM: x86: Add kvm_msr_{read,write}() helpers Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 12/27] KVM: x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 13/27] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
2024-02-20  8:51   ` Chao Gao
2024-05-01 20:43   ` Sean Christopherson
2024-05-06  7:30     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 14/27] KVM: x86: Initialize kvm_caps.supported_xss Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 15/27] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 16/27] KVM: x86: Add fault checks for guest CR4.CET setting Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 17/27] KVM: x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
2024-05-01 22:40   ` Sean Christopherson
2024-05-06  8:31     ` Yang, Weijiang
2024-05-07 17:27       ` Sean Christopherson
2024-05-08  7:00         ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 18/27] KVM: VMX: Introduce CET VMCS fields and control bits Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 19/27] KVM: x86: Use KVM-governed feature framework to track "SHSTK/IBT enabled" Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 20/27] KVM: VMX: Emulate read and write to CET MSRs Yang Weijiang
2024-03-12 22:55   ` Sean Christopherson
2024-03-13  9:43     ` Yang, Weijiang [this message]
2024-03-13 16:00       ` Sean Christopherson
2024-02-19  7:47 ` [PATCH v10 21/27] KVM: x86: Save and reload SSP to/from SMRAM Yang Weijiang
2024-05-01 22:50   ` Sean Christopherson
2024-05-06  8:41     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 22/27] KVM: VMX: Set up interception for CET MSRs Yang Weijiang
2024-05-01 23:07   ` Sean Christopherson
2024-05-06  8:48     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 23/27] KVM: VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 24/27] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
2024-05-01 23:15   ` Sean Christopherson
2024-05-01 23:24     ` Edgecombe, Rick P
2024-05-06  9:19     ` Yang, Weijiang
2024-05-06 16:54       ` Sean Christopherson
2024-05-07  2:37         ` Yang, Weijiang
2024-05-06 17:05       ` Edgecombe, Rick P
2024-05-06 23:33         ` Sean Christopherson
2024-05-06 23:53           ` Edgecombe, Rick P
2024-05-07 14:21             ` Sean Christopherson
2024-05-07 14:45               ` Edgecombe, Rick P
2024-05-07 15:08                 ` Sean Christopherson
2024-05-07 15:33                   ` Edgecombe, Rick P
2024-05-16  7:13     ` Yang, Weijiang
2024-05-16 14:39       ` Sean Christopherson
2024-05-16 15:36         ` Dave Hansen
2024-05-16 16:58           ` Sean Christopherson
2024-05-17  8:27             ` Yang, Weijiang
2024-05-17  8:57         ` Thomas Gleixner
2024-05-17 14:26           ` Sean Christopherson
2024-05-20  9:43             ` Yang, Weijiang
2024-05-20 17:09               ` Sean Christopherson
2024-05-20 17:15                 ` Dave Hansen
2024-05-22  9:03                   ` Yang, Weijiang
2024-05-22 15:06                     ` Edgecombe, Rick P
2024-05-23 10:07                       ` Yang, Weijiang
2024-05-22  8:41                 ` Yang, Weijiang
2024-05-27  9:05                   ` Yang, Weijiang
2024-05-01 23:34   ` Sean Christopherson
2024-05-06  9:41     ` Yang, Weijiang
2024-05-16  7:20       ` Yang, Weijiang
2024-05-16 14:43         ` Sean Christopherson
2024-05-17  8:04           ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 25/27] KVM: nVMX: Introduce new VMX_BASIC bit for event error_code delivery to L1 Yang Weijiang
2024-05-01 23:19   ` Sean Christopherson
2024-05-06  9:19     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 26/27] KVM: nVMX: Enable CET support for nested guest Yang Weijiang
2024-05-01 23:23   ` Sean Christopherson
2024-05-06  9:25     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 27/27] KVM: x86: Don't emulate instructions guarded by CET Yang Weijiang
2024-05-01 23:24   ` Sean Christopherson
2024-05-06  9:26     ` Yang, Weijiang
2024-03-06 14:44 ` [PATCH v10 00/27] Enable CET Virtualization Yang, Weijiang
2024-05-01 23:27 ` Sean Christopherson
2024-05-06  9:31   ` Yang, Weijiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f820b96-0e4b-4cdc-93ff-f63aed829f0d@intel.com \
    --to=weijiang.yang@intel.com \
    --cc=aaronlewis@google.com \
    --cc=chao.gao@intel.com \
    --cc=jmattson@google.com \
    --cc=john.allen@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=mlevitsk@redhat.com \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).