All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: stsp <stsp2@yandex.ru>
Cc: kvm@vger.kernel.org
Subject: Re: guest/host mem out of sync on core2duo?
Date: Thu, 17 Jun 2021 14:42:08 +0000	[thread overview]
Message-ID: <YMtfQHGJL7XP/0Rq@google.com> (raw)
In-Reply-To: <73f1f90e-f952-45a4-184e-1aafb3e4a8fd@yandex.ru>

Dropped my old @intel email to stop getting bounces.

On Mon, Jun 14, 2021, stsp wrote:
> 14.06.2021 20:06, Sean Christopherson пишет:
> > On Sun, Jun 13, 2021, stsp wrote:
> > > Hi kvm developers.
> > > 
> > > I am having the strange problem that can only be reproduced on a core2duo CPU
> > > but not AMD FX or Intel Core I7.
> > > 
> > > My code has 2 ways of setting the guest registers: one is the guest's ring0
> > > stub that just pops all regs from stack and does iret to ring3.  That works
> > > fine.  But sometimes I use KVM_SET_SREGS and resume the VM directly to ring3.
> > > That randomly results in either a good run or invalid guest state return, or
> > > a page fault in guest.
> > Hmm, a core2duo failure is more than likely due to lack of unrestricted guest.
> > You verify this by loading kvm_intel on the Core i7 with unrestricted_guest=0.
> 
> Wow, excellent shot!  Indeed, the problem then starts reproducing also there!
> So at least I now have a problematic setup myself, rather than needing to ask
> for ssh from everyone involved. :)
> 
> What does this mean to us, though?  That its completely unrelated to any
> memory synchronization?

Yes, more than likely this has nothing to do with memory synchronization.

> > > I tried to analyze when either of the above happens exactly, and I have a
> > > very strong suspection that the problem is in a way I update LDT. LDT is
> > > shared between guest and host with KVM_SET_USER_MEMORY_REGION, and I modify
> > > it on host.  So it seems like if I just allocated the new LDT entry, there is
> > > a risk of invalid guest state, as if the guest's LDT still doesn't have it.
> > > If I modified some LDT entry, there can be a page fault in guest, as if the
> > > entry is still old.
> > IIUC, you are updating the LDT itself, e.g. an FS/GS descriptor in the LDT, as
> > opposed to updating the LDT descriptor in the GDT?
> 
> I am updating the LDT itself, not modifying its descriptor in gdt. And with
> the same KVM_SET_SREGS call I also update the segregs to the new values, if
> needed.

Hmm, unconditionally calling KVM_SET_SREGS if you modify anything in the LDT
would be worth trying.  Or did I misunderstand the "if needed" part?

> > Either way, do you also update all relevant segments via KVM_SET_SREGS after
> > modifying memory?
> 
> Yes, if this is needed.  Sometimes its not needed, and when not - it seems
> page fault is more likely. If I also update segregs - then invalid guest
> state.  But these are just the statistical guesses so far.

Ah.  Hrm.  It would still be worth doing KVM_SET_SREGS unconditionally, e.g. it
would narrow the search if the page faults go away and the failures are always
invalid guest state.

> >     Best guess is that KVM doesn't detect that the VM has state
> > that needs to be emulated, or that KVM's internal register state and what's in
> > memory are not consistent.
> 
> Hope you know what parts are emulated w/o unrestricted guest, in which case
> we can advance. :)

It's not parts per se.  KVM needs to emulate "everything", one instruction at a
time, until guest state is no longer invalid with respec to the !unrestricted
rules.

> > Anyways, I highly doubt this is a memory synchronization issue, a corner case
> > related to lack of unrestricted guest is much more likely.
> 
> Just to be sure I tried the CD bit in CR0 to rule out the caching issues, and
> that changes nothing.  So...
>
> What to do next?

In addition to the above experiment, can you get a state dump for the invalid
guest state failure?  I.e. load kvm_intel with dump_invalid_vmcs=1.  And on that
failure, also provide the input to KVM_SET_SREGS.  The LDT in memory might also
be interesting, but it's hopefully unnecessary, especially if unconditionally
doing kVM_SET_SREGS makes the page faults go away.

Best case scenario is that KVM_SET_SREGS stuffs invalid guest state that KVM
doesn't correct detect.  That would be easy to debug and fix, and would give us
a regression test as well.

  reply	other threads:[~2021-06-17 14:42 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-12 22:49 guest/host mem out of sync on core2duo? stsp
2021-06-13 12:36 ` stsp
2021-06-14 17:06 ` Sean Christopherson
2021-06-14 17:32   ` stsp
2021-06-17 14:42     ` Sean Christopherson [this message]
2021-06-18 15:59       ` stsp
2021-06-18 21:07         ` Jim Mattson
2021-06-18 21:55           ` stsp
2021-06-18 22:06             ` Jim Mattson
2021-06-18 22:26               ` stsp
2021-06-18 22:32               ` Sean Christopherson
2021-06-19  0:11                 ` stsp
2021-06-19  0:54                   ` Sean Christopherson
2021-06-19  9:18                     ` stsp
2021-06-21  2:34           ` exception vs SIGALRM race (was: Re: guest/host mem out of sync on core2duo?) stsp
2021-06-21 22:33             ` Jim Mattson
2021-06-21 23:32               ` stsp
2021-06-22  0:27               ` stsp
2021-06-28 21:47                 ` Jim Mattson
2021-06-28 21:50                   ` stsp
2021-06-28 22:00                   ` stsp
2021-06-28 22:27                     ` Jim Mattson
2021-07-06 16:28                       ` Paolo Bonzini
2021-07-06 22:22                         ` stsp
2021-07-06 23:41                           ` Paolo Bonzini
2021-06-23 23:38               ` exception vs SIGALRM race (with test-case now!) stsp
2021-06-24  0:11                 ` stsp
2021-06-24  0:25                   ` stsp
2021-06-24 18:05                     ` exception vs SIGALRM race on core2 CPUs (with qemu-based test-case this time!) stsp
2021-06-24 18:07                     ` stsp
2021-06-25 23:35                       ` exception vs SIGALRM race on core2 CPUs (with fix!) stsp
2021-06-26  0:15                         ` Jim Mattson
2021-06-26  0:35                           ` stsp
2021-06-26 21:50                           ` stsp
2021-06-27 12:13                           ` stsp
2021-06-26 14:03               ` exception vs SIGALRM race (another patch) stsp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMtfQHGJL7XP/0Rq@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=stsp2@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.