[MODERATED] Re: L1D-Fault KVM mitigation

Historical speck list archives
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: speck@linutronix.de
Subject: [MODERATED] Re: L1D-Fault KVM mitigation
Date: Tue, 24 Apr 2018 14:53:15 +0200	[thread overview]
Message-ID: <8cbc35b2-f75a-6357-014d-e20ff7284ac0@redhat.com> (raw)
In-Reply-To: <20180424093537.GC4064@hirez.programming.kicks-ass.net>

[-- Attachment #1: Type: text/plain, Size: 3787 bytes --]

On 24/04/2018 11:35, speck for Peter Zijlstra wrote:
> I know that I worked a little with Tim on this, and I know Google did
> their own thing (but have not seen patches from them -- is pjt on this
> list?). I've also heard Amazon was also working on things (are they
> here?). And I think RHT was also looking into something (mingo, bonzini
> -- are you guys reading?)

Yes, I am.  First of all: the cost of doing an L1D flush on every
vmentry is absolutely horrible on KVM microbenchmarks, but seems a
little better (around 6% worst case) on syscall microbenchmarks.
"Message-passing" workloads with vCPUs repeatedly going to sleep are the
worst.

First of all, hyperthreading in general doesn't exactly shine when
running many small virtual machines since the VMs are unlikely to share
any code or data and you'll be able to use half the normal amount of L1
cache.  Perhaps KSM shares the guest kernels and recovers some of the
icache (assuming that there are kernel-heavy benchmarks that _also_
benefit from hyperthreading), but it's more likely to give worse than
improved performance.

Hyperthreading may provide slightly better jitter when you run two
different guests on the siblings.  But with gang scheduling you wouldn't
do that, so that's not an issue.  As a result, in the overcommitted case
the main issue is having to explain to the customers that disabling
hyperthreading is not that bad.

Even in the non-overcommitted case, there is a possibility that host
IRQs or NMIs happen, which as Thomas pointed out can also pollute the cache.

The only case where hyperthreading may be salvaged is the case where 1)
all guest CPUs are pinned to a single physical CPU and memory is also
reserved because you use 1GB hugetlbfs 2) host IRQs are either using
VT-d posted interrupts or are pinned away from those physical CPUs that
run guests, 3) you are using nohz_full and other similar fine-tuned
configuration to ensure that the guest CPUs run smoothly.  This includes
NFV usecases, but big databases like SAP are also run like this sometimes.

In this case, you'd need to add synchronization around vmexit.  However,
because these workloads _will_ actually do vmexits, sometimes a lot of
them (e.g. unless you use nohz_full in the guest as well, you'll have
vmexits to program the LAPIC timer).  Either all of them will have to
suffer from the synchronization cost, or you have to arbitrarily decide
that some vmexits are "confined" and unlikely to pollute the cache; in
that case you skip the synchronization and the L1D flush.  For example
you could say "anything that does not do get_user_pages is confined".

Because you've done this arbitrary choice, the synchronization is total
security theater unless you know what you're doing: no two guests on the
same core, no interrupt handlers that can run during a vmexit and
pollute the L1 cache (if that happens, the other sibling would be able
to read that data), etc..

BUT: 1) I'm not saying hyperthreading is valuable in those cases, only
that it can be salvaged; 2) if you're paranoid you're more likely to
disable HT anyway.  So while I do plan to test what happens when we do
synchronization, it's all but certain that we're going to ship it.  And
even that would only be if it is acceptable upstream---I'm not going to
make it a special Red Hat only patch.

Ingo suggested, for ease of testing and also for ease of deployment, a
knob to easily online/offline all siblings but the first on each core.
There's still the chance that some userspace daemon is started before
hyperthreading is software-disabled that way, and is confused by the
number of CPUs suddenly halving, so it would have to be both on the
kernel command line and in debugfs.

Paolo

next prev parent reply	other threads:[~2018-04-24 12:53 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24  9:06 [MODERATED] L1D-Fault KVM mitigation Joerg Roedel
2018-04-24  9:35 ` [MODERATED] " Peter Zijlstra
2018-04-24  9:48   ` David Woodhouse
2018-04-24 11:04     ` Peter Zijlstra
2018-04-24 11:16       ` David Woodhouse
2018-04-24 15:10         ` Jon Masters
2018-05-23  9:45       ` David Woodhouse
2018-05-24  9:45         ` Peter Zijlstra
2018-05-24 14:14           ` Jon Masters
2018-05-24 15:04           ` Thomas Gleixner
2018-05-24 15:33             ` Thomas Gleixner
2018-05-24 15:38               ` [MODERATED] " Jiri Kosina
2018-05-24 17:22                 ` Dave Hansen
2018-05-24 17:30                   ` Linus Torvalds
2018-05-24 23:18               ` [MODERATED] Encrypted Message Tim Chen
2018-05-24 23:28                 ` [MODERATED] Re: L1D-Fault KVM mitigation Linus Torvalds
2018-05-25  8:31                   ` Thomas Gleixner
2018-05-28 14:43                     ` [MODERATED] " Paolo Bonzini
2018-05-25 18:22                 ` [MODERATED] Encrypted Message Tim Chen
2018-05-26 19:14                 ` L1D-Fault KVM mitigation Thomas Gleixner
2018-05-26 20:43                   ` [MODERATED] " Andi Kleen
2018-05-26 20:48                     ` Linus Torvalds
2018-05-27 18:25                       ` Andi Kleen
2018-05-27 18:49                         ` Linus Torvalds
2018-05-27 18:57                           ` Thomas Gleixner
2018-05-27 19:13                           ` [MODERATED] " Andrew Cooper
2018-05-27 19:26                             ` Linus Torvalds
2018-05-27 19:41                               ` Thomas Gleixner
2018-05-27 22:26                                 ` [MODERATED] " Andrew Cooper
2018-05-28  6:47                                   ` Thomas Gleixner
2018-05-28 12:26                                     ` [MODERATED] " Andrew Cooper
2018-05-28 14:40                           ` Paolo Bonzini
2018-05-28 15:56                             ` Thomas Gleixner
2018-05-28 17:15                               ` [MODERATED] " Paolo Bonzini
2018-05-27 15:42                     ` Thomas Gleixner
2018-05-27 16:26                       ` [MODERATED] " Linus Torvalds
2018-05-27 18:31                       ` Andi Kleen
2018-05-29 19:29                   ` [MODERATED] Encrypted Message Tim Chen
2018-05-29 21:14                     ` L1D-Fault KVM mitigation Thomas Gleixner
2018-05-30 16:38                       ` [MODERATED] Encrypted Message Tim Chen
2018-05-24 15:44             ` [MODERATED] Re: L1D-Fault KVM mitigation Andi Kleen
2018-05-24 15:38           ` Linus Torvalds
2018-05-24 15:59             ` David Woodhouse
2018-05-24 16:35               ` Linus Torvalds
2018-05-24 16:51                 ` David Woodhouse
2018-05-24 16:57                   ` Linus Torvalds
2018-05-25 11:29                     ` David Woodhouse
2018-04-24 10:30   ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
2018-04-24 11:09     ` Thomas Gleixner
2018-04-24 16:06       ` [MODERATED] " Andi Kleen
2018-04-24 12:53   ` Paolo Bonzini [this message]
2018-05-03 16:20     ` Konrad Rzeszutek Wilk
2018-05-07 17:11       ` Paolo Bonzini
2018-05-16  8:51         ` Jiri Kosina
2018-05-16  8:53           ` Paolo Bonzini
2018-05-21 10:06             ` David Woodhouse
2018-05-21 13:40               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8cbc35b2-f75a-6357-014d-e20ff7284ac0@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=speck@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).