qemu-riscv.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Christoph Müllner" <christoph.muellner@vrull.eu>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: Daniel Henrique Barboza <dbarboza@ventanamicro.com>,
	Huang Tao <eric.huang@linux.alibaba.com>,
	 qemu-devel@nongnu.org, qemu-riscv@nongnu.org,
	zhiwei_liu@linux.alibaba.com,  liwei1518@gmail.com,
	bin.meng@windriver.com, alistair.francis@wdc.com,
	 palmer@dabbelt.com, Philipp Tomsich <philipp.tomsich@vrull.eu>
Subject: Re: [PATCH] target/riscv: Implement dynamic establishment of custom decoder
Date: Fri, 8 Mar 2024 12:56:43 +0100	[thread overview]
Message-ID: <CAEg0e7hS+pntt75jmEREJaUoGfT6fpqS4M9E-aJ2LTXtiM78Pw@mail.gmail.com> (raw)
In-Reply-To: <dc9f673a-3f5a-4703-b171-b8599bdec22e@linaro.org>

On Thu, Mar 7, 2024 at 9:35 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> >>> -        for (size_t i = 0; i < ARRAY_SIZE(decoders); ++i) {
> >>> -            if (decoders[i].guard_func(ctx->cfg_ptr) &&
> >>> -                decoders[i].decode_func(ctx, opcode32)) {
> >>> +        for (size_t i = 0; i < decoder_table_size; ++i) {
> >>> +            if (ctx->decoder[i](ctx, opcode32)) {
> >>>                   return;
>
> By the way, you're adding layers of pointer chasing, so I suspect you'll find all of this
> is a wash or worse, performance-wise.
>
>
> Indeed, since some of the predicates are trivial, going the other way might help: allow
> everything to be inlined:
>
>      if (decode_insn32(...)) {
>          return;
>      }
>      if (has_xthead_p(...) && decode_xthead(...)) {
>          return;
>      }
>      ...
>
>
> Even if there are 10 entries here, so what?  All of the code has to be compiled into QEMU.
>   You're not going to somehow add truly dynamic code that you've loaded from a module.

I just tested this with GCC -O2/-O3. The generated code from the
existing decoder loop will
result in exactly what you have listed here (loop unrolling,
transforming the indirect calls
to direct calls, inlining, and evaluation of statically known conditions).
has_xthead_p() can get inlined as well, if the inlining costs are
considered low enough
(thank you, Richard, for giving some hints about that below).

What the commit message is not mentioning (and what this patch was
actually addressing and
therefore should have been mentioned):
Having dynamic control of the decoder order was meant to allow adding
a vendor_decoder
before decode_insn32() with minimal overhead (no guard_func) for users
that don't need
this particular vendor_decoder.

Being more explicit: there is interest in supporting the
non-conforming (conflicting) instruction
extension XTheadVector:
  https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadvector.adoc
XTheadVector uses the RVV 0.7.1 draft encoding, which conflicts with
the ratified RVV spec.
The idea is to avoid these conflicts with a call to
decode_xtheadvector() before decode_insn32().
This implies that everyone has to call has_xtheadvector_p() before
calling decode_insn32().
And the intent of this patch is to provide a mechanism to reduce this overhead.

When suggesting the dynamic decoder list, I was not aware that
always_true_p() will be
eliminated by the compiler. Since this is the case, I agree with the
"wash or worse" for
decode_insn32(). The elimination of following guard functions for
vendor decoders is likely
less performance relevant.

I don't think we should discuss efficiency any further unless we have
some data to
justify any changes. E.g. emulating a RISC-V SPEC CPU 2017 run on x86_64 and
looking at the runtimes could give the relevant insights for the
following scenarios:
* existing code on upstream/master
* existing code + adding a new extension that comes before
decode_insn32 in decoders[]
* existing code + this patch (dynamic decoders)
Related overheads that could be measured: adding 20 new instructions
to decode_insn32(),
which are not executed (to put the costs into perspective).

> You could perhaps streamline predicates such as has_xthead_p to not test 11 variables by
> adding an artificial "ext_xthead_any" configuration entry that is the sum of all of those.
>
>
> r~


  parent reply	other threads:[~2024-03-08 11:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-06  9:33 [PATCH] target/riscv: Implement dynamic establishment of custom decoder Huang Tao
2024-03-07 19:55 ` Daniel Henrique Barboza
2024-03-07 20:11   ` Richard Henderson
2024-03-07 20:35     ` Richard Henderson
2024-03-08  9:41       ` Huang Tao
2024-03-08 11:56       ` Christoph Müllner [this message]
2024-03-07 21:43     ` Daniel Henrique Barboza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEg0e7hS+pntt75jmEREJaUoGfT6fpqS4M9E-aJ2LTXtiM78Pw@mail.gmail.com \
    --to=christoph.muellner@vrull.eu \
    --cc=alistair.francis@wdc.com \
    --cc=bin.meng@windriver.com \
    --cc=dbarboza@ventanamicro.com \
    --cc=eric.huang@linux.alibaba.com \
    --cc=liwei1518@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=philipp.tomsich@vrull.eu \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-riscv@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=zhiwei_liu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).