Re: Building with PGO: concurrency and test data

Git Mailing List Archive mirror
 help / color / mirror / Atom feed

From: Jeff King <peff@peff.net>
To: intelfx@intelfx.name
Cc: Theodore Ts'o <tytso@mit.edu>, git@vger.kernel.org
Subject: Re: Building with PGO: concurrency and test data
Date: Tue, 23 Apr 2024 18:42:32 -0400	[thread overview]
Message-ID: <20240423224232.GE1172807@coredump.intra.peff.net> (raw)
In-Reply-To: <65f32df3f49341bf192b606914d44cc937f7971a.camel@intelfx.name>

On Sun, Apr 21, 2024 at 02:52:48AM +0200, intelfx@intelfx.name wrote:

> 1. The INSTALL doc says that the profiling pass has to run the test
> suite using a single CPU, and the Makefile `profile` target also
> encodes this rule:
> 
> > As a caveat: a profile-optimized build takes a *lot* longer since the
> > git tree must be built twice, and in order for the profiling
> > measurements to work properly, ccache must be disabled and the test
> > suite has to be run using only a single CPU. <...>
> ( https://github.com/git/git/blob/master/INSTALL#L54-L59 )
> [...]
> However, some cursory searching tells me that gcc is equipped to handle
> concurrent runs of an instrumented program:

That text was added quite a while ago, in f2d713fc3e (Fix build problems
related to profile-directed optimization, 2012-02-06). It may be that it
was a problem back then, but isn't anymore.

+cc the author of that commit; I don't know offhand how many people
use "make profile" (now or back then).

> 2. The performance test suite (t/perf/) uses up to two git repositories
> ("normal" and "large") as test data to run git commands against. Does
> the internal organization of these repositories matter? I.e., does it
> matter if those are "real-world-used" repositories with overlapping
> packs, cruft, loose objects, many refs etc., or can I simply use fresh
> clones of git.git and linux.git without loss of profile quality?

I'd be surprised if the choice of repository didn't have some impact.
After all, if there are no loose objects, then the routines that
interact with them are not going to get a lot of exercise. But how much
does it actually matter in practice? I think you'd have to do a bunch of
trial and error measurements to find out.

My gut is that "larger is better" to emphasize the hot loops, but even
that might not be true. The main reason we want "large" repos in some
perf scripts is that it makes it easier to measure the thing we are
speeding up versus the overhead of starting processes, etc. But PGO
might not be as sensitive to that, if it can get what it needs from a
smaller number of runs of the sensitive spots.

All of which is to say "no idea". I know that's not very satisfying, but
I don't recall anybody really discussing PGO much here in the last
decade, so I think you're largely on your own.

-Peff

     prev parent reply	other threads:[~2024-04-23 22:42 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-21  0:52 Building with PGO: concurrency and test data intelfx
2024-04-21 15:45 ` Mike Castle
2024-04-23 22:42 ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240423224232.GE1172807@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=intelfx@intelfx.name \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).