Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Anh Le via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Timothy Jones <timothy@canva.com>,
	Jeff Hostetler <jeffhost@microsoft.com>, Anh Le <anh@canva.com>
Subject: Re: [PATCH] index: add trace2 region for clear skip worktree
Date: Wed, 26 Oct 2022 09:01:12 -0700	[thread overview]
Message-ID: <xmqq8rl2lgl3.fsf@gitster.g> (raw)
In-Reply-To: <d4103788-5153-11f2-487f-5cc795d261a8@jeffhostetler.com> (Jeff Hostetler's message of "Wed, 26 Oct 2022 10:13:18 -0400")

Jeff Hostetler <git@jeffhostetler.com> writes:

> In the worst case, we walk the entire index and lstat() for a
> significant number of skipped-and-not-present files, then near
> the end of the loop, we find a skipped-but-present directory
> and have to restart the loop.  The second pass will still run
> the full loop again.  Will the second pass actually see any
> skipped cache-entries?  Will it re-lstat() them?  Could the
> `goto restart` just be a `break` or `return`?
>
> I haven't had time to look under the hood here, but I was
> hoping that these two counters would help the series author
> collect telemetry over many runs and gain more insight into
> the perf problem.

Without being able to answer these questions, would we be able to
interpret the numbers reported from these counters?

> Continuing the example from above, if we've already paid the
> costs to lstat() the 95% sparse files AND THEN near the bottom
> of the loop we have to do a restart, then we should expect
> this loop to be doubly slow.  It was my hope that this combination
> of counters would help us understand the variations in perf.

Yes, I understand that double-counting may give some clue to detect
that, but it just looked roundabout way to do that.  Perhaps
counting the path inspected during the first iteration and the path
inspected during the second iteration, separately, without the "how
many times did we repeat?", would give you a better picture?  "After
inspecting 2400 paths, we need to go back and then ended up scanning
3000 paths in the flattened index in the second round" would be
easier to interpret than "We needed flattening, and scanned 5400
paths in total in the two iterations".

> WRT the `intmax_t` vs just `int`: either is fine.

I thought "int" was supposed to be natural machine word, while
incrementing "intmax_t" is allowed to be much slower than "int".
Do we expect more than 2 billion paths?

  reply	other threads:[~2022-10-26 16:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26  0:05 [PATCH] index: add trace2 region for clear skip worktree Anh Le via GitGitGadget
2022-10-26  3:16 ` Junio C Hamano
2022-10-26 14:13   ` Jeff Hostetler
2022-10-26 16:01     ` Junio C Hamano [this message]
2022-10-26 18:29       ` Jeff Hostetler
2022-10-27  0:04         ` Anh Le
2022-10-28  0:46 ` [PATCH v2] " Anh Le via GitGitGadget
2022-10-28 15:49   ` Derrick Stolee
2022-10-28 17:17     ` Junio C Hamano
2022-10-30 23:28       ` Anh Le
2022-10-28 16:50   ` Jeff Hostetler
2022-10-31  0:56   ` [PATCH v3] " Anh Le via GitGitGadget
2022-10-31 22:34     ` Taylor Blau
2022-11-03 23:04     ` [PATCH v4 0/2] " Anh Le via GitGitGadget
2022-11-03 23:05       ` [PATCH v4 1/2] " Anh Le via GitGitGadget
2022-11-03 23:05       ` [PATCH v4 2/2] index: raise a bug if the index is materialised more than once Anh Le via GitGitGadget
2022-11-05  0:29       ` [PATCH v4 0/2] index: add trace2 region for clear skip worktree Taylor Blau
2022-11-07 20:50         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8rl2lgl3.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=anh@canva.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jeffhost@microsoft.com \
    --cc=timothy@canva.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).