Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Chris Torek <chris.torek@gmail.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Junio C Hamano <gitster@pobox.com>,
	Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
Date: Mon, 3 Jul 2023 13:38:13 -0400	[thread overview]
Message-ID: <ZKMHhUVQ3EckJFIE@nand.local> (raw)
In-Reply-To: <20230703055627.GF3502534@coredump.intra.peff.net>

On Mon, Jul 03, 2023 at 01:56:27AM -0400, Jeff King wrote:
> On Tue, Jun 20, 2023 at 10:21:42AM -0400, Taylor Blau wrote:
>
> > Second, note that the jump list is best-effort, since we do not handle
> > loose references, and because of the meta-character issue above. The
> > jump list may not skip past all references which won't appear in the
> > results, but will never skip over a reference which does appear in the
> > result set.
>
> I wonder if we should be advertising this in a docstring comment above
> the relevant function. The problem may be that there are several such
> functions. I just think that it's a gotcha that may affect somebody who
> wants to call the function, and they're not going to think to dig up
> this commit message.

Good idea, thanks.

> >     $ hyperfine \
> >       'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
> >       'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
> >     Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
> >       Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
> >       Range (min … max):   800.0 ms … 807.7 ms    10 runs
> >
> >     Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
> >       Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
> >       Range (min … max):     4.3 ms …   6.7 ms    422 runs
> >
> >     Summary
> >       'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
> >       172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
>
> This measurement is cheating a little, I think, because the earlier
> patch to implement --exclude sped that up from ~800ms to ~100ms (because
> we avoid writing and all of the ref-filter malloc slowness for the
> excluded entries). So the better comparison is between two invocations
> with "--exclude", but before/after this patch. You should still see a
> 20x speedup (100ms down to 5).

I agree. I included a build from the previous commit in this benchmark.
As expected, it's around ~100ms, but at least it gives readers a clearer
picture of how performance changes as a result of this patch.
(

> > @@ -802,14 +826,34 @@ struct packed_ref_iterator {
> >   */
> >  static int next_record(struct packed_ref_iterator *iter)
> >  {
> > -	const char *p = iter->pos, *eol;
> > +	const char *p, *eol;
> >
> >  	strbuf_reset(&iter->refname_buf);
> >
> > +	/*
> > +	 * If iter->pos is contained within a skipped region, jump past
> > +	 * it.
> > +	 *
> > +	 * Note that each skipped region is considered at most once,
> > +	 * since they are ordered based on their starting position.
> > +	 */
> > +	while (iter->jump_cur < iter->jump_nr) {
> > +		struct jump_list_entry *curr = &iter->jump[iter->jump_cur];
> > +		if (iter->pos < curr->start)
> > +			break; /* not to the next jump yet */
> > +
> > +		iter->jump_cur++;
> > +		if (iter->pos < curr->end) {
> > +			iter->pos = curr->end;
> > +			break;
> > +		}
> > +	}
>
> It took me a minute to convince myself that this second "break" was
> right. If we get to it, we know that iter->pos (the current record we
> are looking at) is in the current jump region. So it makes sense to
> advance to curr->end. But might we hit another jump region immediately?
>
> I guess not, because earlier we would have coalesced the jump regions.
> So either there is a non-excluded entry there _or_ we would have
> coalesced the later region into a single larger region. So breaking is
> the right thing to do.

Exactly. I added a short comment to this effect to hopefully avoid any
confusion here.

> > +	for (pattern = excluded_patterns; *pattern; pattern++) {
> > +		struct jump_list_entry *e;
> > +
> > +		/*
> > +		 * We can't feed any excludes with globs in them to the
> > +		 * refs machinery.  It only understands prefix matching.
> > +		 * We likewise can't even feed the string leading up to
> > +		 * the first meta-character, as something like "foo[a]"
> > +		 * should not exclude "foobar" (but the prefix "foo"
> > +		 * would match that and mark it for exclusion).
> > +		 */
> > +		if (has_glob_special(*pattern))
> > +			continue;
>
> OK, and here's where we could split "foo[ac]" into "fooa" and "foob" if
> we wanted. But I think it is a very good idea to leave that out of this
> initial patch. :)

Oh, definitely ;-).

> > +	/*
> > +	 * As an optimization, merge adjacent entries in the jump list
> > +	 * to jump forwards as far as possible when entering a skipped
> > +	 * region.
> > +	 *
> > +	 * For example, if we have two skipped regions:
> > +	 *
> > +	 *	[[A, B], [B, C]]
> > +	 *
> > +	 * we want to combine that into a single entry jumping from A to
> > +	 * C.
> > +	 */
> > +	last_disjoint = iter->jump;
> > +
> > +	for (i = 1, j = 1; i < iter->jump_nr; i++) {
> > +		struct jump_list_entry *ours = &iter->jump[i];
> > +
> > +		if (ours->start == ours->end) {
> > +			/* ignore empty regions (no matching entries) */
> > +			continue;
>
> Dropping empty regions makes sense, but our iteration starts at "1"
> (because the rest of the checks are inherently looking at last_disjoint
> before deciding if each region is worth keeping). So we'd fail to throw
> away iter->jump[0] if it is empty, I think.
>
> That could be fixed here by iterating from 0 and checking for a NULL
> last_disjoint, but maybe it would be easier to avoid allocating at all
> in the earlier loop, when we find that start == end?

Yeah, I agree with this. I think Patrick made a similar suggestion in an
earlier response, and I decided not to take it since it makes the patch
more verbose.

But I think that avoiding the empty region special case is worth it.
Thanks, both :-).

Thanks,
Taylor

  reply	other threads:[~2023-07-03 17:38 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
2023-05-08 21:59 ` [PATCH 01/15] refs.c: rename `ref_filter` Taylor Blau
2023-05-08 21:59 ` [PATCH 02/15] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-05-08 21:59 ` [PATCH 03/15] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-05-08 21:59 ` [PATCH 04/15] ref-filter: add ref_filter_clear() Taylor Blau
2023-05-08 22:29   ` Junio C Hamano
2023-05-08 22:33     ` Taylor Blau
2023-05-09 15:14   ` Patrick Steinhardt
2023-05-09 19:11     ` Taylor Blau
2023-05-08 21:59 ` [PATCH 05/15] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-05-08 22:36   ` Junio C Hamano
2023-05-09 20:13     ` Taylor Blau
2023-05-08 21:59 ` [PATCH 06/15] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-05-08 23:22   ` Junio C Hamano
2023-05-09 20:22     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 07/15] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-05-09 15:14   ` Patrick Steinhardt
2023-05-09 20:23     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 08/15] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-05-08 23:56   ` Junio C Hamano
2023-05-09 20:29     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s) Taylor Blau
2023-05-09  0:10   ` Chris Torek
2023-05-09 20:39     ` Taylor Blau
2023-05-09 15:15   ` Patrick Steinhardt
2023-05-09 20:55     ` Taylor Blau
2023-05-09 21:15       ` Taylor Blau
2023-05-10  7:25       ` Patrick Steinhardt
2023-05-09 23:40   ` Junio C Hamano
2023-05-10  2:30     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 10/15] refs/packed-backend.c: add trace2 counters for skip list Taylor Blau
2023-05-08 22:00 ` [PATCH 11/15] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-05-08 22:00 ` [PATCH 12/15] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-05-08 22:00 ` [PATCH 13/15] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-05-08 22:00 ` [PATCH 14/15] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-05-09 15:15   ` Patrick Steinhardt
2023-05-09 21:34     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 15/15] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
2023-05-15 19:23   ` [PATCH v2 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-05-15 19:23   ` [PATCH v2 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-05-15 19:23   ` [PATCH v2 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-05-15 19:23   ` [PATCH v2 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-05-15 19:23   ` [PATCH v2 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-06-06  7:00     ` Patrick Steinhardt
2023-06-20 12:15       ` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-05-15 19:23   ` [PATCH v2 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-06-06  7:00     ` Patrick Steinhardt
2023-06-20 12:16       ` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-05-15 19:23   ` [PATCH v2 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-06-06  7:01     ` Patrick Steinhardt
2023-06-20 12:18       ` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-05-15 19:23   ` [PATCH v2 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-05-15 19:23   ` [PATCH v2 16/16] ls-refs.c: " Taylor Blau
2023-06-06  7:01   ` [PATCH v2 00/16] refs: implement jump lists for packed backend Patrick Steinhardt
2023-06-20 12:22     ` Taylor Blau
2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
2023-06-07 10:40   ` [PATCH v3 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-06-13 22:19     ` Junio C Hamano
2023-06-07 10:40   ` [PATCH v3 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-06-07 10:41   ` [PATCH v3 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-06-13 22:37     ` Junio C Hamano
2023-06-07 10:41   ` [PATCH v3 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-06-07 10:41   ` [PATCH v3 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-06-13 23:42     ` Junio C Hamano
2023-06-20 11:52       ` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-06-14  0:27     ` Junio C Hamano
2023-06-20 12:05       ` Taylor Blau
2023-06-20 18:49         ` Junio C Hamano
2023-06-07 10:41   ` [PATCH v3 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-06-14  0:32     ` Junio C Hamano
2023-06-20 12:08       ` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-06-14  0:40     ` Junio C Hamano
2023-06-07 10:41   ` [PATCH v3 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-06-07 10:42   ` [PATCH v3 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-06-07 10:42   ` [PATCH v3 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-06-07 10:42   ` [PATCH v3 16/16] ls-refs.c: " Taylor Blau
2023-06-12 21:05   ` [PATCH v3 00/16] refs: implement jump lists for packed backend Junio C Hamano
2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
2023-06-20 14:21   ` [PATCH v4 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-07-03  5:13     ` Jeff King
2023-06-20 14:21   ` [PATCH v4 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-07-03  5:15     ` Jeff King
2023-07-03 17:07       ` Taylor Blau
2023-06-20 14:21   ` [PATCH v4 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-07-03  5:16     ` Jeff King
2023-06-20 14:21   ` [PATCH v4 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-07-03  5:19     ` Jeff King
2023-07-03 17:13       ` Taylor Blau
2023-07-03 17:32         ` Jeff King
2023-06-20 14:21   ` [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-07-03  5:27     ` Jeff King
2023-07-03 17:18       ` Taylor Blau
2023-07-03 17:22         ` Taylor Blau
2023-07-03 17:33           ` Jeff King
2023-06-20 14:21   ` [PATCH v4 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-06-20 14:21   ` [PATCH v4 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-06-20 14:21   ` [PATCH v4 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-06-20 14:21   ` [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-07-03  5:56     ` Jeff King
2023-07-03 17:38       ` Taylor Blau [this message]
2023-06-20 14:21   ` [PATCH v4 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-06-20 14:21   ` [PATCH v4 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-07-03  5:59     ` Jeff King
2023-06-20 14:22   ` [PATCH v4 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-07-03  6:18     ` Jeff King
2023-07-04 18:22       ` Taylor Blau
2023-06-20 14:22   ` [PATCH v4 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-06-20 14:22   ` [PATCH v4 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-06-20 14:22   ` [PATCH v4 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-07-03  6:26     ` Jeff King
2023-07-04 18:43       ` Taylor Blau
2023-06-20 14:22   ` [PATCH v4 16/16] ls-refs.c: " Taylor Blau
2023-07-03  6:27     ` Jeff King
2023-07-03  6:29   ` [PATCH v4 00/16] refs: implement jump lists for packed backend Jeff King
2023-07-10 21:12 ` [PATCH v5 " Taylor Blau
2023-07-10 21:12   ` [PATCH v5 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-07-10 21:12   ` [PATCH v5 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-07-10 21:12   ` [PATCH v5 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-07-10 21:12   ` [PATCH v5 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-07-10 21:12   ` [PATCH v5 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-07-10 21:12   ` [PATCH v5 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-07-10 21:12   ` [PATCH v5 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 12/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-07-10 21:12   ` [PATCH v5 13/16] refs.h: implement `hidden_refs_to_excludes()` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-07-10 21:12   ` [PATCH v5 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-07-10 21:12   ` [PATCH v5 16/16] ls-refs.c: " Taylor Blau
2023-07-10 22:35   ` [PATCH v5 00/16] refs: implement jump lists for packed backend Junio C Hamano
2023-07-11  9:37     ` Patrick Steinhardt
2023-07-11 15:56       ` Junio C Hamano
2023-07-11 17:19         ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKMHhUVQ3EckJFIE@nand.local \
    --to=me@ttaylorr.com \
    --cc=chris.torek@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).