Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Chris Torek <chris.torek@gmail.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Jeff King <peff@peff.net>, Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
Date: Tue, 13 Jun 2023 17:27:05 -0700	[thread overview]
Message-ID: <xmqq352u3n3a.fsf@gitster.g> (raw)
In-Reply-To: <8066858bf59386eeec68f0f1e4247eeebb6482f7.1686134440.git.me@ttaylorr.com> (Taylor Blau's message of "Wed, 7 Jun 2023 06:41:33 -0400")

Taylor Blau <me@ttaylorr.com> writes:

> When iterating through the `packed-refs` file in order to answer a query
> like:
>
>     $ git for-each-ref --exclude=refs/__hidden__
>
> it would be useful to avoid walking over all of the entries in
> `refs/__hidden__/*` when possible, since we know that the ref-filter
> code is going to throw them away anyways.
>
> In certain circumstances, doing so is possible. The algorithm for doing
> so is as follows:
>
>   - For each excluded pattern, find the first record that matches it,
>     and the first record that *doesn't* match it (i.e. the location
>     you'd next want to consider when excluding that pattern).
>
>   - Sort the set of excluded regions from the previous step in ascending
>     order of the first location within the `packed-refs` file that
>     matches.
>
>   - Clean up the results from the previous step: discard empty regions,
>     and combine adjacent regions.

Say something like

   The resulting list of regions that would never contain refs that
   are not excluded is called the "jump list".

here, probably.  Otherwise the first reference to "jump list" we see
later feels a bit too abrupt.

> Then when iterating through the `packed-refs` file, if `iter->pos` is
> ever contained in one of the regions from the previous steps, advance
> `iter->pos` past the end of that region, and continue enumeration.
>
> Note that we only perform this optimization when none of the excluded
> pattern(s) have special meta-characters in them. For a pattern like
> "refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and
> everything underneath them) are not connected. A future implementation
> that handles this case may split the character class (pretending as if
> two patterns were excluded: "refs/fooa", and "refs/fooc").

Makes sense.

> There are a few other gotchas worth considering. First, note that the
> jump list is sorted, so once we jump past a region, we can avoid
> considering it (or any regions preceding it) again. The member
> `jump_pos` is used to track the first next-possible region to jump
> through.
>
> Second, note that the exclusion list is best-effort, since we do not
> handle loose references, and because of the meta-character issue above.

I found this a bit misleading; a natural reading of "exclusion list"
is that the phrase refers to the list of exclude patterns given from
the command line, and users would be upset if the processing of it
is best effort.

I think what you meant to say was that optimization to avoid full
scan using the jump list does not aim for perfection, and entries
that are not skipped using the jump list may still be excluded by
the exclude patterns.

> In repositories with a large number of hidden references, the speed-up

"hidden" -> "excluded".  Your final objective to implement the
feature to exclude refs using patterns and optimize it using the
jump list data may be to implement "hidden references", but that
hasn't be talked about in the above.  All we have been hearing was
that we are optimizing the walk over packed-refs using exclude
patterns.

> can be significant. Tests here are done with a copy of linux.git with a
> reference "refs/pull/N" pointing at every commit, as in:
> ...
> Co-authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  ref-filter.c              |   5 +-
>  refs/packed-backend.c     | 166 ++++++++++++++++++++++++++++++++++++--
>  t/helper/test-ref-store.c |  10 +++
>  t/t1419-exclude-refs.sh   | 101 +++++++++++++++++++++++
>  4 files changed, 274 insertions(+), 8 deletions(-)
>  create mode 100755 t/t1419-exclude-refs.sh

Nice.

> @@ -785,6 +802,13 @@ struct packed_ref_iterator {
>  	/* The end of the part of the buffer that will be iterated over: */
>  	const char *eof;
>  
> +	struct jump_list_entry {
> +		const char *start;
> +		const char *end;
> +	} *jump;
> +	size_t jump_nr, jump_alloc;
> +	size_t jump_pos;
> +
>  	/* Scratch space for current values: */
>  	struct object_id oid, peeled;
>  	struct strbuf refname_buf;
> @@ -802,14 +826,34 @@ struct packed_ref_iterator {
>   */
>  static int next_record(struct packed_ref_iterator *iter)
>  {
> -	const char *p = iter->pos, *eol;
> +	const char *p, *eol;
>  
>  	strbuf_reset(&iter->refname_buf);
>  
> +	/*
> +	 * If iter->pos is contained within a skipped region, jump past
> +	 * it.
> +	 *
> +	 * Note that each skipped region is considered at most once,
> +	 * since they are ordered based on their starting position.
> +	 */
> +	while (iter->jump_pos < iter->jump_nr) {
> +		struct jump_list_entry *curr = &iter->jump[iter->jump_pos];
> +		if (iter->pos < curr->start)
> +			break; /* not to the next jump yet */
> +
> +		iter->jump_pos++;
> +		if (iter->pos < curr->end) {
> +			iter->pos = curr->end;
> +			break;
> +		}
> +	}

Quite straight-forward.

> +static const char *ptr_max(const char *x, const char *y)
> +{
> +	if (x > y)
> +		return x;
> +	return y;
> +}

Hopefully the compiler would inline the function without being told.

These pointers point into the same mmapped region of contiguous
memory that holds the contents of the packed-refs file, so
comparison between them is always defined.  Good.

I wondered if

	return (x > y) ? x : y;

is easier to read, simply because it treats both cases more equally
(in other words, as written, (x>y) appears more "special"), but that
is minor.

> +static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
> +					struct snapshot *snapshot,
> +					const char **excluded_patterns)
> +{
> +	size_t i, j;
> +	const char **pattern;
> +	struct jump_list_entry *last_disjoint;
> +
> +	if (!excluded_patterns)
> +		return;
> +
> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		struct jump_list_entry *e;
> +
> +		/*
> +		 * We can't feed any excludes with globs in them to the
> +		 * refs machinery.  It only understands prefix matching.
> +		 * We likewise can't even feed the string leading up to
> +		 * the first meta-character, as something like "foo[a]"
> +		 * should not exclude "foobar" (but the prefix "foo"
> +		 * would match that and mark it for exclusion).
> +		 */
> +		if (has_glob_special(*pattern))
> +			continue;

OK.  When we have a "literal" exclude pattern and another "glob"
exclude pattern, we can afford to ignore the "glob" one when
building the jump list and include only the "literal" one, because
the jump list is used only to skip over entries that obviously can
never be in the result, *and* --exclude are additive (i.e. being
on jump list because of the "literal" pattern is a reason enough to
be excluded from the result; matching or not matching the other
patterns does not affect the fate of the ref that got excluded due
to matching the "literal" pattern).

Makes sense.

> +		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
> +
> +		e = &iter->jump[iter->jump_nr++];
> +		e->start = find_reference_location(snapshot, *pattern, 0);
> +		e->end = find_reference_location_end(snapshot, *pattern, 0);
> +	}
> +
> +	if (!iter->jump_nr) {
> +		/*
> +		 * Every entry in exclude_patterns has a meta-character,
> +		 * nothing to do here.
> +		 */
> +		return;
> +	}

OK.

Thanks.

  reply	other threads:[~2023-06-14  0:27 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
2023-05-08 21:59 ` [PATCH 01/15] refs.c: rename `ref_filter` Taylor Blau
2023-05-08 21:59 ` [PATCH 02/15] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-05-08 21:59 ` [PATCH 03/15] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-05-08 21:59 ` [PATCH 04/15] ref-filter: add ref_filter_clear() Taylor Blau
2023-05-08 22:29   ` Junio C Hamano
2023-05-08 22:33     ` Taylor Blau
2023-05-09 15:14   ` Patrick Steinhardt
2023-05-09 19:11     ` Taylor Blau
2023-05-08 21:59 ` [PATCH 05/15] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-05-08 22:36   ` Junio C Hamano
2023-05-09 20:13     ` Taylor Blau
2023-05-08 21:59 ` [PATCH 06/15] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-05-08 23:22   ` Junio C Hamano
2023-05-09 20:22     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 07/15] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-05-09 15:14   ` Patrick Steinhardt
2023-05-09 20:23     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 08/15] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-05-08 23:56   ` Junio C Hamano
2023-05-09 20:29     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s) Taylor Blau
2023-05-09  0:10   ` Chris Torek
2023-05-09 20:39     ` Taylor Blau
2023-05-09 15:15   ` Patrick Steinhardt
2023-05-09 20:55     ` Taylor Blau
2023-05-09 21:15       ` Taylor Blau
2023-05-10  7:25       ` Patrick Steinhardt
2023-05-09 23:40   ` Junio C Hamano
2023-05-10  2:30     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 10/15] refs/packed-backend.c: add trace2 counters for skip list Taylor Blau
2023-05-08 22:00 ` [PATCH 11/15] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-05-08 22:00 ` [PATCH 12/15] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-05-08 22:00 ` [PATCH 13/15] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-05-08 22:00 ` [PATCH 14/15] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-05-09 15:15   ` Patrick Steinhardt
2023-05-09 21:34     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 15/15] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
2023-05-15 19:23   ` [PATCH v2 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-05-15 19:23   ` [PATCH v2 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-05-15 19:23   ` [PATCH v2 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-05-15 19:23   ` [PATCH v2 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-05-15 19:23   ` [PATCH v2 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-06-06  7:00     ` Patrick Steinhardt
2023-06-20 12:15       ` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-05-15 19:23   ` [PATCH v2 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-06-06  7:00     ` Patrick Steinhardt
2023-06-20 12:16       ` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-05-15 19:23   ` [PATCH v2 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-06-06  7:01     ` Patrick Steinhardt
2023-06-20 12:18       ` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-05-15 19:23   ` [PATCH v2 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-05-15 19:23   ` [PATCH v2 16/16] ls-refs.c: " Taylor Blau
2023-06-06  7:01   ` [PATCH v2 00/16] refs: implement jump lists for packed backend Patrick Steinhardt
2023-06-20 12:22     ` Taylor Blau
2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
2023-06-07 10:40   ` [PATCH v3 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-06-13 22:19     ` Junio C Hamano
2023-06-07 10:40   ` [PATCH v3 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-06-07 10:41   ` [PATCH v3 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-06-13 22:37     ` Junio C Hamano
2023-06-07 10:41   ` [PATCH v3 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-06-07 10:41   ` [PATCH v3 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-06-13 23:42     ` Junio C Hamano
2023-06-20 11:52       ` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-06-14  0:27     ` Junio C Hamano [this message]
2023-06-20 12:05       ` Taylor Blau
2023-06-20 18:49         ` Junio C Hamano
2023-06-07 10:41   ` [PATCH v3 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-06-14  0:32     ` Junio C Hamano
2023-06-20 12:08       ` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-06-14  0:40     ` Junio C Hamano
2023-06-07 10:41   ` [PATCH v3 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-06-07 10:42   ` [PATCH v3 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-06-07 10:42   ` [PATCH v3 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-06-07 10:42   ` [PATCH v3 16/16] ls-refs.c: " Taylor Blau
2023-06-12 21:05   ` [PATCH v3 00/16] refs: implement jump lists for packed backend Junio C Hamano
2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
2023-06-20 14:21   ` [PATCH v4 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-07-03  5:13     ` Jeff King
2023-06-20 14:21   ` [PATCH v4 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-07-03  5:15     ` Jeff King
2023-07-03 17:07       ` Taylor Blau
2023-06-20 14:21   ` [PATCH v4 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-07-03  5:16     ` Jeff King
2023-06-20 14:21   ` [PATCH v4 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-07-03  5:19     ` Jeff King
2023-07-03 17:13       ` Taylor Blau
2023-07-03 17:32         ` Jeff King
2023-06-20 14:21   ` [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-07-03  5:27     ` Jeff King
2023-07-03 17:18       ` Taylor Blau
2023-07-03 17:22         ` Taylor Blau
2023-07-03 17:33           ` Jeff King
2023-06-20 14:21   ` [PATCH v4 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-06-20 14:21   ` [PATCH v4 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-06-20 14:21   ` [PATCH v4 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-06-20 14:21   ` [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-07-03  5:56     ` Jeff King
2023-07-03 17:38       ` Taylor Blau
2023-06-20 14:21   ` [PATCH v4 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-06-20 14:21   ` [PATCH v4 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-07-03  5:59     ` Jeff King
2023-06-20 14:22   ` [PATCH v4 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-07-03  6:18     ` Jeff King
2023-07-04 18:22       ` Taylor Blau
2023-06-20 14:22   ` [PATCH v4 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-06-20 14:22   ` [PATCH v4 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-06-20 14:22   ` [PATCH v4 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-07-03  6:26     ` Jeff King
2023-07-04 18:43       ` Taylor Blau
2023-06-20 14:22   ` [PATCH v4 16/16] ls-refs.c: " Taylor Blau
2023-07-03  6:27     ` Jeff King
2023-07-03  6:29   ` [PATCH v4 00/16] refs: implement jump lists for packed backend Jeff King
2023-07-10 21:12 ` [PATCH v5 " Taylor Blau
2023-07-10 21:12   ` [PATCH v5 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-07-10 21:12   ` [PATCH v5 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-07-10 21:12   ` [PATCH v5 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-07-10 21:12   ` [PATCH v5 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-07-10 21:12   ` [PATCH v5 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-07-10 21:12   ` [PATCH v5 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-07-10 21:12   ` [PATCH v5 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 12/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-07-10 21:12   ` [PATCH v5 13/16] refs.h: implement `hidden_refs_to_excludes()` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-07-10 21:12   ` [PATCH v5 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-07-10 21:12   ` [PATCH v5 16/16] ls-refs.c: " Taylor Blau
2023-07-10 22:35   ` [PATCH v5 00/16] refs: implement jump lists for packed backend Junio C Hamano
2023-07-11  9:37     ` Patrick Steinhardt
2023-07-11 15:56       ` Junio C Hamano
2023-07-11 17:19         ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq352u3n3a.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=chris.torek@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).