Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
Date: Tue, 09 May 2023 16:40:50 -0700	[thread overview]
Message-ID: <xmqqh6slt6nx.fsf@gitster.g> (raw)
In-Reply-To: <a39d1107c1f3e9fbd859a23aed72e63bbd394fa2.1683581621.git.me@ttaylorr.com> (Taylor Blau's message of "Mon, 8 May 2023 18:00:08 -0400")

Taylor Blau <me@ttaylorr.com> writes:

> When iterating through the `packed-refs` file in order to answer a query
> like:
>
>     $ git for-each-ref --exclude=refs/__hidden__
>
> it would be useful to avoid walking over all of the entries in
> `refs/__hidden__/*` when possible, since we know that the ref-filter
> code is going to throw them away anyways.
>
> In certain circumstances, doing so is possible. The algorithm for doing
> so is as follows:
>
>   - For each excluded pattern, find the first record that matches it,
>     and the first pattern that *doesn't* match it (i.e. the location
>     you'd next want to consider when excluding that pattern).

Do we find "record" and then "pattern", or is the latter a misspelt
"record"?  I will assume it is the latter while reading the rest.
Is the latter "the record that does not match the pattern, whose
record number is the smallest but yet larger than the first record
that matches the pattern?"  That is we assume that the refs in the
packed refs file are sorted and can be partitioned into three by
each pattern: (1) refs before the first matching record---they do
not match the pattern. (2) refs at or after the first matching
record that match.  (3) refs after (2) that do not match.

I am not sure if that would work.  What if the pattern were refs/tags/[ad-z]*
and there are packed tags refs/tags/{a,b,c,d,e}?  The pattern would partition
the refs into [a](matches), [b,c](does not match), [d,e](matches).

Perhaps I am grossly misunderstanding what the above explanation
says.

>   - Sort the patterns by their starting location within the
>     `packed-refs` file.

So the idea is that the patterns are sorted by the first record they
match, and after sorting these patterns, refs that are between the
beginning of the list of refs and the first record associated with
the first pattern will not match _any_ pattern in that list?

>   - Construct a skip list of regions by combining adjacent and
>     overlapping regions from the previous step.

"skip list of regions" -> "list of regions to skip", I guess.

>   - When iterating through the `packed-refs` file, if `iter->pos` is
>     ever contained in one of the regions from the previous steps,
>     advance `iter->pos` past the end of that region, and continue
>     enumeration.
>
> Note that this optimization is only possible when none of the excluded
> pattern(s) have special meta-characters in them.

Ah, so this is called "patterns" but it deals with literal patterns
without globs?  Then I'll stop worrying about the refs/tags/[ad-z]
counter-example.

> To see why this is the
> case, consider the exclusion pattern "refs/foo[a]". In general, in order
> to find the location of the first record that matches this pattern, we
> could only consider up to the first meta-character, "refs/foo". But this
> doesn't work, since the excluded region we'd come up with would include
> "refs/foobar", even though it is not excluded.

OK.

> Using the skip list is fairly straightforward (see the changes to
> `refs/packed-backend.c::next_record()`), but constructing the list is
> not. To ensure that the construction is correct, add a new suite of
> tests in t1419 covering various corner cases (overlapping regions,
> partially overlapping regions, adjacent regions, etc.).

Sounds good.  Does this actually use the skip list data structure,
or do they happen to share the same two words in their names, but
otherwise have nothing common with each other?  If the latter, we
may want to revise the explanation, data type names, and variable
names, to avoid confusion, as Chris pointed out earlier.

> +static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
> +					struct snapshot *snapshot,
> +					const char **excluded_patterns)
> +{
> +	size_t i, j;
> +	const char **pattern;
> +
> +	if (!excluded_patterns)
> +		return;
> +
> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		struct skip_list_entry *e;
> +
> +		/*
> +		 * We can't feed any excludes with globs in them to the
> +		 * refs machinery.  It only understands prefix matching.
> +		 * We likewise can't even feed the string leading up to
> +		 * the first meta-character, as something like "foo[a]"
> +		 * should not exclude "foobar" (but the prefix "foo"
> +		 * would match that and mark it for exclusion).
> +		 */
> +		if (has_glob_special(*pattern))
> +			continue;

Hmph.  I would have expected that a set of patterns with any one
pattern with glob would invalidate the whole skip optimization, but
it is nice if we can salvage such a set and still optimize, if only
for literal patterns.  Interesting.

Ah, that is because we are dealing with ranges that cannot possibly
match.  With the mention of "first record that matches" etc. in the
earlier descriptoin, I somehow misled myself that we are dealing
with ranges that have interesting records.  So, a pattern with glob
does not contribute any range to be skipped, but that is OK.

> +		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
> +
> +		e = &iter->skip[iter->skip_nr++];
> +		e->start = find_reference_location(snapshot, *pattern, 0);
> +		e->end = find_reference_location_end(snapshot, *pattern, 0);

So, iter->skip[] array has one range per pattern at most, but some
patterns may not contribute any range to the list.

> +	}
> +
> +	if (!iter->skip_nr) {
> +		/*
> +		 * Every entry in exclude_patterns has a meta-character,
> +		 * nothing to do here.
> +		 */
> +		return;
> +	}
> +
> +	QSORT(iter->skip, iter->skip_nr, skip_list_entry_cmp);
> +
> +	/*
> +	 * As an optimization, merge adjacent entries in the skip list
> +	 * to jump forwards as far as possible when entering a skipped
> +	 * region.
> +	 *
> +	 * For example, if we have two skipped regions:
> +	 *
> +	 *	[[A, B], [B, C]]

I am confused.  The first pattern may never match records in [A..B]
range, and the second pattern may never match records in [B..C]
range, but what does it mean to combine these two ranges?

> +	 * we want to combine that into a single entry jumping from A to
> +	 * C.
> +	 */
> +	for (i = 1, j = 1; i < iter->skip_nr; i++) {
> +		struct skip_list_entry *ours = &iter->skip[i];
> +		struct skip_list_entry *prev = &iter->skip[i - 1];


> +		if (ours->start == ours->end) {
> +			/* ignore empty regions (no matching entries) */
> +			continue;
> +		} else if (prev->end >= ours->start) {
> +			/* overlapping regions extend the previous one */
> +			prev->end = ptr_max(prev->end, ours->end);
> +		} else {
> +			/* otherwise, insert a new region */
> +			iter->skip[j++] = *ours;
> +		}

None of the {braces} seem needed, but OK.  

> +	}
> +
> +	iter->skip_nr = j;
> +	iter->skip_pos = 0;
> +}

  parent reply	other threads:[~2023-05-09 23:40 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
2023-05-08 21:59 ` [PATCH 01/15] refs.c: rename `ref_filter` Taylor Blau
2023-05-08 21:59 ` [PATCH 02/15] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-05-08 21:59 ` [PATCH 03/15] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-05-08 21:59 ` [PATCH 04/15] ref-filter: add ref_filter_clear() Taylor Blau
2023-05-08 22:29   ` Junio C Hamano
2023-05-08 22:33     ` Taylor Blau
2023-05-09 15:14   ` Patrick Steinhardt
2023-05-09 19:11     ` Taylor Blau
2023-05-08 21:59 ` [PATCH 05/15] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-05-08 22:36   ` Junio C Hamano
2023-05-09 20:13     ` Taylor Blau
2023-05-08 21:59 ` [PATCH 06/15] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-05-08 23:22   ` Junio C Hamano
2023-05-09 20:22     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 07/15] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-05-09 15:14   ` Patrick Steinhardt
2023-05-09 20:23     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 08/15] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-05-08 23:56   ` Junio C Hamano
2023-05-09 20:29     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s) Taylor Blau
2023-05-09  0:10   ` Chris Torek
2023-05-09 20:39     ` Taylor Blau
2023-05-09 15:15   ` Patrick Steinhardt
2023-05-09 20:55     ` Taylor Blau
2023-05-09 21:15       ` Taylor Blau
2023-05-10  7:25       ` Patrick Steinhardt
2023-05-09 23:40   ` Junio C Hamano [this message]
2023-05-10  2:30     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 10/15] refs/packed-backend.c: add trace2 counters for skip list Taylor Blau
2023-05-08 22:00 ` [PATCH 11/15] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-05-08 22:00 ` [PATCH 12/15] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-05-08 22:00 ` [PATCH 13/15] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-05-08 22:00 ` [PATCH 14/15] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-05-09 15:15   ` Patrick Steinhardt
2023-05-09 21:34     ` Taylor Blau
2023-05-08 22:00 ` [PATCH 15/15] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
2023-05-15 19:23   ` [PATCH v2 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-05-15 19:23   ` [PATCH v2 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-05-15 19:23   ` [PATCH v2 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-05-15 19:23   ` [PATCH v2 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-05-15 19:23   ` [PATCH v2 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-06-06  7:00     ` Patrick Steinhardt
2023-06-20 12:15       ` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-05-15 19:23   ` [PATCH v2 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-06-06  7:00     ` Patrick Steinhardt
2023-06-20 12:16       ` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-05-15 19:23   ` [PATCH v2 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-06-06  7:01     ` Patrick Steinhardt
2023-06-20 12:18       ` Taylor Blau
2023-05-15 19:23   ` [PATCH v2 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-05-15 19:23   ` [PATCH v2 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-05-15 19:23   ` [PATCH v2 16/16] ls-refs.c: " Taylor Blau
2023-06-06  7:01   ` [PATCH v2 00/16] refs: implement jump lists for packed backend Patrick Steinhardt
2023-06-20 12:22     ` Taylor Blau
2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
2023-06-07 10:40   ` [PATCH v3 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-06-13 22:19     ` Junio C Hamano
2023-06-07 10:40   ` [PATCH v3 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-06-07 10:41   ` [PATCH v3 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-06-13 22:37     ` Junio C Hamano
2023-06-07 10:41   ` [PATCH v3 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-06-07 10:41   ` [PATCH v3 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-06-13 23:42     ` Junio C Hamano
2023-06-20 11:52       ` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-06-14  0:27     ` Junio C Hamano
2023-06-20 12:05       ` Taylor Blau
2023-06-20 18:49         ` Junio C Hamano
2023-06-07 10:41   ` [PATCH v3 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-06-14  0:32     ` Junio C Hamano
2023-06-20 12:08       ` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-06-07 10:41   ` [PATCH v3 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-06-14  0:40     ` Junio C Hamano
2023-06-07 10:41   ` [PATCH v3 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-06-07 10:42   ` [PATCH v3 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-06-07 10:42   ` [PATCH v3 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-06-07 10:42   ` [PATCH v3 16/16] ls-refs.c: " Taylor Blau
2023-06-12 21:05   ` [PATCH v3 00/16] refs: implement jump lists for packed backend Junio C Hamano
2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
2023-06-20 14:21   ` [PATCH v4 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-07-03  5:13     ` Jeff King
2023-06-20 14:21   ` [PATCH v4 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-07-03  5:15     ` Jeff King
2023-07-03 17:07       ` Taylor Blau
2023-06-20 14:21   ` [PATCH v4 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-07-03  5:16     ` Jeff King
2023-06-20 14:21   ` [PATCH v4 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-07-03  5:19     ` Jeff King
2023-07-03 17:13       ` Taylor Blau
2023-07-03 17:32         ` Jeff King
2023-06-20 14:21   ` [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-07-03  5:27     ` Jeff King
2023-07-03 17:18       ` Taylor Blau
2023-07-03 17:22         ` Taylor Blau
2023-07-03 17:33           ` Jeff King
2023-06-20 14:21   ` [PATCH v4 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-06-20 14:21   ` [PATCH v4 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-06-20 14:21   ` [PATCH v4 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-06-20 14:21   ` [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-07-03  5:56     ` Jeff King
2023-07-03 17:38       ` Taylor Blau
2023-06-20 14:21   ` [PATCH v4 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-06-20 14:21   ` [PATCH v4 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-07-03  5:59     ` Jeff King
2023-06-20 14:22   ` [PATCH v4 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
2023-07-03  6:18     ` Jeff King
2023-07-04 18:22       ` Taylor Blau
2023-06-20 14:22   ` [PATCH v4 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-06-20 14:22   ` [PATCH v4 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-06-20 14:22   ` [PATCH v4 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-07-03  6:26     ` Jeff King
2023-07-04 18:43       ` Taylor Blau
2023-06-20 14:22   ` [PATCH v4 16/16] ls-refs.c: " Taylor Blau
2023-07-03  6:27     ` Jeff King
2023-07-03  6:29   ` [PATCH v4 00/16] refs: implement jump lists for packed backend Jeff King
2023-07-10 21:12 ` [PATCH v5 " Taylor Blau
2023-07-10 21:12   ` [PATCH v5 01/16] refs.c: rename `ref_filter` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
2023-07-10 21:12   ` [PATCH v5 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
2023-07-10 21:12   ` [PATCH v5 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
2023-07-10 21:12   ` [PATCH v5 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
2023-07-10 21:12   ` [PATCH v5 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
2023-07-10 21:12   ` [PATCH v5 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
2023-07-10 21:12   ` [PATCH v5 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 12/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
2023-07-10 21:12   ` [PATCH v5 13/16] refs.h: implement `hidden_refs_to_excludes()` Taylor Blau
2023-07-10 21:12   ` [PATCH v5 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
2023-07-10 21:12   ` [PATCH v5 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
2023-07-10 21:12   ` [PATCH v5 16/16] ls-refs.c: " Taylor Blau
2023-07-10 22:35   ` [PATCH v5 00/16] refs: implement jump lists for packed backend Junio C Hamano
2023-07-11  9:37     ` Patrick Steinhardt
2023-07-11 15:56       ` Junio C Hamano
2023-07-11 17:19         ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqh6slt6nx.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).