Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/15] refs: implement skip lists for packed backend
@ 2023-05-08 21:59 Taylor Blau
  2023-05-08 21:59 ` [PATCH 01/15] refs.c: rename `ref_filter` Taylor Blau
                   ` (18 more replies)
  0 siblings, 19 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 21:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

This series implements the concept of a skip list for the packed refs
backend. In situations where the caller has many references they want to
exclude, the packed-refs iterator maintains a list of (start, end)
tuples of regions to jump over (i.e. if `iter->pos == start`, jump
forward to `end`).

This series implements that concept, and uses it to power a couple of
new things:

  - `git for-each-ref --exclude`, which allows callers to specify
    prefixes they wish to discard.

    In a synthetic example, the naive implementation improved runtime
    from ~820ms to enumerate all references and discard unwanted ones,
    down to ~106ms with `--exclude`. By the end of the series, this time
    drops to under ~5ms.

  - `git receive-pack` uses the new skip-list machinery to avoid
    visiting references hidden via a hideRefs rule.

  - `git upload-pack` has an analogous optimization as above (though it
    can't kick in quite as often for reasons described in that patch).

The series is laid out as follows:

  - The first five patches are preparatory (various refs.c and
    ref-filter cleanup).
  - The sixth patch provides a naive implementation of `--exclude`.
  - The next four patches prepare for and implement skip lists in the
    packed-refs backend.
  - The last five patches apply the optimization in `upload-pack` and
    `receive-pack`.

The series is ordered so that any prefix of sections list above could be
queued independently. Thanks in advance for your review.

Jeff King (5):
  refs.c: rename `ref_filter`
  ref-filter.h: provide `REF_FILTER_INIT`
  ref-filter: clear reachable list pointers after freeing
  ref-filter: add ref_filter_clear()
  ref-filter.c: parameterize match functions over patterns

Taylor Blau (10):
  builtin/for-each-ref.c: add `--exclude` option
  refs: plumb `exclude_patterns` argument throughout
  refs/packed-backend.c: refactor `find_reference_location()`
  refs/packed-backend.c: implement skip lists to avoid excluded
    pattern(s)
  refs/packed-backend.c: add trace2 counters for skip list
  revision.h: store hidden refs in a `strvec`
  refs/packed-backend.c: ignore complicated hidden refs rules
  refs.h: let `for_each_namespaced_ref()` take excluded patterns
  upload-pack.c: avoid enumerating hidden refs where possible
  builtin/receive-pack.c: avoid enumerating hidden references

 Documentation/git-for-each-ref.txt |   6 +
 builtin/branch.c                   |   4 +-
 builtin/for-each-ref.c             |   8 +-
 builtin/receive-pack.c             |   7 +-
 builtin/tag.c                      |   4 +-
 http-backend.c                     |   2 +-
 ls-refs.c                          |   8 +-
 ref-filter.c                       |  61 ++++++--
 ref-filter.h                       |  12 ++
 refs.c                             |  61 ++++----
 refs.h                             |  15 +-
 refs/debug.c                       |   5 +-
 refs/files-backend.c               |   5 +-
 refs/packed-backend.c              | 222 ++++++++++++++++++++++++++---
 refs/refs-internal.h               |   7 +-
 revision.c                         |   4 +-
 revision.h                         |   5 +-
 t/helper/test-reach.c              |   2 +-
 t/helper/test-ref-store.c          |  10 ++
 t/t1419-exclude-refs.sh            | 131 +++++++++++++++++
 t/t6300-for-each-ref.sh            |  35 +++++
 trace2.h                           |   2 +
 trace2/tr2_ctr.c                   |   5 +
 upload-pack.c                      |  43 ++++--
 24 files changed, 559 insertions(+), 105 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

-- 
2.40.1.477.g956c797dfc

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH 01/15] refs.c: rename `ref_filter`
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
@ 2023-05-08 21:59 ` Taylor Blau
  2023-05-08 21:59 ` [PATCH 02/15] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 21:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

From: Jeff King <peff@peff.net>

The refs machinery has its own implementation of a `ref_filter` (used by
`for-each-ref`), which is distinct from the `ref-filler.h` API (also
used by `for-each-ref`, among other things).

Rename the one within refs.c to more clearly indicate its purpose.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/refs.c b/refs.c
index d2a98e1c21..b9b77d2eff 100644
--- a/refs.c
+++ b/refs.c
@@ -375,8 +375,8 @@ char *resolve_refdup(const char *refname, int resolve_flags,
 				   oid, flags);
 }
 
-/* The argument to filter_refs */
-struct ref_filter {
+/* The argument to for_each_filter_refs */
+struct for_each_ref_filter {
 	const char *pattern;
 	const char *prefix;
 	each_ref_fn *fn;
@@ -409,10 +409,11 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
-static int filter_refs(const char *refname, const struct object_id *oid,
-			   int flags, void *data)
+static int for_each_filter_refs(const char *refname,
+				const struct object_id *oid,
+				int flags, void *data)
 {
-	struct ref_filter *filter = (struct ref_filter *)data;
+	struct for_each_ref_filter *filter = data;
 
 	if (wildmatch(filter->pattern, refname, 0))
 		return 0;
@@ -569,7 +570,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	const char *prefix, void *cb_data)
 {
 	struct strbuf real_pattern = STRBUF_INIT;
-	struct ref_filter filter;
+	struct for_each_ref_filter filter;
 	int ret;
 
 	if (!prefix && !starts_with(pattern, "refs/"))
@@ -589,7 +590,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	filter.prefix = prefix;
 	filter.fn = fn;
 	filter.cb_data = cb_data;
-	ret = for_each_ref(filter_refs, &filter);
+	ret = for_each_ref(for_each_filter_refs, &filter);
 
 	strbuf_release(&real_pattern);
 	return ret;
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 02/15] ref-filter.h: provide `REF_FILTER_INIT`
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
  2023-05-08 21:59 ` [PATCH 01/15] refs.c: rename `ref_filter` Taylor Blau
@ 2023-05-08 21:59 ` Taylor Blau
  2023-05-08 21:59 ` [PATCH 03/15] ref-filter: clear reachable list pointers after freeing Taylor Blau
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 21:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

From: Jeff King <peff@peff.net>

Provide a sane initialization value for `struct ref_filter`, which in a
subsequent patch will be used to initialize a new field.

In the meantime, fix a case in test-reach.c where its `ref_filter` is
not even zero-initialized.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/branch.c       | 3 +--
 builtin/for-each-ref.c | 3 +--
 builtin/tag.c          | 3 +--
 ref-filter.h           | 3 +++
 t/helper/test-reach.c  | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 501c47657c..03bb8e414c 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -662,7 +662,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	int reflog = 0, quiet = 0, icase = 0, force = 0,
 	    recurse_submodules_explicit = 0;
 	enum branch_track track;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	static struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -720,7 +720,6 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 
 	setup_ref_filter_porcelain_msg();
 
-	memset(&filter, 0, sizeof(filter));
 	filter.kind = FILTER_REFS_BRANCHES;
 	filter.abbrev = -1;
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 695fc8f4a5..99ccb73518 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -24,7 +24,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	int maxcount = 0, icase = 0, omit_empty = 0;
 	struct ref_array array;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_format format = REF_FORMAT_INIT;
 	struct strbuf output = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
@@ -61,7 +61,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	};
 
 	memset(&array, 0, sizeof(array));
-	memset(&filter, 0, sizeof(filter));
 
 	format.format = "%(objectname) %(objecttype)\t%(refname)";
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 1850a6a6fd..6b41bb7374 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -443,7 +443,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	struct msg_arg msg = { .buf = STRBUF_INIT };
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -501,7 +501,6 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	git_config(git_tag_config, &sorting_options);
 
 	memset(&opt, 0, sizeof(opt));
-	memset(&filter, 0, sizeof(filter));
 	filter.lines = -1;
 	opt.sign = -1;
 
diff --git a/ref-filter.h b/ref-filter.h
index 430701cfb7..a920f73b29 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -92,6 +92,9 @@ struct ref_format {
 	struct string_list bases;
 };
 
+#define REF_FILTER_INIT { \
+	.points_at = OID_ARRAY_INIT, \
+}
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
 	.bases = STRING_LIST_INIT_DUP, \
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5b6f217441..ef58f10c2d 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -139,7 +139,7 @@ int cmd__reach(int ac, const char **av)
 
 		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
 	} else if (!strcmp(av[1], "commit_contains")) {
-		struct ref_filter filter;
+		struct ref_filter filter = REF_FILTER_INIT;
 		struct contains_cache cache;
 		init_contains_cache(&cache);
 
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 03/15] ref-filter: clear reachable list pointers after freeing
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
  2023-05-08 21:59 ` [PATCH 01/15] refs.c: rename `ref_filter` Taylor Blau
  2023-05-08 21:59 ` [PATCH 02/15] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
@ 2023-05-08 21:59 ` Taylor Blau
  2023-05-08 21:59 ` [PATCH 04/15] ref-filter: add ref_filter_clear() Taylor Blau
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 21:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

From: Jeff King <peff@peff.net>

In reach_filter(), we pop all commits from the reachable lists, leaving
them empty. But because we're operating on a list pointer that was
passed by value, the original filter.reachable_from pointer is left
dangling.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 10aab14f03..b1d5022a51 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2416,13 +2416,13 @@ void ref_array_clear(struct ref_array *array)
 #define EXCLUDE_REACHED 0
 #define INCLUDE_REACHED 1
 static void reach_filter(struct ref_array *array,
-			 struct commit_list *check_reachable,
+			 struct commit_list **check_reachable,
 			 int include_reached)
 {
 	int i, old_nr;
 	struct commit **to_clear;
 
-	if (!check_reachable)
+	if (!*check_reachable)
 		return;
 
 	CALLOC_ARRAY(to_clear, array->nr);
@@ -2432,7 +2432,7 @@ static void reach_filter(struct ref_array *array,
 	}
 
 	tips_reachable_from_bases(the_repository,
-				  check_reachable,
+				  *check_reachable,
 				  to_clear, array->nr,
 				  UNINTERESTING);
 
@@ -2453,8 +2453,8 @@ static void reach_filter(struct ref_array *array,
 
 	clear_commit_marks_many(old_nr, to_clear, ALL_REV_FLAGS);
 
-	while (check_reachable) {
-		struct commit *merge_commit = pop_commit(&check_reachable);
+	while (*check_reachable) {
+		struct commit *merge_commit = pop_commit(check_reachable);
 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
 	}
 
@@ -2551,8 +2551,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	clear_contains_cache(&ref_cbdata.no_contains_cache);
 
 	/*  Filters that need revision walking */
-	reach_filter(array, filter->reachable_from, INCLUDE_REACHED);
-	reach_filter(array, filter->unreachable_from, EXCLUDE_REACHED);
+	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
+	reach_filter(array, &filter->unreachable_from, EXCLUDE_REACHED);
 
 	save_commit_buffer = save_commit_buffer_orig;
 	return ret;
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 04/15] ref-filter: add ref_filter_clear()
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (2 preceding siblings ...)
  2023-05-08 21:59 ` [PATCH 03/15] ref-filter: clear reachable list pointers after freeing Taylor Blau
@ 2023-05-08 21:59 ` Taylor Blau
  2023-05-08 22:29   ` Junio C Hamano
  2023-05-09 15:14   ` Patrick Steinhardt
  2023-05-08 21:59 ` [PATCH 05/15] ref-filter.c: parameterize match functions over patterns Taylor Blau
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 21:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

From: Jeff King <peff@peff.net>

We did not bother to clean up at all in branch/tag, and for-each-ref
only hit a few elements. So this is probably cleaning up leaks, but I
didn't check yet.

Note that the reachable_from and unreachable_from lists should be
cleaned as they are used. So this is just covering any case where we
might bail before running the reachability check.
---
 builtin/branch.c       |  1 +
 builtin/for-each-ref.c |  3 +--
 builtin/tag.c          |  1 +
 ref-filter.c           | 16 ++++++++++++++++
 ref-filter.h           |  3 +++
 5 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 03bb8e414c..c201f0cb0b 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -813,6 +813,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		print_columns(&output, colopts, NULL);
 		string_list_clear(&output, 0);
 		ref_sorting_release(sorting);
+		ref_filter_clear(&filter);
 		return 0;
 	} else if (edit_description) {
 		const char *branch_name;
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 99ccb73518..c01fa6fefe 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -120,8 +120,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	strbuf_release(&err);
 	strbuf_release(&output);
 	ref_array_clear(&array);
-	free_commit_list(filter.with_commit);
-	free_commit_list(filter.no_commit);
+	ref_filter_clear(&filter);
 	ref_sorting_release(sorting);
 	strvec_clear(&vec);
 	return 0;
diff --git a/builtin/tag.c b/builtin/tag.c
index 6b41bb7374..aab5e693fe 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -645,6 +645,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 
 cleanup:
 	ref_sorting_release(sorting);
+	ref_filter_clear(&filter);
 	strbuf_release(&buf);
 	strbuf_release(&ref);
 	strbuf_release(&reflog_msg);
diff --git a/ref-filter.c b/ref-filter.c
index b1d5022a51..9ea92b9637 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2864,3 +2864,19 @@ int parse_opt_merge_filter(const struct option *opt, const char *arg, int unset)
 
 	return 0;
 }
+
+void ref_filter_init(struct ref_filter *filter)
+{
+	struct ref_filter blank = REF_FILTER_INIT;
+	memcpy(filter, &blank, sizeof(blank));
+}
+
+void ref_filter_clear(struct ref_filter *filter)
+{
+	oid_array_clear(&filter->points_at);
+	free_commit_list(filter->with_commit);
+	free_commit_list(filter->no_commit);
+	free_commit_list(filter->reachable_from);
+	free_commit_list(filter->unreachable_from);
+	ref_filter_init(filter);
+}
diff --git a/ref-filter.h b/ref-filter.h
index a920f73b29..160b807224 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -170,4 +170,7 @@ void filter_ahead_behind(struct repository *r,
 			 struct ref_format *format,
 			 struct ref_array *array);
 
+void ref_filter_init(struct ref_filter *filter);
+void ref_filter_clear(struct ref_filter *filter);
+
 #endif /*  REF_FILTER_H  */
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 05/15] ref-filter.c: parameterize match functions over patterns
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (3 preceding siblings ...)
  2023-05-08 21:59 ` [PATCH 04/15] ref-filter: add ref_filter_clear() Taylor Blau
@ 2023-05-08 21:59 ` Taylor Blau
  2023-05-08 22:36   ` Junio C Hamano
  2023-05-08 21:59 ` [PATCH 06/15] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 21:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

From: Jeff King <peff@peff.net>

`match_pattern()` and `match_name_as_path()` both take a `struct
ref_filter *`, and then store a stack variable `patterns` pointing at
`filter->patterns`.

The subsequent patch will add a new array of patterns to match over (the
excluded patterns, via a new `git for-each-ref --exclude` option),
treating the return value of these functions differently depending on
which patterns are being used to match.

Tweak `match_pattern()` and `match_name_as_path()` to take an array of
patterns to prepare for passing either in.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 9ea92b9637..6c5eed144f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2102,9 +2102,10 @@ static int get_ref_atom_value(struct ref_array_item *ref, int atom,
  * matches a pattern "refs/heads/mas") or a wildcard (e.g. the same ref
  * matches "refs/heads/mas*", too).
  */
-static int match_pattern(const struct ref_filter *filter, const char *refname)
+static int match_pattern(const struct ref_filter *filter,
+			 const char **patterns,
+			 const char *refname)
 {
-	const char **patterns = filter->name_patterns;
 	unsigned flags = 0;
 
 	if (filter->ignore_case)
@@ -2132,9 +2133,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
  * matches a pattern "refs/heads/" but not "refs/heads/m") or a
  * wildcard (e.g. the same ref matches "refs/heads/m*", too).
  */
-static int match_name_as_path(const struct ref_filter *filter, const char *refname)
+static int match_name_as_path(const struct ref_filter *filter,
+			      const char **pattern,
+			      const char *refname)
 {
-	const char **pattern = filter->name_patterns;
 	int namelen = strlen(refname);
 	unsigned flags = WM_PATHNAME;
 
@@ -2163,8 +2165,8 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 	if (!*filter->name_patterns)
 		return 1; /* No pattern always matches */
 	if (filter->match_as_path)
-		return match_name_as_path(filter, refname);
-	return match_pattern(filter, refname);
+		return match_name_as_path(filter, filter->name_patterns, refname);
+	return match_pattern(filter, filter->name_patterns, refname);
 }
 
 /*
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 06/15] builtin/for-each-ref.c: add `--exclude` option
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (4 preceding siblings ...)
  2023-05-08 21:59 ` [PATCH 05/15] ref-filter.c: parameterize match functions over patterns Taylor Blau
@ 2023-05-08 21:59 ` Taylor Blau
  2023-05-08 23:22   ` Junio C Hamano
  2023-05-08 22:00 ` [PATCH 07/15] refs: plumb `exclude_patterns` argument throughout Taylor Blau
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 21:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

When using `for-each-ref`, it is sometimes convenient for the caller to
be able to exclude certain parts of the references.

For example, if there are many `refs/__hidden__/*` references, the
caller may want to emit all references *except* the hidden ones.
Currently, the only way to do this is to post-process the output, like:

    $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/'

Which is do-able, but requires processing a potentially large quantity
of references.

Teach `git for-each-ref` a new `--exclude=<pattern>` option, which
excludes references from the results if they match one or more excluded
patterns.

This patch provides a naive implementation where the `ref_filter` still
sees all references (including ones that it will discard) and is left to
check whether each reference matches any excluded pattern(s) before
emitting them.

By culling out references we know the caller doesn't care about, we can
avoid allocating memory for their storage, as well as spending time
sorting the output (among other things). Even the naive implementation
provides a significant speed-up on a modified copy of linux.git (that
has a hidden ref pointing at each commit):

    $ hyperfine \
      'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/'
    Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     820.1 ms ±   2.0 ms    [User: 703.7 ms, System: 152.0 ms]
      Range (min … max):   817.7 ms … 823.3 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/
      Time (mean ± σ):     106.6 ms ±   1.1 ms    [User: 99.4 ms, System: 7.1 ms]
      Range (min … max):   104.7 ms … 109.1 ms    27 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran
        7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"'

Subsequent patches will improve on this by avoiding visiting excluded
sections of the `packed-refs` file in certain cases.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-for-each-ref.txt |  6 +++++
 builtin/for-each-ref.c             |  2 ++
 ref-filter.c                       | 13 +++++++++++
 ref-filter.h                       |  6 +++++
 t/t6300-for-each-ref.sh            | 35 ++++++++++++++++++++++++++++++
 5 files changed, 62 insertions(+)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 1e215d4e73..5743eb5def 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -14,6 +14,7 @@ SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
+		   [--exclude=<pattern> ...]
 
 DESCRIPTION
 -----------
@@ -102,6 +103,11 @@ OPTIONS
 	Do not print a newline after formatted refs where the format expands
 	to the empty string.
 
+--exclude=<pattern>::
+	If one or more patterns are given, only refs which do not match
+	any excluded pattern(s) are shown. Matching is done using the
+	same rules as `<pattern>` above.
+
 FIELD NAMES
 -----------
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index c01fa6fefe..449da61e11 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -14,6 +14,7 @@ static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [--points-at <object>]"),
 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
+	N_("git for-each-ref [--exclude=<pattern> ...]"),
 	NULL
 };
 
@@ -47,6 +48,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
+		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
 		OPT_CALLBACK(0, "points-at", &filter.points_at,
 			     N_("object"), N_("print only refs which points at the given object"),
diff --git a/ref-filter.c b/ref-filter.c
index 6c5eed144f..93dc9b331f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2169,6 +2169,15 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 	return match_pattern(filter, filter->name_patterns, refname);
 }
 
+static int filter_exclude_match(struct ref_filter *filter, const char *refname)
+{
+	if (!filter->exclude.nr)
+		return 0;
+	if (filter->match_as_path)
+		return match_name_as_path(filter, filter->exclude.v, refname);
+	return match_pattern(filter, filter->exclude.v, refname);
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -2336,6 +2345,9 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 	if (!filter_pattern_match(filter, refname))
 		return 0;
 
+	if (filter_exclude_match(filter, refname))
+		return 0;
+
 	if (filter->points_at.nr && !match_points_at(&filter->points_at, oid, refname))
 		return 0;
 
@@ -2875,6 +2887,7 @@ void ref_filter_init(struct ref_filter *filter)
 
 void ref_filter_clear(struct ref_filter *filter)
 {
+	strvec_clear(&filter->exclude);
 	oid_array_clear(&filter->points_at);
 	free_commit_list(filter->with_commit);
 	free_commit_list(filter->no_commit);
diff --git a/ref-filter.h b/ref-filter.h
index 160b807224..1524bc463a 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -6,6 +6,7 @@
 #include "refs.h"
 #include "commit.h"
 #include "string-list.h"
+#include "strvec.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -59,6 +60,7 @@ struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 	struct commit_list *no_commit;
@@ -94,6 +96,7 @@ struct ref_format {
 
 #define REF_FILTER_INIT { \
 	.points_at = OID_ARRAY_INIT, \
+	.exclude = STRVEC_INIT, \
 }
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
@@ -112,6 +115,9 @@ struct ref_format {
 #define OPT_REF_SORT(var) \
 	OPT_STRING_LIST(0, "sort", (var), \
 			N_("key"), N_("field name to sort on"))
+#define OPT_REF_FILTER_EXCLUDE(var) \
+	OPT_STRVEC(0, "exclude", &(var)->exclude, \
+		   N_("pattern"), N_("exclude refs which match pattern"))
 
 /*
  * API for filtering a set of refs. Based on the type of refs the user
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 5c00607608..7e8d578522 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -447,6 +447,41 @@ test_expect_success 'exercise glob patterns with prefixes' '
 	test_cmp expected actual
 '
 
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with prefix exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude=refs/tags/foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/foo/one
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with pattern exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude="refs/tags/foo/t*" >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<\EOF
 'refs/heads/main'
 'refs/remotes/origin/main'
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 07/15] refs: plumb `exclude_patterns` argument throughout
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (5 preceding siblings ...)
  2023-05-08 21:59 ` [PATCH 06/15] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
@ 2023-05-08 22:00 ` Taylor Blau
  2023-05-09 15:14   ` Patrick Steinhardt
  2023-05-08 22:00 ` [PATCH 08/15] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

The subsequent patch will want to access an optional `excluded_patterns`
array within refs/packed-backend.c. To do so, the refs subsystem needs
to be updated to pass this value across a number of different locations.

Prepare for a future patch by introducing this plumbing now, passing
NULLs at top-level APIs in order to make that patch less noisy and more
easily readable.

Signed-off-by: Taylor Blau <me@ttaylorr.co>
---
 ls-refs.c             |  2 +-
 ref-filter.c          |  5 +++--
 refs.c                | 32 +++++++++++++++++++-------------
 refs.h                |  8 +++++++-
 refs/debug.c          |  5 +++--
 refs/files-backend.c  |  5 +++--
 refs/packed-backend.c |  5 +++--
 refs/refs-internal.h  |  7 ++++---
 revision.c            |  2 +-
 9 files changed, 44 insertions(+), 27 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index b9f3e08ec3..c7ad39611a 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -192,7 +192,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 		strvec_push(&data.prefixes, "");
 	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
 					  get_git_namespace(), data.prefixes.v,
-					  send_ref, &data);
+					  NULL, send_ref, &data);
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
diff --git a/ref-filter.c b/ref-filter.c
index 93dc9b331f..c8ced1104b 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2207,12 +2207,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return for_each_fullref_in("", cb, cb_data);
+		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+						 "", NULL, cb, cb_data);
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
-						 cb, cb_data);
+						 NULL, cb, cb_data);
 }
 
 /*
diff --git a/refs.c b/refs.c
index b9b77d2eff..538bde644e 100644
--- a/refs.c
+++ b/refs.c
@@ -1526,7 +1526,9 @@ int head_ref(each_ref_fn fn, void *cb_data)
 
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim,
+		const char *prefix,
+		const char **exclude_patterns,
+		int trim,
 		enum do_for_each_ref_flags flags)
 {
 	struct ref_iterator *iter;
@@ -1542,8 +1544,7 @@ struct ref_iterator *refs_ref_iterator_begin(
 		}
 	}
 
-	iter = refs->be->iterator_begin(refs, prefix, flags);
-
+	iter = refs->be->iterator_begin(refs, prefix, exclude_patterns, flags);
 	/*
 	 * `iterator_begin()` already takes care of prefix, but we
 	 * might need to do some trimming:
@@ -1577,7 +1578,7 @@ static int do_for_each_repo_ref(struct repository *r, const char *prefix,
 	if (!refs)
 		return 0;
 
-	iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
+	iter = refs_ref_iterator_begin(refs, prefix, NULL, trim, flags);
 
 	return do_for_each_repo_ref_iterator(r, iter, fn, cb_data);
 }
@@ -1599,6 +1600,7 @@ static int do_for_each_ref_helper(struct repository *r,
 }
 
 static int do_for_each_ref(struct ref_store *refs, const char *prefix,
+			   const char **exclude_patterns,
 			   each_ref_fn fn, int trim,
 			   enum do_for_each_ref_flags flags, void *cb_data)
 {
@@ -1608,7 +1610,8 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 	if (!refs)
 		return 0;
 
-	iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
+	iter = refs_ref_iterator_begin(refs, prefix, exclude_patterns, trim,
+				       flags);
 
 	return do_for_each_repo_ref_iterator(the_repository, iter,
 					do_for_each_ref_helper, &hp);
@@ -1616,7 +1619,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "", NULL, fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1627,7 +1630,7 @@ int for_each_ref(each_ref_fn fn, void *cb_data)
 int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
 			 each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, prefix, fn, strlen(prefix), 0, cb_data);
+	return do_for_each_ref(refs, prefix, NULL, fn, strlen(prefix), 0, cb_data);
 }
 
 int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
@@ -1638,13 +1641,14 @@ int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 {
 	return do_for_each_ref(get_main_ref_store(the_repository),
-			       prefix, fn, 0, 0, cb_data);
+			       prefix, NULL, fn, 0, 0, cb_data);
 }
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
 			     each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, prefix, fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
 }
 
 int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data)
@@ -1661,14 +1665,14 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 	int ret;
 	strbuf_addf(&buf, "%srefs/", get_git_namespace());
 	ret = do_for_each_ref(get_main_ref_store(the_repository),
-			      buf.buf, fn, 0, 0, cb_data);
+			      buf.buf, NULL, fn, 0, 0, cb_data);
 	strbuf_release(&buf);
 	return ret;
 }
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
+	return do_for_each_ref(refs, "", NULL, fn, 0,
 			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
 }
 
@@ -1738,6 +1742,7 @@ static void find_longest_prefixes(struct string_list *out,
 int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 				      const char *namespace,
 				      const char **patterns,
+				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data)
 {
 	struct string_list prefixes = STRING_LIST_INIT_DUP;
@@ -1753,7 +1758,8 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 
 	for_each_string_list_item(prefix, &prefixes) {
 		strbuf_addstr(&buf, prefix->string);
-		ret = refs_for_each_fullref_in(ref_store, buf.buf, fn, cb_data);
+		ret = refs_for_each_fullref_in(ref_store, buf.buf,
+					       exclude_patterns, fn, cb_data);
 		if (ret)
 			break;
 		strbuf_setlen(&buf, namespace_len);
@@ -2408,7 +2414,7 @@ int refs_verify_refname_available(struct ref_store *refs,
 	strbuf_addstr(&dirname, refname + dirname.len);
 	strbuf_addch(&dirname, '/');
 
-	iter = refs_ref_iterator_begin(refs, dirname.buf, 0,
+	iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 				       DO_FOR_EACH_INCLUDE_BROKEN);
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		if (skip &&
diff --git a/refs.h b/refs.h
index 123cfa4424..d672d636cf 100644
--- a/refs.h
+++ b/refs.h
@@ -338,6 +338,7 @@ int for_each_ref(each_ref_fn fn, void *cb_data);
 int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
 			     each_ref_fn fn, void *cb_data);
 int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
@@ -345,10 +346,15 @@ int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data);
  * iterate all refs in "patterns" by partitioning patterns into disjoint sets
  * and iterating the longest-common prefix of each set.
  *
+ * references matching any pattern in "exclude_patterns" are omitted from the
+ * result set on a best-effort basis.
+ *
  * callers should be prepared to ignore references that they did not ask for.
  */
 int refs_for_each_fullref_in_prefixes(struct ref_store *refs,
-				      const char *namespace, const char **patterns,
+				      const char *namespace,
+				      const char **patterns,
+				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data);
 
 /**
diff --git a/refs/debug.c b/refs/debug.c
index adc34c836f..8131133e99 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -228,11 +228,12 @@ static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 
 static struct ref_iterator *
 debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
-			 unsigned int flags)
+			 const char **exclude_patterns, unsigned int flags)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
 	struct ref_iterator *res =
-		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+		drefs->refs->be->iterator_begin(drefs->refs, prefix,
+						exclude_patterns, flags);
 	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
 	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
 	diter->iter = res;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index d0581ee41a..5fae864334 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -827,7 +827,8 @@ static struct ref_iterator_vtable files_ref_iterator_vtable = {
 
 static struct ref_iterator *files_ref_iterator_begin(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags)
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags)
 {
 	struct files_ref_store *refs;
 	struct ref_iterator *loose_iter, *packed_iter, *overlay_iter;
@@ -872,7 +873,7 @@ static struct ref_iterator *files_ref_iterator_begin(
 	 * the packed and loose references.
 	 */
 	packed_iter = refs_ref_iterator_begin(
-			refs->packed_ref_store, prefix, 0,
+			refs->packed_ref_store, prefix, exclude_patterns, 0,
 			DO_FOR_EACH_INCLUDE_BROKEN);
 
 	overlay_iter = overlay_ref_iterator_begin(loose_iter, packed_iter);
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 34c0c4e20f..e54e78e540 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -923,7 +923,8 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 
 static struct ref_iterator *packed_ref_iterator_begin(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags)
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags)
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
@@ -1148,7 +1149,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 * list of refs is exhausted, set iter to NULL. When the list
 	 * of updates is exhausted, leave i set to updates->nr.
 	 */
-	iter = packed_ref_iterator_begin(&refs->base, "",
+	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
 	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
 		iter = NULL;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index a85d113123..28a11b9d61 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -367,8 +367,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
  */
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim,
-		enum do_for_each_ref_flags flags);
+		const char *prefix, const char **exclude_patterns,
+		int trim, enum do_for_each_ref_flags flags);
 
 /*
  * A callback function used to instruct merge_ref_iterator how to
@@ -570,7 +570,8 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
  */
 typedef struct ref_iterator *ref_iterator_begin_fn(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags);
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags);
 
 /* reflog functions */
 
diff --git a/revision.c b/revision.c
index b33cc1d106..89953592f9 100644
--- a/revision.c
+++ b/revision.c
@@ -2670,7 +2670,7 @@ static int for_each_bisect_ref(struct ref_store *refs, each_ref_fn fn,
 	struct strbuf bisect_refs = STRBUF_INIT;
 	int status;
 	strbuf_addf(&bisect_refs, "refs/bisect/%s", term);
-	status = refs_for_each_fullref_in(refs, bisect_refs.buf, fn, cb_data);
+	status = refs_for_each_fullref_in(refs, bisect_refs.buf, NULL, fn, cb_data);
 	strbuf_release(&bisect_refs);
 	return status;
 }
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 08/15] refs/packed-backend.c: refactor `find_reference_location()`
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (6 preceding siblings ...)
  2023-05-08 22:00 ` [PATCH 07/15] refs: plumb `exclude_patterns` argument throughout Taylor Blau
@ 2023-05-08 22:00 ` Taylor Blau
  2023-05-08 23:56   ` Junio C Hamano
  2023-05-08 22:00 ` [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s) Taylor Blau
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

The function `find_reference_location()` is used to perform a
binary search-like function over the contents of a repository's
`$GIT_DIR/packed-refs` file.

The search it implements is unlike a standard binary search in that the
records it searches over are not of a fixed width, so the comparison
must locate the end of a record before comparing it.

Extract the core routine of `find_reference_location()` in order to
implement a function in the following patch which will find the first
location in the `packed-refs` file that *doesn't* match the given
pattern.

The behavior of `find_reference_location()` is unchanged.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c | 46 +++++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index e54e78e540..98f96bf3ee 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -302,7 +302,8 @@ static int cmp_packed_ref_records(const void *v1, const void *v2)
  * Compare a snapshot record at `rec` to the specified NUL-terminated
  * refname.
  */
-static int cmp_record_to_refname(const char *rec, const char *refname)
+static int cmp_record_to_refname(const char *rec, const char *refname,
+				 int start)
 {
 	const char *r1 = rec + the_hash_algo->hexsz + 1;
 	const char *r2 = refname;
@@ -311,7 +312,7 @@ static int cmp_record_to_refname(const char *rec, const char *refname)
 		if (*r1 == '\n')
 			return *r2 ? -1 : 0;
 		if (!*r2)
-			return 1;
+			return start ? 1 : -1;
 		if (*r1 != *r2)
 			return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
 		r1++;
@@ -526,22 +527,9 @@ static int load_contents(struct snapshot *snapshot)
 	return 1;
 }
 
-/*
- * Find the place in `snapshot->buf` where the start of the record for
- * `refname` starts. If `mustexist` is true and the reference doesn't
- * exist, then return NULL. If `mustexist` is false and the reference
- * doesn't exist, then return the point where that reference would be
- * inserted, or `snapshot->eof` (which might be NULL) if it would be
- * inserted at the end of the file. In the latter mode, `refname`
- * doesn't have to be a proper reference name; for example, one could
- * search for "refs/replace/" to find the start of any replace
- * references.
- *
- * The record is sought using a binary search, so `snapshot->buf` must
- * be sorted.
- */
-static const char *find_reference_location(struct snapshot *snapshot,
-					   const char *refname, int mustexist)
+static const char *find_reference_location_1(struct snapshot *snapshot,
+					     const char *refname, int mustexist,
+					     int start)
 {
 	/*
 	 * This is not *quite* a garden-variety binary search, because
@@ -571,7 +559,7 @@ static const char *find_reference_location(struct snapshot *snapshot,
 
 		mid = lo + (hi - lo) / 2;
 		rec = find_start_of_record(lo, mid);
-		cmp = cmp_record_to_refname(rec, refname);
+		cmp = cmp_record_to_refname(rec, refname, start);
 		if (cmp < 0) {
 			lo = find_end_of_record(mid, hi);
 		} else if (cmp > 0) {
@@ -587,6 +575,26 @@ static const char *find_reference_location(struct snapshot *snapshot,
 		return lo;
 }
 
+/*
+ * Find the place in `snapshot->buf` where the start of the record for
+ * `refname` starts. If `mustexist` is true and the reference doesn't
+ * exist, then return NULL. If `mustexist` is false and the reference
+ * doesn't exist, then return the point where that reference would be
+ * inserted, or `snapshot->eof` (which might be NULL) if it would be
+ * inserted at the end of the file. In the latter mode, `refname`
+ * doesn't have to be a proper reference name; for example, one could
+ * search for "refs/replace/" to find the start of any replace
+ * references.
+ *
+ * The record is sought using a binary search, so `snapshot->buf` must
+ * be sorted.
+ */
+static const char *find_reference_location(struct snapshot *snapshot,
+					   const char *refname, int mustexist)
+{
+	return find_reference_location_1(snapshot, refname, mustexist, 1);
+}
+
 /*
  * Create a newly-allocated `snapshot` of the `packed-refs` file in
  * its current state and return it. The return value will already have
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (7 preceding siblings ...)
  2023-05-08 22:00 ` [PATCH 08/15] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
@ 2023-05-08 22:00 ` Taylor Blau
  2023-05-09  0:10   ` Chris Torek
                     ` (2 more replies)
  2023-05-08 22:00 ` [PATCH 10/15] refs/packed-backend.c: add trace2 counters for skip list Taylor Blau
                   ` (9 subsequent siblings)
  18 siblings, 3 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

When iterating through the `packed-refs` file in order to answer a query
like:

    $ git for-each-ref --exclude=refs/__hidden__

it would be useful to avoid walking over all of the entries in
`refs/__hidden__/*` when possible, since we know that the ref-filter
code is going to throw them away anyways.

In certain circumstances, doing so is possible. The algorithm for doing
so is as follows:

  - For each excluded pattern, find the first record that matches it,
    and the first pattern that *doesn't* match it (i.e. the location
    you'd next want to consider when excluding that pattern).

  - Sort the patterns by their starting location within the
    `packed-refs` file.

  - Construct a skip list of regions by combining adjacent and
    overlapping regions from the previous step.

  - When iterating through the `packed-refs` file, if `iter->pos` is
    ever contained in one of the regions from the previous steps,
    advance `iter->pos` past the end of that region, and continue
    enumeration.

Note that this optimization is only possible when none of the excluded
pattern(s) have special meta-characters in them. To see why this is the
case, consider the exclusion pattern "refs/foo[a]". In general, in order
to find the location of the first record that matches this pattern, we
could only consider up to the first meta-character, "refs/foo". But this
doesn't work, since the excluded region we'd come up with would include
"refs/foobar", even though it is not excluded.

There are a few other gotchas worth considering. First, note that the
skip list is sorted, so once we skip past a region, we can avoid
considering it (or any regions preceding it) again. The member
`skip_pos` is used to track the first next-possible region to jump
through.

Second, note that the exclusion list is best-effort, since we do not
handle loose references, and because of the meta-character issue above.

In repositories with a large number of hidden references, the speed-up
can be significant. Tests here are done with a copy of linux.git with a
reference "refs/pull/N" pointing at every commit, as in:

    $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' |
        git update-ref --stdin
    $ git pack-refs --all

, it is significantly faster to have `for-each-ref` skip over the
excluded references, as opposed to filtering them out after the fact:

    $ hyperfine \
      'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
    Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
      Range (min … max):   800.0 ms … 807.7 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
      Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
      Range (min … max):     4.3 ms …   6.7 ms    422 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
      172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'

Using the skip list is fairly straightforward (see the changes to
`refs/packed-backend.c::next_record()`), but constructing the list is
not. To ensure that the construction is correct, add a new suite of
tests in t1419 covering various corner cases (overlapping regions,
partially overlapping regions, adjacent regions, etc.).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c              |   5 +-
 refs/packed-backend.c     | 150 +++++++++++++++++++++++++++++++++++++-
 t/helper/test-ref-store.c |  10 +++
 t/t1419-exclude-refs.sh   | 101 +++++++++++++++++++++++++
 4 files changed, 263 insertions(+), 3 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

diff --git a/ref-filter.c b/ref-filter.c
index c8ced1104b..56ebd332fa 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2208,12 +2208,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
 		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						 "", NULL, cb, cb_data);
+						 "", filter->exclude.v, cb, cb_data);
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
-						 NULL, cb, cb_data);
+						 filter->exclude.v,
+						 cb, cb_data);
 }
 
 /*
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 98f96bf3ee..137a4233f6 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -595,6 +595,21 @@ static const char *find_reference_location(struct snapshot *snapshot,
 	return find_reference_location_1(snapshot, refname, mustexist, 1);
 }
 
+/*
+ * Find the place in `snapshot->buf` after the end of the record for
+ * `refname`. In other words, find the location of first thing *after*
+ * `refname`.
+ *
+ * Other semantics are identical to the ones in
+ * `find_reference_location()`.
+ */
+static const char *find_reference_location_end(struct snapshot *snapshot,
+					       const char *refname,
+					       int mustexist)
+{
+	return find_reference_location_1(snapshot, refname, mustexist, 0);
+}
+
 /*
  * Create a newly-allocated `snapshot` of the `packed-refs` file in
  * its current state and return it. The return value will already have
@@ -786,6 +801,13 @@ struct packed_ref_iterator {
 	/* The end of the part of the buffer that will be iterated over: */
 	const char *eof;
 
+	struct skip_list_entry {
+		const char *start;
+		const char *end;
+	} *skip;
+	size_t skip_nr, skip_alloc;
+	size_t skip_pos;
+
 	/* Scratch space for current values: */
 	struct object_id oid, peeled;
 	struct strbuf refname_buf;
@@ -803,14 +825,34 @@ struct packed_ref_iterator {
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
-	const char *p = iter->pos, *eol;
+	const char *p, *eol;
 
 	strbuf_reset(&iter->refname_buf);
 
+	/*
+	 * If iter->pos is contained within a skipped region, jump past
+	 * it.
+	 *
+	 * Note that each skipped region is considered at most once,
+	 * since they are ordered based on their starting position.
+	 */
+	while (iter->skip_pos < iter->skip_nr) {
+		struct skip_list_entry *curr = &iter->skip[iter->skip_pos];
+		if (iter->pos < curr->start)
+			break; /* not to the next jump yet */
+
+		iter->skip_pos++;
+		if (iter->pos < curr->end) {
+			iter->pos = curr->end;
+			break;
+		}
+	}
+
 	if (iter->pos == iter->eof)
 		return ITER_DONE;
 
 	iter->base.flags = REF_ISPACKED;
+	p = iter->pos;
 
 	if (iter->eof - p < the_hash_algo->hexsz + 2 ||
 	    parse_oid_hex(p, &iter->oid, &p) ||
@@ -918,6 +960,7 @@ static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
 	int ok = ITER_DONE;
 
 	strbuf_release(&iter->refname_buf);
+	free(iter->skip);
 	release_snapshot(iter->snapshot);
 	base_ref_iterator_free(ref_iterator);
 	return ok;
@@ -929,6 +972,108 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.abort = packed_ref_iterator_abort
 };
 
+static int skip_list_entry_cmp(const void *va, const void *vb)
+{
+	const struct skip_list_entry *a = va;
+	const struct skip_list_entry *b = vb;
+
+	if (a->start < b->start)
+		return -1;
+	if (a->start > b->start)
+		return 1;
+	return 0;
+}
+
+static int has_glob_special(const char *str)
+{
+	const char *p;
+	for (p = str; *p; p++) {
+		if (is_glob_special(*p))
+			return 1;
+	}
+	return 0;
+}
+
+static const char *ptr_max(const char *x, const char *y)
+{
+	if (x > y)
+		return x;
+	return y;
+}
+
+static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
+					struct snapshot *snapshot,
+					const char **excluded_patterns)
+{
+	size_t i, j;
+	const char **pattern;
+
+	if (!excluded_patterns)
+		return;
+
+	for (pattern = excluded_patterns; *pattern; pattern++) {
+		struct skip_list_entry *e;
+
+		/*
+		 * We can't feed any excludes with globs in them to the
+		 * refs machinery.  It only understands prefix matching.
+		 * We likewise can't even feed the string leading up to
+		 * the first meta-character, as something like "foo[a]"
+		 * should not exclude "foobar" (but the prefix "foo"
+		 * would match that and mark it for exclusion).
+		 */
+		if (has_glob_special(*pattern))
+			continue;
+
+		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
+
+		e = &iter->skip[iter->skip_nr++];
+		e->start = find_reference_location(snapshot, *pattern, 0);
+		e->end = find_reference_location_end(snapshot, *pattern, 0);
+	}
+
+	if (!iter->skip_nr) {
+		/*
+		 * Every entry in exclude_patterns has a meta-character,
+		 * nothing to do here.
+		 */
+		return;
+	}
+
+	QSORT(iter->skip, iter->skip_nr, skip_list_entry_cmp);
+
+	/*
+	 * As an optimization, merge adjacent entries in the skip list
+	 * to jump forwards as far as possible when entering a skipped
+	 * region.
+	 *
+	 * For example, if we have two skipped regions:
+	 *
+	 *	[[A, B], [B, C]]
+	 *
+	 * we want to combine that into a single entry jumping from A to
+	 * C.
+	 */
+	for (i = 1, j = 1; i < iter->skip_nr; i++) {
+		struct skip_list_entry *ours = &iter->skip[i];
+		struct skip_list_entry *prev = &iter->skip[i - 1];
+
+		if (ours->start == ours->end) {
+			/* ignore empty regions (no matching entries) */
+			continue;
+		} else if (prev->end >= ours->start) {
+			/* overlapping regions extend the previous one */
+			prev->end = ptr_max(prev->end, ours->end);
+		} else {
+			/* otherwise, insert a new region */
+			iter->skip[j++] = *ours;
+		}
+	}
+
+	iter->skip_nr = j;
+	iter->skip_pos = 0;
+}
+
 static struct ref_iterator *packed_ref_iterator_begin(
 		struct ref_store *ref_store,
 		const char *prefix, const char **exclude_patterns,
@@ -964,6 +1109,9 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
 
+	if (exclude_patterns)
+		populate_excluded_skip_list(iter, snapshot, exclude_patterns);
+
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
 
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 6d8f844e9c..2bff003f7c 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -175,6 +175,15 @@ static int cmd_for_each_ref(struct ref_store *refs, const char **argv)
 	return refs_for_each_ref_in(refs, prefix, each_ref, NULL);
 }
 
+static int cmd_for_each_ref__exclude(struct ref_store *refs, const char **argv)
+{
+	const char *prefix = notnull(*argv++, "prefix");
+	const char **exclude_patterns = argv;
+
+	return refs_for_each_fullref_in(refs, prefix, exclude_patterns, each_ref,
+					NULL);
+}
+
 static int cmd_resolve_ref(struct ref_store *refs, const char **argv)
 {
 	struct object_id oid = *null_oid();
@@ -307,6 +316,7 @@ static struct command commands[] = {
 	{ "delete-refs", cmd_delete_refs },
 	{ "rename-ref", cmd_rename_ref },
 	{ "for-each-ref", cmd_for_each_ref },
+	{ "for-each-ref--exclude", cmd_for_each_ref__exclude },
 	{ "resolve-ref", cmd_resolve_ref },
 	{ "verify-ref", cmd_verify_ref },
 	{ "for-each-reflog", cmd_for_each_reflog },
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
new file mode 100755
index 0000000000..da5265a5a8
--- /dev/null
+++ b/t/t1419-exclude-refs.sh
@@ -0,0 +1,101 @@
+#!/bin/sh
+
+test_description='test exclude_patterns functionality in main ref store'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+for_each_ref__exclude () {
+	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
+	cut -d ' ' -f 2 actual.raw
+}
+
+for_each_ref () {
+	git for-each-ref --format='%(refname)' "$@"
+}
+
+test_expect_success 'setup' '
+	test_commit --no-tag base &&
+	base="$(git rev-parse HEAD)" &&
+
+	for name in foo bar baz quux
+	do
+		for i in 1 2 3
+		do
+			echo "create refs/heads/$name/$i $base" || return 1
+		done || return 1
+	done >in &&
+	echo "delete refs/heads/main" >>in &&
+
+	git update-ref --stdin <in &&
+	git pack-refs --all
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/foo/)' '
+	# region in middle
+	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
+	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/bar/)' '
+	# region at beginning
+	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/quux/)' '
+	# region at end
+	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
+	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/quux/)' '
+	# disjoint regions
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/baz/)' '
+	# adjacent, non-overlapping regions
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/ba refs/heads/baz/)' '
+	# overlapping region
+	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/does/not/exist)' '
+	# empty region
+	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/ba*)' '
+	# discards meta-characters
+	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 10/15] refs/packed-backend.c: add trace2 counters for skip list
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (8 preceding siblings ...)
  2023-05-08 22:00 ` [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s) Taylor Blau
@ 2023-05-08 22:00 ` Taylor Blau
  2023-05-08 22:00 ` [PATCH 11/15] revision.h: store hidden refs in a `strvec` Taylor Blau
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

The previous commit added low-level tests to ensure that the packed-refs
iterator did not enumerate excluded sections of the refspace.

However, there was no guarantee that these sections weren't being
visited, only that they were being suppressed from the output. To harden
these tests, add a trace2 counter which tracks the number of regions
skipped by the packed-refs iterator, and assert on its value.

Suggested-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c   |  2 ++
 t/t1419-exclude-refs.sh | 54 ++++++++++++++++++++++++++++-------------
 trace2.h                |  2 ++
 trace2/tr2_ctr.c        |  5 ++++
 4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 137a4233f6..ddfa9add14 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -11,6 +11,7 @@
 #include "../chdir-notify.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
+#include "../trace2.h"
 
 enum mmap_strategy {
 	/*
@@ -844,6 +845,7 @@ static int next_record(struct packed_ref_iterator *iter)
 		iter->skip_pos++;
 		if (iter->pos < curr->end) {
 			iter->pos = curr->end;
+			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_SKIPS, 1);
 			break;
 		}
 	}
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
index da5265a5a8..051b5a54ce 100755
--- a/t/t1419-exclude-refs.sh
+++ b/t/t1419-exclude-refs.sh
@@ -9,7 +9,8 @@ TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 for_each_ref__exclude () {
-	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
+	GIT_TRACE2_PERF=1 test-tool ref-store main \
+		for-each-ref--exclude "$@" >actual.raw
 	cut -d ' ' -f 2 actual.raw
 }
 
@@ -17,6 +18,17 @@ for_each_ref () {
 	git for-each-ref --format='%(refname)' "$@"
 }
 
+assert_skips () {
+	local nr="$1"
+	local trace="$2"
+
+	grep -q "name:skips_made value:$nr" $trace
+}
+
+assert_no_skips () {
+	! assert_skips ".*" "$1"
+}
+
 test_expect_success 'setup' '
 	test_commit --no-tag base &&
 	base="$(git rev-parse HEAD)" &&
@@ -36,66 +48,74 @@ test_expect_success 'setup' '
 
 test_expect_success 'for_each_ref__exclude(refs/heads/foo/)' '
 	# region in middle
-	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
+	for_each_ref__exclude refs/heads refs/heads/foo >actual 2>perf &&
 	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_skips 1 perf
 '
 
 test_expect_success 'for_each_ref__exclude(refs/heads/bar/)' '
 	# region at beginning
-	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar >actual 2>perf &&
 	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_skips 1 perf
 '
 
 test_expect_success 'for_each_ref__exclude(refs/heads/quux/)' '
 	# region at end
-	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
+	for_each_ref__exclude refs/heads refs/heads/quux >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_skips 1 perf
 '
 
 test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/quux/)' '
 	# disjoint regions
-	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual 2>perf &&
 	for_each_ref refs/heads/baz refs/heads/foo >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_skips 2 perf
 '
 
 test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/baz/)' '
 	# adjacent, non-overlapping regions
-	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_skips 1 perf
 '
 
 test_expect_success 'for_each_ref__exclude(refs/heads/ba refs/heads/baz/)' '
 	# overlapping region
-	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
+	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_skips 1 perf
 '
 
 test_expect_success 'for_each_ref__exclude(refs/heads/does/not/exist)' '
 	# empty region
-	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
+	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual 2>perf &&
 	for_each_ref >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_no_skips
 '
 
 test_expect_success 'for_each_ref__exclude(refs/heads/ba*)' '
 	# discards meta-characters
-	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
+	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual 2>perf &&
 	for_each_ref >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_no_skips
 '
 
 test_done
diff --git a/trace2.h b/trace2.h
index 4ced30c0db..6a116f60a9 100644
--- a/trace2.h
+++ b/trace2.h
@@ -551,6 +551,8 @@ enum trace2_counter_id {
 	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
 	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
 
+	TRACE2_COUNTER_ID_PACKED_REFS_SKIPS, /* counts number of skips */
+
 	/* Add additional counter definitions before here. */
 	TRACE2_NUMBER_OF_COUNTERS
 };
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
index b342d3b1a3..f7efbc7646 100644
--- a/trace2/tr2_ctr.c
+++ b/trace2/tr2_ctr.c
@@ -27,6 +27,11 @@ static struct tr2_counter_metadata tr2_counter_metadata[TRACE2_NUMBER_OF_COUNTER
 		.name = "test2",
 		.want_per_thread_events = 1,
 	},
+	[TRACE2_COUNTER_ID_PACKED_REFS_SKIPS] = {
+		.category = "packed-refs",
+		.name = "skips_made",
+		.want_per_thread_events = 0,
+	},
 
 	/* Add additional metadata before here. */
 };
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 11/15] revision.h: store hidden refs in a `strvec`
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (9 preceding siblings ...)
  2023-05-08 22:00 ` [PATCH 10/15] refs/packed-backend.c: add trace2 counters for skip list Taylor Blau
@ 2023-05-08 22:00 ` Taylor Blau
  2023-05-08 22:00 ` [PATCH 12/15] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

In subsequent commits, it will be convenient to have a 'const char **'
of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`,
etc.), instead of a `string_list`.

Convert spots throughout the tree that store the list of hidden refs
from a `string_list` to a `strvec`.

Note that in `parse_hide_refs_config()` there is an ugly const-cast used
to avoid an extra copy of each value before trimming any trailing slash
characters. This could instead be written as:

    ref = xstrdup(value);
    len = strlen(ref);
    while (len && ref[len - 1] == '/')
            ref[--len] = '\0';
    strvec_push(hide_refs, ref);
    free(ref);

but the double-copy (once when calling `xstrdup()`, and another via
`strvec_push()`) is wasteful.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/receive-pack.c |  4 ++--
 ls-refs.c              |  6 +++---
 refs.c                 | 11 ++++++-----
 refs.h                 |  4 ++--
 revision.c             |  2 +-
 revision.h             |  5 +++--
 upload-pack.c          | 10 +++++-----
 7 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d22180435c..064df74715 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -89,7 +89,7 @@ static struct object_id push_cert_oid;
 static struct signature_check sigcheck;
 static const char *push_cert_nonce;
 static const char *cert_nonce_seed;
-static struct string_list hidden_refs = STRING_LIST_INIT_DUP;
+static struct strvec hidden_refs = STRVEC_INIT;
 
 static const char *NONCE_UNSOLICITED = "UNSOLICITED";
 static const char *NONCE_BAD = "BAD";
@@ -2618,7 +2618,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		packet_flush(1);
 	oid_array_clear(&shallow);
 	oid_array_clear(&ref);
-	string_list_clear(&hidden_refs, 0);
+	strvec_clear(&hidden_refs);
 	free((void *)push_cert_nonce);
 	return 0;
 }
diff --git a/ls-refs.c b/ls-refs.c
index c7ad39611a..d3d7e13e5a 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -71,7 +71,7 @@ struct ls_refs_data {
 	unsigned symrefs;
 	struct strvec prefixes;
 	struct strbuf buf;
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 	unsigned unborn : 1;
 };
 
@@ -154,7 +154,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 	memset(&data, 0, sizeof(data));
 	strvec_init(&data.prefixes);
 	strbuf_init(&data.buf, 0);
-	string_list_init_dup(&data.hidden_refs);
+	strvec_init(&data.hidden_refs);
 
 	git_config(ls_refs_config, &data);
 
@@ -196,7 +196,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
-	string_list_clear(&data.hidden_refs, 0);
+	strvec_clear(&data.hidden_refs);
 	return 0;
 }
 
diff --git a/refs.c b/refs.c
index 538bde644e..ec4d5b9101 100644
--- a/refs.c
+++ b/refs.c
@@ -1427,7 +1427,7 @@ char *shorten_unambiguous_ref(const char *refname, int strict)
 }
 
 int parse_hide_refs_config(const char *var, const char *value, const char *section,
-			   struct string_list *hide_refs)
+			   struct strvec *hide_refs)
 {
 	const char *key;
 	if (!strcmp("transfer.hiderefs", var) ||
@@ -1438,22 +1438,23 @@ int parse_hide_refs_config(const char *var, const char *value, const char *secti
 
 		if (!value)
 			return config_error_nonbool(var);
-		ref = xstrdup(value);
+
+		/* drop const to remove trailing '/' characters */
+		ref = (char *)strvec_push(hide_refs, value);
 		len = strlen(ref);
 		while (len && ref[len - 1] == '/')
 			ref[--len] = '\0';
-		string_list_append_nodup(hide_refs, ref);
 	}
 	return 0;
 }
 
 int ref_is_hidden(const char *refname, const char *refname_full,
-		  const struct string_list *hide_refs)
+		  const struct strvec *hide_refs)
 {
 	int i;
 
 	for (i = hide_refs->nr - 1; i >= 0; i--) {
-		const char *match = hide_refs->items[i].string;
+		const char *match = hide_refs->v[i];
 		const char *subject;
 		int neg = 0;
 		const char *p;
diff --git a/refs.h b/refs.h
index d672d636cf..a7751a1fc9 100644
--- a/refs.h
+++ b/refs.h
@@ -810,7 +810,7 @@ int update_ref(const char *msg, const char *refname,
 	       unsigned int flags, enum action_on_err onerr);
 
 int parse_hide_refs_config(const char *var, const char *value, const char *,
-			   struct string_list *);
+			   struct strvec *);
 
 /*
  * Check whether a ref is hidden. If no namespace is set, both the first and
@@ -820,7 +820,7 @@ int parse_hide_refs_config(const char *var, const char *value, const char *,
  * the ref is outside that namespace, the first parameter is NULL. The second
  * parameter always points to the full ref name.
  */
-int ref_is_hidden(const char *, const char *, const struct string_list *);
+int ref_is_hidden(const char *, const char *, const struct strvec *);
 
 /* Is this a per-worktree ref living in the refs/ namespace? */
 int is_per_worktree_ref(const char *refname);
diff --git a/revision.c b/revision.c
index 89953592f9..7c9367a266 100644
--- a/revision.c
+++ b/revision.c
@@ -1558,7 +1558,7 @@ void init_ref_exclusions(struct ref_exclusions *exclusions)
 void clear_ref_exclusions(struct ref_exclusions *exclusions)
 {
 	string_list_clear(&exclusions->excluded_refs, 0);
-	string_list_clear(&exclusions->hidden_refs, 0);
+	strvec_clear(&exclusions->hidden_refs);
 	exclusions->hidden_refs_configured = 0;
 }
 
diff --git a/revision.h b/revision.h
index e8f6de9684..30b5b5919d 100644
--- a/revision.h
+++ b/revision.h
@@ -9,6 +9,7 @@
 #include "commit-slab-decl.h"
 #include "ident.h"
 #include "list-objects-filter-options.h"
+#include "strvec.h"
 
 /**
  * The revision walking API offers functions to build a list of revisions
@@ -94,7 +95,7 @@ struct ref_exclusions {
 	 * Hidden refs is a list of patterns that is to be hidden via
 	 * `ref_is_hidden()`.
 	 */
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 
 	/*
 	 * Indicates whether hidden refs have been configured. This is to
@@ -109,7 +110,7 @@ struct ref_exclusions {
  */
 #define REF_EXCLUSIONS_INIT { \
 	.excluded_refs = STRING_LIST_INIT_DUP, \
-	.hidden_refs = STRING_LIST_INIT_DUP, \
+	.hidden_refs = STRVEC_INIT, \
 }
 
 struct oidset;
diff --git a/upload-pack.c b/upload-pack.c
index 08633dc121..d77d58bdde 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -69,7 +69,7 @@ struct upload_pack_data {
 	struct object_array have_obj;
 	struct oid_array haves;					/* v2 only */
 	struct string_list wanted_refs;				/* v2 only */
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 
 	struct object_array shallows;
 	struct string_list deepen_not;
@@ -126,7 +126,7 @@ static void upload_pack_data_init(struct upload_pack_data *data)
 {
 	struct string_list symref = STRING_LIST_INIT_DUP;
 	struct string_list wanted_refs = STRING_LIST_INIT_DUP;
-	struct string_list hidden_refs = STRING_LIST_INIT_DUP;
+	struct strvec hidden_refs = STRVEC_INIT;
 	struct object_array want_obj = OBJECT_ARRAY_INIT;
 	struct object_array have_obj = OBJECT_ARRAY_INIT;
 	struct oid_array haves = OID_ARRAY_INIT;
@@ -161,7 +161,7 @@ static void upload_pack_data_clear(struct upload_pack_data *data)
 {
 	string_list_clear(&data->symref, 1);
 	string_list_clear(&data->wanted_refs, 1);
-	string_list_clear(&data->hidden_refs, 0);
+	strvec_clear(&data->hidden_refs);
 	object_array_clear(&data->want_obj);
 	object_array_clear(&data->have_obj);
 	oid_array_clear(&data->haves);
@@ -1169,7 +1169,7 @@ static void receive_needs(struct upload_pack_data *data,
 
 /* return non-zero if the ref is hidden, otherwise 0 */
 static int mark_our_ref(const char *refname, const char *refname_full,
-			const struct object_id *oid, const struct string_list *hidden_refs)
+			const struct object_id *oid, const struct strvec *hidden_refs)
 {
 	struct object *o = lookup_unknown_object(the_repository, oid);
 
@@ -1453,7 +1453,7 @@ static int parse_want(struct packet_writer *writer, const char *line,
 
 static int parse_want_ref(struct packet_writer *writer, const char *line,
 			  struct string_list *wanted_refs,
-			  struct string_list *hidden_refs,
+			  struct strvec *hidden_refs,
 			  struct object_array *want_obj)
 {
 	const char *refname_nons;
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 12/15] refs/packed-backend.c: ignore complicated hidden refs rules
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (10 preceding siblings ...)
  2023-05-08 22:00 ` [PATCH 11/15] revision.h: store hidden refs in a `strvec` Taylor Blau
@ 2023-05-08 22:00 ` Taylor Blau
  2023-05-08 22:00 ` [PATCH 13/15] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

In subsequent commits, we'll teach `receive-pack` and `upload-pack` to
use the new skip-list feature in the packed-refs iterator by ignoring
references which are mentioned via its respective hideRefs lists.

However, the packed-ref skip lists cannot handle un-hiding rules (that
begin with '!'), or namespace comparisons (that begin with '^'). Detect
and avoid these cases by falling back to the normal enumeration without
a skip list when such patterns exist.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c   | 19 +++++++++++++++++++
 t/t1419-exclude-refs.sh | 10 ++++++++++
 2 files changed, 29 insertions(+)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index ddfa9add14..7f09201f35 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1013,6 +1013,25 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
 	if (!excluded_patterns)
 		return;
 
+	for (pattern = excluded_patterns; *pattern; pattern++) {
+		/*
+		 * We also can't feed any excludes from hidden refs
+		 * config sections, since later rules may override
+		 * previous ones. For example, with rules "refs/foo" and
+		 * "!refs/foo/bar", we should show "refs/foo/bar" (and
+		 * everything underneath it), but the earlier exclusion
+		 * would cause us to skip all of "refs/foo". We likewise
+		 * don't implement the namespace stripping required for
+		 * '^' rules.
+		 *
+		 * Both are possible to do, but complicated, so avoid
+		 * populating the skip list at all if we see either of
+		 * these patterns.
+		 */
+		if (**pattern == '!' || **pattern == '^')
+			return;
+	}
+
 	for (pattern = excluded_patterns; *pattern; pattern++) {
 		struct skip_list_entry *e;
 
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
index 051b5a54ce..026e4414cd 100755
--- a/t/t1419-exclude-refs.sh
+++ b/t/t1419-exclude-refs.sh
@@ -118,4 +118,14 @@ test_expect_success 'for_each_ref__exclude(refs/heads/ba*)' '
 	assert_no_skips
 '
 
+test_expect_success 'for_each_ref__exclude(refs/heads/foo, !refs/heads/foo/1)' '
+	# discards complex hidden ref rules
+	for_each_ref__exclude refs/heads refs/heads/foo "!refs/heads/foo/1" \
+		>actual 2>perf &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual &&
+	assert_no_skips
+'
+
 test_done
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 13/15] refs.h: let `for_each_namespaced_ref()` take excluded patterns
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (11 preceding siblings ...)
  2023-05-08 22:00 ` [PATCH 12/15] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
@ 2023-05-08 22:00 ` Taylor Blau
  2023-05-08 22:00 ` [PATCH 14/15] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

The following commit will want to call `for_each_namespaced_ref()` with
a list of excluded patterns.

We could introduce a variant of that function, say,
`for_each_namespaced_ref_exclude()` which takes the extra parameter, and
reimplement the original function in terms of that. But all but one
caller (in `http-backend.c`) will supply the new parameter, so add the
new parameter to `for_each_namespaced_ref()` itself instead of
introducing a new function.

For now, supply NULL for the list of excluded patterns at all callers to
avoid changing behavior, which we will do in the subsequent commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 http-backend.c | 2 +-
 refs.c         | 5 +++--
 refs.h         | 3 ++-
 upload-pack.c  | 6 +++---
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index ac146d85c5..ad500683c8 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -559,7 +559,7 @@ static void get_info_refs(struct strbuf *hdr, char *arg UNUSED)
 
 	} else {
 		select_getanyfile(hdr);
-		for_each_namespaced_ref(show_text_ref, &buf);
+		for_each_namespaced_ref(NULL, show_text_ref, &buf);
 		send_strbuf(hdr, "text/plain", &buf);
 	}
 	strbuf_release(&buf);
diff --git a/refs.c b/refs.c
index ec4d5b9101..95a7db9563 100644
--- a/refs.c
+++ b/refs.c
@@ -1660,13 +1660,14 @@ int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_dat
 				    DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
 }
 
-int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
+int for_each_namespaced_ref(const char **exclude_patterns,
+			    each_ref_fn fn, void *cb_data)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int ret;
 	strbuf_addf(&buf, "%srefs/", get_git_namespace());
 	ret = do_for_each_ref(get_main_ref_store(the_repository),
-			      buf.buf, NULL, fn, 0, 0, cb_data);
+			      buf.buf, exclude_patterns, fn, 0, 0, cb_data);
 	strbuf_release(&buf);
 	return ret;
 }
diff --git a/refs.h b/refs.h
index a7751a1fc9..f23626beca 100644
--- a/refs.h
+++ b/refs.h
@@ -372,7 +372,8 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 			 const char *prefix, void *cb_data);
 
 int head_ref_namespaced(each_ref_fn fn, void *cb_data);
-int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
+int for_each_namespaced_ref(const char **exclude_patterns,
+			    each_ref_fn fn, void *cb_data);
 
 /* can be used to learn about broken ref and symref */
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data);
diff --git a/upload-pack.c b/upload-pack.c
index d77d58bdde..7c646ea5bd 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -854,7 +854,7 @@ static void deepen(struct upload_pack_data *data, int depth)
 		 * marked with OUR_REF.
 		 */
 		head_ref_namespaced(check_ref, data);
-		for_each_namespaced_ref(check_ref, data);
+		for_each_namespaced_ref(NULL, check_ref, data);
 
 		get_reachable_list(data, &reachable_shallows);
 		result = get_shallow_commits(&reachable_shallows,
@@ -1378,7 +1378,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		if (advertise_refs)
 			data.no_done = 1;
 		head_ref_namespaced(send_ref, &data);
-		for_each_namespaced_ref(send_ref, &data);
+		for_each_namespaced_ref(NULL, send_ref, &data);
 		/*
 		 * fflush stdout before calling advertise_shallow_grafts because send_ref
 		 * uses stdio.
@@ -1388,7 +1388,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		packet_flush(1);
 	} else {
 		head_ref_namespaced(check_ref, &data);
-		for_each_namespaced_ref(check_ref, &data);
+		for_each_namespaced_ref(NULL, check_ref, &data);
 	}
 
 	if (!advertise_refs) {
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 14/15] upload-pack.c: avoid enumerating hidden refs where possible
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (12 preceding siblings ...)
  2023-05-08 22:00 ` [PATCH 13/15] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
@ 2023-05-08 22:00 ` Taylor Blau
  2023-05-09 15:15   ` Patrick Steinhardt
  2023-05-08 22:00 ` [PATCH 15/15] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.

Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:

  - `uploadpack.allowTipSHA1InWant`, or
  - `uploadpack.allowReachableSHA1InWant`

are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.

When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.

When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.

When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):

    $ printf 0000 >in
    $ hyperfine --warmup=1 \
      'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
      'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
      'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
    Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
      Time (mean ± σ):     406.9 ms ±   1.1 ms    [User: 357.3 ms, System: 49.5 ms]
      Range (min … max):   405.7 ms … 409.2 ms    10 runs

    Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
      Time (mean ± σ):     406.5 ms ±   1.3 ms    [User: 356.5 ms, System: 49.9 ms]
      Range (min … max):   404.6 ms … 408.8 ms    10 runs

    Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
      Time (mean ± σ):       4.7 ms ±   0.2 ms    [User: 0.7 ms, System: 3.9 ms]
      Range (min … max):     4.3 ms …   6.1 ms    472 runs

    Summary
      'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
       86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
       86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'

As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 upload-pack.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 7c646ea5bd..0162fffce0 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -601,11 +601,32 @@ static int get_common_commits(struct upload_pack_data *data,
 	}
 }
 
+static int allow_hidden_refs(enum allow_uor allow_uor)
+{
+	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
+}
+
+static void for_each_namespaced_ref_1(each_ref_fn fn,
+				      struct upload_pack_data *data)
+{
+	/*
+	 * If `data->allow_uor` allows updating hidden refs, we need to
+	 * mark all references (including hidden ones), to check in
+	 * `is_our_ref()` below.
+	 *
+	 * Otherwise, we only care about whether each reference's object
+	 * has the OUR_REF bit set or not, so do not need to visit
+	 * hidden references.
+	 */
+	if (allow_hidden_refs(data->allow_uor))
+		for_each_namespaced_ref(NULL, fn, data);
+	else
+		for_each_namespaced_ref(data->hidden_refs.v, fn, data);
+}
+
 static int is_our_ref(struct object *o, enum allow_uor allow_uor)
 {
-	int allow_hidden_ref = (allow_uor &
-				(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
-	return o->flags & ((allow_hidden_ref ? HIDDEN_REF : 0) | OUR_REF);
+	return o->flags & ((allow_hidden_refs(allow_uor) ? HIDDEN_REF : 0) | OUR_REF);
 }
 
 /*
@@ -854,7 +875,7 @@ static void deepen(struct upload_pack_data *data, int depth)
 		 * marked with OUR_REF.
 		 */
 		head_ref_namespaced(check_ref, data);
-		for_each_namespaced_ref(NULL, check_ref, data);
+		for_each_namespaced_ref_1(check_ref, data);
 
 		get_reachable_list(data, &reachable_shallows);
 		result = get_shallow_commits(&reachable_shallows,
@@ -1378,7 +1399,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		if (advertise_refs)
 			data.no_done = 1;
 		head_ref_namespaced(send_ref, &data);
-		for_each_namespaced_ref(NULL, send_ref, &data);
+		for_each_namespaced_ref_1(send_ref, &data);
 		/*
 		 * fflush stdout before calling advertise_shallow_grafts because send_ref
 		 * uses stdio.
@@ -1388,7 +1409,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		packet_flush(1);
 	} else {
 		head_ref_namespaced(check_ref, &data);
-		for_each_namespaced_ref(NULL, check_ref, &data);
+		for_each_namespaced_ref_1(check_ref, &data);
 	}
 
 	if (!advertise_refs) {
-- 
2.40.1.477.g956c797dfc


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 15/15] builtin/receive-pack.c: avoid enumerating hidden references
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (13 preceding siblings ...)
  2023-05-08 22:00 ` [PATCH 14/15] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
@ 2023-05-08 22:00 ` Taylor Blau
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Derrick Stolee, Junio C Hamano

Now that `refs_for_each_fullref_in()` has the ability to avoid
enumerating references matching certain pattern(s), use that to avoid
visiting hidden refs when constructing the ref advertisement via
receive-pack.

Note that since this exclusion is best-effort, we still need
`show_ref_cb()` to check whether or not each reference is hidden or not
before including it in the advertisement.

As was the case when applying this same optimization to `upload-pack`,
`receive-pack`'s reference advertisement phase can proceed much quicker
by avoiding enumerating references that will not be part of the
advertisement.

(Below, we're still using linux.git with one hidden refs/pull/N ref per
commit):

    $ hyperfine -L v ,.compile 'git{v} -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'
    Benchmark 1: git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
      Time (mean ± σ):      89.1 ms ±   1.7 ms    [User: 82.0 ms, System: 7.0 ms]
      Range (min … max):    87.7 ms …  95.5 ms    31 runs

    Benchmark 2: git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
      Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 0.5 ms, System: 3.9 ms]
      Range (min … max):     4.1 ms …   5.6 ms    508 runs

    Summary
      'git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git' ran
       20.00 ± 1.05 times faster than 'git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/receive-pack.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 064df74715..b954bcf802 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -336,7 +336,8 @@ static void write_head_info(void)
 {
 	static struct oidset seen = OIDSET_INIT;
 
-	for_each_ref(show_ref_cb, &seen);
+	refs_for_each_fullref_in(get_main_ref_store(the_repository), "",
+				 hidden_refs.v, show_ref_cb, &seen);
 	for_each_alternate_ref(show_one_alternate_ref, &seen);
 	oidset_clear(&seen);
 	if (!sent_capabilities)
-- 
2.40.1.477.g956c797dfc

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH 04/15] ref-filter: add ref_filter_clear()
  2023-05-08 21:59 ` [PATCH 04/15] ref-filter: add ref_filter_clear() Taylor Blau
@ 2023-05-08 22:29   ` Junio C Hamano
  2023-05-08 22:33     ` Taylor Blau
  2023-05-09 15:14   ` Patrick Steinhardt
  1 sibling, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2023-05-08 22:29 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> From: Jeff King <peff@peff.net>
>
> We did not bother to clean up at all in branch/tag, and for-each-ref
> only hit a few elements. So this is probably cleaning up leaks, but I
> didn't check yet.
>
> Note that the reachable_from and unreachable_from lists should be
> cleaned as they are used. So this is just covering any case where we
> might bail before running the reachability check.
> ---

Not signed-off?

> +void ref_filter_clear(struct ref_filter *filter)
> +{
> +	oid_array_clear(&filter->points_at);
> +	free_commit_list(filter->with_commit);
> +	free_commit_list(filter->no_commit);
> +	free_commit_list(filter->reachable_from);
> +	free_commit_list(filter->unreachable_from);
> +	ref_filter_init(filter);
> +}

And the previous step matters here---otherwise we will end up
walking two commit lists whose elements have all been popped
in reach_filter().

Makes sense.

> +void ref_filter_init(struct ref_filter *filter)
> +{
> +	struct ref_filter blank = REF_FILTER_INIT;
> +	memcpy(filter, &blank, sizeof(blank));
> +}


I wonder if structure assignment "*filter = blank" is easier to see
but I think we've seen this "_INIT; memcpy()" dance before.



^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 04/15] ref-filter: add ref_filter_clear()
  2023-05-08 22:29   ` Junio C Hamano
@ 2023-05-08 22:33     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-08 22:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Derrick Stolee

On Mon, May 08, 2023 at 03:29:22PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > From: Jeff King <peff@peff.net>
> >
> > We did not bother to clean up at all in branch/tag, and for-each-ref
> > only hit a few elements. So this is probably cleaning up leaks, but I
> > didn't check yet.
> >
> > Note that the reachable_from and unreachable_from lists should be
> > cleaned as they are used. So this is just covering any case where we
> > might bail before running the reachability check.
> > ---
>
> Not signed-off?

Oops. Sorry about that, this should be:

    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Taylor Blau <me@ttaylorr.com>

> I wonder if structure assignment "*filter = blank" is easier to see
> but I think we've seen this "_INIT; memcpy()" dance before.

Yeah, this matches how many other _init() functions work (just looking
through the output of `git grep -B3 memcpy | grep -A3 _init`).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 05/15] ref-filter.c: parameterize match functions over patterns
  2023-05-08 21:59 ` [PATCH 05/15] ref-filter.c: parameterize match functions over patterns Taylor Blau
@ 2023-05-08 22:36   ` Junio C Hamano
  2023-05-09 20:13     ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2023-05-08 22:36 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> -static int match_pattern(const struct ref_filter *filter, const char *refname)
> +static int match_pattern(const struct ref_filter *filter,
> +			 const char **patterns,
> +			 const char *refname)
>  {
> -	const char **patterns = filter->name_patterns;
>  	unsigned flags = 0;
>  
>  	if (filter->ignore_case)
> @@ -2132,9 +2133,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
>   * matches a pattern "refs/heads/" but not "refs/heads/m") or a
>   * wildcard (e.g. the same ref matches "refs/heads/m*", too).
>   */
> -static int match_name_as_path(const struct ref_filter *filter, const char *refname)
> +static int match_name_as_path(const struct ref_filter *filter,
> +			      const char **pattern,
> +			      const char *refname)
>  {
> -	const char **pattern = filter->name_patterns;
>  	int namelen = strlen(refname);
>  	unsigned flags = WM_PATHNAME;
>  

These hint that we'd eventually lose .name_patterns member from the
structure so that we can pass pattern array that is not necessarily
tied to any instance of a filter?

> @@ -2163,8 +2165,8 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
>  	if (!*filter->name_patterns)
>  		return 1; /* No pattern always matches */
>  	if (filter->match_as_path)
> -		return match_name_as_path(filter, refname);
> -	return match_pattern(filter, refname);
> +		return match_name_as_path(filter, filter->name_patterns, refname);
> +	return match_pattern(filter, filter->name_patterns, refname);

And we are not there yet, so we hoist the use of .name_patterns
member one level up to the only caller?

Without seeing how it evolves, we can tell this does not make
anything break, but we cannot tell how this helps anything (yet).

Let's read on.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 06/15] builtin/for-each-ref.c: add `--exclude` option
  2023-05-08 21:59 ` [PATCH 06/15] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
@ 2023-05-08 23:22   ` Junio C Hamano
  2023-05-09 20:22     ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2023-05-08 23:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index c01fa6fefe..449da61e11 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -14,6 +14,7 @@ static char const * const for_each_ref_usage[] = {
>  	N_("git for-each-ref [--points-at <object>]"),
>  	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
>  	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
> +	N_("git for-each-ref [--exclude=<pattern> ...]"),
>  	NULL
>  };

I think the original is already wrong, but the easiest thing we can
do in order to avoid making it worse is to drop this hunk, as the
existing usage is this:

static char const * const for_each_ref_usage[] = {
	N_("git for-each-ref [<options>] [<pattern>]"),
	N_("git for-each-ref [--points-at <object>]"),
	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
	NULL
};

and this series merely adds a new "--exclude=<pattern>" as one of
the "<options>". 

As we can see from the fact that for example

 $ git for-each-ref --no-merged next refs/heads/\?\?/\*

works just fine, exactly the same thing can be said about the other
--points-at/--merged/--no-merged/--contains/--no-contains options.

The SYNOPSIS section of the manual page is fine.

> diff --git a/ref-filter.c b/ref-filter.c
> index 6c5eed144f..93dc9b331f 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2169,6 +2169,15 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
>  	return match_pattern(filter, filter->name_patterns, refname);
>  }
>  
> +static int filter_exclude_match(struct ref_filter *filter, const char *refname)
> +{
> +	if (!filter->exclude.nr)
> +		return 0;
> +	if (filter->match_as_path)
> +		return match_name_as_path(filter, filter->exclude.v, refname);
> +	return match_pattern(filter, filter->exclude.v, refname);
> +}

Earlier I made a comment about .name_patterns member becoming
unnecessary, but I think what should need to happen is instead
match_pattern() and match_name_as_path() to lose the "filter"
parameter, and take a boolean "ignore_case" instead.

>  struct ref_filter {
>  	const char **name_patterns;
> +	struct strvec exclude;

At some point after the dust settles, we may want to tweak
name_patterns so that these two appear to complement each other more
clearly, e.g. use "struct strvec include" to replace "name_patterns"
or something.  But that is an unrelated tangent.

In any case, there wasn't anything surprising or unexpected in the
code.  Looking good.

> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 5c00607608..7e8d578522 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -447,6 +447,41 @@ test_expect_success 'exercise glob patterns with prefixes' '
>  	test_cmp expected actual
>  '
>  
> +cat >expected <<\EOF
> +refs/tags/bar
> +refs/tags/baz
> +refs/tags/testtag
> +EOF
> +
> +test_expect_success 'exercise patterns with prefix exclusions' '
> +	for tag in foo/one foo/two foo/three bar baz
> +	do
> +		git tag "$tag" || return 1
> +	done &&
> +	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
> +	git for-each-ref --format="%(refname)" \
> +		refs/tags/ --exclude=refs/tags/foo >actual &&
> +	test_cmp expected actual
> +'
> +
> +cat >expected <<\EOF
> +refs/tags/bar
> +refs/tags/baz
> +refs/tags/foo/one
> +refs/tags/testtag
> +EOF
> +
> +test_expect_success 'exercise patterns with pattern exclusions' '
> +	for tag in foo/one foo/two foo/three bar baz
> +	do
> +		git tag "$tag" || return 1
> +	done &&
> +	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
> +	git for-each-ref --format="%(refname)" \
> +		refs/tags/ --exclude="refs/tags/foo/t*" >actual &&
> +	test_cmp expected actual
> +'

These are doing as Romans do, so I won't comment on the outdated
pattern of preparing the expectation outside the test script.  After
the dust settles, somebody needs to go in and clean it up.

Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 08/15] refs/packed-backend.c: refactor `find_reference_location()`
  2023-05-08 22:00 ` [PATCH 08/15] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
@ 2023-05-08 23:56   ` Junio C Hamano
  2023-05-09 20:29     ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2023-05-08 23:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> The function `find_reference_location()` is used to perform a
> binary search-like function over the contents of a repository's
> `$GIT_DIR/packed-refs` file.
>
> The search it implements is unlike a standard binary search in that the
> records it searches over are not of a fixed width, so the comparison
> must locate the end of a record before comparing it.
>
> Extract the core routine of `find_reference_location()` in order to
> implement a function in the following patch which will find the first
> location in the `packed-refs` file that *doesn't* match the given
> pattern.
>
> The behavior of `find_reference_location()` is unchanged.

The splitting out of this step is rather unfortunate in that the
extra parameter "start" given to cmp_record_to_refname(), together
with a rather curious returning to "-1" when the parameter is false,
are not justified at all by looking at the caller, because the only
caller of the function, find_reference_location_1() always passes
"start" it got from its caller without modifying, and the sole
caller of that passes "1" to "start".  IOW, the behaviour of
cmp_record_to_refname() when "start" is false can be any arbitrary
garbage (it could even be BUG("")) and the claim "the behaviour is
unchanged" will still be true.

But that does not help the readers all that much.

So, yes I can agree that this does not introduce any new bug, it is
a mysterious no-op, and why we want to pass different values in "start"
in future steps in order to achieve what is not explained and leaves
the readers frustrated.

I'll stop here for now and come back to the rest laster.

Thanks.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
  2023-05-08 22:00 ` [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s) Taylor Blau
@ 2023-05-09  0:10   ` Chris Torek
  2023-05-09 20:39     ` Taylor Blau
  2023-05-09 15:15   ` Patrick Steinhardt
  2023-05-09 23:40   ` Junio C Hamano
  2 siblings, 1 reply; 149+ messages in thread
From: Chris Torek @ 2023-05-09  0:10 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

I haven't read through any of this closely (don't have time for it now) but
want to mention one thing here:

On Mon, May 8, 2023 at 3:06 PM Taylor Blau <me@ttaylorr.com> wrote:
>   - Construct a skip list of regions by combining adjacent and
>     overlapping regions from the previous step.

You might want to add a note to the code that there is no
relationship here to the skip list data structure (see
https://en.wikipedia.org/wiki/Skip_list).

Chris

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 04/15] ref-filter: add ref_filter_clear()
  2023-05-08 21:59 ` [PATCH 04/15] ref-filter: add ref_filter_clear() Taylor Blau
  2023-05-08 22:29   ` Junio C Hamano
@ 2023-05-09 15:14   ` Patrick Steinhardt
  2023-05-09 19:11     ` Taylor Blau
  1 sibling, 1 reply; 149+ messages in thread
From: Patrick Steinhardt @ 2023-05-09 15:14 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 460 bytes --]

On Mon, May 08, 2023 at 05:59:52PM -0400, Taylor Blau wrote:
> From: Jeff King <peff@peff.net>
> 
> We did not bother to clean up at all in branch/tag, and for-each-ref
> only hit a few elements. So this is probably cleaning up leaks, but I
> didn't check yet.

Nit: it sounds like there still is the intent to check whether this does
fix leaks. How about updating the commit message to either not mention
the intent or perform the check?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 07/15] refs: plumb `exclude_patterns` argument throughout
  2023-05-08 22:00 ` [PATCH 07/15] refs: plumb `exclude_patterns` argument throughout Taylor Blau
@ 2023-05-09 15:14   ` Patrick Steinhardt
  2023-05-09 20:23     ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Patrick Steinhardt @ 2023-05-09 15:14 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

On Mon, May 08, 2023 at 06:00:01PM -0400, Taylor Blau wrote:
> The subsequent patch will want to access an optional `excluded_patterns`
> array within refs/packed-backend.c. To do so, the refs subsystem needs
> to be updated to pass this value across a number of different locations.
> 
> Prepare for a future patch by introducing this plumbing now, passing
> NULLs at top-level APIs in order to make that patch less noisy and more
> easily readable.

It might be worth mentioning in the commit message that the exclude
patterns are supposed to be best-effort. In other words, any caller
would still need to do manual filtering if I understand correctly. And
while this is indeed documented via `refs_for_each_fullref_in_prefixes`
it is quite easy to miss this important little detail.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
  2023-05-08 22:00 ` [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s) Taylor Blau
  2023-05-09  0:10   ` Chris Torek
@ 2023-05-09 15:15   ` Patrick Steinhardt
  2023-05-09 20:55     ` Taylor Blau
  2023-05-09 23:40   ` Junio C Hamano
  2 siblings, 1 reply; 149+ messages in thread
From: Patrick Steinhardt @ 2023-05-09 15:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 8029 bytes --]

On Mon, May 08, 2023 at 06:00:08PM -0400, Taylor Blau wrote:
> When iterating through the `packed-refs` file in order to answer a query
> like:
> 
>     $ git for-each-ref --exclude=refs/__hidden__
> 
> it would be useful to avoid walking over all of the entries in
> `refs/__hidden__/*` when possible, since we know that the ref-filter
> code is going to throw them away anyways.
> 
> In certain circumstances, doing so is possible. The algorithm for doing
> so is as follows:
> 
>   - For each excluded pattern, find the first record that matches it,
>     and the first pattern that *doesn't* match it (i.e. the location
>     you'd next want to consider when excluding that pattern).
> 
>   - Sort the patterns by their starting location within the
>     `packed-refs` file.
> 
>   - Construct a skip list of regions by combining adjacent and
>     overlapping regions from the previous step.
> 
>   - When iterating through the `packed-refs` file, if `iter->pos` is
>     ever contained in one of the regions from the previous steps,
>     advance `iter->pos` past the end of that region, and continue
>     enumeration.
> 
> Note that this optimization is only possible when none of the excluded
> pattern(s) have special meta-characters in them. To see why this is the
> case, consider the exclusion pattern "refs/foo[a]". In general, in order
> to find the location of the first record that matches this pattern, we
> could only consider up to the first meta-character, "refs/foo". But this
> doesn't work, since the excluded region we'd come up with would include
> "refs/foobar", even though it is not excluded.

Is this generally true though? A naive implementation would iterate
through all references and find the first reference that matches the
exclusion regular exepression. From thereon we continue to iterate until
we find the first entry that doesn't match. This may cause us to end up
with a suboptimal skip list, but the skip list would still be valid.

As I said, this implementation would be naive as we're now forced to
iterate through all references starting at the beginning. I assume that
your implementation will instead use a binary search to locate the first
entry that matches the exclusion pattern and the last pattern. But the
way this paragraph is formulated makes it sound like this is a general
fact, even though it is a fact that derives from the implementation.

I of course don't propose to change the algorithm here, but instead to
clarify where this restriction actually comes from and why the tradeoff
makes sense.

[snip]
> @@ -929,6 +972,108 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
>  	.abort = packed_ref_iterator_abort
>  };
>  
> +static int skip_list_entry_cmp(const void *va, const void *vb)
> +{
> +	const struct skip_list_entry *a = va;
> +	const struct skip_list_entry *b = vb;
> +
> +	if (a->start < b->start)
> +		return -1;
> +	if (a->start > b->start)
> +		return 1;
> +	return 0;
> +}
> +
> +static int has_glob_special(const char *str)
> +{
> +	const char *p;
> +	for (p = str; *p; p++) {
> +		if (is_glob_special(*p))
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static const char *ptr_max(const char *x, const char *y)
> +{
> +	if (x > y)
> +		return x;
> +	return y;
> +}
> +
> +static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
> +					struct snapshot *snapshot,
> +					const char **excluded_patterns)
> +{
> +	size_t i, j;
> +	const char **pattern;
> +
> +	if (!excluded_patterns)
> +		return;
> +
> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		struct skip_list_entry *e;
> +
> +		/*
> +		 * We can't feed any excludes with globs in them to the
> +		 * refs machinery.  It only understands prefix matching.
> +		 * We likewise can't even feed the string leading up to
> +		 * the first meta-character, as something like "foo[a]"
> +		 * should not exclude "foobar" (but the prefix "foo"
> +		 * would match that and mark it for exclusion).
> +		 */
> +		if (has_glob_special(*pattern))
> +			continue;
> +
> +		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
> +
> +		e = &iter->skip[iter->skip_nr++];
> +		e->start = find_reference_location(snapshot, *pattern, 0);
> +		e->end = find_reference_location_end(snapshot, *pattern, 0);

One thing that makes this hard to reason about is that we don't
explicitly handle the case where the pattern doesn't match at all. So
you require a bunch of knowledge about what exactly the functions
`find_reference_location()` and `find_reference_location_end()` do in
that case in order to determine whether we will end up doing the right
thing.

Explicitly handling this would give us some benefits:

- It makes the code more obvious.

- We can avoid an extra skip list entry for every non-matching
  pattern.

- We wouldn't have to perform binary search over the snapshot twice.

Might be I'm missing something though.

> +	}
> +
> +	if (!iter->skip_nr) {
> +		/*
> +		 * Every entry in exclude_patterns has a meta-character,
> +		 * nothing to do here.
> +		 */
> +		return;
> +	}
> +
> +	QSORT(iter->skip, iter->skip_nr, skip_list_entry_cmp);
> +
> +	/*
> +	 * As an optimization, merge adjacent entries in the skip list
> +	 * to jump forwards as far as possible when entering a skipped
> +	 * region.
> +	 *
> +	 * For example, if we have two skipped regions:
> +	 *
> +	 *	[[A, B], [B, C]]
> +	 *
> +	 * we want to combine that into a single entry jumping from A to
> +	 * C.
> +	 */
> +	for (i = 1, j = 1; i < iter->skip_nr; i++) {
> +		struct skip_list_entry *ours = &iter->skip[i];
> +		struct skip_list_entry *prev = &iter->skip[i - 1];
> +
> +		if (ours->start == ours->end) {
> +			/* ignore empty regions (no matching entries) */
> +			continue;
> +		} else if (prev->end >= ours->start) {
> +			/* overlapping regions extend the previous one */
> +			prev->end = ptr_max(prev->end, ours->end);
> +		} else {
> +			/* otherwise, insert a new region */
> +			iter->skip[j++] = *ours;
> +		}
> +	}

Mh. Does this do the right thing in case we have multiple consecutive
overlapping skip list entries? We always end up adjusting the immediate
predecessor as we use `i - 1` to find `prev`. Shouldn't we instead start
with `j = 0` and assign `prev = &iter->skip[j]`?

[snip]
> diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
> new file mode 100755
> index 0000000000..da5265a5a8
> --- /dev/null
> +++ b/t/t1419-exclude-refs.sh
> @@ -0,0 +1,101 @@
> +#!/bin/sh
> +
> +test_description='test exclude_patterns functionality in main ref store'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +for_each_ref__exclude () {
> +	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
> +	cut -d ' ' -f 2 actual.raw
> +}
> +
> +for_each_ref () {
> +	git for-each-ref --format='%(refname)' "$@"
> +}
> +
> +test_expect_success 'setup' '
> +	test_commit --no-tag base &&
> +	base="$(git rev-parse HEAD)" &&
> +
> +	for name in foo bar baz quux
> +	do
> +		for i in 1 2 3
> +		do
> +			echo "create refs/heads/$name/$i $base" || return 1
> +		done || return 1
> +	done >in &&
> +	echo "delete refs/heads/main" >>in &&
> +
> +	git update-ref --stdin <in &&
> +	git pack-refs --all
> +'
> +
> +test_expect_success 'for_each_ref__exclude(refs/heads/foo/)' '
> +	# region in middle
> +	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
> +	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
> +
> +	test_cmp expect actual
> +'

Nit: it might be a bit more readable if we put the comment into the test
description instead of having an opaque description that mentions ref
names that don't have much of a meaning without reading the test itself.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 14/15] upload-pack.c: avoid enumerating hidden refs where possible
  2023-05-08 22:00 ` [PATCH 14/15] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
@ 2023-05-09 15:15   ` Patrick Steinhardt
  2023-05-09 21:34     ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Patrick Steinhardt @ 2023-05-09 15:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Mon, May 08, 2023 at 06:00:26PM -0400, Taylor Blau wrote:
[snip]
> diff --git a/upload-pack.c b/upload-pack.c
> index 7c646ea5bd..0162fffce0 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -601,11 +601,32 @@ static int get_common_commits(struct upload_pack_data *data,
>  	}
>  }
>  
> +static int allow_hidden_refs(enum allow_uor allow_uor)
> +{
> +	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
> +}
> +
> +static void for_each_namespaced_ref_1(each_ref_fn fn,
> +				      struct upload_pack_data *data)

I know it's common practice in the Git project, but personally I tend to
fight with functions that have a `_1` suffix. You simply cannot tell
what the difference is to the non-suffixed variant without checking its
declaration.

`for_each_namespaced_ref_with_optional_hidden_refs()` is definitely a
mouthful though, and I can't really think of something better either.

> +{
> +	/*
> +	 * If `data->allow_uor` allows updating hidden refs, we need to
> +	 * mark all references (including hidden ones), to check in
> +	 * `is_our_ref()` below.

Doesn't this influence whether somebody can _fetch_ objects pointed to
by the hidden refs instead of _updating_ them?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 04/15] ref-filter: add ref_filter_clear()
  2023-05-09 15:14   ` Patrick Steinhardt
@ 2023-05-09 19:11     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-09 19:11 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

On Tue, May 09, 2023 at 05:14:49PM +0200, Patrick Steinhardt wrote:
> On Mon, May 08, 2023 at 05:59:52PM -0400, Taylor Blau wrote:
> > From: Jeff King <peff@peff.net>
> >
> > We did not bother to clean up at all in branch/tag, and for-each-ref
> > only hit a few elements. So this is probably cleaning up leaks, but I
> > didn't check yet.
>
> Nit: it sounds like there still is the intent to check whether this does
> fix leaks. How about updating the commit message to either not mention
> the intent or perform the check?

Oops. Thanks for the reminder. Peff sent this patch to me while we were
working on this topic together, and I forgot to come back and actually
perform this check.

Luckily, it helps out quite a bit:

   t/t0041-usage.sh                          |  1 +
   t/t2016-checkout-patch.sh                 |  1 +
   t/t3402-rebase-merge.sh                   |  1 +
   t/t3407-rebase-abort.sh                   |  1 +
   t/t4058-diff-duplicates.sh                |  2 ++
   t/t5407-post-rewrite-hook.sh              |  1 +
   t/t5811-proto-disable-git.sh              |  2 ++
   t/t6001-rev-list-graft.sh                 |  1 +
   t/t7008-filter-branch-null-sha1.sh        |  1 +
   t/t7408-submodule-reference.sh            |  2 ++
   t/t9502-gitweb-standalone-parse-output.sh |  1 +

11 new leak-free tests! I'll take it ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 05/15] ref-filter.c: parameterize match functions over patterns
  2023-05-08 22:36   ` Junio C Hamano
@ 2023-05-09 20:13     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-09 20:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Derrick Stolee

On Mon, May 08, 2023 at 03:36:45PM -0700, Junio C Hamano wrote:
> > @@ -2132,9 +2133,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
> >   * matches a pattern "refs/heads/" but not "refs/heads/m") or a
> >   * wildcard (e.g. the same ref matches "refs/heads/m*", too).
> >   */
> > -static int match_name_as_path(const struct ref_filter *filter, const char *refname)
> > +static int match_name_as_path(const struct ref_filter *filter,
> > +			      const char **pattern,
> > +			      const char *refname)
> >  {
> > -	const char **pattern = filter->name_patterns;
> >  	int namelen = strlen(refname);
> >  	unsigned flags = WM_PATHNAME;
> >
>
> These hint that we'd eventually lose .name_patterns member from the
> structure so that we can pass pattern array that is not necessarily
> tied to any instance of a filter?

Right. We'll pass in the excluded patterns in the next commit.

> And we are not there yet, so we hoist the use of .name_patterns
> member one level up to the only caller?
>
> Without seeing how it evolves, we can tell this does not make
> anything break, but we cannot tell how this helps anything (yet).

Hmm. I tried to allude to this in the patch message with the paragraph:

    The subsequent patch will add a new array of patterns to match over (the
    excluded patterns, via a new `git for-each-ref --exclude` option),
    treating the return value of these functions differently depending on
    which patterns are being used to match.

    Tweak `match_pattern()` and `match_name_as_path()` to take an array of
    patterns to prepare for passing either in.

...but I'm happy to add detail or clarify it if you think that these
paragraphs could use it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 06/15] builtin/for-each-ref.c: add `--exclude` option
  2023-05-08 23:22   ` Junio C Hamano
@ 2023-05-09 20:22     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-09 20:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Derrick Stolee

On Mon, May 08, 2023 at 04:22:08PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > index c01fa6fefe..449da61e11 100644
> > --- a/builtin/for-each-ref.c
> > +++ b/builtin/for-each-ref.c
> > @@ -14,6 +14,7 @@ static char const * const for_each_ref_usage[] = {
> >  	N_("git for-each-ref [--points-at <object>]"),
> >  	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
> >  	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
> > +	N_("git for-each-ref [--exclude=<pattern> ...]"),
> >  	NULL
> >  };
>
> I think the original is already wrong, but the easiest thing we can
> do in order to avoid making it worse is to drop this hunk, as the
> existing usage is this:
>
> static char const * const for_each_ref_usage[] = {
> 	N_("git for-each-ref [<options>] [<pattern>]"),
> 	N_("git for-each-ref [--points-at <object>]"),
> 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
> 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
> 	NULL
> };
>
> and this series merely adds a new "--exclude=<pattern>" as one of
> the "<options>".
>
> As we can see from the fact that for example
>
>  $ git for-each-ref --no-merged next refs/heads/\?\?/\*
>
> works just fine, exactly the same thing can be said about the other
> --points-at/--merged/--no-merged/--contains/--no-contains options.
>
> The SYNOPSIS section of the manual page is fine.

Good point, will tweak; thanks.

> > @@ -2169,6 +2169,15 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
> >  	return match_pattern(filter, filter->name_patterns, refname);
> >  }
> >
> > +static int filter_exclude_match(struct ref_filter *filter, const char *refname)
> > +{
> > +	if (!filter->exclude.nr)
> > +		return 0;
> > +	if (filter->match_as_path)
> > +		return match_name_as_path(filter, filter->exclude.v, refname);
> > +	return match_pattern(filter, filter->exclude.v, refname);
> > +}
>
> Earlier I made a comment about .name_patterns member becoming
> unnecessary, but I think what should need to happen is instead
> match_pattern() and match_name_as_path() to lose the "filter"
> parameter, and take a boolean "ignore_case" instead.

Agreed.

> > +cat >expected <<\EOF
> > +refs/tags/bar
> > +refs/tags/baz
> > +refs/tags/foo/one
> > +refs/tags/testtag
> > +EOF
> > +
> > +test_expect_success 'exercise patterns with pattern exclusions' '
> > +	for tag in foo/one foo/two foo/three bar baz
> > +	do
> > +		git tag "$tag" || return 1
> > +	done &&
> > +	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
> > +	git for-each-ref --format="%(refname)" \
> > +		refs/tags/ --exclude="refs/tags/foo/t*" >actual &&
> > +	test_cmp expected actual
> > +'
>
> These are doing as Romans do, so I won't comment on the outdated
> pattern of preparing the expectation outside the test script.  After
> the dust settles, somebody needs to go in and clean it up.

Yeah, I figured that this series was already getting pretty long, but
that it would be expedient to propagate forward this pattern. But it
should be cleaned up. Let's tag it with #leftoverbits accordingly.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 07/15] refs: plumb `exclude_patterns` argument throughout
  2023-05-09 15:14   ` Patrick Steinhardt
@ 2023-05-09 20:23     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-09 20:23 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

On Tue, May 09, 2023 at 05:14:58PM +0200, Patrick Steinhardt wrote:
> On Mon, May 08, 2023 at 06:00:01PM -0400, Taylor Blau wrote:
> > The subsequent patch will want to access an optional `excluded_patterns`
> > array within refs/packed-backend.c. To do so, the refs subsystem needs
> > to be updated to pass this value across a number of different locations.
> >
> > Prepare for a future patch by introducing this plumbing now, passing
> > NULLs at top-level APIs in order to make that patch less noisy and more
> > easily readable.
>
> It might be worth mentioning in the commit message that the exclude
> patterns are supposed to be best-effort. In other words, any caller
> would still need to do manual filtering if I understand correctly. And
> while this is indeed documented via `refs_for_each_fullref_in_prefixes`
> it is quite easy to miss this important little detail.

Good suggestion, I updated this patch message accordingly.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 08/15] refs/packed-backend.c: refactor `find_reference_location()`
  2023-05-08 23:56   ` Junio C Hamano
@ 2023-05-09 20:29     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-09 20:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Derrick Stolee

On Mon, May 08, 2023 at 04:56:41PM -0700, Junio C Hamano wrote:
> So, yes I can agree that this does not introduce any new bug, it is
> a mysterious no-op, and why we want to pass different values in "start"
> in future steps in order to achieve what is not explained and leaves
> the readers frustrated.

Re-reading this patch again with your review in mind, I agree that the
split is poorly placed.

I modified this patch to just extract the core routine behind a helper
function and avoided adding the "start" parameter here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
  2023-05-09  0:10   ` Chris Torek
@ 2023-05-09 20:39     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-09 20:39 UTC (permalink / raw)
  To: Chris Torek; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

On Mon, May 08, 2023 at 05:10:52PM -0700, Chris Torek wrote:
> On Mon, May 8, 2023 at 3:06 PM Taylor Blau <me@ttaylorr.com> wrote:
> >   - Construct a skip list of regions by combining adjacent and
> >     overlapping regions from the previous step.
>
> You might want to add a note to the code that there is no
> relationship here to the skip list data structure (see
> https://en.wikipedia.org/wiki/Skip_list).

Good suggestion, thanks. I picked the name skip list for this concept
since we're skipping over excluded regions of the packed-refs file, but
you're right it has no relation to the skip list data structure.

Will note in the patch message.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
  2023-05-09 15:15   ` Patrick Steinhardt
@ 2023-05-09 20:55     ` Taylor Blau
  2023-05-09 21:15       ` Taylor Blau
  2023-05-10  7:25       ` Patrick Steinhardt
  0 siblings, 2 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-09 20:55 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

On Tue, May 09, 2023 at 05:15:43PM +0200, Patrick Steinhardt wrote:
> On Mon, May 08, 2023 at 06:00:08PM -0400, Taylor Blau wrote:
>
> > Note that this optimization is only possible when none of the excluded
> > pattern(s) have special meta-characters in them. To see why this is the
> > case, consider the exclusion pattern "refs/foo[a]". In general, in order
> > to find the location of the first record that matches this pattern, we
> > could only consider up to the first meta-character, "refs/foo". But this
> > doesn't work, since the excluded region we'd come up with would include
> > "refs/foobar", even though it is not excluded.
>
> Is this generally true though? A naive implementation would iterate
> through all references and find the first reference that matches the
> exclusion regular exepression. From thereon we continue to iterate until
> we find the first entry that doesn't match. This may cause us to end up
> with a suboptimal skip list, but the skip list would still be valid.
>
> As I said, this implementation would be naive as we're now forced to
> iterate through all references starting at the beginning. I assume that
> your implementation will instead use a binary search to locate the first
> entry that matches the exclusion pattern and the last pattern. But the
> way this paragraph is formulated makes it sound like this is a general
> fact, even though it is a fact that derives from the implementation.
>
> I of course don't propose to change the algorithm here, but instead to
> clarify where this restriction actually comes from and why the tradeoff
> makes sense.

In the example you include, it's possible. But consider something like:

    $ git for-each-ref --exclude='refs/foo[ac]'

The region that matches that expression ("refs/fooa", "refs/fooc" and
everything underneath them) does not have to appear as a continuous
single region in the packed-refs file. If you have, say, "refs/foobar",
that will appear between the two regions you want to exclude.

So I think you *might* be able to do it in general, but at the very
least it would involve splitting each character class and finding the
start and end of any region(s) that it matches.

Even so, you'd have to try and match each entry as you determine the
width of the excluded region, at which point you're at par with
enumerating them anyway and having the caller discard any entries it
doesn't want.

> > +	for (pattern = excluded_patterns; *pattern; pattern++) {
> > +		struct skip_list_entry *e;
> > +
> > +		/*
> > +		 * We can't feed any excludes with globs in them to the
> > +		 * refs machinery.  It only understands prefix matching.
> > +		 * We likewise can't even feed the string leading up to
> > +		 * the first meta-character, as something like "foo[a]"
> > +		 * should not exclude "foobar" (but the prefix "foo"
> > +		 * would match that and mark it for exclusion).
> > +		 */
> > +		if (has_glob_special(*pattern))
> > +			continue;
> > +
> > +		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
> > +
> > +		e = &iter->skip[iter->skip_nr++];
> > +		e->start = find_reference_location(snapshot, *pattern, 0);
> > +		e->end = find_reference_location_end(snapshot, *pattern, 0);
>
> One thing that makes this hard to reason about is that we don't
> explicitly handle the case where the pattern doesn't match at all. So
> you require a bunch of knowledge about what exactly the functions
> `find_reference_location()` and `find_reference_location_end()` do in
> that case in order to determine whether we will end up doing the right
> thing.
>
> Explicitly handling this would give us some benefits:
>
> - It makes the code more obvious.
>
> - We can avoid an extra skip list entry for every non-matching
>   pattern.
>
> - We wouldn't have to perform binary search over the snapshot twice.
>
> Might be I'm missing something though.

We handle it implicitly via the mustexist parameter that both
find_reference_location() and find_reference_location_end() take, which
returns the position that you would find a matching entry if one does
not already exist.

You're right that it would save you a second binary search, but that is
likely to be a vanishingly small cost compared to the actual traversal,
disk I/O, etc.

I could imagine modifying the signature of both of those functions to be
something like:

    int find_reference_location(struct snapshot *, const char *pattern,
                                int mustexist, char *out)

Which would propagate the result through `out`, and the return value
would be whether or not it found a matching entry. But that only really
makes sense when `mustexist` is set to 0, and it adds some verbosity
throughout.

I think that it's totally possible to avoid the second search, but I'm
not sure that the cost of additional complexity and verbosity is worth
the benefit of avoiding one binary search (which will likely be
resident, anyway).

Note though that if e->start == e->end here we will discard the empty
skip region below.

> > +	for (i = 1, j = 1; i < iter->skip_nr; i++) {
> > +		struct skip_list_entry *ours = &iter->skip[i];
> > +		struct skip_list_entry *prev = &iter->skip[i - 1];
> > +
> > +		if (ours->start == ours->end) {
> > +			/* ignore empty regions (no matching entries) */
> > +			continue;
> > +		} else if (prev->end >= ours->start) {
> > +			/* overlapping regions extend the previous one */
> > +			prev->end = ptr_max(prev->end, ours->end);
> > +		} else {
> > +			/* otherwise, insert a new region */
> > +			iter->skip[j++] = *ours;
> > +		}
> > +	}
>
> Mh. Does this do the right thing in case we have multiple consecutive
> overlapping skip list entries? We always end up adjusting the immediate
> predecessor as we use `i - 1` to find `prev`. Shouldn't we instead start
> with `j = 0` and assign `prev = &iter->skip[j]`?

Good catch. I think applying this on top should do the trick:

--- 8< ---
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 137a4233f6..3b1337267a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1054,9 +1054,9 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
 	 * we want to combine that into a single entry jumping from A to
 	 * C.
 	 */
-	for (i = 1, j = 1; i < iter->skip_nr; i++) {
+	for (i = 0, j = 0; i < iter->skip_nr; i++) {
 		struct skip_list_entry *ours = &iter->skip[i];
-		struct skip_list_entry *prev = &iter->skip[i - 1];
+		struct skip_list_entry *prev = &iter->skip[j];

 		if (ours->start == ours->end) {
 			/* ignore empty regions (no matching entries) */
@@ -1066,7 +1066,7 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
 			prev->end = ptr_max(prev->end, ours->end);
 		} else {
 			/* otherwise, insert a new region */
-			iter->skip[j++] = *ours;
+			iter->skip[++j] = *ours;
 		}
 	}
--- >8 ---

> > +test_expect_success 'for_each_ref__exclude(refs/heads/foo/)' '
> > +	# region in middle
> > +	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
> > +	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
> > +
> > +	test_cmp expect actual
> > +'
>
> Nit: it might be a bit more readable if we put the comment into the test
> description instead of having an opaque description that mentions ref
> names that don't have much of a meaning without reading the test itself.

Fair enough ;-).

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
  2023-05-09 20:55     ` Taylor Blau
@ 2023-05-09 21:15       ` Taylor Blau
  2023-05-10  7:25       ` Patrick Steinhardt
  1 sibling, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-09 21:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

On Tue, May 09, 2023 at 04:55:55PM -0400, Taylor Blau wrote:
> Good catch. I think applying this on top should do the trick:
>
> --- 8< ---
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 137a4233f6..3b1337267a 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1054,9 +1054,9 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
>  	 * we want to combine that into a single entry jumping from A to
>  	 * C.
>  	 */
> -	for (i = 1, j = 1; i < iter->skip_nr; i++) {
> +	for (i = 0, j = 0; i < iter->skip_nr; i++) {
>  		struct skip_list_entry *ours = &iter->skip[i];
> -		struct skip_list_entry *prev = &iter->skip[i - 1];
> +		struct skip_list_entry *prev = &iter->skip[j];
>
>  		if (ours->start == ours->end) {
>  			/* ignore empty regions (no matching entries) */
> @@ -1066,7 +1066,7 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
>  			prev->end = ptr_max(prev->end, ours->end);
>  		} else {
>  			/* otherwise, insert a new region */
> -			iter->skip[j++] = *ours;
> +			iter->skip[++j] = *ours;
>  		}
>  	}
> --- >8 ---

Oops, this is wrong. It should be something like this instead:

--- 8< ---
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 137a4233f6..574f32d67f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1007,6 +1007,7 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
 {
 	size_t i, j;
 	const char **pattern;
+	struct skip_list_entry *last_disjoint;

 	if (!excluded_patterns)
 		return;
@@ -1054,19 +1055,22 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
 	 * we want to combine that into a single entry jumping from A to
 	 * C.
 	 */
+	last_disjoint = iter->skip;
+
 	for (i = 1, j = 1; i < iter->skip_nr; i++) {
 		struct skip_list_entry *ours = &iter->skip[i];
-		struct skip_list_entry *prev = &iter->skip[i - 1];

 		if (ours->start == ours->end) {
 			/* ignore empty regions (no matching entries) */
 			continue;
-		} else if (prev->end >= ours->start) {
+		} else if (ours->start <= last_disjoint->end) {
 			/* overlapping regions extend the previous one */
-			prev->end = ptr_max(prev->end, ours->end);
+			last_disjoint->end = ptr_max(last_disjoint->end, ours->end);
 		} else {
 			/* otherwise, insert a new region */
 			iter->skip[j++] = *ours;
+			last_disjoint = ours;
+
 		}
 	}
--- >8 ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH 14/15] upload-pack.c: avoid enumerating hidden refs where possible
  2023-05-09 15:15   ` Patrick Steinhardt
@ 2023-05-09 21:34     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-09 21:34 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

On Tue, May 09, 2023 at 05:15:50PM +0200, Patrick Steinhardt wrote:
> On Mon, May 08, 2023 at 06:00:26PM -0400, Taylor Blau wrote:
> > @@ -601,11 +601,32 @@ static int get_common_commits(struct upload_pack_data *data,
> >  	}
> >  }
> >
> > +static int allow_hidden_refs(enum allow_uor allow_uor)
> > +{
> > +	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
> > +}
> > +
> > +static void for_each_namespaced_ref_1(each_ref_fn fn,
> > +				      struct upload_pack_data *data)
>
> I know it's common practice in the Git project, but personally I tend to
> fight with functions that have a `_1` suffix. You simply cannot tell
> what the difference is to the non-suffixed variant without checking its
> declaration.
>
> `for_each_namespaced_ref_with_optional_hidden_refs()` is definitely a
> mouthful though, and I can't really think of something better either.

Yeah, I know. It's not my favorite convention, either, but I also
couldn't come up with anything shorter ;-).

> > +{
> > +	/*
> > +	 * If `data->allow_uor` allows updating hidden refs, we need to
> > +	 * mark all references (including hidden ones), to check in
> > +	 * `is_our_ref()` below.
>
> Doesn't this influence whether somebody can _fetch_ objects pointed to
> by the hidden refs instead of _updating_ them?

Oops, yes. Thanks for catching my obvious typo.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
  2023-05-08 22:00 ` [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s) Taylor Blau
  2023-05-09  0:10   ` Chris Torek
  2023-05-09 15:15   ` Patrick Steinhardt
@ 2023-05-09 23:40   ` Junio C Hamano
  2023-05-10  2:30     ` Taylor Blau
  2 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2023-05-09 23:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> When iterating through the `packed-refs` file in order to answer a query
> like:
>
>     $ git for-each-ref --exclude=refs/__hidden__
>
> it would be useful to avoid walking over all of the entries in
> `refs/__hidden__/*` when possible, since we know that the ref-filter
> code is going to throw them away anyways.
>
> In certain circumstances, doing so is possible. The algorithm for doing
> so is as follows:
>
>   - For each excluded pattern, find the first record that matches it,
>     and the first pattern that *doesn't* match it (i.e. the location
>     you'd next want to consider when excluding that pattern).

Do we find "record" and then "pattern", or is the latter a misspelt
"record"?  I will assume it is the latter while reading the rest.
Is the latter "the record that does not match the pattern, whose
record number is the smallest but yet larger than the first record
that matches the pattern?"  That is we assume that the refs in the
packed refs file are sorted and can be partitioned into three by
each pattern: (1) refs before the first matching record---they do
not match the pattern. (2) refs at or after the first matching
record that match.  (3) refs after (2) that do not match.

I am not sure if that would work.  What if the pattern were refs/tags/[ad-z]*
and there are packed tags refs/tags/{a,b,c,d,e}?  The pattern would partition
the refs into [a](matches), [b,c](does not match), [d,e](matches).

Perhaps I am grossly misunderstanding what the above explanation
says.

>   - Sort the patterns by their starting location within the
>     `packed-refs` file.

So the idea is that the patterns are sorted by the first record they
match, and after sorting these patterns, refs that are between the
beginning of the list of refs and the first record associated with
the first pattern will not match _any_ pattern in that list?

>   - Construct a skip list of regions by combining adjacent and
>     overlapping regions from the previous step.

"skip list of regions" -> "list of regions to skip", I guess.

>   - When iterating through the `packed-refs` file, if `iter->pos` is
>     ever contained in one of the regions from the previous steps,
>     advance `iter->pos` past the end of that region, and continue
>     enumeration.
>
> Note that this optimization is only possible when none of the excluded
> pattern(s) have special meta-characters in them.

Ah, so this is called "patterns" but it deals with literal patterns
without globs?  Then I'll stop worrying about the refs/tags/[ad-z]
counter-example.

> To see why this is the
> case, consider the exclusion pattern "refs/foo[a]". In general, in order
> to find the location of the first record that matches this pattern, we
> could only consider up to the first meta-character, "refs/foo". But this
> doesn't work, since the excluded region we'd come up with would include
> "refs/foobar", even though it is not excluded.

OK.

> Using the skip list is fairly straightforward (see the changes to
> `refs/packed-backend.c::next_record()`), but constructing the list is
> not. To ensure that the construction is correct, add a new suite of
> tests in t1419 covering various corner cases (overlapping regions,
> partially overlapping regions, adjacent regions, etc.).

Sounds good.  Does this actually use the skip list data structure,
or do they happen to share the same two words in their names, but
otherwise have nothing common with each other?  If the latter, we
may want to revise the explanation, data type names, and variable
names, to avoid confusion, as Chris pointed out earlier.

> +static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
> +					struct snapshot *snapshot,
> +					const char **excluded_patterns)
> +{
> +	size_t i, j;
> +	const char **pattern;
> +
> +	if (!excluded_patterns)
> +		return;
> +
> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		struct skip_list_entry *e;
> +
> +		/*
> +		 * We can't feed any excludes with globs in them to the
> +		 * refs machinery.  It only understands prefix matching.
> +		 * We likewise can't even feed the string leading up to
> +		 * the first meta-character, as something like "foo[a]"
> +		 * should not exclude "foobar" (but the prefix "foo"
> +		 * would match that and mark it for exclusion).
> +		 */
> +		if (has_glob_special(*pattern))
> +			continue;

Hmph.  I would have expected that a set of patterns with any one
pattern with glob would invalidate the whole skip optimization, but
it is nice if we can salvage such a set and still optimize, if only
for literal patterns.  Interesting.

Ah, that is because we are dealing with ranges that cannot possibly
match.  With the mention of "first record that matches" etc. in the
earlier descriptoin, I somehow misled myself that we are dealing
with ranges that have interesting records.  So, a pattern with glob
does not contribute any range to be skipped, but that is OK.

> +		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
> +
> +		e = &iter->skip[iter->skip_nr++];
> +		e->start = find_reference_location(snapshot, *pattern, 0);
> +		e->end = find_reference_location_end(snapshot, *pattern, 0);

So, iter->skip[] array has one range per pattern at most, but some
patterns may not contribute any range to the list.

> +	}
> +
> +	if (!iter->skip_nr) {
> +		/*
> +		 * Every entry in exclude_patterns has a meta-character,
> +		 * nothing to do here.
> +		 */
> +		return;
> +	}
> +
> +	QSORT(iter->skip, iter->skip_nr, skip_list_entry_cmp);
> +
> +	/*
> +	 * As an optimization, merge adjacent entries in the skip list
> +	 * to jump forwards as far as possible when entering a skipped
> +	 * region.
> +	 *
> +	 * For example, if we have two skipped regions:
> +	 *
> +	 *	[[A, B], [B, C]]

I am confused.  The first pattern may never match records in [A..B]
range, and the second pattern may never match records in [B..C]
range, but what does it mean to combine these two ranges?

> +	 * we want to combine that into a single entry jumping from A to
> +	 * C.
> +	 */
> +	for (i = 1, j = 1; i < iter->skip_nr; i++) {
> +		struct skip_list_entry *ours = &iter->skip[i];
> +		struct skip_list_entry *prev = &iter->skip[i - 1];


> +		if (ours->start == ours->end) {
> +			/* ignore empty regions (no matching entries) */
> +			continue;
> +		} else if (prev->end >= ours->start) {
> +			/* overlapping regions extend the previous one */
> +			prev->end = ptr_max(prev->end, ours->end);
> +		} else {
> +			/* otherwise, insert a new region */
> +			iter->skip[j++] = *ours;
> +		}

None of the {braces} seem needed, but OK.  

> +	}
> +
> +	iter->skip_nr = j;
> +	iter->skip_pos = 0;
> +}

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
  2023-05-09 23:40   ` Junio C Hamano
@ 2023-05-10  2:30     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-10  2:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Derrick Stolee

On Tue, May 09, 2023 at 04:40:50PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > When iterating through the `packed-refs` file in order to answer a query
> > like:
> >
> >     $ git for-each-ref --exclude=refs/__hidden__
> >
> > it would be useful to avoid walking over all of the entries in
> > `refs/__hidden__/*` when possible, since we know that the ref-filter
> > code is going to throw them away anyways.
> >
> > In certain circumstances, doing so is possible. The algorithm for doing
> > so is as follows:
> >
> >   - For each excluded pattern, find the first record that matches it,
> >     and the first pattern that *doesn't* match it (i.e. the location
> >     you'd next want to consider when excluding that pattern).
>
> Do we find "record" and then "pattern", or is the latter a misspelt
> "record"?  I will assume it is the latter while reading the rest.
> Is the latter "the record that does not match the pattern, whose
> record number is the smallest but yet larger than the first record
> that matches the pattern?"  That is we assume that the refs in the
> packed refs file are sorted and can be partitioned into three by
> each pattern: (1) refs before the first matching record---they do
> not match the pattern. (2) refs at or after the first matching
> record that match.  (3) refs after (2) that do not match.
>
> I am not sure if that would work.  What if the pattern were refs/tags/[ad-z]*
> and there are packed tags refs/tags/{a,b,c,d,e}?  The pattern would partition
> the refs into [a](matches), [b,c](does not match), [d,e](matches).
>
> Perhaps I am grossly misunderstanding what the above explanation
> says.

Sorry, you're right to assume the latter. This should read "..., find
the record that matches it, and the first record that *doesn't* match."

In the example you give, we would ignore that whole pattern and
enumerate everything (including entries that match 'refs/tags/[ad-z]*'),
and the caller is expected to discard them.

> >   - Sort the patterns by their starting location within the
> >     `packed-refs` file.
>
> So the idea is that the patterns are sorted by the first record they
> match, and after sorting these patterns, refs that are between the
> beginning of the list of refs and the first record associated with
> the first pattern will not match _any_ pattern in that list?

My patch could use some clarification here, since it is much easier to
treat the patterns as unsorted, and then sort the beginning of the range
that they match.

> >   - Construct a skip list of regions by combining adjacent and
> >     overlapping regions from the previous step.
>
> "skip list of regions" -> "list of regions to skip", I guess.

Thanks, will update.

> > Using the skip list is fairly straightforward (see the changes to
> > `refs/packed-backend.c::next_record()`), but constructing the list is
> > not. To ensure that the construction is correct, add a new suite of
> > tests in t1419 covering various corner cases (overlapping regions,
> > partially overlapping regions, adjacent regions, etc.).
>
> Sounds good.  Does this actually use the skip list data structure,
> or do they happen to share the same two words in their names, but
> otherwise have nothing common with each other?  If the latter, we
> may want to revise the explanation, data type names, and variable
> names, to avoid confusion, as Chris pointed out earlier.

They have nothing to do with each other ;-). I made a note in this patch
in the revised version I'll send in the next day or two to note the
distinction. But I'm fine with renaming the whole concept to "jump list"
or something like that if you prefer.

> > +	for (pattern = excluded_patterns; *pattern; pattern++) {
> > +		struct skip_list_entry *e;
> > +
> > +		/*
> > +		 * We can't feed any excludes with globs in them to the
> > +		 * refs machinery.  It only understands prefix matching.
> > +		 * We likewise can't even feed the string leading up to
> > +		 * the first meta-character, as something like "foo[a]"
> > +		 * should not exclude "foobar" (but the prefix "foo"
> > +		 * would match that and mark it for exclusion).
> > +		 */
> > +		if (has_glob_special(*pattern))
> > +			continue;
>
> Hmph.  I would have expected that a set of patterns with any one
> pattern with glob would invalidate the whole skip optimization, but
> it is nice if we can salvage such a set and still optimize, if only
> for literal patterns.  Interesting.
>
> Ah, that is because we are dealing with ranges that cannot possibly
> match.  With the mention of "first record that matches" etc. in the
> earlier descriptoin, I somehow misled myself that we are dealing
> with ranges that have interesting records.  So, a pattern with glob
> does not contribute any range to be skipped, but that is OK.

Exactly right. This is all "best effort" anyway, since there are some
patterns that we cannot construct a skip list entry for in the general
case. So it's fine if we enumerate some references that match one or
more of the excluded patterns, because the caller is expected to drop
those results themselves.

> > +	/*
> > +	 * As an optimization, merge adjacent entries in the skip list
> > +	 * to jump forwards as far as possible when entering a skipped
> > +	 * region.
> > +	 *
> > +	 * For example, if we have two skipped regions:
> > +	 *
> > +	 *	[[A, B], [B, C]]
>
> I am confused.  The first pattern may never match records in [A..B]
> range, and the second pattern may never match records in [B..C]
> range, but what does it mean to combine these two ranges?

The patterns would match all records in their respective regions, but
since they are excluded we want to jump over those references instead of
iterating (and then discarding them later).

So if we have a jump from A->B, and another from B->C, it would be fine
to perform two jumps to get from A to C. But we can detect this case by
combining adjacent/overlapping regions.

> > +		if (ours->start == ours->end) {
> > +			/* ignore empty regions (no matching entries) */
> > +			continue;
> > +		} else if (prev->end >= ours->start) {
> > +			/* overlapping regions extend the previous one */
> > +			prev->end = ptr_max(prev->end, ours->end);
> > +		} else {
> > +			/* otherwise, insert a new region */
> > +			iter->skip[j++] = *ours;
> > +		}
>
> None of the {braces} seem needed, but OK.

Yeah. I added the braces here intentionally to match the
CodingGuidelines, which state that this case is an exception.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
  2023-05-09 20:55     ` Taylor Blau
  2023-05-09 21:15       ` Taylor Blau
@ 2023-05-10  7:25       ` Patrick Steinhardt
  1 sibling, 0 replies; 149+ messages in thread
From: Patrick Steinhardt @ 2023-05-10  7:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Derrick Stolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3039 bytes --]

On Tue, May 09, 2023 at 04:55:55PM -0400, Taylor Blau wrote:
> On Tue, May 09, 2023 at 05:15:43PM +0200, Patrick Steinhardt wrote:
> > On Mon, May 08, 2023 at 06:00:08PM -0400, Taylor Blau wrote:
> >
> > > Note that this optimization is only possible when none of the excluded
> > > pattern(s) have special meta-characters in them. To see why this is the
> > > case, consider the exclusion pattern "refs/foo[a]". In general, in order
> > > to find the location of the first record that matches this pattern, we
> > > could only consider up to the first meta-character, "refs/foo". But this
> > > doesn't work, since the excluded region we'd come up with would include
> > > "refs/foobar", even though it is not excluded.
> >
> > Is this generally true though? A naive implementation would iterate
> > through all references and find the first reference that matches the
> > exclusion regular exepression. From thereon we continue to iterate until
> > we find the first entry that doesn't match. This may cause us to end up
> > with a suboptimal skip list, but the skip list would still be valid.
> >
> > As I said, this implementation would be naive as we're now forced to
> > iterate through all references starting at the beginning. I assume that
> > your implementation will instead use a binary search to locate the first
> > entry that matches the exclusion pattern and the last pattern. But the
> > way this paragraph is formulated makes it sound like this is a general
> > fact, even though it is a fact that derives from the implementation.
> >
> > I of course don't propose to change the algorithm here, but instead to
> > clarify where this restriction actually comes from and why the tradeoff
> > makes sense.
> 
> In the example you include, it's possible. But consider something like:
> 
>     $ git for-each-ref --exclude='refs/foo[ac]'
> 
> The region that matches that expression ("refs/fooa", "refs/fooc" and
> everything underneath them) does not have to appear as a continuous
> single region in the packed-refs file. If you have, say, "refs/foobar",
> that will appear between the two regions you want to exclude.
> 
> So I think you *might* be able to do it in general, but at the very
> least it would involve splitting each character class and finding the
> start and end of any region(s) that it matches.
> 
> Even so, you'd have to try and match each entry as you determine the
> width of the excluded region, at which point you're at par with
> enumerating them anyway and having the caller discard any entries it
> doesn't want.

Alternatively you could also do this on a best-effort basis and only
find the first matching region. But anyway, as said: I'm fine with the
limitations but think that we should document better where they come
from. The current commit message sounds like the limitation is of
general nature even though it is in fact a conciously-chosen tradeoff
that allows us to make the implementation more efficient for most cases.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v2 00/16] refs: implement jump lists for packed backend
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (14 preceding siblings ...)
  2023-05-08 22:00 ` [PATCH 15/15] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
@ 2023-05-15 19:23 ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 01/16] refs.c: rename `ref_filter` Taylor Blau
                     ` (16 more replies)
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                   ` (2 subsequent siblings)
  18 siblings, 17 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

Here is a reroll of my series to implement jump (née skip) lists for the
packed refs backend.

Not a ton has changed since last time, but some notable things that have
changed include:

  - Renaming "skip lists" to "jump lists" to clarify that this
    implementation does not use the skip list data structure.
  - Patch reorganization, splitting out `find_reference_location_end()`
    more sensibly, rewording patch messages, etc.
  - Addresses feedback from Junio and Patrick Steinhardt's helpful
    reviews.

As usual, a range-diff is included below for convenience.

Given that we are expecting -rc0 today, we should aim to not let review
of this topic direct our attention away from testing the release
candidates. We can get more serious about it on the other side of 2.41.

Thanks in advance for another look.

Jeff King (5):
  refs.c: rename `ref_filter`
  ref-filter.h: provide `REF_FILTER_INIT`
  ref-filter: clear reachable list pointers after freeing
  ref-filter: add `ref_filter_clear()`
  ref-filter.c: parameterize match functions over patterns

Taylor Blau (11):
  builtin/for-each-ref.c: add `--exclude` option
  refs: plumb `exclude_patterns` argument throughout
  refs/packed-backend.c: refactor `find_reference_location()`
  refs/packed-backend.c: implement jump lists to avoid excluded
    pattern(s)
  refs/packed-backend.c: add trace2 counters for jump list
  revision.h: store hidden refs in a `strvec`
  refs/packed-backend.c: ignore complicated hidden refs rules
  refs.h: let `for_each_namespaced_ref()` take excluded patterns
  builtin/receive-pack.c: avoid enumerating hidden references
  upload-pack.c: avoid enumerating hidden refs where possible
  ls-refs.c: avoid enumerating hidden refs where possible

 Documentation/git-for-each-ref.txt |   6 +
 builtin/branch.c                   |   4 +-
 builtin/for-each-ref.c             |   7 +-
 builtin/receive-pack.c             |   7 +-
 builtin/tag.c                      |   4 +-
 http-backend.c                     |   2 +-
 ls-refs.c                          |   8 +-
 ref-filter.c                       |  63 ++++++--
 ref-filter.h                       |  12 ++
 refs.c                             |  61 ++++----
 refs.h                             |  15 +-
 refs/debug.c                       |   5 +-
 refs/files-backend.c               |   5 +-
 refs/packed-backend.c              | 226 ++++++++++++++++++++++++++---
 refs/refs-internal.h               |   7 +-
 revision.c                         |   4 +-
 revision.h                         |   5 +-
 t/helper/test-reach.c              |   2 +-
 t/helper/test-ref-store.c          |  10 ++
 t/t0041-usage.sh                   |   1 +
 t/t1419-exclude-refs.sh            | 131 +++++++++++++++++
 t/t3402-rebase-merge.sh            |   1 +
 t/t6300-for-each-ref.sh            |  35 +++++
 trace2.h                           |   2 +
 trace2/tr2_ctr.c                   |   5 +
 upload-pack.c                      |  43 ++++--
 26 files changed, 565 insertions(+), 106 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

Range-diff against v1:
 1:  2225f79941 =  1:  6cac42e70e refs.c: rename `ref_filter`
 2:  ea1c7834db =  2:  8dda7db738 ref-filter.h: provide `REF_FILTER_INIT`
 3:  7fe8623f60 =  3:  bf21df783d ref-filter: clear reachable list pointers after freeing
 4:  c804ba3620 !  4:  85ecb70957 ref-filter: add ref_filter_clear()
    @@ Metadata
     Author: Jeff King <peff@peff.net>
     
      ## Commit message ##
    -    ref-filter: add ref_filter_clear()
    +    ref-filter: add `ref_filter_clear()`
     
    -    We did not bother to clean up at all in branch/tag, and for-each-ref
    -    only hit a few elements. So this is probably cleaning up leaks, but I
    -    didn't check yet.
    +    We did not bother to clean up at all in `git branch` or `git tag`, and
    +    `git for-each-ref` only cleans up a couple of members.
     
    -    Note that the reachable_from and unreachable_from lists should be
    +    Add and call `ref_filter_clear()` when cleaning up a `struct
    +    ref_filter`. Running this patch (without any test changes) indicates a
    +    couple of now leak-free tests. This was found by running:
    +
    +        $ make SANITIZE=leak
    +        $ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate
    +
    +    (Note that the `reachable_from` and `unreachable_from` lists should be
         cleaned as they are used. So this is just covering any case where we
    -    might bail before running the reachability check.
    +    might bail before running the reachability check.)
    +
    +    Signed-off-by: Jeff King <peff@peff.net>
    +    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## builtin/branch.c ##
     @@ builtin/branch.c: int cmd_branch(int argc, const char **argv, const char *prefix)
    @@ ref-filter.h: void filter_ahead_behind(struct repository *r,
     +void ref_filter_clear(struct ref_filter *filter);
     +
      #endif /*  REF_FILTER_H  */
    +
    + ## t/t0041-usage.sh ##
    +@@ t/t0041-usage.sh: test_description='Test commands behavior when given invalid argument value'
    + GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
    + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    + 
    ++TEST_PASSES_SANITIZE_LEAK=true
    + . ./test-lib.sh
    + 
    + test_expect_success 'setup ' '
    +
    + ## t/t3402-rebase-merge.sh ##
    +@@ t/t3402-rebase-merge.sh: test_description='git rebase --merge test'
    + GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
    + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    + 
    ++TEST_PASSES_SANITIZE_LEAK=true
    + . ./test-lib.sh
    + 
    + T="A quick brown fox
 5:  c54000f5f5 !  5:  385890b459 ref-filter.c: parameterize match functions over patterns
    @@ Commit message
         Tweak `match_pattern()` and `match_name_as_path()` to take an array of
         patterns to prepare for passing either in.
     
    +    Once we start passing either in, `match_pattern()` will have little to
    +    do with a particular `struct ref_filter *` instance. To clarify this,
    +    drop it from the argument list, and replace it with the only bit of the
    +    `ref_filter` that we care about (`filter->ignore_case`).
    +
         Co-authored-by: Taylor Blau <me@ttaylorr.com>
         Signed-off-by: Jeff King <peff@peff.net>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    @@ ref-filter.c: static int get_ref_atom_value(struct ref_array_item *ref, int atom
       * matches "refs/heads/mas*", too).
       */
     -static int match_pattern(const struct ref_filter *filter, const char *refname)
    -+static int match_pattern(const struct ref_filter *filter,
    -+			 const char **patterns,
    -+			 const char *refname)
    ++static int match_pattern(const char **patterns, const char *refname,
    ++			 const int ignore_case)
      {
     -	const char **patterns = filter->name_patterns;
      	unsigned flags = 0;
      
    - 	if (filter->ignore_case)
    +-	if (filter->ignore_case)
    ++	if (ignore_case)
    + 		flags |= WM_CASEFOLD;
    + 
    + 	/*
     @@ ref-filter.c: static int match_pattern(const struct ref_filter *filter, const char *refname)
       * matches a pattern "refs/heads/" but not "refs/heads/m") or a
       * wildcard (e.g. the same ref matches "refs/heads/m*", too).
    @@ ref-filter.c: static int filter_pattern_match(struct ref_filter *filter, const c
     -		return match_name_as_path(filter, refname);
     -	return match_pattern(filter, refname);
     +		return match_name_as_path(filter, filter->name_patterns, refname);
    -+	return match_pattern(filter, filter->name_patterns, refname);
    ++	return match_pattern(filter->name_patterns, refname,
    ++			     filter->ignore_case);
      }
      
      /*
 6:  ea5c0ddc10 !  6:  1a3371a0a7 builtin/for-each-ref.c: add `--exclude` option
    @@ Documentation/git-for-each-ref.txt: OPTIONS
      
     
      ## builtin/for-each-ref.c ##
    -@@ builtin/for-each-ref.c: static char const * const for_each_ref_usage[] = {
    - 	N_("git for-each-ref [--points-at <object>]"),
    - 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
    - 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
    -+	N_("git for-each-ref [--exclude=<pattern> ...]"),
    - 	NULL
    - };
    - 
     @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
      		OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
      		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
    @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const
     
      ## ref-filter.c ##
     @@ ref-filter.c: static int filter_pattern_match(struct ref_filter *filter, const char *refname)
    - 	return match_pattern(filter, filter->name_patterns, refname);
    + 			     filter->ignore_case);
      }
      
     +static int filter_exclude_match(struct ref_filter *filter, const char *refname)
    @@ ref-filter.c: static int filter_pattern_match(struct ref_filter *filter, const c
     +		return 0;
     +	if (filter->match_as_path)
     +		return match_name_as_path(filter, filter->exclude.v, refname);
    -+	return match_pattern(filter, filter->exclude.v, refname);
    ++	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
     +}
     +
      /*
 7:  f437cd83e2 !  7:  aa05549b6e refs: plumb `exclude_patterns` argument throughout
    @@ Commit message
         refs: plumb `exclude_patterns` argument throughout
     
         The subsequent patch will want to access an optional `excluded_patterns`
    -    array within refs/packed-backend.c. To do so, the refs subsystem needs
    -    to be updated to pass this value across a number of different locations.
    +    array within `refs/packed-backend.c` that will cull out certain
    +    references matching any of the given patterns on a best-effort basis.
    +
    +    To do so, the refs subsystem needs to be updated to pass this value
    +    across a number of different locations.
     
         Prepare for a future patch by introducing this plumbing now, passing
         NULLs at top-level APIs in order to make that patch less noisy and more
 8:  836a5665b7 !  8:  6002c568b5 refs/packed-backend.c: refactor `find_reference_location()`
    @@ Commit message
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## refs/packed-backend.c ##
    -@@ refs/packed-backend.c: static int cmp_packed_ref_records(const void *v1, const void *v2)
    -  * Compare a snapshot record at `rec` to the specified NUL-terminated
    -  * refname.
    -  */
    --static int cmp_record_to_refname(const char *rec, const char *refname)
    -+static int cmp_record_to_refname(const char *rec, const char *refname,
    -+				 int start)
    - {
    - 	const char *r1 = rec + the_hash_algo->hexsz + 1;
    - 	const char *r2 = refname;
    -@@ refs/packed-backend.c: static int cmp_record_to_refname(const char *rec, const char *refname)
    - 		if (*r1 == '\n')
    - 			return *r2 ? -1 : 0;
    - 		if (!*r2)
    --			return 1;
    -+			return start ? 1 : -1;
    - 		if (*r1 != *r2)
    - 			return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
    - 		r1++;
     @@ refs/packed-backend.c: static int load_contents(struct snapshot *snapshot)
      	return 1;
      }
    @@ refs/packed-backend.c: static int load_contents(struct snapshot *snapshot)
     -static const char *find_reference_location(struct snapshot *snapshot,
     -					   const char *refname, int mustexist)
     +static const char *find_reference_location_1(struct snapshot *snapshot,
    -+					     const char *refname, int mustexist,
    -+					     int start)
    ++					     const char *refname, int mustexist)
      {
      	/*
      	 * This is not *quite* a garden-variety binary search, because
     @@ refs/packed-backend.c: static const char *find_reference_location(struct snapshot *snapshot,
    - 
    - 		mid = lo + (hi - lo) / 2;
    - 		rec = find_start_of_record(lo, mid);
    --		cmp = cmp_record_to_refname(rec, refname);
    -+		cmp = cmp_record_to_refname(rec, refname, start);
    - 		if (cmp < 0) {
    - 			lo = find_end_of_record(mid, hi);
    - 		} else if (cmp > 0) {
    -@@ refs/packed-backend.c: static const char *find_reference_location(struct snapshot *snapshot,
      		return lo;
      }
      
    @@ refs/packed-backend.c: static const char *find_reference_location(struct snapsho
     +static const char *find_reference_location(struct snapshot *snapshot,
     +					   const char *refname, int mustexist)
     +{
    -+	return find_reference_location_1(snapshot, refname, mustexist, 1);
    ++	return find_reference_location_1(snapshot, refname, mustexist);
     +}
     +
      /*
 9:  a39d1107c1 !  9:  8c78f49a8d refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)
    +    refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
     
         When iterating through the `packed-refs` file in order to answer a query
         like:
    @@ Commit message
         so is as follows:
     
           - For each excluded pattern, find the first record that matches it,
    -        and the first pattern that *doesn't* match it (i.e. the location
    +        and the first record that *doesn't* match it (i.e. the location
             you'd next want to consider when excluding that pattern).
     
    -      - Sort the patterns by their starting location within the
    -        `packed-refs` file.
    +      - Sort the set of excluded regions from the previous step in ascending
    +        order of the first location within the `packed-refs` file that
    +        matches.
     
    -      - Construct a skip list of regions by combining adjacent and
    -        overlapping regions from the previous step.
    +      - Clean up the results from the previous step: discard empty regions,
    +        and combine adjacent regions.
     
    -      - When iterating through the `packed-refs` file, if `iter->pos` is
    -        ever contained in one of the regions from the previous steps,
    -        advance `iter->pos` past the end of that region, and continue
    -        enumeration.
    +    Then when iterating through the `packed-refs` file, if `iter->pos` is
    +    ever contained in one of the regions from the previous steps, advance
    +    `iter->pos` past the end of that region, and continue enumeration.
     
    -    Note that this optimization is only possible when none of the excluded
    -    pattern(s) have special meta-characters in them. To see why this is the
    -    case, consider the exclusion pattern "refs/foo[a]". In general, in order
    -    to find the location of the first record that matches this pattern, we
    -    could only consider up to the first meta-character, "refs/foo". But this
    -    doesn't work, since the excluded region we'd come up with would include
    -    "refs/foobar", even though it is not excluded.
    +    Note that we only perform this optimization when none of the excluded
    +    pattern(s) have special meta-characters in them. For a pattern like
    +    "refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and
    +    everything underneath them) are not connected. A future implementation
    +    that handles this case may split the character class (pretending as if
    +    two patterns were excluded: "refs/fooa", and "refs/fooc").
     
         There are a few other gotchas worth considering. First, note that the
    -    skip list is sorted, so once we skip past a region, we can avoid
    +    jump list is sorted, so once we jump past a region, we can avoid
         considering it (or any regions preceding it) again. The member
    -    `skip_pos` is used to track the first next-possible region to jump
    +    `jump_pos` is used to track the first next-possible region to jump
         through.
     
         Second, note that the exclusion list is best-effort, since we do not
    @@ Commit message
                 git update-ref --stdin
             $ git pack-refs --all
     
    -    , it is significantly faster to have `for-each-ref` skip over the
    +    , it is significantly faster to have `for-each-ref` jump over the
         excluded references, as opposed to filtering them out after the fact:
     
             $ hyperfine \
    @@ Commit message
               'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
               172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
     
    -    Using the skip list is fairly straightforward (see the changes to
    +    Using the jump list is fairly straightforward (see the changes to
         `refs/packed-backend.c::next_record()`), but constructing the list is
         not. To ensure that the construction is correct, add a new suite of
         tests in t1419 covering various corner cases (overlapping regions,
    @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      /*
     
      ## refs/packed-backend.c ##
    -@@ refs/packed-backend.c: static const char *find_reference_location(struct snapshot *snapshot,
    - 	return find_reference_location_1(snapshot, refname, mustexist, 1);
    +@@ refs/packed-backend.c: static int cmp_packed_ref_records(const void *v1, const void *v2)
    +  * Compare a snapshot record at `rec` to the specified NUL-terminated
    +  * refname.
    +  */
    +-static int cmp_record_to_refname(const char *rec, const char *refname)
    ++static int cmp_record_to_refname(const char *rec, const char *refname,
    ++				 int start)
    + {
    + 	const char *r1 = rec + the_hash_algo->hexsz + 1;
    + 	const char *r2 = refname;
    +@@ refs/packed-backend.c: static int cmp_record_to_refname(const char *rec, const char *refname)
    + 		if (*r1 == '\n')
    + 			return *r2 ? -1 : 0;
    + 		if (!*r2)
    +-			return 1;
    ++			return start ? 1 : -1;
    + 		if (*r1 != *r2)
    + 			return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
    + 		r1++;
    +@@ refs/packed-backend.c: static int load_contents(struct snapshot *snapshot)
      }
      
    + static const char *find_reference_location_1(struct snapshot *snapshot,
    +-					     const char *refname, int mustexist)
    ++					     const char *refname, int mustexist,
    ++					     int start)
    + {
    + 	/*
    + 	 * This is not *quite* a garden-variety binary search, because
    +@@ refs/packed-backend.c: static const char *find_reference_location_1(struct snapshot *snapshot,
    + 
    + 		mid = lo + (hi - lo) / 2;
    + 		rec = find_start_of_record(lo, mid);
    +-		cmp = cmp_record_to_refname(rec, refname);
    ++		cmp = cmp_record_to_refname(rec, refname, start);
    + 		if (cmp < 0) {
    + 			lo = find_end_of_record(mid, hi);
    + 		} else if (cmp > 0) {
    +@@ refs/packed-backend.c: static const char *find_reference_location_1(struct snapshot *snapshot,
    + static const char *find_reference_location(struct snapshot *snapshot,
    + 					   const char *refname, int mustexist)
    + {
    +-	return find_reference_location_1(snapshot, refname, mustexist);
    ++	return find_reference_location_1(snapshot, refname, mustexist, 1);
    ++}
    ++
     +/*
     + * Find the place in `snapshot->buf` after the end of the record for
     + * `refname`. In other words, find the location of first thing *after*
    @@ refs/packed-backend.c: static const char *find_reference_location(struct snapsho
     +					       int mustexist)
     +{
     +	return find_reference_location_1(snapshot, refname, mustexist, 0);
    -+}
    -+
    + }
    + 
      /*
    -  * Create a newly-allocated `snapshot` of the `packed-refs` file in
    -  * its current state and return it. The return value will already have
     @@ refs/packed-backend.c: struct packed_ref_iterator {
      	/* The end of the part of the buffer that will be iterated over: */
      	const char *eof;
      
    -+	struct skip_list_entry {
    ++	struct jump_list_entry {
     +		const char *start;
     +		const char *end;
    -+	} *skip;
    -+	size_t skip_nr, skip_alloc;
    -+	size_t skip_pos;
    ++	} *jump;
    ++	size_t jump_nr, jump_alloc;
    ++	size_t jump_pos;
     +
      	/* Scratch space for current values: */
      	struct object_id oid, peeled;
    @@ refs/packed-backend.c: struct packed_ref_iterator {
     +	 * Note that each skipped region is considered at most once,
     +	 * since they are ordered based on their starting position.
     +	 */
    -+	while (iter->skip_pos < iter->skip_nr) {
    -+		struct skip_list_entry *curr = &iter->skip[iter->skip_pos];
    ++	while (iter->jump_pos < iter->jump_nr) {
    ++		struct jump_list_entry *curr = &iter->jump[iter->jump_pos];
     +		if (iter->pos < curr->start)
     +			break; /* not to the next jump yet */
     +
    -+		iter->skip_pos++;
    ++		iter->jump_pos++;
     +		if (iter->pos < curr->end) {
     +			iter->pos = curr->end;
     +			break;
    @@ refs/packed-backend.c: static int packed_ref_iterator_abort(struct ref_iterator
      	int ok = ITER_DONE;
      
      	strbuf_release(&iter->refname_buf);
    -+	free(iter->skip);
    ++	free(iter->jump);
      	release_snapshot(iter->snapshot);
      	base_ref_iterator_free(ref_iterator);
      	return ok;
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
      	.abort = packed_ref_iterator_abort
      };
      
    -+static int skip_list_entry_cmp(const void *va, const void *vb)
    ++static int jump_list_entry_cmp(const void *va, const void *vb)
     +{
    -+	const struct skip_list_entry *a = va;
    -+	const struct skip_list_entry *b = vb;
    ++	const struct jump_list_entry *a = va;
    ++	const struct jump_list_entry *b = vb;
     +
     +	if (a->start < b->start)
     +		return -1;
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +	return y;
     +}
     +
    -+static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
    ++static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
     +					struct snapshot *snapshot,
     +					const char **excluded_patterns)
     +{
     +	size_t i, j;
     +	const char **pattern;
    ++	struct jump_list_entry *last_disjoint;
     +
     +	if (!excluded_patterns)
     +		return;
     +
     +	for (pattern = excluded_patterns; *pattern; pattern++) {
    -+		struct skip_list_entry *e;
    ++		struct jump_list_entry *e;
     +
     +		/*
     +		 * We can't feed any excludes with globs in them to the
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +		if (has_glob_special(*pattern))
     +			continue;
     +
    -+		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
    ++		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
     +
    -+		e = &iter->skip[iter->skip_nr++];
    ++		e = &iter->jump[iter->jump_nr++];
     +		e->start = find_reference_location(snapshot, *pattern, 0);
     +		e->end = find_reference_location_end(snapshot, *pattern, 0);
     +	}
     +
    -+	if (!iter->skip_nr) {
    ++	if (!iter->jump_nr) {
     +		/*
     +		 * Every entry in exclude_patterns has a meta-character,
     +		 * nothing to do here.
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +		return;
     +	}
     +
    -+	QSORT(iter->skip, iter->skip_nr, skip_list_entry_cmp);
    ++	QSORT(iter->jump, iter->jump_nr, jump_list_entry_cmp);
     +
     +	/*
    -+	 * As an optimization, merge adjacent entries in the skip list
    ++	 * As an optimization, merge adjacent entries in the jump list
     +	 * to jump forwards as far as possible when entering a skipped
     +	 * region.
     +	 *
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +	 * we want to combine that into a single entry jumping from A to
     +	 * C.
     +	 */
    -+	for (i = 1, j = 1; i < iter->skip_nr; i++) {
    -+		struct skip_list_entry *ours = &iter->skip[i];
    -+		struct skip_list_entry *prev = &iter->skip[i - 1];
    ++	last_disjoint = iter->jump;
    ++
    ++	for (i = 1, j = 1; i < iter->jump_nr; i++) {
    ++		struct jump_list_entry *ours = &iter->jump[i];
     +
     +		if (ours->start == ours->end) {
     +			/* ignore empty regions (no matching entries) */
     +			continue;
    -+		} else if (prev->end >= ours->start) {
    ++		} else if (ours->start <= last_disjoint->end) {
     +			/* overlapping regions extend the previous one */
    -+			prev->end = ptr_max(prev->end, ours->end);
    ++			last_disjoint->end = ptr_max(last_disjoint->end, ours->end);
     +		} else {
     +			/* otherwise, insert a new region */
    -+			iter->skip[j++] = *ours;
    ++			iter->jump[j++] = *ours;
    ++			last_disjoint = ours;
    ++
     +		}
     +	}
     +
    -+	iter->skip_nr = j;
    -+	iter->skip_pos = 0;
    ++	iter->jump_nr = j;
    ++	iter->jump_pos = 0;
     +}
     +
      static struct ref_iterator *packed_ref_iterator_begin(
    @@ refs/packed-backend.c: static struct ref_iterator *packed_ref_iterator_begin(
      	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
      
     +	if (exclude_patterns)
    -+		populate_excluded_skip_list(iter, snapshot, exclude_patterns);
    ++		populate_excluded_jump_list(iter, snapshot, exclude_patterns);
     +
      	iter->snapshot = snapshot;
      	acquire_snapshot(snapshot);
    @@ t/t1419-exclude-refs.sh (new)
     +	git pack-refs --all
     +'
     +
    -+test_expect_success 'for_each_ref__exclude(refs/heads/foo/)' '
    -+	# region in middle
    ++test_expect_success 'excluded region in middle' '
     +	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
     +	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
     +
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'for_each_ref__exclude(refs/heads/bar/)' '
    -+	# region at beginning
    ++test_expect_success 'excluded region at beginning' '
     +	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
     +	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
     +
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'for_each_ref__exclude(refs/heads/quux/)' '
    -+	# region at end
    ++test_expect_success 'excluded region at end' '
     +	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
     +	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
     +
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/quux/)' '
    -+	# disjoint regions
    ++test_expect_success 'disjoint excluded regions' '
     +	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
     +	for_each_ref refs/heads/baz refs/heads/foo >expect &&
     +
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/baz/)' '
    -+	# adjacent, non-overlapping regions
    ++test_expect_success 'adjacent, non-overlapping excluded regions' '
     +	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
     +	for_each_ref refs/heads/foo refs/heads/quux >expect &&
     +
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'for_each_ref__exclude(refs/heads/ba refs/heads/baz/)' '
    -+	# overlapping region
    ++test_expect_success 'overlapping excluded regions' '
     +	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
     +	for_each_ref refs/heads/foo refs/heads/quux >expect &&
     +
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'for_each_ref__exclude(refs/heads/does/not/exist)' '
    -+	# empty region
    ++test_expect_success 'several overlapping excluded regions' '
    ++	for_each_ref__exclude refs/heads \
    ++		refs/heads/bar refs/heads/baz refs/heads/foo >actual &&
    ++	for_each_ref refs/heads/quux >expect &&
    ++
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success 'non-matching excluded section' '
     +	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
     +	for_each_ref >expect &&
     +
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'for_each_ref__exclude(refs/heads/ba*)' '
    -+	# discards meta-characters
    ++test_expect_success 'meta-characters are discarded' '
     +	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
     +	for_each_ref >expect &&
     +
10:  5698c2794f ! 10:  5059f5dd42 refs/packed-backend.c: add trace2 counters for skip list
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    refs/packed-backend.c: add trace2 counters for skip list
    +    refs/packed-backend.c: add trace2 counters for jump list
     
         The previous commit added low-level tests to ensure that the packed-refs
         iterator did not enumerate excluded sections of the refspace.
    @@ refs/packed-backend.c
      enum mmap_strategy {
      	/*
     @@ refs/packed-backend.c: static int next_record(struct packed_ref_iterator *iter)
    - 		iter->skip_pos++;
    + 		iter->jump_pos++;
      		if (iter->pos < curr->end) {
      			iter->pos = curr->end;
    -+			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_SKIPS, 1);
    ++			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, 1);
      			break;
      		}
      	}
    @@ t/t1419-exclude-refs.sh: for_each_ref () {
      	git for-each-ref --format='%(refname)' "$@"
      }
      
    -+assert_skips () {
    ++assert_jumps () {
     +	local nr="$1"
     +	local trace="$2"
     +
    -+	grep -q "name:skips_made value:$nr" $trace
    ++	grep -q "name:jumps_made value:$nr" $trace
     +}
     +
    -+assert_no_skips () {
    -+	! assert_skips ".*" "$1"
    ++assert_no_jumps () {
    ++	! assert_jumps ".*" "$1"
     +}
     +
      test_expect_success 'setup' '
      	test_commit --no-tag base &&
      	base="$(git rev-parse HEAD)" &&
     @@ t/t1419-exclude-refs.sh: test_expect_success 'setup' '
    + '
      
    - test_expect_success 'for_each_ref__exclude(refs/heads/foo/)' '
    - 	# region in middle
    + test_expect_success 'excluded region in middle' '
     -	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
     +	for_each_ref__exclude refs/heads refs/heads/foo >actual 2>perf &&
      	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_skips 1 perf
    ++	assert_jumps 1 perf
      '
      
    - test_expect_success 'for_each_ref__exclude(refs/heads/bar/)' '
    - 	# region at beginning
    + test_expect_success 'excluded region at beginning' '
     -	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
     +	for_each_ref__exclude refs/heads refs/heads/bar >actual 2>perf &&
      	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_skips 1 perf
    ++	assert_jumps 1 perf
      '
      
    - test_expect_success 'for_each_ref__exclude(refs/heads/quux/)' '
    - 	# region at end
    + test_expect_success 'excluded region at end' '
     -	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
     +	for_each_ref__exclude refs/heads refs/heads/quux >actual 2>perf &&
      	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_skips 1 perf
    ++	assert_jumps 1 perf
      '
      
    - test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/quux/)' '
    - 	# disjoint regions
    + test_expect_success 'disjoint excluded regions' '
     -	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
     +	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual 2>perf &&
      	for_each_ref refs/heads/baz refs/heads/foo >expect &&
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_skips 2 perf
    ++	assert_jumps 2 perf
      '
      
    - test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/baz/)' '
    - 	# adjacent, non-overlapping regions
    + test_expect_success 'adjacent, non-overlapping excluded regions' '
     -	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
     +	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual 2>perf &&
      	for_each_ref refs/heads/foo refs/heads/quux >expect &&
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_skips 1 perf
    ++	assert_jumps 1 perf
      '
      
    - test_expect_success 'for_each_ref__exclude(refs/heads/ba refs/heads/baz/)' '
    - 	# overlapping region
    + test_expect_success 'overlapping excluded regions' '
     -	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
     +	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf &&
      	for_each_ref refs/heads/foo refs/heads/quux >expect &&
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_skips 1 perf
    ++	assert_jumps 1 perf
      '
      
    - test_expect_success 'for_each_ref__exclude(refs/heads/does/not/exist)' '
    - 	# empty region
    + test_expect_success 'several overlapping excluded regions' '
    + 	for_each_ref__exclude refs/heads \
    +-		refs/heads/bar refs/heads/baz refs/heads/foo >actual &&
    ++		refs/heads/bar refs/heads/baz refs/heads/foo >actual 2>perf &&
    + 	for_each_ref refs/heads/quux >expect &&
    + 
    +-	test_cmp expect actual
    ++	test_cmp expect actual &&
    ++	assert_jumps 1 perf
    + '
    + 
    + test_expect_success 'non-matching excluded section' '
     -	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
     +	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual 2>perf &&
      	for_each_ref >expect &&
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_no_skips
    ++	assert_no_jumps
      '
      
    - test_expect_success 'for_each_ref__exclude(refs/heads/ba*)' '
    - 	# discards meta-characters
    + test_expect_success 'meta-characters are discarded' '
     -	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
     +	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual 2>perf &&
      	for_each_ref >expect &&
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_no_skips
    ++	assert_no_jumps
      '
      
      test_done
    @@ trace2.h: enum trace2_counter_id {
      	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
      	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
      
    -+	TRACE2_COUNTER_ID_PACKED_REFS_SKIPS, /* counts number of skips */
    ++	TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, /* counts number of jumps */
     +
      	/* Add additional counter definitions before here. */
      	TRACE2_NUMBER_OF_COUNTERS
    @@ trace2/tr2_ctr.c: static struct tr2_counter_metadata tr2_counter_metadata[TRACE2
      		.name = "test2",
      		.want_per_thread_events = 1,
      	},
    -+	[TRACE2_COUNTER_ID_PACKED_REFS_SKIPS] = {
    ++	[TRACE2_COUNTER_ID_PACKED_REFS_JUMPS] = {
     +		.category = "packed-refs",
    -+		.name = "skips_made",
    ++		.name = "jumps_made",
     +		.want_per_thread_events = 0,
     +	},
      
11:  5b9814ad8c ! 11:  f765b50a84 revision.h: store hidden refs in a `strvec`
    @@ revision.c: void init_ref_exclusions(struct ref_exclusions *exclusions)
     
      ## revision.h ##
     @@
    - #include "commit-slab-decl.h"
    + #include "decorate.h"
      #include "ident.h"
      #include "list-objects-filter-options.h"
     +#include "strvec.h"
12:  dd5b34185c ! 12:  254bcc4361 refs/packed-backend.c: ignore complicated hidden refs rules
    @@ Commit message
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## refs/packed-backend.c ##
    -@@ refs/packed-backend.c: static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
    +@@ refs/packed-backend.c: static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
      	if (!excluded_patterns)
      		return;
      
    @@ refs/packed-backend.c: static void populate_excluded_skip_list(struct packed_ref
     +		 * '^' rules.
     +		 *
     +		 * Both are possible to do, but complicated, so avoid
    -+		 * populating the skip list at all if we see either of
    ++		 * populating the jump list at all if we see either of
     +		 * these patterns.
     +		 */
     +		if (**pattern == '!' || **pattern == '^')
    @@ refs/packed-backend.c: static void populate_excluded_skip_list(struct packed_ref
     +	}
     +
      	for (pattern = excluded_patterns; *pattern; pattern++) {
    - 		struct skip_list_entry *e;
    + 		struct jump_list_entry *e;
      
     
      ## t/t1419-exclude-refs.sh ##
    -@@ t/t1419-exclude-refs.sh: test_expect_success 'for_each_ref__exclude(refs/heads/ba*)' '
    - 	assert_no_skips
    +@@ t/t1419-exclude-refs.sh: test_expect_success 'meta-characters are discarded' '
    + 	assert_no_jumps
      '
      
    -+test_expect_success 'for_each_ref__exclude(refs/heads/foo, !refs/heads/foo/1)' '
    -+	# discards complex hidden ref rules
    ++test_expect_success 'complex hidden ref rules are discarded' '
     +	for_each_ref__exclude refs/heads refs/heads/foo "!refs/heads/foo/1" \
     +		>actual 2>perf &&
     +	for_each_ref >expect &&
     +
     +	test_cmp expect actual &&
    -+	assert_no_skips
    ++	assert_no_jumps
     +'
     +
      test_done
13:  c65b3dea81 = 13:  50e7df7dc0 refs.h: let `for_each_namespaced_ref()` take excluded patterns
15:  7d3383083d = 14:  f6a3a5a6ba builtin/receive-pack.c: avoid enumerating hidden references
14:  44bbf85e73 ! 15:  2331fa7a4d upload-pack.c: avoid enumerating hidden refs where possible
    @@ upload-pack.c: static int get_common_commits(struct upload_pack_data *data,
     +				      struct upload_pack_data *data)
     +{
     +	/*
    -+	 * If `data->allow_uor` allows updating hidden refs, we need to
    ++	 * If `data->allow_uor` allows fetching hidden refs, we need to
     +	 * mark all references (including hidden ones), to check in
     +	 * `is_our_ref()` below.
     +	 *
 -:  ---------- > 16:  2c6b89d64a ls-refs.c: avoid enumerating hidden refs where possible
-- 
2.40.1.572.g5c4ab523ef

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v2 01/16] refs.c: rename `ref_filter`
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

The refs machinery has its own implementation of a `ref_filter` (used by
`for-each-ref`), which is distinct from the `ref-filler.h` API (also
used by `for-each-ref`, among other things).

Rename the one within refs.c to more clearly indicate its purpose.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/refs.c b/refs.c
index d2a98e1c21..b9b77d2eff 100644
--- a/refs.c
+++ b/refs.c
@@ -375,8 +375,8 @@ char *resolve_refdup(const char *refname, int resolve_flags,
 				   oid, flags);
 }
 
-/* The argument to filter_refs */
-struct ref_filter {
+/* The argument to for_each_filter_refs */
+struct for_each_ref_filter {
 	const char *pattern;
 	const char *prefix;
 	each_ref_fn *fn;
@@ -409,10 +409,11 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
-static int filter_refs(const char *refname, const struct object_id *oid,
-			   int flags, void *data)
+static int for_each_filter_refs(const char *refname,
+				const struct object_id *oid,
+				int flags, void *data)
 {
-	struct ref_filter *filter = (struct ref_filter *)data;
+	struct for_each_ref_filter *filter = data;
 
 	if (wildmatch(filter->pattern, refname, 0))
 		return 0;
@@ -569,7 +570,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	const char *prefix, void *cb_data)
 {
 	struct strbuf real_pattern = STRBUF_INIT;
-	struct ref_filter filter;
+	struct for_each_ref_filter filter;
 	int ret;
 
 	if (!prefix && !starts_with(pattern, "refs/"))
@@ -589,7 +590,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	filter.prefix = prefix;
 	filter.fn = fn;
 	filter.cb_data = cb_data;
-	ret = for_each_ref(filter_refs, &filter);
+	ret = for_each_ref(for_each_filter_refs, &filter);
 
 	strbuf_release(&real_pattern);
 	return ret;
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 02/16] ref-filter.h: provide `REF_FILTER_INIT`
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 01/16] refs.c: rename `ref_filter` Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

Provide a sane initialization value for `struct ref_filter`, which in a
subsequent patch will be used to initialize a new field.

In the meantime, fix a case in test-reach.c where its `ref_filter` is
not even zero-initialized.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/branch.c       | 3 +--
 builtin/for-each-ref.c | 3 +--
 builtin/tag.c          | 3 +--
 ref-filter.h           | 3 +++
 t/helper/test-reach.c  | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 501c47657c..03bb8e414c 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -662,7 +662,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	int reflog = 0, quiet = 0, icase = 0, force = 0,
 	    recurse_submodules_explicit = 0;
 	enum branch_track track;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	static struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -720,7 +720,6 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 
 	setup_ref_filter_porcelain_msg();
 
-	memset(&filter, 0, sizeof(filter));
 	filter.kind = FILTER_REFS_BRANCHES;
 	filter.abbrev = -1;
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 695fc8f4a5..99ccb73518 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -24,7 +24,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	int maxcount = 0, icase = 0, omit_empty = 0;
 	struct ref_array array;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_format format = REF_FORMAT_INIT;
 	struct strbuf output = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
@@ -61,7 +61,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	};
 
 	memset(&array, 0, sizeof(array));
-	memset(&filter, 0, sizeof(filter));
 
 	format.format = "%(objectname) %(objecttype)\t%(refname)";
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 1850a6a6fd..6b41bb7374 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -443,7 +443,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	struct msg_arg msg = { .buf = STRBUF_INIT };
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -501,7 +501,6 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	git_config(git_tag_config, &sorting_options);
 
 	memset(&opt, 0, sizeof(opt));
-	memset(&filter, 0, sizeof(filter));
 	filter.lines = -1;
 	opt.sign = -1;
 
diff --git a/ref-filter.h b/ref-filter.h
index 430701cfb7..a920f73b29 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -92,6 +92,9 @@ struct ref_format {
 	struct string_list bases;
 };
 
+#define REF_FILTER_INIT { \
+	.points_at = OID_ARRAY_INIT, \
+}
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
 	.bases = STRING_LIST_INIT_DUP, \
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5b6f217441..ef58f10c2d 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -139,7 +139,7 @@ int cmd__reach(int ac, const char **av)
 
 		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
 	} else if (!strcmp(av[1], "commit_contains")) {
-		struct ref_filter filter;
+		struct ref_filter filter = REF_FILTER_INIT;
 		struct contains_cache cache;
 		init_contains_cache(&cache);
 
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 03/16] ref-filter: clear reachable list pointers after freeing
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 01/16] refs.c: rename `ref_filter` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

In reach_filter(), we pop all commits from the reachable lists, leaving
them empty. But because we're operating on a list pointer that was
passed by value, the original filter.reachable_from pointer is left
dangling.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 4991cd4f7a..048d277cbf 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2418,13 +2418,13 @@ void ref_array_clear(struct ref_array *array)
 #define EXCLUDE_REACHED 0
 #define INCLUDE_REACHED 1
 static void reach_filter(struct ref_array *array,
-			 struct commit_list *check_reachable,
+			 struct commit_list **check_reachable,
 			 int include_reached)
 {
 	int i, old_nr;
 	struct commit **to_clear;
 
-	if (!check_reachable)
+	if (!*check_reachable)
 		return;
 
 	CALLOC_ARRAY(to_clear, array->nr);
@@ -2434,7 +2434,7 @@ static void reach_filter(struct ref_array *array,
 	}
 
 	tips_reachable_from_bases(the_repository,
-				  check_reachable,
+				  *check_reachable,
 				  to_clear, array->nr,
 				  UNINTERESTING);
 
@@ -2455,8 +2455,8 @@ static void reach_filter(struct ref_array *array,
 
 	clear_commit_marks_many(old_nr, to_clear, ALL_REV_FLAGS);
 
-	while (check_reachable) {
-		struct commit *merge_commit = pop_commit(&check_reachable);
+	while (*check_reachable) {
+		struct commit *merge_commit = pop_commit(check_reachable);
 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
 	}
 
@@ -2553,8 +2553,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	clear_contains_cache(&ref_cbdata.no_contains_cache);
 
 	/*  Filters that need revision walking */
-	reach_filter(array, filter->reachable_from, INCLUDE_REACHED);
-	reach_filter(array, filter->unreachable_from, EXCLUDE_REACHED);
+	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
+	reach_filter(array, &filter->unreachable_from, EXCLUDE_REACHED);
 
 	save_commit_buffer = save_commit_buffer_orig;
 	return ret;
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 04/16] ref-filter: add `ref_filter_clear()`
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (2 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

We did not bother to clean up at all in `git branch` or `git tag`, and
`git for-each-ref` only cleans up a couple of members.

Add and call `ref_filter_clear()` when cleaning up a `struct
ref_filter`. Running this patch (without any test changes) indicates a
couple of now leak-free tests. This was found by running:

    $ make SANITIZE=leak
    $ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate

(Note that the `reachable_from` and `unreachable_from` lists should be
cleaned as they are used. So this is just covering any case where we
might bail before running the reachability check.)

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/branch.c        |  1 +
 builtin/for-each-ref.c  |  3 +--
 builtin/tag.c           |  1 +
 ref-filter.c            | 16 ++++++++++++++++
 ref-filter.h            |  3 +++
 t/t0041-usage.sh        |  1 +
 t/t3402-rebase-merge.sh |  1 +
 7 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 03bb8e414c..c201f0cb0b 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -813,6 +813,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		print_columns(&output, colopts, NULL);
 		string_list_clear(&output, 0);
 		ref_sorting_release(sorting);
+		ref_filter_clear(&filter);
 		return 0;
 	} else if (edit_description) {
 		const char *branch_name;
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 99ccb73518..c01fa6fefe 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -120,8 +120,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	strbuf_release(&err);
 	strbuf_release(&output);
 	ref_array_clear(&array);
-	free_commit_list(filter.with_commit);
-	free_commit_list(filter.no_commit);
+	ref_filter_clear(&filter);
 	ref_sorting_release(sorting);
 	strvec_clear(&vec);
 	return 0;
diff --git a/builtin/tag.c b/builtin/tag.c
index 6b41bb7374..aab5e693fe 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -645,6 +645,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 
 cleanup:
 	ref_sorting_release(sorting);
+	ref_filter_clear(&filter);
 	strbuf_release(&buf);
 	strbuf_release(&ref);
 	strbuf_release(&reflog_msg);
diff --git a/ref-filter.c b/ref-filter.c
index 048d277cbf..d32f426898 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2866,3 +2866,19 @@ int parse_opt_merge_filter(const struct option *opt, const char *arg, int unset)
 
 	return 0;
 }
+
+void ref_filter_init(struct ref_filter *filter)
+{
+	struct ref_filter blank = REF_FILTER_INIT;
+	memcpy(filter, &blank, sizeof(blank));
+}
+
+void ref_filter_clear(struct ref_filter *filter)
+{
+	oid_array_clear(&filter->points_at);
+	free_commit_list(filter->with_commit);
+	free_commit_list(filter->no_commit);
+	free_commit_list(filter->reachable_from);
+	free_commit_list(filter->unreachable_from);
+	ref_filter_init(filter);
+}
diff --git a/ref-filter.h b/ref-filter.h
index a920f73b29..160b807224 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -170,4 +170,7 @@ void filter_ahead_behind(struct repository *r,
 			 struct ref_format *format,
 			 struct ref_array *array);
 
+void ref_filter_init(struct ref_filter *filter);
+void ref_filter_clear(struct ref_filter *filter);
+
 #endif /*  REF_FILTER_H  */
diff --git a/t/t0041-usage.sh b/t/t0041-usage.sh
index c4fc34eb18..9ea974b0c6 100755
--- a/t/t0041-usage.sh
+++ b/t/t0041-usage.sh
@@ -5,6 +5,7 @@ test_description='Test commands behavior when given invalid argument value'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup ' '
diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
index 79b0640c00..e9e03ca4b5 100755
--- a/t/t3402-rebase-merge.sh
+++ b/t/t3402-rebase-merge.sh
@@ -8,6 +8,7 @@ test_description='git rebase --merge test'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 T="A quick brown fox
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 05/16] ref-filter.c: parameterize match functions over patterns
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (3 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

`match_pattern()` and `match_name_as_path()` both take a `struct
ref_filter *`, and then store a stack variable `patterns` pointing at
`filter->patterns`.

The subsequent patch will add a new array of patterns to match over (the
excluded patterns, via a new `git for-each-ref --exclude` option),
treating the return value of these functions differently depending on
which patterns are being used to match.

Tweak `match_pattern()` and `match_name_as_path()` to take an array of
patterns to prepare for passing either in.

Once we start passing either in, `match_pattern()` will have little to
do with a particular `struct ref_filter *` instance. To clarify this,
drop it from the argument list, and replace it with the only bit of the
`ref_filter` that we care about (`filter->ignore_case`).

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index d32f426898..6d91c7cb0d 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2104,12 +2104,12 @@ static int get_ref_atom_value(struct ref_array_item *ref, int atom,
  * matches a pattern "refs/heads/mas") or a wildcard (e.g. the same ref
  * matches "refs/heads/mas*", too).
  */
-static int match_pattern(const struct ref_filter *filter, const char *refname)
+static int match_pattern(const char **patterns, const char *refname,
+			 const int ignore_case)
 {
-	const char **patterns = filter->name_patterns;
 	unsigned flags = 0;
 
-	if (filter->ignore_case)
+	if (ignore_case)
 		flags |= WM_CASEFOLD;
 
 	/*
@@ -2134,9 +2134,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
  * matches a pattern "refs/heads/" but not "refs/heads/m") or a
  * wildcard (e.g. the same ref matches "refs/heads/m*", too).
  */
-static int match_name_as_path(const struct ref_filter *filter, const char *refname)
+static int match_name_as_path(const struct ref_filter *filter,
+			      const char **pattern,
+			      const char *refname)
 {
-	const char **pattern = filter->name_patterns;
 	int namelen = strlen(refname);
 	unsigned flags = WM_PATHNAME;
 
@@ -2165,8 +2166,9 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 	if (!*filter->name_patterns)
 		return 1; /* No pattern always matches */
 	if (filter->match_as_path)
-		return match_name_as_path(filter, refname);
-	return match_pattern(filter, refname);
+		return match_name_as_path(filter, filter->name_patterns, refname);
+	return match_pattern(filter->name_patterns, refname,
+			     filter->ignore_case);
 }
 
 /*
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 06/16] builtin/for-each-ref.c: add `--exclude` option
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (4 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

When using `for-each-ref`, it is sometimes convenient for the caller to
be able to exclude certain parts of the references.

For example, if there are many `refs/__hidden__/*` references, the
caller may want to emit all references *except* the hidden ones.
Currently, the only way to do this is to post-process the output, like:

    $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/'

Which is do-able, but requires processing a potentially large quantity
of references.

Teach `git for-each-ref` a new `--exclude=<pattern>` option, which
excludes references from the results if they match one or more excluded
patterns.

This patch provides a naive implementation where the `ref_filter` still
sees all references (including ones that it will discard) and is left to
check whether each reference matches any excluded pattern(s) before
emitting them.

By culling out references we know the caller doesn't care about, we can
avoid allocating memory for their storage, as well as spending time
sorting the output (among other things). Even the naive implementation
provides a significant speed-up on a modified copy of linux.git (that
has a hidden ref pointing at each commit):

    $ hyperfine \
      'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/'
    Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     820.1 ms ±   2.0 ms    [User: 703.7 ms, System: 152.0 ms]
      Range (min … max):   817.7 ms … 823.3 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/
      Time (mean ± σ):     106.6 ms ±   1.1 ms    [User: 99.4 ms, System: 7.1 ms]
      Range (min … max):   104.7 ms … 109.1 ms    27 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran
        7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"'

Subsequent patches will improve on this by avoiding visiting excluded
sections of the `packed-refs` file in certain cases.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-for-each-ref.txt |  6 +++++
 builtin/for-each-ref.c             |  1 +
 ref-filter.c                       | 13 +++++++++++
 ref-filter.h                       |  6 +++++
 t/t6300-for-each-ref.sh            | 35 ++++++++++++++++++++++++++++++
 5 files changed, 61 insertions(+)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 1e215d4e73..5743eb5def 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -14,6 +14,7 @@ SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
+		   [--exclude=<pattern> ...]
 
 DESCRIPTION
 -----------
@@ -102,6 +103,11 @@ OPTIONS
 	Do not print a newline after formatted refs where the format expands
 	to the empty string.
 
+--exclude=<pattern>::
+	If one or more patterns are given, only refs which do not match
+	any excluded pattern(s) are shown. Matching is done using the
+	same rules as `<pattern>` above.
+
 FIELD NAMES
 -----------
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index c01fa6fefe..3384987428 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -47,6 +47,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
+		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
 		OPT_CALLBACK(0, "points-at", &filter.points_at,
 			     N_("object"), N_("print only refs which points at the given object"),
diff --git a/ref-filter.c b/ref-filter.c
index 6d91c7cb0d..d44418efb7 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2171,6 +2171,15 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 			     filter->ignore_case);
 }
 
+static int filter_exclude_match(struct ref_filter *filter, const char *refname)
+{
+	if (!filter->exclude.nr)
+		return 0;
+	if (filter->match_as_path)
+		return match_name_as_path(filter, filter->exclude.v, refname);
+	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -2338,6 +2347,9 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 	if (!filter_pattern_match(filter, refname))
 		return 0;
 
+	if (filter_exclude_match(filter, refname))
+		return 0;
+
 	if (filter->points_at.nr && !match_points_at(&filter->points_at, oid, refname))
 		return 0;
 
@@ -2877,6 +2889,7 @@ void ref_filter_init(struct ref_filter *filter)
 
 void ref_filter_clear(struct ref_filter *filter)
 {
+	strvec_clear(&filter->exclude);
 	oid_array_clear(&filter->points_at);
 	free_commit_list(filter->with_commit);
 	free_commit_list(filter->no_commit);
diff --git a/ref-filter.h b/ref-filter.h
index 160b807224..1524bc463a 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -6,6 +6,7 @@
 #include "refs.h"
 #include "commit.h"
 #include "string-list.h"
+#include "strvec.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -59,6 +60,7 @@ struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 	struct commit_list *no_commit;
@@ -94,6 +96,7 @@ struct ref_format {
 
 #define REF_FILTER_INIT { \
 	.points_at = OID_ARRAY_INIT, \
+	.exclude = STRVEC_INIT, \
 }
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
@@ -112,6 +115,9 @@ struct ref_format {
 #define OPT_REF_SORT(var) \
 	OPT_STRING_LIST(0, "sort", (var), \
 			N_("key"), N_("field name to sort on"))
+#define OPT_REF_FILTER_EXCLUDE(var) \
+	OPT_STRVEC(0, "exclude", &(var)->exclude, \
+		   N_("pattern"), N_("exclude refs which match pattern"))
 
 /*
  * API for filtering a set of refs. Based on the type of refs the user
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 5c00607608..7e8d578522 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -447,6 +447,41 @@ test_expect_success 'exercise glob patterns with prefixes' '
 	test_cmp expected actual
 '
 
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with prefix exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude=refs/tags/foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/foo/one
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with pattern exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude="refs/tags/foo/t*" >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<\EOF
 'refs/heads/main'
 'refs/remotes/origin/main'
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 07/16] refs: plumb `exclude_patterns` argument throughout
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (5 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The subsequent patch will want to access an optional `excluded_patterns`
array within `refs/packed-backend.c` that will cull out certain
references matching any of the given patterns on a best-effort basis.

To do so, the refs subsystem needs to be updated to pass this value
across a number of different locations.

Prepare for a future patch by introducing this plumbing now, passing
NULLs at top-level APIs in order to make that patch less noisy and more
easily readable.

Signed-off-by: Taylor Blau <me@ttaylorr.co>
---
 ls-refs.c             |  2 +-
 ref-filter.c          |  5 +++--
 refs.c                | 32 +++++++++++++++++++-------------
 refs.h                |  8 +++++++-
 refs/debug.c          |  5 +++--
 refs/files-backend.c  |  5 +++--
 refs/packed-backend.c |  5 +++--
 refs/refs-internal.h  |  7 ++++---
 revision.c            |  2 +-
 9 files changed, 44 insertions(+), 27 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index f385938b64..6f490b2d9c 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -193,7 +193,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 		strvec_push(&data.prefixes, "");
 	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
 					  get_git_namespace(), data.prefixes.v,
-					  send_ref, &data);
+					  NULL, send_ref, &data);
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
diff --git a/ref-filter.c b/ref-filter.c
index d44418efb7..717c3c4bcf 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2209,12 +2209,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return for_each_fullref_in("", cb, cb_data);
+		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+						 "", NULL, cb, cb_data);
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
-						 cb, cb_data);
+						 NULL, cb, cb_data);
 }
 
 /*
diff --git a/refs.c b/refs.c
index b9b77d2eff..538bde644e 100644
--- a/refs.c
+++ b/refs.c
@@ -1526,7 +1526,9 @@ int head_ref(each_ref_fn fn, void *cb_data)
 
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim,
+		const char *prefix,
+		const char **exclude_patterns,
+		int trim,
 		enum do_for_each_ref_flags flags)
 {
 	struct ref_iterator *iter;
@@ -1542,8 +1544,7 @@ struct ref_iterator *refs_ref_iterator_begin(
 		}
 	}
 
-	iter = refs->be->iterator_begin(refs, prefix, flags);
-
+	iter = refs->be->iterator_begin(refs, prefix, exclude_patterns, flags);
 	/*
 	 * `iterator_begin()` already takes care of prefix, but we
 	 * might need to do some trimming:
@@ -1577,7 +1578,7 @@ static int do_for_each_repo_ref(struct repository *r, const char *prefix,
 	if (!refs)
 		return 0;
 
-	iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
+	iter = refs_ref_iterator_begin(refs, prefix, NULL, trim, flags);
 
 	return do_for_each_repo_ref_iterator(r, iter, fn, cb_data);
 }
@@ -1599,6 +1600,7 @@ static int do_for_each_ref_helper(struct repository *r,
 }
 
 static int do_for_each_ref(struct ref_store *refs, const char *prefix,
+			   const char **exclude_patterns,
 			   each_ref_fn fn, int trim,
 			   enum do_for_each_ref_flags flags, void *cb_data)
 {
@@ -1608,7 +1610,8 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 	if (!refs)
 		return 0;
 
-	iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
+	iter = refs_ref_iterator_begin(refs, prefix, exclude_patterns, trim,
+				       flags);
 
 	return do_for_each_repo_ref_iterator(the_repository, iter,
 					do_for_each_ref_helper, &hp);
@@ -1616,7 +1619,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "", NULL, fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1627,7 +1630,7 @@ int for_each_ref(each_ref_fn fn, void *cb_data)
 int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
 			 each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, prefix, fn, strlen(prefix), 0, cb_data);
+	return do_for_each_ref(refs, prefix, NULL, fn, strlen(prefix), 0, cb_data);
 }
 
 int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
@@ -1638,13 +1641,14 @@ int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 {
 	return do_for_each_ref(get_main_ref_store(the_repository),
-			       prefix, fn, 0, 0, cb_data);
+			       prefix, NULL, fn, 0, 0, cb_data);
 }
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
 			     each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, prefix, fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
 }
 
 int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data)
@@ -1661,14 +1665,14 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 	int ret;
 	strbuf_addf(&buf, "%srefs/", get_git_namespace());
 	ret = do_for_each_ref(get_main_ref_store(the_repository),
-			      buf.buf, fn, 0, 0, cb_data);
+			      buf.buf, NULL, fn, 0, 0, cb_data);
 	strbuf_release(&buf);
 	return ret;
 }
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
+	return do_for_each_ref(refs, "", NULL, fn, 0,
 			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
 }
 
@@ -1738,6 +1742,7 @@ static void find_longest_prefixes(struct string_list *out,
 int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 				      const char *namespace,
 				      const char **patterns,
+				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data)
 {
 	struct string_list prefixes = STRING_LIST_INIT_DUP;
@@ -1753,7 +1758,8 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 
 	for_each_string_list_item(prefix, &prefixes) {
 		strbuf_addstr(&buf, prefix->string);
-		ret = refs_for_each_fullref_in(ref_store, buf.buf, fn, cb_data);
+		ret = refs_for_each_fullref_in(ref_store, buf.buf,
+					       exclude_patterns, fn, cb_data);
 		if (ret)
 			break;
 		strbuf_setlen(&buf, namespace_len);
@@ -2408,7 +2414,7 @@ int refs_verify_refname_available(struct ref_store *refs,
 	strbuf_addstr(&dirname, refname + dirname.len);
 	strbuf_addch(&dirname, '/');
 
-	iter = refs_ref_iterator_begin(refs, dirname.buf, 0,
+	iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 				       DO_FOR_EACH_INCLUDE_BROKEN);
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		if (skip &&
diff --git a/refs.h b/refs.h
index 123cfa4424..d672d636cf 100644
--- a/refs.h
+++ b/refs.h
@@ -338,6 +338,7 @@ int for_each_ref(each_ref_fn fn, void *cb_data);
 int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
 			     each_ref_fn fn, void *cb_data);
 int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
@@ -345,10 +346,15 @@ int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data);
  * iterate all refs in "patterns" by partitioning patterns into disjoint sets
  * and iterating the longest-common prefix of each set.
  *
+ * references matching any pattern in "exclude_patterns" are omitted from the
+ * result set on a best-effort basis.
+ *
  * callers should be prepared to ignore references that they did not ask for.
  */
 int refs_for_each_fullref_in_prefixes(struct ref_store *refs,
-				      const char *namespace, const char **patterns,
+				      const char *namespace,
+				      const char **patterns,
+				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data);
 
 /**
diff --git a/refs/debug.c b/refs/debug.c
index 6f11e6de46..328f894177 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -229,11 +229,12 @@ static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 
 static struct ref_iterator *
 debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
-			 unsigned int flags)
+			 const char **exclude_patterns, unsigned int flags)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
 	struct ref_iterator *res =
-		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+		drefs->refs->be->iterator_begin(drefs->refs, prefix,
+						exclude_patterns, flags);
 	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
 	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
 	diter->iter = res;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index bca7b851c5..3bc3c57c05 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -829,7 +829,8 @@ static struct ref_iterator_vtable files_ref_iterator_vtable = {
 
 static struct ref_iterator *files_ref_iterator_begin(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags)
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags)
 {
 	struct files_ref_store *refs;
 	struct ref_iterator *loose_iter, *packed_iter, *overlay_iter;
@@ -874,7 +875,7 @@ static struct ref_iterator *files_ref_iterator_begin(
 	 * the packed and loose references.
 	 */
 	packed_iter = refs_ref_iterator_begin(
-			refs->packed_ref_store, prefix, 0,
+			refs->packed_ref_store, prefix, exclude_patterns, 0,
 			DO_FOR_EACH_INCLUDE_BROKEN);
 
 	overlay_iter = overlay_ref_iterator_begin(loose_iter, packed_iter);
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 5b412a133b..176bd3905b 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -924,7 +924,8 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 
 static struct ref_iterator *packed_ref_iterator_begin(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags)
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags)
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
@@ -1149,7 +1150,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 * list of refs is exhausted, set iter to NULL. When the list
 	 * of updates is exhausted, leave i set to updates->nr.
 	 */
-	iter = packed_ref_iterator_begin(&refs->base, "",
+	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
 	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
 		iter = NULL;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index a85d113123..28a11b9d61 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -367,8 +367,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
  */
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim,
-		enum do_for_each_ref_flags flags);
+		const char *prefix, const char **exclude_patterns,
+		int trim, enum do_for_each_ref_flags flags);
 
 /*
  * A callback function used to instruct merge_ref_iterator how to
@@ -570,7 +570,8 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
  */
 typedef struct ref_iterator *ref_iterator_begin_fn(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags);
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags);
 
 /* reflog functions */
 
diff --git a/revision.c b/revision.c
index b33cc1d106..89953592f9 100644
--- a/revision.c
+++ b/revision.c
@@ -2670,7 +2670,7 @@ static int for_each_bisect_ref(struct ref_store *refs, each_ref_fn fn,
 	struct strbuf bisect_refs = STRBUF_INIT;
 	int status;
 	strbuf_addf(&bisect_refs, "refs/bisect/%s", term);
-	status = refs_for_each_fullref_in(refs, bisect_refs.buf, fn, cb_data);
+	status = refs_for_each_fullref_in(refs, bisect_refs.buf, NULL, fn, cb_data);
 	strbuf_release(&bisect_refs);
 	return status;
 }
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 08/16] refs/packed-backend.c: refactor `find_reference_location()`
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (6 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The function `find_reference_location()` is used to perform a
binary search-like function over the contents of a repository's
`$GIT_DIR/packed-refs` file.

The search it implements is unlike a standard binary search in that the
records it searches over are not of a fixed width, so the comparison
must locate the end of a record before comparing it.

Extract the core routine of `find_reference_location()` in order to
implement a function in the following patch which will find the first
location in the `packed-refs` file that *doesn't* match the given
pattern.

The behavior of `find_reference_location()` is unchanged.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 176bd3905b..33639f73e1 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -527,22 +527,8 @@ static int load_contents(struct snapshot *snapshot)
 	return 1;
 }
 
-/*
- * Find the place in `snapshot->buf` where the start of the record for
- * `refname` starts. If `mustexist` is true and the reference doesn't
- * exist, then return NULL. If `mustexist` is false and the reference
- * doesn't exist, then return the point where that reference would be
- * inserted, or `snapshot->eof` (which might be NULL) if it would be
- * inserted at the end of the file. In the latter mode, `refname`
- * doesn't have to be a proper reference name; for example, one could
- * search for "refs/replace/" to find the start of any replace
- * references.
- *
- * The record is sought using a binary search, so `snapshot->buf` must
- * be sorted.
- */
-static const char *find_reference_location(struct snapshot *snapshot,
-					   const char *refname, int mustexist)
+static const char *find_reference_location_1(struct snapshot *snapshot,
+					     const char *refname, int mustexist)
 {
 	/*
 	 * This is not *quite* a garden-variety binary search, because
@@ -588,6 +574,26 @@ static const char *find_reference_location(struct snapshot *snapshot,
 		return lo;
 }
 
+/*
+ * Find the place in `snapshot->buf` where the start of the record for
+ * `refname` starts. If `mustexist` is true and the reference doesn't
+ * exist, then return NULL. If `mustexist` is false and the reference
+ * doesn't exist, then return the point where that reference would be
+ * inserted, or `snapshot->eof` (which might be NULL) if it would be
+ * inserted at the end of the file. In the latter mode, `refname`
+ * doesn't have to be a proper reference name; for example, one could
+ * search for "refs/replace/" to find the start of any replace
+ * references.
+ *
+ * The record is sought using a binary search, so `snapshot->buf` must
+ * be sorted.
+ */
+static const char *find_reference_location(struct snapshot *snapshot,
+					   const char *refname, int mustexist)
+{
+	return find_reference_location_1(snapshot, refname, mustexist);
+}
+
 /*
  * Create a newly-allocated `snapshot` of the `packed-refs` file in
  * its current state and return it. The return value will already have
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (7 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-06-06  7:00     ` Patrick Steinhardt
  2023-05-15 19:23   ` [PATCH v2 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
                     ` (7 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

When iterating through the `packed-refs` file in order to answer a query
like:

    $ git for-each-ref --exclude=refs/__hidden__

it would be useful to avoid walking over all of the entries in
`refs/__hidden__/*` when possible, since we know that the ref-filter
code is going to throw them away anyways.

In certain circumstances, doing so is possible. The algorithm for doing
so is as follows:

  - For each excluded pattern, find the first record that matches it,
    and the first record that *doesn't* match it (i.e. the location
    you'd next want to consider when excluding that pattern).

  - Sort the set of excluded regions from the previous step in ascending
    order of the first location within the `packed-refs` file that
    matches.

  - Clean up the results from the previous step: discard empty regions,
    and combine adjacent regions.

Then when iterating through the `packed-refs` file, if `iter->pos` is
ever contained in one of the regions from the previous steps, advance
`iter->pos` past the end of that region, and continue enumeration.

Note that we only perform this optimization when none of the excluded
pattern(s) have special meta-characters in them. For a pattern like
"refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and
everything underneath them) are not connected. A future implementation
that handles this case may split the character class (pretending as if
two patterns were excluded: "refs/fooa", and "refs/fooc").

There are a few other gotchas worth considering. First, note that the
jump list is sorted, so once we jump past a region, we can avoid
considering it (or any regions preceding it) again. The member
`jump_pos` is used to track the first next-possible region to jump
through.

Second, note that the exclusion list is best-effort, since we do not
handle loose references, and because of the meta-character issue above.

In repositories with a large number of hidden references, the speed-up
can be significant. Tests here are done with a copy of linux.git with a
reference "refs/pull/N" pointing at every commit, as in:

    $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' |
        git update-ref --stdin
    $ git pack-refs --all

, it is significantly faster to have `for-each-ref` jump over the
excluded references, as opposed to filtering them out after the fact:

    $ hyperfine \
      'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
    Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
      Range (min … max):   800.0 ms … 807.7 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
      Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
      Range (min … max):     4.3 ms …   6.7 ms    422 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
      172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'

Using the jump list is fairly straightforward (see the changes to
`refs/packed-backend.c::next_record()`), but constructing the list is
not. To ensure that the construction is correct, add a new suite of
tests in t1419 covering various corner cases (overlapping regions,
partially overlapping regions, adjacent regions, etc.).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c              |   5 +-
 refs/packed-backend.c     | 166 ++++++++++++++++++++++++++++++++++++--
 t/helper/test-ref-store.c |  10 +++
 t/t1419-exclude-refs.sh   | 101 +++++++++++++++++++++++
 4 files changed, 274 insertions(+), 8 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

diff --git a/ref-filter.c b/ref-filter.c
index 717c3c4bcf..ddc7f5204f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2210,12 +2210,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
 		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						 "", NULL, cb, cb_data);
+						 "", filter->exclude.v, cb, cb_data);
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
-						 NULL, cb, cb_data);
+						 filter->exclude.v,
+						 cb, cb_data);
 }
 
 /*
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 33639f73e1..67327e579c 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -303,7 +303,8 @@ static int cmp_packed_ref_records(const void *v1, const void *v2)
  * Compare a snapshot record at `rec` to the specified NUL-terminated
  * refname.
  */
-static int cmp_record_to_refname(const char *rec, const char *refname)
+static int cmp_record_to_refname(const char *rec, const char *refname,
+				 int start)
 {
 	const char *r1 = rec + the_hash_algo->hexsz + 1;
 	const char *r2 = refname;
@@ -312,7 +313,7 @@ static int cmp_record_to_refname(const char *rec, const char *refname)
 		if (*r1 == '\n')
 			return *r2 ? -1 : 0;
 		if (!*r2)
-			return 1;
+			return start ? 1 : -1;
 		if (*r1 != *r2)
 			return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
 		r1++;
@@ -528,7 +529,8 @@ static int load_contents(struct snapshot *snapshot)
 }
 
 static const char *find_reference_location_1(struct snapshot *snapshot,
-					     const char *refname, int mustexist)
+					     const char *refname, int mustexist,
+					     int start)
 {
 	/*
 	 * This is not *quite* a garden-variety binary search, because
@@ -558,7 +560,7 @@ static const char *find_reference_location_1(struct snapshot *snapshot,
 
 		mid = lo + (hi - lo) / 2;
 		rec = find_start_of_record(lo, mid);
-		cmp = cmp_record_to_refname(rec, refname);
+		cmp = cmp_record_to_refname(rec, refname, start);
 		if (cmp < 0) {
 			lo = find_end_of_record(mid, hi);
 		} else if (cmp > 0) {
@@ -591,7 +593,22 @@ static const char *find_reference_location_1(struct snapshot *snapshot,
 static const char *find_reference_location(struct snapshot *snapshot,
 					   const char *refname, int mustexist)
 {
-	return find_reference_location_1(snapshot, refname, mustexist);
+	return find_reference_location_1(snapshot, refname, mustexist, 1);
+}
+
+/*
+ * Find the place in `snapshot->buf` after the end of the record for
+ * `refname`. In other words, find the location of first thing *after*
+ * `refname`.
+ *
+ * Other semantics are identical to the ones in
+ * `find_reference_location()`.
+ */
+static const char *find_reference_location_end(struct snapshot *snapshot,
+					       const char *refname,
+					       int mustexist)
+{
+	return find_reference_location_1(snapshot, refname, mustexist, 0);
 }
 
 /*
@@ -785,6 +802,13 @@ struct packed_ref_iterator {
 	/* The end of the part of the buffer that will be iterated over: */
 	const char *eof;
 
+	struct jump_list_entry {
+		const char *start;
+		const char *end;
+	} *jump;
+	size_t jump_nr, jump_alloc;
+	size_t jump_pos;
+
 	/* Scratch space for current values: */
 	struct object_id oid, peeled;
 	struct strbuf refname_buf;
@@ -802,14 +826,34 @@ struct packed_ref_iterator {
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
-	const char *p = iter->pos, *eol;
+	const char *p, *eol;
 
 	strbuf_reset(&iter->refname_buf);
 
+	/*
+	 * If iter->pos is contained within a skipped region, jump past
+	 * it.
+	 *
+	 * Note that each skipped region is considered at most once,
+	 * since they are ordered based on their starting position.
+	 */
+	while (iter->jump_pos < iter->jump_nr) {
+		struct jump_list_entry *curr = &iter->jump[iter->jump_pos];
+		if (iter->pos < curr->start)
+			break; /* not to the next jump yet */
+
+		iter->jump_pos++;
+		if (iter->pos < curr->end) {
+			iter->pos = curr->end;
+			break;
+		}
+	}
+
 	if (iter->pos == iter->eof)
 		return ITER_DONE;
 
 	iter->base.flags = REF_ISPACKED;
+	p = iter->pos;
 
 	if (iter->eof - p < the_hash_algo->hexsz + 2 ||
 	    parse_oid_hex(p, &iter->oid, &p) ||
@@ -917,6 +961,7 @@ static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
 	int ok = ITER_DONE;
 
 	strbuf_release(&iter->refname_buf);
+	free(iter->jump);
 	release_snapshot(iter->snapshot);
 	base_ref_iterator_free(ref_iterator);
 	return ok;
@@ -928,6 +973,112 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.abort = packed_ref_iterator_abort
 };
 
+static int jump_list_entry_cmp(const void *va, const void *vb)
+{
+	const struct jump_list_entry *a = va;
+	const struct jump_list_entry *b = vb;
+
+	if (a->start < b->start)
+		return -1;
+	if (a->start > b->start)
+		return 1;
+	return 0;
+}
+
+static int has_glob_special(const char *str)
+{
+	const char *p;
+	for (p = str; *p; p++) {
+		if (is_glob_special(*p))
+			return 1;
+	}
+	return 0;
+}
+
+static const char *ptr_max(const char *x, const char *y)
+{
+	if (x > y)
+		return x;
+	return y;
+}
+
+static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
+					struct snapshot *snapshot,
+					const char **excluded_patterns)
+{
+	size_t i, j;
+	const char **pattern;
+	struct jump_list_entry *last_disjoint;
+
+	if (!excluded_patterns)
+		return;
+
+	for (pattern = excluded_patterns; *pattern; pattern++) {
+		struct jump_list_entry *e;
+
+		/*
+		 * We can't feed any excludes with globs in them to the
+		 * refs machinery.  It only understands prefix matching.
+		 * We likewise can't even feed the string leading up to
+		 * the first meta-character, as something like "foo[a]"
+		 * should not exclude "foobar" (but the prefix "foo"
+		 * would match that and mark it for exclusion).
+		 */
+		if (has_glob_special(*pattern))
+			continue;
+
+		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
+
+		e = &iter->jump[iter->jump_nr++];
+		e->start = find_reference_location(snapshot, *pattern, 0);
+		e->end = find_reference_location_end(snapshot, *pattern, 0);
+	}
+
+	if (!iter->jump_nr) {
+		/*
+		 * Every entry in exclude_patterns has a meta-character,
+		 * nothing to do here.
+		 */
+		return;
+	}
+
+	QSORT(iter->jump, iter->jump_nr, jump_list_entry_cmp);
+
+	/*
+	 * As an optimization, merge adjacent entries in the jump list
+	 * to jump forwards as far as possible when entering a skipped
+	 * region.
+	 *
+	 * For example, if we have two skipped regions:
+	 *
+	 *	[[A, B], [B, C]]
+	 *
+	 * we want to combine that into a single entry jumping from A to
+	 * C.
+	 */
+	last_disjoint = iter->jump;
+
+	for (i = 1, j = 1; i < iter->jump_nr; i++) {
+		struct jump_list_entry *ours = &iter->jump[i];
+
+		if (ours->start == ours->end) {
+			/* ignore empty regions (no matching entries) */
+			continue;
+		} else if (ours->start <= last_disjoint->end) {
+			/* overlapping regions extend the previous one */
+			last_disjoint->end = ptr_max(last_disjoint->end, ours->end);
+		} else {
+			/* otherwise, insert a new region */
+			iter->jump[j++] = *ours;
+			last_disjoint = ours;
+
+		}
+	}
+
+	iter->jump_nr = j;
+	iter->jump_pos = 0;
+}
+
 static struct ref_iterator *packed_ref_iterator_begin(
 		struct ref_store *ref_store,
 		const char *prefix, const char **exclude_patterns,
@@ -963,6 +1114,9 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
 
+	if (exclude_patterns)
+		populate_excluded_jump_list(iter, snapshot, exclude_patterns);
+
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
 
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 6d8f844e9c..2bff003f7c 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -175,6 +175,15 @@ static int cmd_for_each_ref(struct ref_store *refs, const char **argv)
 	return refs_for_each_ref_in(refs, prefix, each_ref, NULL);
 }
 
+static int cmd_for_each_ref__exclude(struct ref_store *refs, const char **argv)
+{
+	const char *prefix = notnull(*argv++, "prefix");
+	const char **exclude_patterns = argv;
+
+	return refs_for_each_fullref_in(refs, prefix, exclude_patterns, each_ref,
+					NULL);
+}
+
 static int cmd_resolve_ref(struct ref_store *refs, const char **argv)
 {
 	struct object_id oid = *null_oid();
@@ -307,6 +316,7 @@ static struct command commands[] = {
 	{ "delete-refs", cmd_delete_refs },
 	{ "rename-ref", cmd_rename_ref },
 	{ "for-each-ref", cmd_for_each_ref },
+	{ "for-each-ref--exclude", cmd_for_each_ref__exclude },
 	{ "resolve-ref", cmd_resolve_ref },
 	{ "verify-ref", cmd_verify_ref },
 	{ "for-each-reflog", cmd_for_each_reflog },
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
new file mode 100755
index 0000000000..bc534c8ea1
--- /dev/null
+++ b/t/t1419-exclude-refs.sh
@@ -0,0 +1,101 @@
+#!/bin/sh
+
+test_description='test exclude_patterns functionality in main ref store'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+for_each_ref__exclude () {
+	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
+	cut -d ' ' -f 2 actual.raw
+}
+
+for_each_ref () {
+	git for-each-ref --format='%(refname)' "$@"
+}
+
+test_expect_success 'setup' '
+	test_commit --no-tag base &&
+	base="$(git rev-parse HEAD)" &&
+
+	for name in foo bar baz quux
+	do
+		for i in 1 2 3
+		do
+			echo "create refs/heads/$name/$i $base" || return 1
+		done || return 1
+	done >in &&
+	echo "delete refs/heads/main" >>in &&
+
+	git update-ref --stdin <in &&
+	git pack-refs --all
+'
+
+test_expect_success 'excluded region in middle' '
+	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
+	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'excluded region at beginning' '
+	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'excluded region at end' '
+	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
+	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'disjoint excluded regions' '
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'adjacent, non-overlapping excluded regions' '
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'overlapping excluded regions' '
+	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'several overlapping excluded regions' '
+	for_each_ref__exclude refs/heads \
+		refs/heads/bar refs/heads/baz refs/heads/foo >actual &&
+	for_each_ref refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'non-matching excluded section' '
+	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'meta-characters are discarded' '
+	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 10/16] refs/packed-backend.c: add trace2 counters for jump list
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (8 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The previous commit added low-level tests to ensure that the packed-refs
iterator did not enumerate excluded sections of the refspace.

However, there was no guarantee that these sections weren't being
visited, only that they were being suppressed from the output. To harden
these tests, add a trace2 counter which tracks the number of regions
skipped by the packed-refs iterator, and assert on its value.

Suggested-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c   |  2 ++
 t/t1419-exclude-refs.sh | 59 ++++++++++++++++++++++++++++-------------
 trace2.h                |  2 ++
 trace2/tr2_ctr.c        |  5 ++++
 4 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 67327e579c..7ba9fa2bb8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -12,6 +12,7 @@
 #include "../chdir-notify.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
+#include "../trace2.h"
 
 enum mmap_strategy {
 	/*
@@ -845,6 +846,7 @@ static int next_record(struct packed_ref_iterator *iter)
 		iter->jump_pos++;
 		if (iter->pos < curr->end) {
 			iter->pos = curr->end;
+			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, 1);
 			break;
 		}
 	}
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
index bc534c8ea1..350a7d2587 100755
--- a/t/t1419-exclude-refs.sh
+++ b/t/t1419-exclude-refs.sh
@@ -9,7 +9,8 @@ TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 for_each_ref__exclude () {
-	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
+	GIT_TRACE2_PERF=1 test-tool ref-store main \
+		for-each-ref--exclude "$@" >actual.raw
 	cut -d ' ' -f 2 actual.raw
 }
 
@@ -17,6 +18,17 @@ for_each_ref () {
 	git for-each-ref --format='%(refname)' "$@"
 }
 
+assert_jumps () {
+	local nr="$1"
+	local trace="$2"
+
+	grep -q "name:jumps_made value:$nr" $trace
+}
+
+assert_no_jumps () {
+	! assert_jumps ".*" "$1"
+}
+
 test_expect_success 'setup' '
 	test_commit --no-tag base &&
 	base="$(git rev-parse HEAD)" &&
@@ -35,67 +47,76 @@ test_expect_success 'setup' '
 '
 
 test_expect_success 'excluded region in middle' '
-	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
+	for_each_ref__exclude refs/heads refs/heads/foo >actual 2>perf &&
 	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'excluded region at beginning' '
-	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar >actual 2>perf &&
 	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'excluded region at end' '
-	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
+	for_each_ref__exclude refs/heads refs/heads/quux >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'disjoint excluded regions' '
-	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual 2>perf &&
 	for_each_ref refs/heads/baz refs/heads/foo >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 2 perf
 '
 
 test_expect_success 'adjacent, non-overlapping excluded regions' '
-	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'overlapping excluded regions' '
-	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
+	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'several overlapping excluded regions' '
 	for_each_ref__exclude refs/heads \
-		refs/heads/bar refs/heads/baz refs/heads/foo >actual &&
+		refs/heads/bar refs/heads/baz refs/heads/foo >actual 2>perf &&
 	for_each_ref refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'non-matching excluded section' '
-	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
+	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual 2>perf &&
 	for_each_ref >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_no_jumps
 '
 
 test_expect_success 'meta-characters are discarded' '
-	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
+	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual 2>perf &&
 	for_each_ref >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_no_jumps
 '
 
 test_done
diff --git a/trace2.h b/trace2.h
index 4ced30c0db..9452e291f5 100644
--- a/trace2.h
+++ b/trace2.h
@@ -551,6 +551,8 @@ enum trace2_counter_id {
 	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
 	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
 
+	TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, /* counts number of jumps */
+
 	/* Add additional counter definitions before here. */
 	TRACE2_NUMBER_OF_COUNTERS
 };
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
index b342d3b1a3..50570d0165 100644
--- a/trace2/tr2_ctr.c
+++ b/trace2/tr2_ctr.c
@@ -27,6 +27,11 @@ static struct tr2_counter_metadata tr2_counter_metadata[TRACE2_NUMBER_OF_COUNTER
 		.name = "test2",
 		.want_per_thread_events = 1,
 	},
+	[TRACE2_COUNTER_ID_PACKED_REFS_JUMPS] = {
+		.category = "packed-refs",
+		.name = "jumps_made",
+		.want_per_thread_events = 0,
+	},
 
 	/* Add additional metadata before here. */
 };
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 11/16] revision.h: store hidden refs in a `strvec`
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (9 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-06-06  7:00     ` Patrick Steinhardt
  2023-05-15 19:23   ` [PATCH v2 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
                     ` (5 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In subsequent commits, it will be convenient to have a 'const char **'
of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`,
etc.), instead of a `string_list`.

Convert spots throughout the tree that store the list of hidden refs
from a `string_list` to a `strvec`.

Note that in `parse_hide_refs_config()` there is an ugly const-cast used
to avoid an extra copy of each value before trimming any trailing slash
characters. This could instead be written as:

    ref = xstrdup(value);
    len = strlen(ref);
    while (len && ref[len - 1] == '/')
            ref[--len] = '\0';
    strvec_push(hide_refs, ref);
    free(ref);

but the double-copy (once when calling `xstrdup()`, and another via
`strvec_push()`) is wasteful.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/receive-pack.c |  4 ++--
 ls-refs.c              |  6 +++---
 refs.c                 | 11 ++++++-----
 refs.h                 |  4 ++--
 revision.c             |  2 +-
 revision.h             |  5 +++--
 upload-pack.c          | 10 +++++-----
 7 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 1a31a58367..1a8472eddc 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -90,7 +90,7 @@ static struct object_id push_cert_oid;
 static struct signature_check sigcheck;
 static const char *push_cert_nonce;
 static const char *cert_nonce_seed;
-static struct string_list hidden_refs = STRING_LIST_INIT_DUP;
+static struct strvec hidden_refs = STRVEC_INIT;
 
 static const char *NONCE_UNSOLICITED = "UNSOLICITED";
 static const char *NONCE_BAD = "BAD";
@@ -2619,7 +2619,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		packet_flush(1);
 	oid_array_clear(&shallow);
 	oid_array_clear(&ref);
-	string_list_clear(&hidden_refs, 0);
+	strvec_clear(&hidden_refs);
 	free((void *)push_cert_nonce);
 	return 0;
 }
diff --git a/ls-refs.c b/ls-refs.c
index 6f490b2d9c..8c3181d051 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -72,7 +72,7 @@ struct ls_refs_data {
 	unsigned symrefs;
 	struct strvec prefixes;
 	struct strbuf buf;
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 	unsigned unborn : 1;
 };
 
@@ -155,7 +155,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 	memset(&data, 0, sizeof(data));
 	strvec_init(&data.prefixes);
 	strbuf_init(&data.buf, 0);
-	string_list_init_dup(&data.hidden_refs);
+	strvec_init(&data.hidden_refs);
 
 	git_config(ls_refs_config, &data);
 
@@ -197,7 +197,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
-	string_list_clear(&data.hidden_refs, 0);
+	strvec_clear(&data.hidden_refs);
 	return 0;
 }
 
diff --git a/refs.c b/refs.c
index 538bde644e..ec4d5b9101 100644
--- a/refs.c
+++ b/refs.c
@@ -1427,7 +1427,7 @@ char *shorten_unambiguous_ref(const char *refname, int strict)
 }
 
 int parse_hide_refs_config(const char *var, const char *value, const char *section,
-			   struct string_list *hide_refs)
+			   struct strvec *hide_refs)
 {
 	const char *key;
 	if (!strcmp("transfer.hiderefs", var) ||
@@ -1438,22 +1438,23 @@ int parse_hide_refs_config(const char *var, const char *value, const char *secti
 
 		if (!value)
 			return config_error_nonbool(var);
-		ref = xstrdup(value);
+
+		/* drop const to remove trailing '/' characters */
+		ref = (char *)strvec_push(hide_refs, value);
 		len = strlen(ref);
 		while (len && ref[len - 1] == '/')
 			ref[--len] = '\0';
-		string_list_append_nodup(hide_refs, ref);
 	}
 	return 0;
 }
 
 int ref_is_hidden(const char *refname, const char *refname_full,
-		  const struct string_list *hide_refs)
+		  const struct strvec *hide_refs)
 {
 	int i;
 
 	for (i = hide_refs->nr - 1; i >= 0; i--) {
-		const char *match = hide_refs->items[i].string;
+		const char *match = hide_refs->v[i];
 		const char *subject;
 		int neg = 0;
 		const char *p;
diff --git a/refs.h b/refs.h
index d672d636cf..a7751a1fc9 100644
--- a/refs.h
+++ b/refs.h
@@ -810,7 +810,7 @@ int update_ref(const char *msg, const char *refname,
 	       unsigned int flags, enum action_on_err onerr);
 
 int parse_hide_refs_config(const char *var, const char *value, const char *,
-			   struct string_list *);
+			   struct strvec *);
 
 /*
  * Check whether a ref is hidden. If no namespace is set, both the first and
@@ -820,7 +820,7 @@ int parse_hide_refs_config(const char *var, const char *value, const char *,
  * the ref is outside that namespace, the first parameter is NULL. The second
  * parameter always points to the full ref name.
  */
-int ref_is_hidden(const char *, const char *, const struct string_list *);
+int ref_is_hidden(const char *, const char *, const struct strvec *);
 
 /* Is this a per-worktree ref living in the refs/ namespace? */
 int is_per_worktree_ref(const char *refname);
diff --git a/revision.c b/revision.c
index 89953592f9..7c9367a266 100644
--- a/revision.c
+++ b/revision.c
@@ -1558,7 +1558,7 @@ void init_ref_exclusions(struct ref_exclusions *exclusions)
 void clear_ref_exclusions(struct ref_exclusions *exclusions)
 {
 	string_list_clear(&exclusions->excluded_refs, 0);
-	string_list_clear(&exclusions->hidden_refs, 0);
+	strvec_clear(&exclusions->hidden_refs);
 	exclusions->hidden_refs_configured = 0;
 }
 
diff --git a/revision.h b/revision.h
index 31828748dc..94f035fa22 100644
--- a/revision.h
+++ b/revision.h
@@ -10,6 +10,7 @@
 #include "decorate.h"
 #include "ident.h"
 #include "list-objects-filter-options.h"
+#include "strvec.h"
 
 /**
  * The revision walking API offers functions to build a list of revisions
@@ -95,7 +96,7 @@ struct ref_exclusions {
 	 * Hidden refs is a list of patterns that is to be hidden via
 	 * `ref_is_hidden()`.
 	 */
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 
 	/*
 	 * Indicates whether hidden refs have been configured. This is to
@@ -110,7 +111,7 @@ struct ref_exclusions {
  */
 #define REF_EXCLUSIONS_INIT { \
 	.excluded_refs = STRING_LIST_INIT_DUP, \
-	.hidden_refs = STRING_LIST_INIT_DUP, \
+	.hidden_refs = STRVEC_INIT, \
 }
 
 struct oidset;
diff --git a/upload-pack.c b/upload-pack.c
index 08633dc121..d77d58bdde 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -69,7 +69,7 @@ struct upload_pack_data {
 	struct object_array have_obj;
 	struct oid_array haves;					/* v2 only */
 	struct string_list wanted_refs;				/* v2 only */
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 
 	struct object_array shallows;
 	struct string_list deepen_not;
@@ -126,7 +126,7 @@ static void upload_pack_data_init(struct upload_pack_data *data)
 {
 	struct string_list symref = STRING_LIST_INIT_DUP;
 	struct string_list wanted_refs = STRING_LIST_INIT_DUP;
-	struct string_list hidden_refs = STRING_LIST_INIT_DUP;
+	struct strvec hidden_refs = STRVEC_INIT;
 	struct object_array want_obj = OBJECT_ARRAY_INIT;
 	struct object_array have_obj = OBJECT_ARRAY_INIT;
 	struct oid_array haves = OID_ARRAY_INIT;
@@ -161,7 +161,7 @@ static void upload_pack_data_clear(struct upload_pack_data *data)
 {
 	string_list_clear(&data->symref, 1);
 	string_list_clear(&data->wanted_refs, 1);
-	string_list_clear(&data->hidden_refs, 0);
+	strvec_clear(&data->hidden_refs);
 	object_array_clear(&data->want_obj);
 	object_array_clear(&data->have_obj);
 	oid_array_clear(&data->haves);
@@ -1169,7 +1169,7 @@ static void receive_needs(struct upload_pack_data *data,
 
 /* return non-zero if the ref is hidden, otherwise 0 */
 static int mark_our_ref(const char *refname, const char *refname_full,
-			const struct object_id *oid, const struct string_list *hidden_refs)
+			const struct object_id *oid, const struct strvec *hidden_refs)
 {
 	struct object *o = lookup_unknown_object(the_repository, oid);
 
@@ -1453,7 +1453,7 @@ static int parse_want(struct packet_writer *writer, const char *line,
 
 static int parse_want_ref(struct packet_writer *writer, const char *line,
 			  struct string_list *wanted_refs,
-			  struct string_list *hidden_refs,
+			  struct strvec *hidden_refs,
 			  struct object_array *want_obj)
 {
 	const char *refname_nons;
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 12/16] refs/packed-backend.c: ignore complicated hidden refs rules
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (10 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In subsequent commits, we'll teach `receive-pack` and `upload-pack` to
use the new skip-list feature in the packed-refs iterator by ignoring
references which are mentioned via its respective hideRefs lists.

However, the packed-ref skip lists cannot handle un-hiding rules (that
begin with '!'), or namespace comparisons (that begin with '^'). Detect
and avoid these cases by falling back to the normal enumeration without
a skip list when such patterns exist.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c   | 19 +++++++++++++++++++
 t/t1419-exclude-refs.sh |  9 +++++++++
 2 files changed, 28 insertions(+)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 7ba9fa2bb8..9ea6c07866 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1015,6 +1015,25 @@ static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
 	if (!excluded_patterns)
 		return;
 
+	for (pattern = excluded_patterns; *pattern; pattern++) {
+		/*
+		 * We also can't feed any excludes from hidden refs
+		 * config sections, since later rules may override
+		 * previous ones. For example, with rules "refs/foo" and
+		 * "!refs/foo/bar", we should show "refs/foo/bar" (and
+		 * everything underneath it), but the earlier exclusion
+		 * would cause us to skip all of "refs/foo". We likewise
+		 * don't implement the namespace stripping required for
+		 * '^' rules.
+		 *
+		 * Both are possible to do, but complicated, so avoid
+		 * populating the jump list at all if we see either of
+		 * these patterns.
+		 */
+		if (**pattern == '!' || **pattern == '^')
+			return;
+	}
+
 	for (pattern = excluded_patterns; *pattern; pattern++) {
 		struct jump_list_entry *e;
 
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
index 350a7d2587..0e91e2f399 100755
--- a/t/t1419-exclude-refs.sh
+++ b/t/t1419-exclude-refs.sh
@@ -119,4 +119,13 @@ test_expect_success 'meta-characters are discarded' '
 	assert_no_jumps
 '
 
+test_expect_success 'complex hidden ref rules are discarded' '
+	for_each_ref__exclude refs/heads refs/heads/foo "!refs/heads/foo/1" \
+		>actual 2>perf &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual &&
+	assert_no_jumps
+'
+
 test_done
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (11 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-06-06  7:01     ` Patrick Steinhardt
  2023-05-15 19:23   ` [PATCH v2 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
                     ` (3 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The following commit will want to call `for_each_namespaced_ref()` with
a list of excluded patterns.

We could introduce a variant of that function, say,
`for_each_namespaced_ref_exclude()` which takes the extra parameter, and
reimplement the original function in terms of that. But all but one
caller (in `http-backend.c`) will supply the new parameter, so add the
new parameter to `for_each_namespaced_ref()` itself instead of
introducing a new function.

For now, supply NULL for the list of excluded patterns at all callers to
avoid changing behavior, which we will do in the subsequent commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 http-backend.c | 2 +-
 refs.c         | 5 +++--
 refs.h         | 3 ++-
 upload-pack.c  | 6 +++---
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index ac146d85c5..ad500683c8 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -559,7 +559,7 @@ static void get_info_refs(struct strbuf *hdr, char *arg UNUSED)
 
 	} else {
 		select_getanyfile(hdr);
-		for_each_namespaced_ref(show_text_ref, &buf);
+		for_each_namespaced_ref(NULL, show_text_ref, &buf);
 		send_strbuf(hdr, "text/plain", &buf);
 	}
 	strbuf_release(&buf);
diff --git a/refs.c b/refs.c
index ec4d5b9101..95a7db9563 100644
--- a/refs.c
+++ b/refs.c
@@ -1660,13 +1660,14 @@ int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_dat
 				    DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
 }
 
-int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
+int for_each_namespaced_ref(const char **exclude_patterns,
+			    each_ref_fn fn, void *cb_data)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int ret;
 	strbuf_addf(&buf, "%srefs/", get_git_namespace());
 	ret = do_for_each_ref(get_main_ref_store(the_repository),
-			      buf.buf, NULL, fn, 0, 0, cb_data);
+			      buf.buf, exclude_patterns, fn, 0, 0, cb_data);
 	strbuf_release(&buf);
 	return ret;
 }
diff --git a/refs.h b/refs.h
index a7751a1fc9..f23626beca 100644
--- a/refs.h
+++ b/refs.h
@@ -372,7 +372,8 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 			 const char *prefix, void *cb_data);
 
 int head_ref_namespaced(each_ref_fn fn, void *cb_data);
-int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
+int for_each_namespaced_ref(const char **exclude_patterns,
+			    each_ref_fn fn, void *cb_data);
 
 /* can be used to learn about broken ref and symref */
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data);
diff --git a/upload-pack.c b/upload-pack.c
index d77d58bdde..7c646ea5bd 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -854,7 +854,7 @@ static void deepen(struct upload_pack_data *data, int depth)
 		 * marked with OUR_REF.
 		 */
 		head_ref_namespaced(check_ref, data);
-		for_each_namespaced_ref(check_ref, data);
+		for_each_namespaced_ref(NULL, check_ref, data);
 
 		get_reachable_list(data, &reachable_shallows);
 		result = get_shallow_commits(&reachable_shallows,
@@ -1378,7 +1378,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		if (advertise_refs)
 			data.no_done = 1;
 		head_ref_namespaced(send_ref, &data);
-		for_each_namespaced_ref(send_ref, &data);
+		for_each_namespaced_ref(NULL, send_ref, &data);
 		/*
 		 * fflush stdout before calling advertise_shallow_grafts because send_ref
 		 * uses stdio.
@@ -1388,7 +1388,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		packet_flush(1);
 	} else {
 		head_ref_namespaced(check_ref, &data);
-		for_each_namespaced_ref(check_ref, &data);
+		for_each_namespaced_ref(NULL, check_ref, &data);
 	}
 
 	if (!advertise_refs) {
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 14/16] builtin/receive-pack.c: avoid enumerating hidden references
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (12 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

Now that `refs_for_each_fullref_in()` has the ability to avoid
enumerating references matching certain pattern(s), use that to avoid
visiting hidden refs when constructing the ref advertisement via
receive-pack.

Note that since this exclusion is best-effort, we still need
`show_ref_cb()` to check whether or not each reference is hidden or not
before including it in the advertisement.

As was the case when applying this same optimization to `upload-pack`,
`receive-pack`'s reference advertisement phase can proceed much quicker
by avoiding enumerating references that will not be part of the
advertisement.

(Below, we're still using linux.git with one hidden refs/pull/N ref per
commit):

    $ hyperfine -L v ,.compile 'git{v} -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'
    Benchmark 1: git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
      Time (mean ± σ):      89.1 ms ±   1.7 ms    [User: 82.0 ms, System: 7.0 ms]
      Range (min … max):    87.7 ms …  95.5 ms    31 runs

    Benchmark 2: git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
      Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 0.5 ms, System: 3.9 ms]
      Range (min … max):     4.1 ms …   5.6 ms    508 runs

    Summary
      'git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git' ran
       20.00 ± 1.05 times faster than 'git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/receive-pack.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 1a8472eddc..bd5bcc375f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -337,7 +337,8 @@ static void write_head_info(void)
 {
 	static struct oidset seen = OIDSET_INIT;
 
-	for_each_ref(show_ref_cb, &seen);
+	refs_for_each_fullref_in(get_main_ref_store(the_repository), "",
+				 hidden_refs.v, show_ref_cb, &seen);
 	for_each_alternate_ref(show_one_alternate_ref, &seen);
 	oidset_clear(&seen);
 	if (!sent_capabilities)
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 15/16] upload-pack.c: avoid enumerating hidden refs where possible
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (13 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-05-15 19:23   ` [PATCH v2 16/16] ls-refs.c: " Taylor Blau
  2023-06-06  7:01   ` [PATCH v2 00/16] refs: implement jump lists for packed backend Patrick Steinhardt
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.

Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:

  - `uploadpack.allowTipSHA1InWant`, or
  - `uploadpack.allowReachableSHA1InWant`

are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.

When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.

When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.

When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):

    $ printf 0000 >in
    $ hyperfine --warmup=1 \
      'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
      'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
      'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
    Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
      Time (mean ± σ):     406.9 ms ±   1.1 ms    [User: 357.3 ms, System: 49.5 ms]
      Range (min … max):   405.7 ms … 409.2 ms    10 runs

    Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
      Time (mean ± σ):     406.5 ms ±   1.3 ms    [User: 356.5 ms, System: 49.9 ms]
      Range (min … max):   404.6 ms … 408.8 ms    10 runs

    Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
      Time (mean ± σ):       4.7 ms ±   0.2 ms    [User: 0.7 ms, System: 3.9 ms]
      Range (min … max):     4.3 ms …   6.1 ms    472 runs

    Summary
      'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
       86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
       86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'

As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 upload-pack.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 7c646ea5bd..6fa667bf25 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -601,11 +601,32 @@ static int get_common_commits(struct upload_pack_data *data,
 	}
 }
 
+static int allow_hidden_refs(enum allow_uor allow_uor)
+{
+	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
+}
+
+static void for_each_namespaced_ref_1(each_ref_fn fn,
+				      struct upload_pack_data *data)
+{
+	/*
+	 * If `data->allow_uor` allows fetching hidden refs, we need to
+	 * mark all references (including hidden ones), to check in
+	 * `is_our_ref()` below.
+	 *
+	 * Otherwise, we only care about whether each reference's object
+	 * has the OUR_REF bit set or not, so do not need to visit
+	 * hidden references.
+	 */
+	if (allow_hidden_refs(data->allow_uor))
+		for_each_namespaced_ref(NULL, fn, data);
+	else
+		for_each_namespaced_ref(data->hidden_refs.v, fn, data);
+}
+
 static int is_our_ref(struct object *o, enum allow_uor allow_uor)
 {
-	int allow_hidden_ref = (allow_uor &
-				(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
-	return o->flags & ((allow_hidden_ref ? HIDDEN_REF : 0) | OUR_REF);
+	return o->flags & ((allow_hidden_refs(allow_uor) ? HIDDEN_REF : 0) | OUR_REF);
 }
 
 /*
@@ -854,7 +875,7 @@ static void deepen(struct upload_pack_data *data, int depth)
 		 * marked with OUR_REF.
 		 */
 		head_ref_namespaced(check_ref, data);
-		for_each_namespaced_ref(NULL, check_ref, data);
+		for_each_namespaced_ref_1(check_ref, data);
 
 		get_reachable_list(data, &reachable_shallows);
 		result = get_shallow_commits(&reachable_shallows,
@@ -1378,7 +1399,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		if (advertise_refs)
 			data.no_done = 1;
 		head_ref_namespaced(send_ref, &data);
-		for_each_namespaced_ref(NULL, send_ref, &data);
+		for_each_namespaced_ref_1(send_ref, &data);
 		/*
 		 * fflush stdout before calling advertise_shallow_grafts because send_ref
 		 * uses stdio.
@@ -1388,7 +1409,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		packet_flush(1);
 	} else {
 		head_ref_namespaced(check_ref, &data);
-		for_each_namespaced_ref(NULL, check_ref, &data);
+		for_each_namespaced_ref_1(check_ref, &data);
 	}
 
 	if (!advertise_refs) {
-- 
2.40.1.572.g5c4ab523ef


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 16/16] ls-refs.c: avoid enumerating hidden refs where possible
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (14 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
@ 2023-05-15 19:23   ` Taylor Blau
  2023-06-06  7:01   ` [PATCH v2 00/16] refs: implement jump lists for packed backend Patrick Steinhardt
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-05-15 19:23 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In a similar fashion as in previous commits, teach `ls-refs` to avoid
enumerating hidden references where possible.

As before, this is linux.git with one hidden reference per commit.

    $ hyperfine -L v ,.compile 'git{v} -c protocol.version=2 ls-remote .'
    Benchmark 1: git -c protocol.version=2 ls-remote .
      Time (mean ± σ):      89.8 ms ±   0.6 ms    [User: 84.3 ms, System: 5.7 ms]
      Range (min … max):    88.8 ms …  91.3 ms    32 runs

    Benchmark 2: git.compile -c protocol.version=2 ls-remote .
      Time (mean ± σ):       6.5 ms ±   0.1 ms    [User: 2.4 ms, System: 4.3 ms]
      Range (min … max):     6.2 ms …   8.3 ms    397 runs

    Summary
      'git.compile -c protocol.version=2 ls-remote .' ran
       13.85 ± 0.33 times faster than 'git -c protocol.version=2 ls-remote .'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ls-refs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ls-refs.c b/ls-refs.c
index 8c3181d051..c9a723ba89 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -193,7 +193,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 		strvec_push(&data.prefixes, "");
 	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
 					  get_git_namespace(), data.prefixes.v,
-					  NULL, send_ref, &data);
+					  data.hidden_refs.v, send_ref, &data);
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
-- 
2.40.1.572.g5c4ab523ef

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-05-15 19:23   ` [PATCH v2 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
@ 2023-06-06  7:00     ` Patrick Steinhardt
  2023-06-20 12:15       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  7:00 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 17954 bytes --]

On Mon, May 15, 2023 at 03:23:33PM -0400, Taylor Blau wrote:
> When iterating through the `packed-refs` file in order to answer a query
> like:
> 
>     $ git for-each-ref --exclude=refs/__hidden__
> 
> it would be useful to avoid walking over all of the entries in
> `refs/__hidden__/*` when possible, since we know that the ref-filter
> code is going to throw them away anyways.
> 
> In certain circumstances, doing so is possible. The algorithm for doing
> so is as follows:
> 
>   - For each excluded pattern, find the first record that matches it,
>     and the first record that *doesn't* match it (i.e. the location
>     you'd next want to consider when excluding that pattern).
> 
>   - Sort the set of excluded regions from the previous step in ascending
>     order of the first location within the `packed-refs` file that
>     matches.
> 
>   - Clean up the results from the previous step: discard empty regions,
>     and combine adjacent regions.
> 
> Then when iterating through the `packed-refs` file, if `iter->pos` is
> ever contained in one of the regions from the previous steps, advance
> `iter->pos` past the end of that region, and continue enumeration.
> 
> Note that we only perform this optimization when none of the excluded
> pattern(s) have special meta-characters in them. For a pattern like
> "refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and
> everything underneath them) are not connected. A future implementation
> that handles this case may split the character class (pretending as if
> two patterns were excluded: "refs/fooa", and "refs/fooc").
> 
> There are a few other gotchas worth considering. First, note that the
> jump list is sorted, so once we jump past a region, we can avoid
> considering it (or any regions preceding it) again. The member
> `jump_pos` is used to track the first next-possible region to jump
> through.
> 
> Second, note that the exclusion list is best-effort, since we do not
> handle loose references, and because of the meta-character issue above.
> 
> In repositories with a large number of hidden references, the speed-up
> can be significant. Tests here are done with a copy of linux.git with a
> reference "refs/pull/N" pointing at every commit, as in:
> 
>     $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' |
>         git update-ref --stdin
>     $ git pack-refs --all
> 
> , it is significantly faster to have `for-each-ref` jump over the
> excluded references, as opposed to filtering them out after the fact:
> 
>     $ hyperfine \
>       'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
>       'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
>     Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
>       Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
>       Range (min … max):   800.0 ms … 807.7 ms    10 runs
> 
>     Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
>       Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
>       Range (min … max):     4.3 ms …   6.7 ms    422 runs
> 
>     Summary
>       'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
>       172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
> 
> Using the jump list is fairly straightforward (see the changes to
> `refs/packed-backend.c::next_record()`), but constructing the list is
> not. To ensure that the construction is correct, add a new suite of
> tests in t1419 covering various corner cases (overlapping regions,
> partially overlapping regions, adjacent regions, etc.).
> 
> Co-authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  ref-filter.c              |   5 +-
>  refs/packed-backend.c     | 166 ++++++++++++++++++++++++++++++++++++--
>  t/helper/test-ref-store.c |  10 +++
>  t/t1419-exclude-refs.sh   | 101 +++++++++++++++++++++++
>  4 files changed, 274 insertions(+), 8 deletions(-)
>  create mode 100755 t/t1419-exclude-refs.sh
> 
> diff --git a/ref-filter.c b/ref-filter.c
> index 717c3c4bcf..ddc7f5204f 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2210,12 +2210,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>  	if (!filter->name_patterns[0]) {
>  		/* no patterns; we have to look at everything */
>  		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> -						 "", NULL, cb, cb_data);
> +						 "", filter->exclude.v, cb, cb_data);
>  	}
>  
>  	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
>  						 NULL, filter->name_patterns,
> -						 NULL, cb, cb_data);
> +						 filter->exclude.v,
> +						 cb, cb_data);
>  }
>  
>  /*
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 33639f73e1..67327e579c 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -303,7 +303,8 @@ static int cmp_packed_ref_records(const void *v1, const void *v2)
>   * Compare a snapshot record at `rec` to the specified NUL-terminated
>   * refname.
>   */
> -static int cmp_record_to_refname(const char *rec, const char *refname)
> +static int cmp_record_to_refname(const char *rec, const char *refname,
> +				 int start)
>  {
>  	const char *r1 = rec + the_hash_algo->hexsz + 1;
>  	const char *r2 = refname;
> @@ -312,7 +313,7 @@ static int cmp_record_to_refname(const char *rec, const char *refname)
>  		if (*r1 == '\n')
>  			return *r2 ? -1 : 0;
>  		if (!*r2)
> -			return 1;
> +			return start ? 1 : -1;
>  		if (*r1 != *r2)
>  			return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
>  		r1++;
> @@ -528,7 +529,8 @@ static int load_contents(struct snapshot *snapshot)
>  }
>  
>  static const char *find_reference_location_1(struct snapshot *snapshot,
> -					     const char *refname, int mustexist)
> +					     const char *refname, int mustexist,
> +					     int start)
>  {
>  	/*
>  	 * This is not *quite* a garden-variety binary search, because
> @@ -558,7 +560,7 @@ static const char *find_reference_location_1(struct snapshot *snapshot,
>  
>  		mid = lo + (hi - lo) / 2;
>  		rec = find_start_of_record(lo, mid);
> -		cmp = cmp_record_to_refname(rec, refname);
> +		cmp = cmp_record_to_refname(rec, refname, start);
>  		if (cmp < 0) {
>  			lo = find_end_of_record(mid, hi);
>  		} else if (cmp > 0) {
> @@ -591,7 +593,22 @@ static const char *find_reference_location_1(struct snapshot *snapshot,
>  static const char *find_reference_location(struct snapshot *snapshot,
>  					   const char *refname, int mustexist)
>  {
> -	return find_reference_location_1(snapshot, refname, mustexist);
> +	return find_reference_location_1(snapshot, refname, mustexist, 1);
> +}
> +
> +/*
> + * Find the place in `snapshot->buf` after the end of the record for
> + * `refname`. In other words, find the location of first thing *after*
> + * `refname`.
> + *
> + * Other semantics are identical to the ones in
> + * `find_reference_location()`.
> + */
> +static const char *find_reference_location_end(struct snapshot *snapshot,
> +					       const char *refname,
> +					       int mustexist)
> +{
> +	return find_reference_location_1(snapshot, refname, mustexist, 0);
>  }
>  
>  /*
> @@ -785,6 +802,13 @@ struct packed_ref_iterator {
>  	/* The end of the part of the buffer that will be iterated over: */
>  	const char *eof;
>  
> +	struct jump_list_entry {
> +		const char *start;
> +		const char *end;
> +	} *jump;
> +	size_t jump_nr, jump_alloc;
> +	size_t jump_pos;
> 
Nit: I had some trouble with `jump_pos` given that it sounds so similar
to `iter->pos`, and thus you tend to think that they both apply to the
position in the packed-refs file. `jump_curr` or `jump_idx` might help
to avoid this confusion.

>  	/* Scratch space for current values: */
>  	struct object_id oid, peeled;
>  	struct strbuf refname_buf;
> @@ -802,14 +826,34 @@ struct packed_ref_iterator {
>   */
>  static int next_record(struct packed_ref_iterator *iter)
>  {
> -	const char *p = iter->pos, *eol;
> +	const char *p, *eol;
>  
>  	strbuf_reset(&iter->refname_buf);
>  
> +	/*
> +	 * If iter->pos is contained within a skipped region, jump past
> +	 * it.
> +	 *
> +	 * Note that each skipped region is considered at most once,
> +	 * since they are ordered based on their starting position.
> +	 */
> +	while (iter->jump_pos < iter->jump_nr) {
> +		struct jump_list_entry *curr = &iter->jump[iter->jump_pos];
> +		if (iter->pos < curr->start)
> +			break; /* not to the next jump yet */
> +
> +		iter->jump_pos++;
> +		if (iter->pos < curr->end) {
> +			iter->pos = curr->end;
> +			break;
> +		}
> +	}
> +
>  	if (iter->pos == iter->eof)
>  		return ITER_DONE;
>  
>  	iter->base.flags = REF_ISPACKED;
> +	p = iter->pos;
>  
>  	if (iter->eof - p < the_hash_algo->hexsz + 2 ||
>  	    parse_oid_hex(p, &iter->oid, &p) ||
> @@ -917,6 +961,7 @@ static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
>  	int ok = ITER_DONE;
>  
>  	strbuf_release(&iter->refname_buf);
> +	free(iter->jump);
>  	release_snapshot(iter->snapshot);
>  	base_ref_iterator_free(ref_iterator);
>  	return ok;
> @@ -928,6 +973,112 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
>  	.abort = packed_ref_iterator_abort
>  };
>  
> +static int jump_list_entry_cmp(const void *va, const void *vb)
> +{
> +	const struct jump_list_entry *a = va;
> +	const struct jump_list_entry *b = vb;
> +
> +	if (a->start < b->start)
> +		return -1;
> +	if (a->start > b->start)
> +		return 1;
> +	return 0;
> +}
> +
> +static int has_glob_special(const char *str)
> +{
> +	const char *p;
> +	for (p = str; *p; p++) {
> +		if (is_glob_special(*p))
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static const char *ptr_max(const char *x, const char *y)
> +{
> +	if (x > y)
> +		return x;
> +	return y;
> +}
> +
> +static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
> +					struct snapshot *snapshot,
> +					const char **excluded_patterns)
> +{
> +	size_t i, j;
> +	const char **pattern;
> +	struct jump_list_entry *last_disjoint;
> +
> +	if (!excluded_patterns)
> +		return;
> +
> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		struct jump_list_entry *e;
> +
> +		/*
> +		 * We can't feed any excludes with globs in them to the
> +		 * refs machinery.  It only understands prefix matching.
> +		 * We likewise can't even feed the string leading up to
> +		 * the first meta-character, as something like "foo[a]"
> +		 * should not exclude "foobar" (but the prefix "foo"
> +		 * would match that and mark it for exclusion).
> +		 */
> +		if (has_glob_special(*pattern))
> +			continue;
> +
> +		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
> +
> +		e = &iter->jump[iter->jump_nr++];
> +		e->start = find_reference_location(snapshot, *pattern, 0);
> +		e->end = find_reference_location_end(snapshot, *pattern, 0);

Nit: we could detect the non-matching case here already, which would
allow us to skip an allocation. It's probably pre-mature optimization
though, so please feel free to ignore.

> +	}
> +
> +	if (!iter->jump_nr) {
> +		/*
> +		 * Every entry in exclude_patterns has a meta-character,
> +		 * nothing to do here.
> +		 */
> +		return;
> +	}
> +
> +	QSORT(iter->jump, iter->jump_nr, jump_list_entry_cmp);
> +
> +	/*
> +	 * As an optimization, merge adjacent entries in the jump list
> +	 * to jump forwards as far as possible when entering a skipped
> +	 * region.
> +	 *
> +	 * For example, if we have two skipped regions:
> +	 *
> +	 *	[[A, B], [B, C]]
> +	 *
> +	 * we want to combine that into a single entry jumping from A to
> +	 * C.
> +	 */
> +	last_disjoint = iter->jump;

Nit: if we initialized `j = 0`, then `last_disjoint` would always be
equal to `iter->jump[j]`. We could then declare the variable inside of
the loop to make it a bit easier to understand.

Patrick

> +	for (i = 1, j = 1; i < iter->jump_nr; i++) {
> +		struct jump_list_entry *ours = &iter->jump[i];
> +
> +		if (ours->start == ours->end) {
> +			/* ignore empty regions (no matching entries) */
> +			continue;
> +		} else if (ours->start <= last_disjoint->end) {
> +			/* overlapping regions extend the previous one */
> +			last_disjoint->end = ptr_max(last_disjoint->end, ours->end);
> +		} else {
> +			/* otherwise, insert a new region */
> +			iter->jump[j++] = *ours;
> +			last_disjoint = ours;
> +
> +		}
> +	}
> +
> +	iter->jump_nr = j;
> +	iter->jump_pos = 0;
> +}
> +
>  static struct ref_iterator *packed_ref_iterator_begin(
>  		struct ref_store *ref_store,
>  		const char *prefix, const char **exclude_patterns,
> @@ -963,6 +1114,9 @@ static struct ref_iterator *packed_ref_iterator_begin(
>  	ref_iterator = &iter->base;
>  	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
>  
> +	if (exclude_patterns)
> +		populate_excluded_jump_list(iter, snapshot, exclude_patterns);
> +
>  	iter->snapshot = snapshot;
>  	acquire_snapshot(snapshot);
>  
> diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
> index 6d8f844e9c..2bff003f7c 100644
> --- a/t/helper/test-ref-store.c
> +++ b/t/helper/test-ref-store.c
> @@ -175,6 +175,15 @@ static int cmd_for_each_ref(struct ref_store *refs, const char **argv)
>  	return refs_for_each_ref_in(refs, prefix, each_ref, NULL);
>  }
>  
> +static int cmd_for_each_ref__exclude(struct ref_store *refs, const char **argv)
> +{
> +	const char *prefix = notnull(*argv++, "prefix");
> +	const char **exclude_patterns = argv;
> +
> +	return refs_for_each_fullref_in(refs, prefix, exclude_patterns, each_ref,
> +					NULL);
> +}
> +
>  static int cmd_resolve_ref(struct ref_store *refs, const char **argv)
>  {
>  	struct object_id oid = *null_oid();
> @@ -307,6 +316,7 @@ static struct command commands[] = {
>  	{ "delete-refs", cmd_delete_refs },
>  	{ "rename-ref", cmd_rename_ref },
>  	{ "for-each-ref", cmd_for_each_ref },
> +	{ "for-each-ref--exclude", cmd_for_each_ref__exclude },
>  	{ "resolve-ref", cmd_resolve_ref },
>  	{ "verify-ref", cmd_verify_ref },
>  	{ "for-each-reflog", cmd_for_each_reflog },
> diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
> new file mode 100755
> index 0000000000..bc534c8ea1
> --- /dev/null
> +++ b/t/t1419-exclude-refs.sh
> @@ -0,0 +1,101 @@
> +#!/bin/sh
> +
> +test_description='test exclude_patterns functionality in main ref store'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +for_each_ref__exclude () {
> +	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
> +	cut -d ' ' -f 2 actual.raw
> +}
> +
> +for_each_ref () {
> +	git for-each-ref --format='%(refname)' "$@"
> +}
> +
> +test_expect_success 'setup' '
> +	test_commit --no-tag base &&
> +	base="$(git rev-parse HEAD)" &&
> +
> +	for name in foo bar baz quux
> +	do
> +		for i in 1 2 3
> +		do
> +			echo "create refs/heads/$name/$i $base" || return 1
> +		done || return 1
> +	done >in &&
> +	echo "delete refs/heads/main" >>in &&
> +
> +	git update-ref --stdin <in &&
> +	git pack-refs --all
> +'
> +
> +test_expect_success 'excluded region in middle' '
> +	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
> +	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
> +
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'excluded region at beginning' '
> +	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
> +	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
> +
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'excluded region at end' '
> +	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
> +	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
> +
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'disjoint excluded regions' '
> +	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
> +	for_each_ref refs/heads/baz refs/heads/foo >expect &&
> +
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'adjacent, non-overlapping excluded regions' '
> +	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
> +	for_each_ref refs/heads/foo refs/heads/quux >expect &&
> +
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'overlapping excluded regions' '
> +	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
> +	for_each_ref refs/heads/foo refs/heads/quux >expect &&
> +
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'several overlapping excluded regions' '
> +	for_each_ref__exclude refs/heads \
> +		refs/heads/bar refs/heads/baz refs/heads/foo >actual &&
> +	for_each_ref refs/heads/quux >expect &&
> +
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'non-matching excluded section' '
> +	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
> +	for_each_ref >expect &&
> +
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'meta-characters are discarded' '
> +	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
> +	for_each_ref >expect &&
> +
> +	test_cmp expect actual
> +'
> +
> +test_done
> -- 
> 2.40.1.572.g5c4ab523ef
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 11/16] revision.h: store hidden refs in a `strvec`
  2023-05-15 19:23   ` [PATCH v2 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
@ 2023-06-06  7:00     ` Patrick Steinhardt
  2023-06-20 12:16       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  7:00 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]

On Mon, May 15, 2023 at 03:23:39PM -0400, Taylor Blau wrote:
> In subsequent commits, it will be convenient to have a 'const char **'
> of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`,
> etc.), instead of a `string_list`.
> 
> Convert spots throughout the tree that store the list of hidden refs
> from a `string_list` to a `strvec`.
> 
> Note that in `parse_hide_refs_config()` there is an ugly const-cast used
> to avoid an extra copy of each value before trimming any trailing slash
> characters. This could instead be written as:
> 
>     ref = xstrdup(value);
>     len = strlen(ref);
>     while (len && ref[len - 1] == '/')
>             ref[--len] = '\0';
>     strvec_push(hide_refs, ref);
>     free(ref);
> 
> but the double-copy (once when calling `xstrdup()`, and another via
> `strvec_push()`) is wasteful.

I guess the proper way to fix this would be to introduce something like
a `strvec_push_nodup()` function that takes ownership. And in fact this
helper exists already, but it's declared as static. So we could get
around the ugly cast with a simple change to expose the helper function.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns
  2023-05-15 19:23   ` [PATCH v2 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
@ 2023-06-06  7:01     ` Patrick Steinhardt
  2023-06-20 12:18       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  7:01 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 309 bytes --]

On Mon, May 15, 2023 at 03:23:45PM -0400, Taylor Blau wrote:
> The following commit will want to call `for_each_namespaced_ref()` with
> a list of excluded patterns.

This statement isn't quite true as the following commit touches
git-receive-pack(1), which doesn't use `for_each_namespaced_ref_()`.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 00/16] refs: implement jump lists for packed backend
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
                     ` (15 preceding siblings ...)
  2023-05-15 19:23   ` [PATCH v2 16/16] ls-refs.c: " Taylor Blau
@ 2023-06-06  7:01   ` Patrick Steinhardt
  2023-06-20 12:22     ` Taylor Blau
  16 siblings, 1 reply; 149+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  7:01 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]

On Mon, May 15, 2023 at 03:23:07PM -0400, Taylor Blau wrote:
> Here is a reroll of my series to implement jump (née skip) lists for the
> packed refs backend.
> 
> Not a ton has changed since last time, but some notable things that have
> changed include:
> 
>   - Renaming "skip lists" to "jump lists" to clarify that this
>     implementation does not use the skip list data structure.
>   - Patch reorganization, splitting out `find_reference_location_end()`
>     more sensibly, rewording patch messages, etc.
>   - Addresses feedback from Junio and Patrick Steinhardt's helpful
>     reviews.
> 
> As usual, a range-diff is included below for convenience.
> 
> Given that we are expecting -rc0 today, we should aim to not let review
> of this topic direct our attention away from testing the release
> candidates. We can get more serious about it on the other side of 2.41.
> 
> Thanks in advance for another look.

I didn't have many comments in this round. Personally though I'd split
up this patch series into two in order to land the individual parts
faster, where the first part introduces `git for-each-ref --exclude` and
the second part introduces the jump list for the packed-refs backend.

Each of these have merit on their own, and especially the first part
should require less discussion. Furthermore, by splitting it up the
review becomes easier to manage as 16 patches does require quite a long
attention span to handle.

Anyway, this is just a suggestion from my sided, please feel free to
ignore.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v3 00/16] refs: implement jump lists for packed backend
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (15 preceding siblings ...)
  2023-05-15 19:23 ` [PATCH v2 00/16] refs: implement jump lists for packed backend Taylor Blau
@ 2023-06-07 10:40 ` Taylor Blau
  2023-06-07 10:40   ` [PATCH v3 01/16] refs.c: rename `ref_filter` Taylor Blau
                     ` (16 more replies)
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
  2023-07-10 21:12 ` [PATCH v5 " Taylor Blau
  18 siblings, 17 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:40 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

Here is a reroll of my series to implement jump (née skip) lists for the
packed refs backend, based on top of the current 'master'.

Nothing substantive has changed since the last version, where review had
stabilized. This version just resolves a couple of merge conflicts with
633390bd08 (Merge branch 'bc/clone-empty-repo-via-protocol-v0',
2023-05-19).

As usual, a range-diff is included below for convenience.

Thanks in advance for a hopefully final look ;-).

Jeff King (5):
  refs.c: rename `ref_filter`
  ref-filter.h: provide `REF_FILTER_INIT`
  ref-filter: clear reachable list pointers after freeing
  ref-filter: add `ref_filter_clear()`
  ref-filter.c: parameterize match functions over patterns

Taylor Blau (11):
  builtin/for-each-ref.c: add `--exclude` option
  refs: plumb `exclude_patterns` argument throughout
  refs/packed-backend.c: refactor `find_reference_location()`
  refs/packed-backend.c: implement jump lists to avoid excluded
    pattern(s)
  refs/packed-backend.c: add trace2 counters for jump list
  revision.h: store hidden refs in a `strvec`
  refs/packed-backend.c: ignore complicated hidden refs rules
  refs.h: let `for_each_namespaced_ref()` take excluded patterns
  builtin/receive-pack.c: avoid enumerating hidden references
  upload-pack.c: avoid enumerating hidden refs where possible
  ls-refs.c: avoid enumerating hidden refs where possible

 Documentation/git-for-each-ref.txt |   6 +
 builtin/branch.c                   |   4 +-
 builtin/for-each-ref.c             |   7 +-
 builtin/receive-pack.c             |   7 +-
 builtin/tag.c                      |   4 +-
 http-backend.c                     |   2 +-
 ls-refs.c                          |   8 +-
 ref-filter.c                       |  63 ++++++--
 ref-filter.h                       |  12 ++
 refs.c                             |  61 ++++----
 refs.h                             |  15 +-
 refs/debug.c                       |   5 +-
 refs/files-backend.c               |   5 +-
 refs/packed-backend.c              | 226 ++++++++++++++++++++++++++---
 refs/refs-internal.h               |   7 +-
 revision.c                         |   4 +-
 revision.h                         |   5 +-
 t/helper/test-reach.c              |   2 +-
 t/helper/test-ref-store.c          |  10 ++
 t/t0041-usage.sh                   |   1 +
 t/t1419-exclude-refs.sh            | 131 +++++++++++++++++
 t/t3402-rebase-merge.sh            |   1 +
 t/t6300-for-each-ref.sh            |  35 +++++
 trace2.h                           |   2 +
 trace2/tr2_ctr.c                   |   5 +
 upload-pack.c                      |  43 ++++--
 26 files changed, 565 insertions(+), 106 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

Range-diff against v2:
 1:  6cac42e70e =  1:  afac948f04 refs.c: rename `ref_filter`
 2:  8dda7db738 =  2:  b9336e3b77 ref-filter.h: provide `REF_FILTER_INIT`
 3:  bf21df783d =  3:  fc28b5caaa ref-filter: clear reachable list pointers after freeing
 4:  85ecb70957 =  4:  bc5356fe4b ref-filter: add `ref_filter_clear()`
 5:  385890b459 =  5:  1988ca4c0a ref-filter.c: parameterize match functions over patterns
 6:  1a3371a0a7 =  6:  60d85aa4ad builtin/for-each-ref.c: add `--exclude` option
 7:  aa05549b6e =  7:  c4fe9a1893 refs: plumb `exclude_patterns` argument throughout
 8:  6002c568b5 =  8:  9cab5e0699 refs/packed-backend.c: refactor `find_reference_location()`
 9:  8c78f49a8d =  9:  8066858bf5 refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
10:  5059f5dd42 = 10:  3c045076a9 refs/packed-backend.c: add trace2 counters for jump list
11:  f765b50a84 = 11:  0ff542eaef revision.h: store hidden refs in a `strvec`
12:  254bcc4361 = 12:  ca006b2c3f refs/packed-backend.c: ignore complicated hidden refs rules
13:  50e7df7dc0 ! 13:  cae703a425 refs.h: let `for_each_namespaced_ref()` take excluded patterns
    @@ upload-pack.c: void upload_pack(const int advertise_refs, const int stateless_rp
      		head_ref_namespaced(send_ref, &data);
     -		for_each_namespaced_ref(send_ref, &data);
     +		for_each_namespaced_ref(NULL, send_ref, &data);
    - 		/*
    - 		 * fflush stdout before calling advertise_shallow_grafts because send_ref
    - 		 * uses stdio.
    + 		if (!data.sent_capabilities) {
    + 			const char *refname = "capabilities^{}";
    + 			write_v0_ref(&data, refname, refname, null_oid());
     @@ upload-pack.c: void upload_pack(const int advertise_refs, const int stateless_rpc,
      		packet_flush(1);
      	} else {
14:  f6a3a5a6ba = 14:  1db10b76ea builtin/receive-pack.c: avoid enumerating hidden references
15:  2331fa7a4d ! 15:  014243ebe6 upload-pack.c: avoid enumerating hidden refs where possible
    @@ upload-pack.c: void upload_pack(const int advertise_refs, const int stateless_rp
      		head_ref_namespaced(send_ref, &data);
     -		for_each_namespaced_ref(NULL, send_ref, &data);
     +		for_each_namespaced_ref_1(send_ref, &data);
    - 		/*
    - 		 * fflush stdout before calling advertise_shallow_grafts because send_ref
    - 		 * uses stdio.
    + 		if (!data.sent_capabilities) {
    + 			const char *refname = "capabilities^{}";
    + 			write_v0_ref(&data, refname, refname, null_oid());
     @@ upload-pack.c: void upload_pack(const int advertise_refs, const int stateless_rpc,
      		packet_flush(1);
      	} else {
16:  2c6b89d64a = 16:  e02fe93379 ls-refs.c: avoid enumerating hidden refs where possible
-- 
2.41.0.16.g26cd413590

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v3 01/16] refs.c: rename `ref_filter`
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
@ 2023-06-07 10:40   ` Taylor Blau
  2023-06-13 22:19     ` Junio C Hamano
  2023-06-07 10:40   ` [PATCH v3 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
                     ` (15 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:40 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

The refs machinery has its own implementation of a `ref_filter` (used by
`for-each-ref`), which is distinct from the `ref-filler.h` API (also
used by `for-each-ref`, among other things).

Rename the one within refs.c to more clearly indicate its purpose.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/refs.c b/refs.c
index d2a98e1c21..b9b77d2eff 100644
--- a/refs.c
+++ b/refs.c
@@ -375,8 +375,8 @@ char *resolve_refdup(const char *refname, int resolve_flags,
 				   oid, flags);
 }
 
-/* The argument to filter_refs */
-struct ref_filter {
+/* The argument to for_each_filter_refs */
+struct for_each_ref_filter {
 	const char *pattern;
 	const char *prefix;
 	each_ref_fn *fn;
@@ -409,10 +409,11 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
-static int filter_refs(const char *refname, const struct object_id *oid,
-			   int flags, void *data)
+static int for_each_filter_refs(const char *refname,
+				const struct object_id *oid,
+				int flags, void *data)
 {
-	struct ref_filter *filter = (struct ref_filter *)data;
+	struct for_each_ref_filter *filter = data;
 
 	if (wildmatch(filter->pattern, refname, 0))
 		return 0;
@@ -569,7 +570,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	const char *prefix, void *cb_data)
 {
 	struct strbuf real_pattern = STRBUF_INIT;
-	struct ref_filter filter;
+	struct for_each_ref_filter filter;
 	int ret;
 
 	if (!prefix && !starts_with(pattern, "refs/"))
@@ -589,7 +590,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	filter.prefix = prefix;
 	filter.fn = fn;
 	filter.cb_data = cb_data;
-	ret = for_each_ref(filter_refs, &filter);
+	ret = for_each_ref(for_each_filter_refs, &filter);
 
 	strbuf_release(&real_pattern);
 	return ret;
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 02/16] ref-filter.h: provide `REF_FILTER_INIT`
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
  2023-06-07 10:40   ` [PATCH v3 01/16] refs.c: rename `ref_filter` Taylor Blau
@ 2023-06-07 10:40   ` Taylor Blau
  2023-06-07 10:41   ` [PATCH v3 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:40 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

Provide a sane initialization value for `struct ref_filter`, which in a
subsequent patch will be used to initialize a new field.

In the meantime, fix a case in test-reach.c where its `ref_filter` is
not even zero-initialized.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/branch.c       | 3 +--
 builtin/for-each-ref.c | 3 +--
 builtin/tag.c          | 3 +--
 ref-filter.h           | 3 +++
 t/helper/test-reach.c  | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index e6c2655af6..7891dec361 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -707,7 +707,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	int reflog = 0, quiet = 0, icase = 0, force = 0,
 	    recurse_submodules_explicit = 0;
 	enum branch_track track;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	static struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -765,7 +765,6 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 
 	setup_ref_filter_porcelain_msg();
 
-	memset(&filter, 0, sizeof(filter));
 	filter.kind = FILTER_REFS_BRANCHES;
 	filter.abbrev = -1;
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 695fc8f4a5..99ccb73518 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -24,7 +24,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	int maxcount = 0, icase = 0, omit_empty = 0;
 	struct ref_array array;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_format format = REF_FORMAT_INIT;
 	struct strbuf output = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
@@ -61,7 +61,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	};
 
 	memset(&array, 0, sizeof(array));
-	memset(&filter, 0, sizeof(filter));
 
 	format.format = "%(objectname) %(objecttype)\t%(refname)";
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 1850a6a6fd..6b41bb7374 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -443,7 +443,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	struct msg_arg msg = { .buf = STRBUF_INIT };
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -501,7 +501,6 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	git_config(git_tag_config, &sorting_options);
 
 	memset(&opt, 0, sizeof(opt));
-	memset(&filter, 0, sizeof(filter));
 	filter.lines = -1;
 	opt.sign = -1;
 
diff --git a/ref-filter.h b/ref-filter.h
index 430701cfb7..a920f73b29 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -92,6 +92,9 @@ struct ref_format {
 	struct string_list bases;
 };
 
+#define REF_FILTER_INIT { \
+	.points_at = OID_ARRAY_INIT, \
+}
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
 	.bases = STRING_LIST_INIT_DUP, \
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5b6f217441..ef58f10c2d 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -139,7 +139,7 @@ int cmd__reach(int ac, const char **av)
 
 		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
 	} else if (!strcmp(av[1], "commit_contains")) {
-		struct ref_filter filter;
+		struct ref_filter filter = REF_FILTER_INIT;
 		struct contains_cache cache;
 		init_contains_cache(&cache);
 
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 03/16] ref-filter: clear reachable list pointers after freeing
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
  2023-06-07 10:40   ` [PATCH v3 01/16] refs.c: rename `ref_filter` Taylor Blau
  2023-06-07 10:40   ` [PATCH v3 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-07 10:41   ` [PATCH v3 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

In reach_filter(), we pop all commits from the reachable lists, leaving
them empty. But because we're operating on a list pointer that was
passed by value, the original filter.reachable_from pointer is left
dangling.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 4991cd4f7a..048d277cbf 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2418,13 +2418,13 @@ void ref_array_clear(struct ref_array *array)
 #define EXCLUDE_REACHED 0
 #define INCLUDE_REACHED 1
 static void reach_filter(struct ref_array *array,
-			 struct commit_list *check_reachable,
+			 struct commit_list **check_reachable,
 			 int include_reached)
 {
 	int i, old_nr;
 	struct commit **to_clear;
 
-	if (!check_reachable)
+	if (!*check_reachable)
 		return;
 
 	CALLOC_ARRAY(to_clear, array->nr);
@@ -2434,7 +2434,7 @@ static void reach_filter(struct ref_array *array,
 	}
 
 	tips_reachable_from_bases(the_repository,
-				  check_reachable,
+				  *check_reachable,
 				  to_clear, array->nr,
 				  UNINTERESTING);
 
@@ -2455,8 +2455,8 @@ static void reach_filter(struct ref_array *array,
 
 	clear_commit_marks_many(old_nr, to_clear, ALL_REV_FLAGS);
 
-	while (check_reachable) {
-		struct commit *merge_commit = pop_commit(&check_reachable);
+	while (*check_reachable) {
+		struct commit *merge_commit = pop_commit(check_reachable);
 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
 	}
 
@@ -2553,8 +2553,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	clear_contains_cache(&ref_cbdata.no_contains_cache);
 
 	/*  Filters that need revision walking */
-	reach_filter(array, filter->reachable_from, INCLUDE_REACHED);
-	reach_filter(array, filter->unreachable_from, EXCLUDE_REACHED);
+	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
+	reach_filter(array, &filter->unreachable_from, EXCLUDE_REACHED);
 
 	save_commit_buffer = save_commit_buffer_orig;
 	return ret;
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 04/16] ref-filter: add `ref_filter_clear()`
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (2 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-07 10:41   ` [PATCH v3 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

We did not bother to clean up at all in `git branch` or `git tag`, and
`git for-each-ref` only cleans up a couple of members.

Add and call `ref_filter_clear()` when cleaning up a `struct
ref_filter`. Running this patch (without any test changes) indicates a
couple of now leak-free tests. This was found by running:

    $ make SANITIZE=leak
    $ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate

(Note that the `reachable_from` and `unreachable_from` lists should be
cleaned as they are used. So this is just covering any case where we
might bail before running the reachability check.)

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/branch.c        |  1 +
 builtin/for-each-ref.c  |  3 +--
 builtin/tag.c           |  1 +
 ref-filter.c            | 16 ++++++++++++++++
 ref-filter.h            |  3 +++
 t/t0041-usage.sh        |  1 +
 t/t3402-rebase-merge.sh |  1 +
 7 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 7891dec361..07ee874617 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -858,6 +858,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		print_columns(&output, colopts, NULL);
 		string_list_clear(&output, 0);
 		ref_sorting_release(sorting);
+		ref_filter_clear(&filter);
 		return 0;
 	} else if (edit_description) {
 		const char *branch_name;
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 99ccb73518..c01fa6fefe 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -120,8 +120,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	strbuf_release(&err);
 	strbuf_release(&output);
 	ref_array_clear(&array);
-	free_commit_list(filter.with_commit);
-	free_commit_list(filter.no_commit);
+	ref_filter_clear(&filter);
 	ref_sorting_release(sorting);
 	strvec_clear(&vec);
 	return 0;
diff --git a/builtin/tag.c b/builtin/tag.c
index 6b41bb7374..aab5e693fe 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -645,6 +645,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 
 cleanup:
 	ref_sorting_release(sorting);
+	ref_filter_clear(&filter);
 	strbuf_release(&buf);
 	strbuf_release(&ref);
 	strbuf_release(&reflog_msg);
diff --git a/ref-filter.c b/ref-filter.c
index 048d277cbf..d32f426898 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2866,3 +2866,19 @@ int parse_opt_merge_filter(const struct option *opt, const char *arg, int unset)
 
 	return 0;
 }
+
+void ref_filter_init(struct ref_filter *filter)
+{
+	struct ref_filter blank = REF_FILTER_INIT;
+	memcpy(filter, &blank, sizeof(blank));
+}
+
+void ref_filter_clear(struct ref_filter *filter)
+{
+	oid_array_clear(&filter->points_at);
+	free_commit_list(filter->with_commit);
+	free_commit_list(filter->no_commit);
+	free_commit_list(filter->reachable_from);
+	free_commit_list(filter->unreachable_from);
+	ref_filter_init(filter);
+}
diff --git a/ref-filter.h b/ref-filter.h
index a920f73b29..160b807224 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -170,4 +170,7 @@ void filter_ahead_behind(struct repository *r,
 			 struct ref_format *format,
 			 struct ref_array *array);
 
+void ref_filter_init(struct ref_filter *filter);
+void ref_filter_clear(struct ref_filter *filter);
+
 #endif /*  REF_FILTER_H  */
diff --git a/t/t0041-usage.sh b/t/t0041-usage.sh
index c4fc34eb18..9ea974b0c6 100755
--- a/t/t0041-usage.sh
+++ b/t/t0041-usage.sh
@@ -5,6 +5,7 @@ test_description='Test commands behavior when given invalid argument value'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup ' '
diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
index 79b0640c00..e9e03ca4b5 100755
--- a/t/t3402-rebase-merge.sh
+++ b/t/t3402-rebase-merge.sh
@@ -8,6 +8,7 @@ test_description='git rebase --merge test'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 T="A quick brown fox
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 05/16] ref-filter.c: parameterize match functions over patterns
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (3 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-13 22:37     ` Junio C Hamano
  2023-06-07 10:41   ` [PATCH v3 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
                     ` (11 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

`match_pattern()` and `match_name_as_path()` both take a `struct
ref_filter *`, and then store a stack variable `patterns` pointing at
`filter->patterns`.

The subsequent patch will add a new array of patterns to match over (the
excluded patterns, via a new `git for-each-ref --exclude` option),
treating the return value of these functions differently depending on
which patterns are being used to match.

Tweak `match_pattern()` and `match_name_as_path()` to take an array of
patterns to prepare for passing either in.

Once we start passing either in, `match_pattern()` will have little to
do with a particular `struct ref_filter *` instance. To clarify this,
drop it from the argument list, and replace it with the only bit of the
`ref_filter` that we care about (`filter->ignore_case`).

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index d32f426898..6d91c7cb0d 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2104,12 +2104,12 @@ static int get_ref_atom_value(struct ref_array_item *ref, int atom,
  * matches a pattern "refs/heads/mas") or a wildcard (e.g. the same ref
  * matches "refs/heads/mas*", too).
  */
-static int match_pattern(const struct ref_filter *filter, const char *refname)
+static int match_pattern(const char **patterns, const char *refname,
+			 const int ignore_case)
 {
-	const char **patterns = filter->name_patterns;
 	unsigned flags = 0;
 
-	if (filter->ignore_case)
+	if (ignore_case)
 		flags |= WM_CASEFOLD;
 
 	/*
@@ -2134,9 +2134,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
  * matches a pattern "refs/heads/" but not "refs/heads/m") or a
  * wildcard (e.g. the same ref matches "refs/heads/m*", too).
  */
-static int match_name_as_path(const struct ref_filter *filter, const char *refname)
+static int match_name_as_path(const struct ref_filter *filter,
+			      const char **pattern,
+			      const char *refname)
 {
-	const char **pattern = filter->name_patterns;
 	int namelen = strlen(refname);
 	unsigned flags = WM_PATHNAME;
 
@@ -2165,8 +2166,9 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 	if (!*filter->name_patterns)
 		return 1; /* No pattern always matches */
 	if (filter->match_as_path)
-		return match_name_as_path(filter, refname);
-	return match_pattern(filter, refname);
+		return match_name_as_path(filter, filter->name_patterns, refname);
+	return match_pattern(filter->name_patterns, refname,
+			     filter->ignore_case);
 }
 
 /*
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 06/16] builtin/for-each-ref.c: add `--exclude` option
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (4 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-07 10:41   ` [PATCH v3 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

When using `for-each-ref`, it is sometimes convenient for the caller to
be able to exclude certain parts of the references.

For example, if there are many `refs/__hidden__/*` references, the
caller may want to emit all references *except* the hidden ones.
Currently, the only way to do this is to post-process the output, like:

    $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/'

Which is do-able, but requires processing a potentially large quantity
of references.

Teach `git for-each-ref` a new `--exclude=<pattern>` option, which
excludes references from the results if they match one or more excluded
patterns.

This patch provides a naive implementation where the `ref_filter` still
sees all references (including ones that it will discard) and is left to
check whether each reference matches any excluded pattern(s) before
emitting them.

By culling out references we know the caller doesn't care about, we can
avoid allocating memory for their storage, as well as spending time
sorting the output (among other things). Even the naive implementation
provides a significant speed-up on a modified copy of linux.git (that
has a hidden ref pointing at each commit):

    $ hyperfine \
      'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/'
    Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     820.1 ms ±   2.0 ms    [User: 703.7 ms, System: 152.0 ms]
      Range (min … max):   817.7 ms … 823.3 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/
      Time (mean ± σ):     106.6 ms ±   1.1 ms    [User: 99.4 ms, System: 7.1 ms]
      Range (min … max):   104.7 ms … 109.1 ms    27 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran
        7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"'

Subsequent patches will improve on this by avoiding visiting excluded
sections of the `packed-refs` file in certain cases.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-for-each-ref.txt |  6 +++++
 builtin/for-each-ref.c             |  1 +
 ref-filter.c                       | 13 +++++++++++
 ref-filter.h                       |  6 +++++
 t/t6300-for-each-ref.sh            | 35 ++++++++++++++++++++++++++++++
 5 files changed, 61 insertions(+)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 1e215d4e73..5743eb5def 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -14,6 +14,7 @@ SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
+		   [--exclude=<pattern> ...]
 
 DESCRIPTION
 -----------
@@ -102,6 +103,11 @@ OPTIONS
 	Do not print a newline after formatted refs where the format expands
 	to the empty string.
 
+--exclude=<pattern>::
+	If one or more patterns are given, only refs which do not match
+	any excluded pattern(s) are shown. Matching is done using the
+	same rules as `<pattern>` above.
+
 FIELD NAMES
 -----------
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index c01fa6fefe..3384987428 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -47,6 +47,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
+		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
 		OPT_CALLBACK(0, "points-at", &filter.points_at,
 			     N_("object"), N_("print only refs which points at the given object"),
diff --git a/ref-filter.c b/ref-filter.c
index 6d91c7cb0d..d44418efb7 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2171,6 +2171,15 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 			     filter->ignore_case);
 }
 
+static int filter_exclude_match(struct ref_filter *filter, const char *refname)
+{
+	if (!filter->exclude.nr)
+		return 0;
+	if (filter->match_as_path)
+		return match_name_as_path(filter, filter->exclude.v, refname);
+	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -2338,6 +2347,9 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 	if (!filter_pattern_match(filter, refname))
 		return 0;
 
+	if (filter_exclude_match(filter, refname))
+		return 0;
+
 	if (filter->points_at.nr && !match_points_at(&filter->points_at, oid, refname))
 		return 0;
 
@@ -2877,6 +2889,7 @@ void ref_filter_init(struct ref_filter *filter)
 
 void ref_filter_clear(struct ref_filter *filter)
 {
+	strvec_clear(&filter->exclude);
 	oid_array_clear(&filter->points_at);
 	free_commit_list(filter->with_commit);
 	free_commit_list(filter->no_commit);
diff --git a/ref-filter.h b/ref-filter.h
index 160b807224..1524bc463a 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -6,6 +6,7 @@
 #include "refs.h"
 #include "commit.h"
 #include "string-list.h"
+#include "strvec.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -59,6 +60,7 @@ struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 	struct commit_list *no_commit;
@@ -94,6 +96,7 @@ struct ref_format {
 
 #define REF_FILTER_INIT { \
 	.points_at = OID_ARRAY_INIT, \
+	.exclude = STRVEC_INIT, \
 }
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
@@ -112,6 +115,9 @@ struct ref_format {
 #define OPT_REF_SORT(var) \
 	OPT_STRING_LIST(0, "sort", (var), \
 			N_("key"), N_("field name to sort on"))
+#define OPT_REF_FILTER_EXCLUDE(var) \
+	OPT_STRVEC(0, "exclude", &(var)->exclude, \
+		   N_("pattern"), N_("exclude refs which match pattern"))
 
 /*
  * API for filtering a set of refs. Based on the type of refs the user
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 5c00607608..7e8d578522 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -447,6 +447,41 @@ test_expect_success 'exercise glob patterns with prefixes' '
 	test_cmp expected actual
 '
 
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with prefix exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude=refs/tags/foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/foo/one
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with pattern exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude="refs/tags/foo/t*" >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<\EOF
 'refs/heads/main'
 'refs/remotes/origin/main'
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 07/16] refs: plumb `exclude_patterns` argument throughout
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (5 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-13 23:42     ` Junio C Hamano
  2023-06-07 10:41   ` [PATCH v3 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
                     ` (9 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The subsequent patch will want to access an optional `excluded_patterns`
array within `refs/packed-backend.c` that will cull out certain
references matching any of the given patterns on a best-effort basis.

To do so, the refs subsystem needs to be updated to pass this value
across a number of different locations.

Prepare for a future patch by introducing this plumbing now, passing
NULLs at top-level APIs in order to make that patch less noisy and more
easily readable.

Signed-off-by: Taylor Blau <me@ttaylorr.co>
---
 ls-refs.c             |  2 +-
 ref-filter.c          |  5 +++--
 refs.c                | 32 +++++++++++++++++++-------------
 refs.h                |  8 +++++++-
 refs/debug.c          |  5 +++--
 refs/files-backend.c  |  5 +++--
 refs/packed-backend.c |  5 +++--
 refs/refs-internal.h  |  7 ++++---
 revision.c            |  2 +-
 9 files changed, 44 insertions(+), 27 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index f385938b64..6f490b2d9c 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -193,7 +193,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 		strvec_push(&data.prefixes, "");
 	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
 					  get_git_namespace(), data.prefixes.v,
-					  send_ref, &data);
+					  NULL, send_ref, &data);
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
diff --git a/ref-filter.c b/ref-filter.c
index d44418efb7..717c3c4bcf 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2209,12 +2209,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return for_each_fullref_in("", cb, cb_data);
+		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+						 "", NULL, cb, cb_data);
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
-						 cb, cb_data);
+						 NULL, cb, cb_data);
 }
 
 /*
diff --git a/refs.c b/refs.c
index b9b77d2eff..538bde644e 100644
--- a/refs.c
+++ b/refs.c
@@ -1526,7 +1526,9 @@ int head_ref(each_ref_fn fn, void *cb_data)
 
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim,
+		const char *prefix,
+		const char **exclude_patterns,
+		int trim,
 		enum do_for_each_ref_flags flags)
 {
 	struct ref_iterator *iter;
@@ -1542,8 +1544,7 @@ struct ref_iterator *refs_ref_iterator_begin(
 		}
 	}
 
-	iter = refs->be->iterator_begin(refs, prefix, flags);
-
+	iter = refs->be->iterator_begin(refs, prefix, exclude_patterns, flags);
 	/*
 	 * `iterator_begin()` already takes care of prefix, but we
 	 * might need to do some trimming:
@@ -1577,7 +1578,7 @@ static int do_for_each_repo_ref(struct repository *r, const char *prefix,
 	if (!refs)
 		return 0;
 
-	iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
+	iter = refs_ref_iterator_begin(refs, prefix, NULL, trim, flags);
 
 	return do_for_each_repo_ref_iterator(r, iter, fn, cb_data);
 }
@@ -1599,6 +1600,7 @@ static int do_for_each_ref_helper(struct repository *r,
 }
 
 static int do_for_each_ref(struct ref_store *refs, const char *prefix,
+			   const char **exclude_patterns,
 			   each_ref_fn fn, int trim,
 			   enum do_for_each_ref_flags flags, void *cb_data)
 {
@@ -1608,7 +1610,8 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 	if (!refs)
 		return 0;
 
-	iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
+	iter = refs_ref_iterator_begin(refs, prefix, exclude_patterns, trim,
+				       flags);
 
 	return do_for_each_repo_ref_iterator(the_repository, iter,
 					do_for_each_ref_helper, &hp);
@@ -1616,7 +1619,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "", NULL, fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1627,7 +1630,7 @@ int for_each_ref(each_ref_fn fn, void *cb_data)
 int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
 			 each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, prefix, fn, strlen(prefix), 0, cb_data);
+	return do_for_each_ref(refs, prefix, NULL, fn, strlen(prefix), 0, cb_data);
 }
 
 int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
@@ -1638,13 +1641,14 @@ int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 {
 	return do_for_each_ref(get_main_ref_store(the_repository),
-			       prefix, fn, 0, 0, cb_data);
+			       prefix, NULL, fn, 0, 0, cb_data);
 }
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
 			     each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, prefix, fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
 }
 
 int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data)
@@ -1661,14 +1665,14 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 	int ret;
 	strbuf_addf(&buf, "%srefs/", get_git_namespace());
 	ret = do_for_each_ref(get_main_ref_store(the_repository),
-			      buf.buf, fn, 0, 0, cb_data);
+			      buf.buf, NULL, fn, 0, 0, cb_data);
 	strbuf_release(&buf);
 	return ret;
 }
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
+	return do_for_each_ref(refs, "", NULL, fn, 0,
 			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
 }
 
@@ -1738,6 +1742,7 @@ static void find_longest_prefixes(struct string_list *out,
 int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 				      const char *namespace,
 				      const char **patterns,
+				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data)
 {
 	struct string_list prefixes = STRING_LIST_INIT_DUP;
@@ -1753,7 +1758,8 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 
 	for_each_string_list_item(prefix, &prefixes) {
 		strbuf_addstr(&buf, prefix->string);
-		ret = refs_for_each_fullref_in(ref_store, buf.buf, fn, cb_data);
+		ret = refs_for_each_fullref_in(ref_store, buf.buf,
+					       exclude_patterns, fn, cb_data);
 		if (ret)
 			break;
 		strbuf_setlen(&buf, namespace_len);
@@ -2408,7 +2414,7 @@ int refs_verify_refname_available(struct ref_store *refs,
 	strbuf_addstr(&dirname, refname + dirname.len);
 	strbuf_addch(&dirname, '/');
 
-	iter = refs_ref_iterator_begin(refs, dirname.buf, 0,
+	iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 				       DO_FOR_EACH_INCLUDE_BROKEN);
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		if (skip &&
diff --git a/refs.h b/refs.h
index 123cfa4424..d672d636cf 100644
--- a/refs.h
+++ b/refs.h
@@ -338,6 +338,7 @@ int for_each_ref(each_ref_fn fn, void *cb_data);
 int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
 			     each_ref_fn fn, void *cb_data);
 int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
@@ -345,10 +346,15 @@ int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data);
  * iterate all refs in "patterns" by partitioning patterns into disjoint sets
  * and iterating the longest-common prefix of each set.
  *
+ * references matching any pattern in "exclude_patterns" are omitted from the
+ * result set on a best-effort basis.
+ *
  * callers should be prepared to ignore references that they did not ask for.
  */
 int refs_for_each_fullref_in_prefixes(struct ref_store *refs,
-				      const char *namespace, const char **patterns,
+				      const char *namespace,
+				      const char **patterns,
+				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data);
 
 /**
diff --git a/refs/debug.c b/refs/debug.c
index 6f11e6de46..328f894177 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -229,11 +229,12 @@ static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 
 static struct ref_iterator *
 debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
-			 unsigned int flags)
+			 const char **exclude_patterns, unsigned int flags)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
 	struct ref_iterator *res =
-		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+		drefs->refs->be->iterator_begin(drefs->refs, prefix,
+						exclude_patterns, flags);
 	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
 	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
 	diter->iter = res;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index bca7b851c5..3bc3c57c05 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -829,7 +829,8 @@ static struct ref_iterator_vtable files_ref_iterator_vtable = {
 
 static struct ref_iterator *files_ref_iterator_begin(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags)
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags)
 {
 	struct files_ref_store *refs;
 	struct ref_iterator *loose_iter, *packed_iter, *overlay_iter;
@@ -874,7 +875,7 @@ static struct ref_iterator *files_ref_iterator_begin(
 	 * the packed and loose references.
 	 */
 	packed_iter = refs_ref_iterator_begin(
-			refs->packed_ref_store, prefix, 0,
+			refs->packed_ref_store, prefix, exclude_patterns, 0,
 			DO_FOR_EACH_INCLUDE_BROKEN);
 
 	overlay_iter = overlay_ref_iterator_begin(loose_iter, packed_iter);
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 5b412a133b..176bd3905b 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -924,7 +924,8 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 
 static struct ref_iterator *packed_ref_iterator_begin(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags)
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags)
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
@@ -1149,7 +1150,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 * list of refs is exhausted, set iter to NULL. When the list
 	 * of updates is exhausted, leave i set to updates->nr.
 	 */
-	iter = packed_ref_iterator_begin(&refs->base, "",
+	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
 	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
 		iter = NULL;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index a85d113123..28a11b9d61 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -367,8 +367,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
  */
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim,
-		enum do_for_each_ref_flags flags);
+		const char *prefix, const char **exclude_patterns,
+		int trim, enum do_for_each_ref_flags flags);
 
 /*
  * A callback function used to instruct merge_ref_iterator how to
@@ -570,7 +570,8 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
  */
 typedef struct ref_iterator *ref_iterator_begin_fn(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags);
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags);
 
 /* reflog functions */
 
diff --git a/revision.c b/revision.c
index b33cc1d106..89953592f9 100644
--- a/revision.c
+++ b/revision.c
@@ -2670,7 +2670,7 @@ static int for_each_bisect_ref(struct ref_store *refs, each_ref_fn fn,
 	struct strbuf bisect_refs = STRBUF_INIT;
 	int status;
 	strbuf_addf(&bisect_refs, "refs/bisect/%s", term);
-	status = refs_for_each_fullref_in(refs, bisect_refs.buf, fn, cb_data);
+	status = refs_for_each_fullref_in(refs, bisect_refs.buf, NULL, fn, cb_data);
 	strbuf_release(&bisect_refs);
 	return status;
 }
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 08/16] refs/packed-backend.c: refactor `find_reference_location()`
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (6 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-07 10:41   ` [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The function `find_reference_location()` is used to perform a
binary search-like function over the contents of a repository's
`$GIT_DIR/packed-refs` file.

The search it implements is unlike a standard binary search in that the
records it searches over are not of a fixed width, so the comparison
must locate the end of a record before comparing it.

Extract the core routine of `find_reference_location()` in order to
implement a function in the following patch which will find the first
location in the `packed-refs` file that *doesn't* match the given
pattern.

The behavior of `find_reference_location()` is unchanged.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 176bd3905b..33639f73e1 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -527,22 +527,8 @@ static int load_contents(struct snapshot *snapshot)
 	return 1;
 }
 
-/*
- * Find the place in `snapshot->buf` where the start of the record for
- * `refname` starts. If `mustexist` is true and the reference doesn't
- * exist, then return NULL. If `mustexist` is false and the reference
- * doesn't exist, then return the point where that reference would be
- * inserted, or `snapshot->eof` (which might be NULL) if it would be
- * inserted at the end of the file. In the latter mode, `refname`
- * doesn't have to be a proper reference name; for example, one could
- * search for "refs/replace/" to find the start of any replace
- * references.
- *
- * The record is sought using a binary search, so `snapshot->buf` must
- * be sorted.
- */
-static const char *find_reference_location(struct snapshot *snapshot,
-					   const char *refname, int mustexist)
+static const char *find_reference_location_1(struct snapshot *snapshot,
+					     const char *refname, int mustexist)
 {
 	/*
 	 * This is not *quite* a garden-variety binary search, because
@@ -588,6 +574,26 @@ static const char *find_reference_location(struct snapshot *snapshot,
 		return lo;
 }
 
+/*
+ * Find the place in `snapshot->buf` where the start of the record for
+ * `refname` starts. If `mustexist` is true and the reference doesn't
+ * exist, then return NULL. If `mustexist` is false and the reference
+ * doesn't exist, then return the point where that reference would be
+ * inserted, or `snapshot->eof` (which might be NULL) if it would be
+ * inserted at the end of the file. In the latter mode, `refname`
+ * doesn't have to be a proper reference name; for example, one could
+ * search for "refs/replace/" to find the start of any replace
+ * references.
+ *
+ * The record is sought using a binary search, so `snapshot->buf` must
+ * be sorted.
+ */
+static const char *find_reference_location(struct snapshot *snapshot,
+					   const char *refname, int mustexist)
+{
+	return find_reference_location_1(snapshot, refname, mustexist);
+}
+
 /*
  * Create a newly-allocated `snapshot` of the `packed-refs` file in
  * its current state and return it. The return value will already have
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (7 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-14  0:27     ` Junio C Hamano
  2023-06-07 10:41   ` [PATCH v3 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
                     ` (7 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

When iterating through the `packed-refs` file in order to answer a query
like:

    $ git for-each-ref --exclude=refs/__hidden__

it would be useful to avoid walking over all of the entries in
`refs/__hidden__/*` when possible, since we know that the ref-filter
code is going to throw them away anyways.

In certain circumstances, doing so is possible. The algorithm for doing
so is as follows:

  - For each excluded pattern, find the first record that matches it,
    and the first record that *doesn't* match it (i.e. the location
    you'd next want to consider when excluding that pattern).

  - Sort the set of excluded regions from the previous step in ascending
    order of the first location within the `packed-refs` file that
    matches.

  - Clean up the results from the previous step: discard empty regions,
    and combine adjacent regions.

Then when iterating through the `packed-refs` file, if `iter->pos` is
ever contained in one of the regions from the previous steps, advance
`iter->pos` past the end of that region, and continue enumeration.

Note that we only perform this optimization when none of the excluded
pattern(s) have special meta-characters in them. For a pattern like
"refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and
everything underneath them) are not connected. A future implementation
that handles this case may split the character class (pretending as if
two patterns were excluded: "refs/fooa", and "refs/fooc").

There are a few other gotchas worth considering. First, note that the
jump list is sorted, so once we jump past a region, we can avoid
considering it (or any regions preceding it) again. The member
`jump_pos` is used to track the first next-possible region to jump
through.

Second, note that the exclusion list is best-effort, since we do not
handle loose references, and because of the meta-character issue above.

In repositories with a large number of hidden references, the speed-up
can be significant. Tests here are done with a copy of linux.git with a
reference "refs/pull/N" pointing at every commit, as in:

    $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' |
        git update-ref --stdin
    $ git pack-refs --all

, it is significantly faster to have `for-each-ref` jump over the
excluded references, as opposed to filtering them out after the fact:

    $ hyperfine \
      'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
    Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
      Range (min … max):   800.0 ms … 807.7 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
      Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
      Range (min … max):     4.3 ms …   6.7 ms    422 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
      172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'

Using the jump list is fairly straightforward (see the changes to
`refs/packed-backend.c::next_record()`), but constructing the list is
not. To ensure that the construction is correct, add a new suite of
tests in t1419 covering various corner cases (overlapping regions,
partially overlapping regions, adjacent regions, etc.).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c              |   5 +-
 refs/packed-backend.c     | 166 ++++++++++++++++++++++++++++++++++++--
 t/helper/test-ref-store.c |  10 +++
 t/t1419-exclude-refs.sh   | 101 +++++++++++++++++++++++
 4 files changed, 274 insertions(+), 8 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

diff --git a/ref-filter.c b/ref-filter.c
index 717c3c4bcf..ddc7f5204f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2210,12 +2210,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
 		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						 "", NULL, cb, cb_data);
+						 "", filter->exclude.v, cb, cb_data);
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
-						 NULL, cb, cb_data);
+						 filter->exclude.v,
+						 cb, cb_data);
 }
 
 /*
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 33639f73e1..67327e579c 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -303,7 +303,8 @@ static int cmp_packed_ref_records(const void *v1, const void *v2)
  * Compare a snapshot record at `rec` to the specified NUL-terminated
  * refname.
  */
-static int cmp_record_to_refname(const char *rec, const char *refname)
+static int cmp_record_to_refname(const char *rec, const char *refname,
+				 int start)
 {
 	const char *r1 = rec + the_hash_algo->hexsz + 1;
 	const char *r2 = refname;
@@ -312,7 +313,7 @@ static int cmp_record_to_refname(const char *rec, const char *refname)
 		if (*r1 == '\n')
 			return *r2 ? -1 : 0;
 		if (!*r2)
-			return 1;
+			return start ? 1 : -1;
 		if (*r1 != *r2)
 			return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
 		r1++;
@@ -528,7 +529,8 @@ static int load_contents(struct snapshot *snapshot)
 }
 
 static const char *find_reference_location_1(struct snapshot *snapshot,
-					     const char *refname, int mustexist)
+					     const char *refname, int mustexist,
+					     int start)
 {
 	/*
 	 * This is not *quite* a garden-variety binary search, because
@@ -558,7 +560,7 @@ static const char *find_reference_location_1(struct snapshot *snapshot,
 
 		mid = lo + (hi - lo) / 2;
 		rec = find_start_of_record(lo, mid);
-		cmp = cmp_record_to_refname(rec, refname);
+		cmp = cmp_record_to_refname(rec, refname, start);
 		if (cmp < 0) {
 			lo = find_end_of_record(mid, hi);
 		} else if (cmp > 0) {
@@ -591,7 +593,22 @@ static const char *find_reference_location_1(struct snapshot *snapshot,
 static const char *find_reference_location(struct snapshot *snapshot,
 					   const char *refname, int mustexist)
 {
-	return find_reference_location_1(snapshot, refname, mustexist);
+	return find_reference_location_1(snapshot, refname, mustexist, 1);
+}
+
+/*
+ * Find the place in `snapshot->buf` after the end of the record for
+ * `refname`. In other words, find the location of first thing *after*
+ * `refname`.
+ *
+ * Other semantics are identical to the ones in
+ * `find_reference_location()`.
+ */
+static const char *find_reference_location_end(struct snapshot *snapshot,
+					       const char *refname,
+					       int mustexist)
+{
+	return find_reference_location_1(snapshot, refname, mustexist, 0);
 }
 
 /*
@@ -785,6 +802,13 @@ struct packed_ref_iterator {
 	/* The end of the part of the buffer that will be iterated over: */
 	const char *eof;
 
+	struct jump_list_entry {
+		const char *start;
+		const char *end;
+	} *jump;
+	size_t jump_nr, jump_alloc;
+	size_t jump_pos;
+
 	/* Scratch space for current values: */
 	struct object_id oid, peeled;
 	struct strbuf refname_buf;
@@ -802,14 +826,34 @@ struct packed_ref_iterator {
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
-	const char *p = iter->pos, *eol;
+	const char *p, *eol;
 
 	strbuf_reset(&iter->refname_buf);
 
+	/*
+	 * If iter->pos is contained within a skipped region, jump past
+	 * it.
+	 *
+	 * Note that each skipped region is considered at most once,
+	 * since they are ordered based on their starting position.
+	 */
+	while (iter->jump_pos < iter->jump_nr) {
+		struct jump_list_entry *curr = &iter->jump[iter->jump_pos];
+		if (iter->pos < curr->start)
+			break; /* not to the next jump yet */
+
+		iter->jump_pos++;
+		if (iter->pos < curr->end) {
+			iter->pos = curr->end;
+			break;
+		}
+	}
+
 	if (iter->pos == iter->eof)
 		return ITER_DONE;
 
 	iter->base.flags = REF_ISPACKED;
+	p = iter->pos;
 
 	if (iter->eof - p < the_hash_algo->hexsz + 2 ||
 	    parse_oid_hex(p, &iter->oid, &p) ||
@@ -917,6 +961,7 @@ static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
 	int ok = ITER_DONE;
 
 	strbuf_release(&iter->refname_buf);
+	free(iter->jump);
 	release_snapshot(iter->snapshot);
 	base_ref_iterator_free(ref_iterator);
 	return ok;
@@ -928,6 +973,112 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.abort = packed_ref_iterator_abort
 };
 
+static int jump_list_entry_cmp(const void *va, const void *vb)
+{
+	const struct jump_list_entry *a = va;
+	const struct jump_list_entry *b = vb;
+
+	if (a->start < b->start)
+		return -1;
+	if (a->start > b->start)
+		return 1;
+	return 0;
+}
+
+static int has_glob_special(const char *str)
+{
+	const char *p;
+	for (p = str; *p; p++) {
+		if (is_glob_special(*p))
+			return 1;
+	}
+	return 0;
+}
+
+static const char *ptr_max(const char *x, const char *y)
+{
+	if (x > y)
+		return x;
+	return y;
+}
+
+static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
+					struct snapshot *snapshot,
+					const char **excluded_patterns)
+{
+	size_t i, j;
+	const char **pattern;
+	struct jump_list_entry *last_disjoint;
+
+	if (!excluded_patterns)
+		return;
+
+	for (pattern = excluded_patterns; *pattern; pattern++) {
+		struct jump_list_entry *e;
+
+		/*
+		 * We can't feed any excludes with globs in them to the
+		 * refs machinery.  It only understands prefix matching.
+		 * We likewise can't even feed the string leading up to
+		 * the first meta-character, as something like "foo[a]"
+		 * should not exclude "foobar" (but the prefix "foo"
+		 * would match that and mark it for exclusion).
+		 */
+		if (has_glob_special(*pattern))
+			continue;
+
+		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
+
+		e = &iter->jump[iter->jump_nr++];
+		e->start = find_reference_location(snapshot, *pattern, 0);
+		e->end = find_reference_location_end(snapshot, *pattern, 0);
+	}
+
+	if (!iter->jump_nr) {
+		/*
+		 * Every entry in exclude_patterns has a meta-character,
+		 * nothing to do here.
+		 */
+		return;
+	}
+
+	QSORT(iter->jump, iter->jump_nr, jump_list_entry_cmp);
+
+	/*
+	 * As an optimization, merge adjacent entries in the jump list
+	 * to jump forwards as far as possible when entering a skipped
+	 * region.
+	 *
+	 * For example, if we have two skipped regions:
+	 *
+	 *	[[A, B], [B, C]]
+	 *
+	 * we want to combine that into a single entry jumping from A to
+	 * C.
+	 */
+	last_disjoint = iter->jump;
+
+	for (i = 1, j = 1; i < iter->jump_nr; i++) {
+		struct jump_list_entry *ours = &iter->jump[i];
+
+		if (ours->start == ours->end) {
+			/* ignore empty regions (no matching entries) */
+			continue;
+		} else if (ours->start <= last_disjoint->end) {
+			/* overlapping regions extend the previous one */
+			last_disjoint->end = ptr_max(last_disjoint->end, ours->end);
+		} else {
+			/* otherwise, insert a new region */
+			iter->jump[j++] = *ours;
+			last_disjoint = ours;
+
+		}
+	}
+
+	iter->jump_nr = j;
+	iter->jump_pos = 0;
+}
+
 static struct ref_iterator *packed_ref_iterator_begin(
 		struct ref_store *ref_store,
 		const char *prefix, const char **exclude_patterns,
@@ -963,6 +1114,9 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
 
+	if (exclude_patterns)
+		populate_excluded_jump_list(iter, snapshot, exclude_patterns);
+
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
 
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 6d8f844e9c..2bff003f7c 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -175,6 +175,15 @@ static int cmd_for_each_ref(struct ref_store *refs, const char **argv)
 	return refs_for_each_ref_in(refs, prefix, each_ref, NULL);
 }
 
+static int cmd_for_each_ref__exclude(struct ref_store *refs, const char **argv)
+{
+	const char *prefix = notnull(*argv++, "prefix");
+	const char **exclude_patterns = argv;
+
+	return refs_for_each_fullref_in(refs, prefix, exclude_patterns, each_ref,
+					NULL);
+}
+
 static int cmd_resolve_ref(struct ref_store *refs, const char **argv)
 {
 	struct object_id oid = *null_oid();
@@ -307,6 +316,7 @@ static struct command commands[] = {
 	{ "delete-refs", cmd_delete_refs },
 	{ "rename-ref", cmd_rename_ref },
 	{ "for-each-ref", cmd_for_each_ref },
+	{ "for-each-ref--exclude", cmd_for_each_ref__exclude },
 	{ "resolve-ref", cmd_resolve_ref },
 	{ "verify-ref", cmd_verify_ref },
 	{ "for-each-reflog", cmd_for_each_reflog },
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
new file mode 100755
index 0000000000..bc534c8ea1
--- /dev/null
+++ b/t/t1419-exclude-refs.sh
@@ -0,0 +1,101 @@
+#!/bin/sh
+
+test_description='test exclude_patterns functionality in main ref store'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+for_each_ref__exclude () {
+	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
+	cut -d ' ' -f 2 actual.raw
+}
+
+for_each_ref () {
+	git for-each-ref --format='%(refname)' "$@"
+}
+
+test_expect_success 'setup' '
+	test_commit --no-tag base &&
+	base="$(git rev-parse HEAD)" &&
+
+	for name in foo bar baz quux
+	do
+		for i in 1 2 3
+		do
+			echo "create refs/heads/$name/$i $base" || return 1
+		done || return 1
+	done >in &&
+	echo "delete refs/heads/main" >>in &&
+
+	git update-ref --stdin <in &&
+	git pack-refs --all
+'
+
+test_expect_success 'excluded region in middle' '
+	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
+	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'excluded region at beginning' '
+	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'excluded region at end' '
+	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
+	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'disjoint excluded regions' '
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'adjacent, non-overlapping excluded regions' '
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'overlapping excluded regions' '
+	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'several overlapping excluded regions' '
+	for_each_ref__exclude refs/heads \
+		refs/heads/bar refs/heads/baz refs/heads/foo >actual &&
+	for_each_ref refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'non-matching excluded section' '
+	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'meta-characters are discarded' '
+	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 10/16] refs/packed-backend.c: add trace2 counters for jump list
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (8 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-14  0:32     ` Junio C Hamano
  2023-06-07 10:41   ` [PATCH v3 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
                     ` (6 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The previous commit added low-level tests to ensure that the packed-refs
iterator did not enumerate excluded sections of the refspace.

However, there was no guarantee that these sections weren't being
visited, only that they were being suppressed from the output. To harden
these tests, add a trace2 counter which tracks the number of regions
skipped by the packed-refs iterator, and assert on its value.

Suggested-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c   |  2 ++
 t/t1419-exclude-refs.sh | 59 ++++++++++++++++++++++++++++-------------
 trace2.h                |  2 ++
 trace2/tr2_ctr.c        |  5 ++++
 4 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 67327e579c..7ba9fa2bb8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -12,6 +12,7 @@
 #include "../chdir-notify.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
+#include "../trace2.h"
 
 enum mmap_strategy {
 	/*
@@ -845,6 +846,7 @@ static int next_record(struct packed_ref_iterator *iter)
 		iter->jump_pos++;
 		if (iter->pos < curr->end) {
 			iter->pos = curr->end;
+			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, 1);
 			break;
 		}
 	}
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
index bc534c8ea1..350a7d2587 100755
--- a/t/t1419-exclude-refs.sh
+++ b/t/t1419-exclude-refs.sh
@@ -9,7 +9,8 @@ TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 for_each_ref__exclude () {
-	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
+	GIT_TRACE2_PERF=1 test-tool ref-store main \
+		for-each-ref--exclude "$@" >actual.raw
 	cut -d ' ' -f 2 actual.raw
 }
 
@@ -17,6 +18,17 @@ for_each_ref () {
 	git for-each-ref --format='%(refname)' "$@"
 }
 
+assert_jumps () {
+	local nr="$1"
+	local trace="$2"
+
+	grep -q "name:jumps_made value:$nr" $trace
+}
+
+assert_no_jumps () {
+	! assert_jumps ".*" "$1"
+}
+
 test_expect_success 'setup' '
 	test_commit --no-tag base &&
 	base="$(git rev-parse HEAD)" &&
@@ -35,67 +47,76 @@ test_expect_success 'setup' '
 '
 
 test_expect_success 'excluded region in middle' '
-	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
+	for_each_ref__exclude refs/heads refs/heads/foo >actual 2>perf &&
 	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'excluded region at beginning' '
-	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar >actual 2>perf &&
 	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'excluded region at end' '
-	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
+	for_each_ref__exclude refs/heads refs/heads/quux >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'disjoint excluded regions' '
-	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual 2>perf &&
 	for_each_ref refs/heads/baz refs/heads/foo >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 2 perf
 '
 
 test_expect_success 'adjacent, non-overlapping excluded regions' '
-	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'overlapping excluded regions' '
-	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
+	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'several overlapping excluded regions' '
 	for_each_ref__exclude refs/heads \
-		refs/heads/bar refs/heads/baz refs/heads/foo >actual &&
+		refs/heads/bar refs/heads/baz refs/heads/foo >actual 2>perf &&
 	for_each_ref refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'non-matching excluded section' '
-	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
+	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual 2>perf &&
 	for_each_ref >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_no_jumps
 '
 
 test_expect_success 'meta-characters are discarded' '
-	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
+	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual 2>perf &&
 	for_each_ref >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_no_jumps
 '
 
 test_done
diff --git a/trace2.h b/trace2.h
index 4ced30c0db..9452e291f5 100644
--- a/trace2.h
+++ b/trace2.h
@@ -551,6 +551,8 @@ enum trace2_counter_id {
 	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
 	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
 
+	TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, /* counts number of jumps */
+
 	/* Add additional counter definitions before here. */
 	TRACE2_NUMBER_OF_COUNTERS
 };
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
index b342d3b1a3..50570d0165 100644
--- a/trace2/tr2_ctr.c
+++ b/trace2/tr2_ctr.c
@@ -27,6 +27,11 @@ static struct tr2_counter_metadata tr2_counter_metadata[TRACE2_NUMBER_OF_COUNTER
 		.name = "test2",
 		.want_per_thread_events = 1,
 	},
+	[TRACE2_COUNTER_ID_PACKED_REFS_JUMPS] = {
+		.category = "packed-refs",
+		.name = "jumps_made",
+		.want_per_thread_events = 0,
+	},
 
 	/* Add additional metadata before here. */
 };
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 11/16] revision.h: store hidden refs in a `strvec`
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (9 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-07 10:41   ` [PATCH v3 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In subsequent commits, it will be convenient to have a 'const char **'
of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`,
etc.), instead of a `string_list`.

Convert spots throughout the tree that store the list of hidden refs
from a `string_list` to a `strvec`.

Note that in `parse_hide_refs_config()` there is an ugly const-cast used
to avoid an extra copy of each value before trimming any trailing slash
characters. This could instead be written as:

    ref = xstrdup(value);
    len = strlen(ref);
    while (len && ref[len - 1] == '/')
            ref[--len] = '\0';
    strvec_push(hide_refs, ref);
    free(ref);

but the double-copy (once when calling `xstrdup()`, and another via
`strvec_push()`) is wasteful.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/receive-pack.c |  4 ++--
 ls-refs.c              |  6 +++---
 refs.c                 | 11 ++++++-----
 refs.h                 |  4 ++--
 revision.c             |  2 +-
 revision.h             |  5 +++--
 upload-pack.c          | 10 +++++-----
 7 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 1a31a58367..1a8472eddc 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -90,7 +90,7 @@ static struct object_id push_cert_oid;
 static struct signature_check sigcheck;
 static const char *push_cert_nonce;
 static const char *cert_nonce_seed;
-static struct string_list hidden_refs = STRING_LIST_INIT_DUP;
+static struct strvec hidden_refs = STRVEC_INIT;
 
 static const char *NONCE_UNSOLICITED = "UNSOLICITED";
 static const char *NONCE_BAD = "BAD";
@@ -2619,7 +2619,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		packet_flush(1);
 	oid_array_clear(&shallow);
 	oid_array_clear(&ref);
-	string_list_clear(&hidden_refs, 0);
+	strvec_clear(&hidden_refs);
 	free((void *)push_cert_nonce);
 	return 0;
 }
diff --git a/ls-refs.c b/ls-refs.c
index 6f490b2d9c..8c3181d051 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -72,7 +72,7 @@ struct ls_refs_data {
 	unsigned symrefs;
 	struct strvec prefixes;
 	struct strbuf buf;
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 	unsigned unborn : 1;
 };
 
@@ -155,7 +155,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 	memset(&data, 0, sizeof(data));
 	strvec_init(&data.prefixes);
 	strbuf_init(&data.buf, 0);
-	string_list_init_dup(&data.hidden_refs);
+	strvec_init(&data.hidden_refs);
 
 	git_config(ls_refs_config, &data);
 
@@ -197,7 +197,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
-	string_list_clear(&data.hidden_refs, 0);
+	strvec_clear(&data.hidden_refs);
 	return 0;
 }
 
diff --git a/refs.c b/refs.c
index 538bde644e..ec4d5b9101 100644
--- a/refs.c
+++ b/refs.c
@@ -1427,7 +1427,7 @@ char *shorten_unambiguous_ref(const char *refname, int strict)
 }
 
 int parse_hide_refs_config(const char *var, const char *value, const char *section,
-			   struct string_list *hide_refs)
+			   struct strvec *hide_refs)
 {
 	const char *key;
 	if (!strcmp("transfer.hiderefs", var) ||
@@ -1438,22 +1438,23 @@ int parse_hide_refs_config(const char *var, const char *value, const char *secti
 
 		if (!value)
 			return config_error_nonbool(var);
-		ref = xstrdup(value);
+
+		/* drop const to remove trailing '/' characters */
+		ref = (char *)strvec_push(hide_refs, value);
 		len = strlen(ref);
 		while (len && ref[len - 1] == '/')
 			ref[--len] = '\0';
-		string_list_append_nodup(hide_refs, ref);
 	}
 	return 0;
 }
 
 int ref_is_hidden(const char *refname, const char *refname_full,
-		  const struct string_list *hide_refs)
+		  const struct strvec *hide_refs)
 {
 	int i;
 
 	for (i = hide_refs->nr - 1; i >= 0; i--) {
-		const char *match = hide_refs->items[i].string;
+		const char *match = hide_refs->v[i];
 		const char *subject;
 		int neg = 0;
 		const char *p;
diff --git a/refs.h b/refs.h
index d672d636cf..a7751a1fc9 100644
--- a/refs.h
+++ b/refs.h
@@ -810,7 +810,7 @@ int update_ref(const char *msg, const char *refname,
 	       unsigned int flags, enum action_on_err onerr);
 
 int parse_hide_refs_config(const char *var, const char *value, const char *,
-			   struct string_list *);
+			   struct strvec *);
 
 /*
  * Check whether a ref is hidden. If no namespace is set, both the first and
@@ -820,7 +820,7 @@ int parse_hide_refs_config(const char *var, const char *value, const char *,
  * the ref is outside that namespace, the first parameter is NULL. The second
  * parameter always points to the full ref name.
  */
-int ref_is_hidden(const char *, const char *, const struct string_list *);
+int ref_is_hidden(const char *, const char *, const struct strvec *);
 
 /* Is this a per-worktree ref living in the refs/ namespace? */
 int is_per_worktree_ref(const char *refname);
diff --git a/revision.c b/revision.c
index 89953592f9..7c9367a266 100644
--- a/revision.c
+++ b/revision.c
@@ -1558,7 +1558,7 @@ void init_ref_exclusions(struct ref_exclusions *exclusions)
 void clear_ref_exclusions(struct ref_exclusions *exclusions)
 {
 	string_list_clear(&exclusions->excluded_refs, 0);
-	string_list_clear(&exclusions->hidden_refs, 0);
+	strvec_clear(&exclusions->hidden_refs);
 	exclusions->hidden_refs_configured = 0;
 }
 
diff --git a/revision.h b/revision.h
index 31828748dc..94f035fa22 100644
--- a/revision.h
+++ b/revision.h
@@ -10,6 +10,7 @@
 #include "decorate.h"
 #include "ident.h"
 #include "list-objects-filter-options.h"
+#include "strvec.h"
 
 /**
  * The revision walking API offers functions to build a list of revisions
@@ -95,7 +96,7 @@ struct ref_exclusions {
 	 * Hidden refs is a list of patterns that is to be hidden via
 	 * `ref_is_hidden()`.
 	 */
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 
 	/*
 	 * Indicates whether hidden refs have been configured. This is to
@@ -110,7 +111,7 @@ struct ref_exclusions {
  */
 #define REF_EXCLUSIONS_INIT { \
 	.excluded_refs = STRING_LIST_INIT_DUP, \
-	.hidden_refs = STRING_LIST_INIT_DUP, \
+	.hidden_refs = STRVEC_INIT, \
 }
 
 struct oidset;
diff --git a/upload-pack.c b/upload-pack.c
index d3312006a3..1a213ed775 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -69,7 +69,7 @@ struct upload_pack_data {
 	struct object_array have_obj;
 	struct oid_array haves;					/* v2 only */
 	struct string_list wanted_refs;				/* v2 only */
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 
 	struct object_array shallows;
 	struct string_list deepen_not;
@@ -127,7 +127,7 @@ static void upload_pack_data_init(struct upload_pack_data *data)
 {
 	struct string_list symref = STRING_LIST_INIT_DUP;
 	struct string_list wanted_refs = STRING_LIST_INIT_DUP;
-	struct string_list hidden_refs = STRING_LIST_INIT_DUP;
+	struct strvec hidden_refs = STRVEC_INIT;
 	struct object_array want_obj = OBJECT_ARRAY_INIT;
 	struct object_array have_obj = OBJECT_ARRAY_INIT;
 	struct oid_array haves = OID_ARRAY_INIT;
@@ -162,7 +162,7 @@ static void upload_pack_data_clear(struct upload_pack_data *data)
 {
 	string_list_clear(&data->symref, 1);
 	string_list_clear(&data->wanted_refs, 1);
-	string_list_clear(&data->hidden_refs, 0);
+	strvec_clear(&data->hidden_refs);
 	object_array_clear(&data->want_obj);
 	object_array_clear(&data->have_obj);
 	oid_array_clear(&data->haves);
@@ -1170,7 +1170,7 @@ static void receive_needs(struct upload_pack_data *data,
 
 /* return non-zero if the ref is hidden, otherwise 0 */
 static int mark_our_ref(const char *refname, const char *refname_full,
-			const struct object_id *oid, const struct string_list *hidden_refs)
+			const struct object_id *oid, const struct strvec *hidden_refs)
 {
 	struct object *o = lookup_unknown_object(the_repository, oid);
 
@@ -1465,7 +1465,7 @@ static int parse_want(struct packet_writer *writer, const char *line,
 
 static int parse_want_ref(struct packet_writer *writer, const char *line,
 			  struct string_list *wanted_refs,
-			  struct string_list *hidden_refs,
+			  struct strvec *hidden_refs,
 			  struct object_array *want_obj)
 {
 	const char *refname_nons;
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 12/16] refs/packed-backend.c: ignore complicated hidden refs rules
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (10 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-14  0:40     ` Junio C Hamano
  2023-06-07 10:41   ` [PATCH v3 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
                     ` (4 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In subsequent commits, we'll teach `receive-pack` and `upload-pack` to
use the new skip-list feature in the packed-refs iterator by ignoring
references which are mentioned via its respective hideRefs lists.

However, the packed-ref skip lists cannot handle un-hiding rules (that
begin with '!'), or namespace comparisons (that begin with '^'). Detect
and avoid these cases by falling back to the normal enumeration without
a skip list when such patterns exist.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c   | 19 +++++++++++++++++++
 t/t1419-exclude-refs.sh |  9 +++++++++
 2 files changed, 28 insertions(+)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 7ba9fa2bb8..9ea6c07866 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1015,6 +1015,25 @@ static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
 	if (!excluded_patterns)
 		return;
 
+	for (pattern = excluded_patterns; *pattern; pattern++) {
+		/*
+		 * We also can't feed any excludes from hidden refs
+		 * config sections, since later rules may override
+		 * previous ones. For example, with rules "refs/foo" and
+		 * "!refs/foo/bar", we should show "refs/foo/bar" (and
+		 * everything underneath it), but the earlier exclusion
+		 * would cause us to skip all of "refs/foo". We likewise
+		 * don't implement the namespace stripping required for
+		 * '^' rules.
+		 *
+		 * Both are possible to do, but complicated, so avoid
+		 * populating the jump list at all if we see either of
+		 * these patterns.
+		 */
+		if (**pattern == '!' || **pattern == '^')
+			return;
+	}
+
 	for (pattern = excluded_patterns; *pattern; pattern++) {
 		struct jump_list_entry *e;
 
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
index 350a7d2587..0e91e2f399 100755
--- a/t/t1419-exclude-refs.sh
+++ b/t/t1419-exclude-refs.sh
@@ -119,4 +119,13 @@ test_expect_success 'meta-characters are discarded' '
 	assert_no_jumps
 '
 
+test_expect_success 'complex hidden ref rules are discarded' '
+	for_each_ref__exclude refs/heads refs/heads/foo "!refs/heads/foo/1" \
+		>actual 2>perf &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual &&
+	assert_no_jumps
+'
+
 test_done
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (11 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
@ 2023-06-07 10:41   ` Taylor Blau
  2023-06-07 10:42   ` [PATCH v3 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:41 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The following commit will want to call `for_each_namespaced_ref()` with
a list of excluded patterns.

We could introduce a variant of that function, say,
`for_each_namespaced_ref_exclude()` which takes the extra parameter, and
reimplement the original function in terms of that. But all but one
caller (in `http-backend.c`) will supply the new parameter, so add the
new parameter to `for_each_namespaced_ref()` itself instead of
introducing a new function.

For now, supply NULL for the list of excluded patterns at all callers to
avoid changing behavior, which we will do in the subsequent commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 http-backend.c | 2 +-
 refs.c         | 5 +++--
 refs.h         | 3 ++-
 upload-pack.c  | 6 +++---
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index ac146d85c5..ad500683c8 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -559,7 +559,7 @@ static void get_info_refs(struct strbuf *hdr, char *arg UNUSED)
 
 	} else {
 		select_getanyfile(hdr);
-		for_each_namespaced_ref(show_text_ref, &buf);
+		for_each_namespaced_ref(NULL, show_text_ref, &buf);
 		send_strbuf(hdr, "text/plain", &buf);
 	}
 	strbuf_release(&buf);
diff --git a/refs.c b/refs.c
index ec4d5b9101..95a7db9563 100644
--- a/refs.c
+++ b/refs.c
@@ -1660,13 +1660,14 @@ int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_dat
 				    DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
 }
 
-int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
+int for_each_namespaced_ref(const char **exclude_patterns,
+			    each_ref_fn fn, void *cb_data)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int ret;
 	strbuf_addf(&buf, "%srefs/", get_git_namespace());
 	ret = do_for_each_ref(get_main_ref_store(the_repository),
-			      buf.buf, NULL, fn, 0, 0, cb_data);
+			      buf.buf, exclude_patterns, fn, 0, 0, cb_data);
 	strbuf_release(&buf);
 	return ret;
 }
diff --git a/refs.h b/refs.h
index a7751a1fc9..f23626beca 100644
--- a/refs.h
+++ b/refs.h
@@ -372,7 +372,8 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 			 const char *prefix, void *cb_data);
 
 int head_ref_namespaced(each_ref_fn fn, void *cb_data);
-int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
+int for_each_namespaced_ref(const char **exclude_patterns,
+			    each_ref_fn fn, void *cb_data);
 
 /* can be used to learn about broken ref and symref */
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data);
diff --git a/upload-pack.c b/upload-pack.c
index 1a213ed775..99d216938c 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -855,7 +855,7 @@ static void deepen(struct upload_pack_data *data, int depth)
 		 * marked with OUR_REF.
 		 */
 		head_ref_namespaced(check_ref, data);
-		for_each_namespaced_ref(check_ref, data);
+		for_each_namespaced_ref(NULL, check_ref, data);
 
 		get_reachable_list(data, &reachable_shallows);
 		result = get_shallow_commits(&reachable_shallows,
@@ -1386,7 +1386,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		if (advertise_refs)
 			data.no_done = 1;
 		head_ref_namespaced(send_ref, &data);
-		for_each_namespaced_ref(send_ref, &data);
+		for_each_namespaced_ref(NULL, send_ref, &data);
 		if (!data.sent_capabilities) {
 			const char *refname = "capabilities^{}";
 			write_v0_ref(&data, refname, refname, null_oid());
@@ -1400,7 +1400,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		packet_flush(1);
 	} else {
 		head_ref_namespaced(check_ref, &data);
-		for_each_namespaced_ref(check_ref, &data);
+		for_each_namespaced_ref(NULL, check_ref, &data);
 	}
 
 	if (!advertise_refs) {
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 14/16] builtin/receive-pack.c: avoid enumerating hidden references
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (12 preceding siblings ...)
  2023-06-07 10:41   ` [PATCH v3 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
@ 2023-06-07 10:42   ` Taylor Blau
  2023-06-07 10:42   ` [PATCH v3 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:42 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

Now that `refs_for_each_fullref_in()` has the ability to avoid
enumerating references matching certain pattern(s), use that to avoid
visiting hidden refs when constructing the ref advertisement via
receive-pack.

Note that since this exclusion is best-effort, we still need
`show_ref_cb()` to check whether or not each reference is hidden or not
before including it in the advertisement.

As was the case when applying this same optimization to `upload-pack`,
`receive-pack`'s reference advertisement phase can proceed much quicker
by avoiding enumerating references that will not be part of the
advertisement.

(Below, we're still using linux.git with one hidden refs/pull/N ref per
commit):

    $ hyperfine -L v ,.compile 'git{v} -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'
    Benchmark 1: git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
      Time (mean ± σ):      89.1 ms ±   1.7 ms    [User: 82.0 ms, System: 7.0 ms]
      Range (min … max):    87.7 ms …  95.5 ms    31 runs

    Benchmark 2: git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
      Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 0.5 ms, System: 3.9 ms]
      Range (min … max):     4.1 ms …   5.6 ms    508 runs

    Summary
      'git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git' ran
       20.00 ± 1.05 times faster than 'git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/receive-pack.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 1a8472eddc..bd5bcc375f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -337,7 +337,8 @@ static void write_head_info(void)
 {
 	static struct oidset seen = OIDSET_INIT;
 
-	for_each_ref(show_ref_cb, &seen);
+	refs_for_each_fullref_in(get_main_ref_store(the_repository), "",
+				 hidden_refs.v, show_ref_cb, &seen);
 	for_each_alternate_ref(show_one_alternate_ref, &seen);
 	oidset_clear(&seen);
 	if (!sent_capabilities)
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 15/16] upload-pack.c: avoid enumerating hidden refs where possible
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (13 preceding siblings ...)
  2023-06-07 10:42   ` [PATCH v3 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
@ 2023-06-07 10:42   ` Taylor Blau
  2023-06-07 10:42   ` [PATCH v3 16/16] ls-refs.c: " Taylor Blau
  2023-06-12 21:05   ` [PATCH v3 00/16] refs: implement jump lists for packed backend Junio C Hamano
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:42 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.

Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:

  - `uploadpack.allowTipSHA1InWant`, or
  - `uploadpack.allowReachableSHA1InWant`

are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.

When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.

When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.

When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):

    $ printf 0000 >in
    $ hyperfine --warmup=1 \
      'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
      'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
      'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
    Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
      Time (mean ± σ):     406.9 ms ±   1.1 ms    [User: 357.3 ms, System: 49.5 ms]
      Range (min … max):   405.7 ms … 409.2 ms    10 runs

    Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
      Time (mean ± σ):     406.5 ms ±   1.3 ms    [User: 356.5 ms, System: 49.9 ms]
      Range (min … max):   404.6 ms … 408.8 ms    10 runs

    Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
      Time (mean ± σ):       4.7 ms ±   0.2 ms    [User: 0.7 ms, System: 3.9 ms]
      Range (min … max):     4.3 ms …   6.1 ms    472 runs

    Summary
      'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
       86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
       86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'

As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 upload-pack.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 99d216938c..366a101d8d 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -602,11 +602,32 @@ static int get_common_commits(struct upload_pack_data *data,
 	}
 }
 
+static int allow_hidden_refs(enum allow_uor allow_uor)
+{
+	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
+}
+
+static void for_each_namespaced_ref_1(each_ref_fn fn,
+				      struct upload_pack_data *data)
+{
+	/*
+	 * If `data->allow_uor` allows fetching hidden refs, we need to
+	 * mark all references (including hidden ones), to check in
+	 * `is_our_ref()` below.
+	 *
+	 * Otherwise, we only care about whether each reference's object
+	 * has the OUR_REF bit set or not, so do not need to visit
+	 * hidden references.
+	 */
+	if (allow_hidden_refs(data->allow_uor))
+		for_each_namespaced_ref(NULL, fn, data);
+	else
+		for_each_namespaced_ref(data->hidden_refs.v, fn, data);
+}
+
 static int is_our_ref(struct object *o, enum allow_uor allow_uor)
 {
-	int allow_hidden_ref = (allow_uor &
-				(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
-	return o->flags & ((allow_hidden_ref ? HIDDEN_REF : 0) | OUR_REF);
+	return o->flags & ((allow_hidden_refs(allow_uor) ? HIDDEN_REF : 0) | OUR_REF);
 }
 
 /*
@@ -855,7 +876,7 @@ static void deepen(struct upload_pack_data *data, int depth)
 		 * marked with OUR_REF.
 		 */
 		head_ref_namespaced(check_ref, data);
-		for_each_namespaced_ref(NULL, check_ref, data);
+		for_each_namespaced_ref_1(check_ref, data);
 
 		get_reachable_list(data, &reachable_shallows);
 		result = get_shallow_commits(&reachable_shallows,
@@ -1386,7 +1407,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		if (advertise_refs)
 			data.no_done = 1;
 		head_ref_namespaced(send_ref, &data);
-		for_each_namespaced_ref(NULL, send_ref, &data);
+		for_each_namespaced_ref_1(send_ref, &data);
 		if (!data.sent_capabilities) {
 			const char *refname = "capabilities^{}";
 			write_v0_ref(&data, refname, refname, null_oid());
@@ -1400,7 +1421,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		packet_flush(1);
 	} else {
 		head_ref_namespaced(check_ref, &data);
-		for_each_namespaced_ref(NULL, check_ref, &data);
+		for_each_namespaced_ref_1(check_ref, &data);
 	}
 
 	if (!advertise_refs) {
-- 
2.41.0.16.g26cd413590


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 16/16] ls-refs.c: avoid enumerating hidden refs where possible
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (14 preceding siblings ...)
  2023-06-07 10:42   ` [PATCH v3 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
@ 2023-06-07 10:42   ` Taylor Blau
  2023-06-12 21:05   ` [PATCH v3 00/16] refs: implement jump lists for packed backend Junio C Hamano
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-07 10:42 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In a similar fashion as in previous commits, teach `ls-refs` to avoid
enumerating hidden references where possible.

As before, this is linux.git with one hidden reference per commit.

    $ hyperfine -L v ,.compile 'git{v} -c protocol.version=2 ls-remote .'
    Benchmark 1: git -c protocol.version=2 ls-remote .
      Time (mean ± σ):      89.8 ms ±   0.6 ms    [User: 84.3 ms, System: 5.7 ms]
      Range (min … max):    88.8 ms …  91.3 ms    32 runs

    Benchmark 2: git.compile -c protocol.version=2 ls-remote .
      Time (mean ± σ):       6.5 ms ±   0.1 ms    [User: 2.4 ms, System: 4.3 ms]
      Range (min … max):     6.2 ms …   8.3 ms    397 runs

    Summary
      'git.compile -c protocol.version=2 ls-remote .' ran
       13.85 ± 0.33 times faster than 'git -c protocol.version=2 ls-remote .'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ls-refs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ls-refs.c b/ls-refs.c
index 8c3181d051..c9a723ba89 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -193,7 +193,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 		strvec_push(&data.prefixes, "");
 	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
 					  get_git_namespace(), data.prefixes.v,
-					  NULL, send_ref, &data);
+					  data.hidden_refs.v, send_ref, &data);
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
-- 
2.41.0.16.g26cd413590

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 00/16] refs: implement jump lists for packed backend
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
                     ` (15 preceding siblings ...)
  2023-06-07 10:42   ` [PATCH v3 16/16] ls-refs.c: " Taylor Blau
@ 2023-06-12 21:05   ` Junio C Hamano
  16 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2023-06-12 21:05 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> Here is a reroll of my series to implement jump (née skip) lists for the
> packed refs backend, based on top of the current 'master'.

Hmph.  I kind of liked Patrick's suggestion to split this into two
series to make it easier for the earlier half to graduate faster,
but perhaps it didn't make much difference?  I certainly did not get
the impression that "review had stabilized".  During my review of
the initial round, for example, I lost steam in the middle because
it was simply too long a series and didn't have a chance to give the
remainder a proper review.  I do not know about others.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 01/16] refs.c: rename `ref_filter`
  2023-06-07 10:40   ` [PATCH v3 01/16] refs.c: rename `ref_filter` Taylor Blau
@ 2023-06-13 22:19     ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2023-06-13 22:19 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> -static int filter_refs(const char *refname, const struct object_id *oid,
> -			   int flags, void *data)
> +static int for_each_filter_refs(const char *refname,
> +				const struct object_id *oid,
> +				int flags, void *data)
>  {
> -	struct ref_filter *filter = (struct ref_filter *)data;
> +	struct for_each_ref_filter *filter = data;

Nice to see that a trivial and obvious clean-up like this is
silently included ;-)

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 05/16] ref-filter.c: parameterize match functions over patterns
  2023-06-07 10:41   ` [PATCH v3 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
@ 2023-06-13 22:37     ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2023-06-13 22:37 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> Once we start passing either in, `match_pattern()` will have little to
> do with a particular `struct ref_filter *` instance. To clarify this,
> drop it from the argument list, and replace it with the only bit of the
> `ref_filter` that we care about (`filter->ignore_case`).

Makes sense.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 07/16] refs: plumb `exclude_patterns` argument throughout
  2023-06-07 10:41   ` [PATCH v3 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
@ 2023-06-13 23:42     ` Junio C Hamano
  2023-06-20 11:52       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2023-06-13 23:42 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> diff --git a/ls-refs.c b/ls-refs.c
> index f385938b64..6f490b2d9c 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -193,7 +193,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
>  		strvec_push(&data.prefixes, "");
>  	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
>  					  get_git_namespace(), data.prefixes.v,
> -					  send_ref, &data);
> +					  NULL, send_ref, &data);

OK.

> diff --git a/ref-filter.c b/ref-filter.c
> index d44418efb7..717c3c4bcf 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2209,12 +2209,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>  
>  	if (!filter->name_patterns[0]) {
>  		/* no patterns; we have to look at everything */
> -		return for_each_fullref_in("", cb, cb_data);
> +		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> +						 "", NULL, cb, cb_data);
>  	}

Is this merely "while at it", or was there a reason why refs_*
variant must be used here?  It is curious that we do not teach the
exclude_patterns to some functions like for_each_fullref_in() while
adding exclude_patterns to others, making the API surface uneven.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-06-07 10:41   ` [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
@ 2023-06-14  0:27     ` Junio C Hamano
  2023-06-20 12:05       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2023-06-14  0:27 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> When iterating through the `packed-refs` file in order to answer a query
> like:
>
>     $ git for-each-ref --exclude=refs/__hidden__
>
> it would be useful to avoid walking over all of the entries in
> `refs/__hidden__/*` when possible, since we know that the ref-filter
> code is going to throw them away anyways.
>
> In certain circumstances, doing so is possible. The algorithm for doing
> so is as follows:
>
>   - For each excluded pattern, find the first record that matches it,
>     and the first record that *doesn't* match it (i.e. the location
>     you'd next want to consider when excluding that pattern).
>
>   - Sort the set of excluded regions from the previous step in ascending
>     order of the first location within the `packed-refs` file that
>     matches.
>
>   - Clean up the results from the previous step: discard empty regions,
>     and combine adjacent regions.

Say something like

   The resulting list of regions that would never contain refs that
   are not excluded is called the "jump list".

here, probably.  Otherwise the first reference to "jump list" we see
later feels a bit too abrupt.

> Then when iterating through the `packed-refs` file, if `iter->pos` is
> ever contained in one of the regions from the previous steps, advance
> `iter->pos` past the end of that region, and continue enumeration.
>
> Note that we only perform this optimization when none of the excluded
> pattern(s) have special meta-characters in them. For a pattern like
> "refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and
> everything underneath them) are not connected. A future implementation
> that handles this case may split the character class (pretending as if
> two patterns were excluded: "refs/fooa", and "refs/fooc").

Makes sense.

> There are a few other gotchas worth considering. First, note that the
> jump list is sorted, so once we jump past a region, we can avoid
> considering it (or any regions preceding it) again. The member
> `jump_pos` is used to track the first next-possible region to jump
> through.
>
> Second, note that the exclusion list is best-effort, since we do not
> handle loose references, and because of the meta-character issue above.

I found this a bit misleading; a natural reading of "exclusion list"
is that the phrase refers to the list of exclude patterns given from
the command line, and users would be upset if the processing of it
is best effort.

I think what you meant to say was that optimization to avoid full
scan using the jump list does not aim for perfection, and entries
that are not skipped using the jump list may still be excluded by
the exclude patterns.

> In repositories with a large number of hidden references, the speed-up

"hidden" -> "excluded".  Your final objective to implement the
feature to exclude refs using patterns and optimize it using the
jump list data may be to implement "hidden references", but that
hasn't be talked about in the above.  All we have been hearing was
that we are optimizing the walk over packed-refs using exclude
patterns.

> can be significant. Tests here are done with a copy of linux.git with a
> reference "refs/pull/N" pointing at every commit, as in:
> ...
> Co-authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  ref-filter.c              |   5 +-
>  refs/packed-backend.c     | 166 ++++++++++++++++++++++++++++++++++++--
>  t/helper/test-ref-store.c |  10 +++
>  t/t1419-exclude-refs.sh   | 101 +++++++++++++++++++++++
>  4 files changed, 274 insertions(+), 8 deletions(-)
>  create mode 100755 t/t1419-exclude-refs.sh

Nice.

> @@ -785,6 +802,13 @@ struct packed_ref_iterator {
>  	/* The end of the part of the buffer that will be iterated over: */
>  	const char *eof;
>  
> +	struct jump_list_entry {
> +		const char *start;
> +		const char *end;
> +	} *jump;
> +	size_t jump_nr, jump_alloc;
> +	size_t jump_pos;
> +
>  	/* Scratch space for current values: */
>  	struct object_id oid, peeled;
>  	struct strbuf refname_buf;
> @@ -802,14 +826,34 @@ struct packed_ref_iterator {
>   */
>  static int next_record(struct packed_ref_iterator *iter)
>  {
> -	const char *p = iter->pos, *eol;
> +	const char *p, *eol;
>  
>  	strbuf_reset(&iter->refname_buf);
>  
> +	/*
> +	 * If iter->pos is contained within a skipped region, jump past
> +	 * it.
> +	 *
> +	 * Note that each skipped region is considered at most once,
> +	 * since they are ordered based on their starting position.
> +	 */
> +	while (iter->jump_pos < iter->jump_nr) {
> +		struct jump_list_entry *curr = &iter->jump[iter->jump_pos];
> +		if (iter->pos < curr->start)
> +			break; /* not to the next jump yet */
> +
> +		iter->jump_pos++;
> +		if (iter->pos < curr->end) {
> +			iter->pos = curr->end;
> +			break;
> +		}
> +	}

Quite straight-forward.

> +static const char *ptr_max(const char *x, const char *y)
> +{
> +	if (x > y)
> +		return x;
> +	return y;
> +}

Hopefully the compiler would inline the function without being told.

These pointers point into the same mmapped region of contiguous
memory that holds the contents of the packed-refs file, so
comparison between them is always defined.  Good.

I wondered if

	return (x > y) ? x : y;

is easier to read, simply because it treats both cases more equally
(in other words, as written, (x>y) appears more "special"), but that
is minor.

> +static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
> +					struct snapshot *snapshot,
> +					const char **excluded_patterns)
> +{
> +	size_t i, j;
> +	const char **pattern;
> +	struct jump_list_entry *last_disjoint;
> +
> +	if (!excluded_patterns)
> +		return;
> +
> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		struct jump_list_entry *e;
> +
> +		/*
> +		 * We can't feed any excludes with globs in them to the
> +		 * refs machinery.  It only understands prefix matching.
> +		 * We likewise can't even feed the string leading up to
> +		 * the first meta-character, as something like "foo[a]"
> +		 * should not exclude "foobar" (but the prefix "foo"
> +		 * would match that and mark it for exclusion).
> +		 */
> +		if (has_glob_special(*pattern))
> +			continue;

OK.  When we have a "literal" exclude pattern and another "glob"
exclude pattern, we can afford to ignore the "glob" one when
building the jump list and include only the "literal" one, because
the jump list is used only to skip over entries that obviously can
never be in the result, *and* --exclude are additive (i.e. being
on jump list because of the "literal" pattern is a reason enough to
be excluded from the result; matching or not matching the other
patterns does not affect the fate of the ref that got excluded due
to matching the "literal" pattern).

Makes sense.

> +		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
> +
> +		e = &iter->jump[iter->jump_nr++];
> +		e->start = find_reference_location(snapshot, *pattern, 0);
> +		e->end = find_reference_location_end(snapshot, *pattern, 0);
> +	}
> +
> +	if (!iter->jump_nr) {
> +		/*
> +		 * Every entry in exclude_patterns has a meta-character,
> +		 * nothing to do here.
> +		 */
> +		return;
> +	}

OK.

Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 10/16] refs/packed-backend.c: add trace2 counters for jump list
  2023-06-07 10:41   ` [PATCH v3 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
@ 2023-06-14  0:32     ` Junio C Hamano
  2023-06-20 12:08       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2023-06-14  0:32 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> +assert_jumps () {
> +	local nr="$1"
> +	local trace="$2"
> +
> +	grep -q "name:jumps_made value:$nr" $trace
> +}

You may expect 1 hit and the test will pass if trace had 10 or 12 hits.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 12/16] refs/packed-backend.c: ignore complicated hidden refs rules
  2023-06-07 10:41   ` [PATCH v3 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
@ 2023-06-14  0:40     ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2023-06-14  0:40 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> In subsequent commits, we'll teach `receive-pack` and `upload-pack` to
> use the new skip-list feature in the packed-refs iterator by ignoring

"skip" -> "jump".

> references which are mentioned via its respective hideRefs lists.
>
> However, the packed-ref skip lists cannot handle un-hiding rules (that
> begin with '!'), or namespace comparisons (that begin with '^'). Detect
> and avoid these cases by falling back to the normal enumeration without
> a skip list when such patterns exist.

> +		 * We also can't feed any excludes from hidden refs
> +		 * config sections, since later rules may override
> +		 * previous ones. For example, with rules "refs/foo" and
> +		 * "!refs/foo/bar", we should show "refs/foo/bar" (and
> +		 * everything underneath it), but the earlier exclusion
> +		 * would cause us to skip all of "refs/foo".

Good observation.  The presence of !refs/foo/bar in hide list
forbids us from adding refs/foo to the jump list, and it is the
simplest to disable the whole jump list business when we have such a
feature in use.

Makes sense.  Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 07/16] refs: plumb `exclude_patterns` argument throughout
  2023-06-13 23:42     ` Junio C Hamano
@ 2023-06-20 11:52       ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

On Tue, Jun 13, 2023 at 04:42:05PM -0700, Junio C Hamano wrote:
> > diff --git a/ref-filter.c b/ref-filter.c
> > index d44418efb7..717c3c4bcf 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2209,12 +2209,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
> >
> >  	if (!filter->name_patterns[0]) {
> >  		/* no patterns; we have to look at everything */
> > -		return for_each_fullref_in("", cb, cb_data);
> > +		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> > +						 "", NULL, cb, cb_data);
> >  	}
>
> Is this merely "while at it", or was there a reason why refs_*
> variant must be used here?

Good question. This changes later in the series (via
"refs/packed-backend.c: implement jump lists to avoid excluded
pattern(s)") to pass the excluded patterns from the ref_filter.

There's no need to change it in this patch, since the functionality at
this point is equivalent in the pre- and post-image. I think this
staging is a consequence of having written much of this series before
committing anything, and then trying to segment it out into meaningful
patches after the fact.

> It is curious that we do not teach the exclude_patterns to some
> functions like for_each_fullref_in() while adding exclude_patterns to
> others, making the API surface uneven.

We could plumb it through in more places, but my preference would be to
modify the API if/as new callers need to pass a list of excluded
patterns. The API has an enormous amount of surface area (and many
functions which take a ton of arguments), so I'd rather keep it small as
long as possible, even at the expense of some unevenness in its
interface.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-06-14  0:27     ` Junio C Hamano
@ 2023-06-20 12:05       ` Taylor Blau
  2023-06-20 18:49         ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 12:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

On Tue, Jun 13, 2023 at 05:27:05PM -0700, Junio C Hamano wrote:
> >   - Clean up the results from the previous step: discard empty regions,
> >     and combine adjacent regions.
>
> Say something like
>
>    The resulting list of regions that would never contain refs that
>    are not excluded is called the "jump list".
>
> here, probably.  Otherwise the first reference to "jump list" we see
> later feels a bit too abrupt.

Good suggestion. I phrased it slightly differently in the version that
I'll send shortly, but the spirit is the same.

> > There are a few other gotchas worth considering. First, note that the
> > jump list is sorted, so once we jump past a region, we can avoid
> > considering it (or any regions preceding it) again. The member
> > `jump_pos` is used to track the first next-possible region to jump
> > through.
> >
> > Second, note that the exclusion list is best-effort, since we do not
> > handle loose references, and because of the meta-character issue above.
>
> I found this a bit misleading; a natural reading of "exclusion list"
> is that the phrase refers to the list of exclude patterns given from
> the command line, and users would be upset if the processing of it
> is best effort.
>
> I think what you meant to say was that optimization to avoid full
> scan using the jump list does not aim for perfection, and entries
> that are not skipped using the jump list may still be excluded by
> the exclude patterns.

Yep, exactly. I think this was from an earlier version from before this
was called "jump lists", and I missed it when find-and-replacing
through the patch messages. Thanks for spotting.

> > In repositories with a large number of hidden references, the speed-up
>
> "hidden" -> "excluded".  Your final objective to implement the
> feature to exclude refs using patterns and optimize it using the
> jump list data may be to implement "hidden references", but that
> hasn't be talked about in the above.  All we have been hearing was
> that we are optimizing the walk over packed-refs using exclude
> patterns.

Yep, thanks.

> > +static const char *ptr_max(const char *x, const char *y)
> > +{
> > +	if (x > y)
> > +		return x;
> > +	return y;
> > +}
>
> Hopefully the compiler would inline the function without being told.
>
> These pointers point into the same mmapped region of contiguous
> memory that holds the contents of the packed-refs file, so
> comparison between them is always defined.  Good.
>
> I wondered if
>
> 	return (x > y) ? x : y;
>
> is easier to read, simply because it treats both cases more equally
> (in other words, as written, (x>y) appears more "special"), but that
> is minor.

Yeah, I think that any reasonable compiler would almost certainly inline
this, especially at higher optimization levels. But I agree with your
suggestion nonetheless, thanks.

> > +		/*
> > +		 * We can't feed any excludes with globs in them to the
> > +		 * refs machinery.  It only understands prefix matching.
> > +		 * We likewise can't even feed the string leading up to
> > +		 * the first meta-character, as something like "foo[a]"
> > +		 * should not exclude "foobar" (but the prefix "foo"
> > +		 * would match that and mark it for exclusion).
> > +		 */
> > +		if (has_glob_special(*pattern))
> > +			continue;
>
> OK.  When we have a "literal" exclude pattern and another "glob"
> exclude pattern, we can afford to ignore the "glob" one when
> building the jump list and include only the "literal" one, because
> the jump list is used only to skip over entries that obviously can
> never be in the result, *and* --exclude are additive (i.e. being
> on jump list because of the "literal" pattern is a reason enough to
> be excluded from the result; matching or not matching the other
> patterns does not affect the fate of the ref that got excluded due
> to matching the "literal" pattern).
>
> Makes sense.

Exactly.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 10/16] refs/packed-backend.c: add trace2 counters for jump list
  2023-06-14  0:32     ` Junio C Hamano
@ 2023-06-20 12:08       ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 12:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

On Tue, Jun 13, 2023 at 05:32:16PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > +assert_jumps () {
> > +	local nr="$1"
> > +	local trace="$2"
> > +
> > +	grep -q "name:jumps_made value:$nr" $trace
> > +}
>
> You may expect 1 hit and the test will pass if trace had 10 or 12 hits.

Oops. Well spotted, thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-06-06  7:00     ` Patrick Steinhardt
@ 2023-06-20 12:15       ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 12:15 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano

On Tue, Jun 06, 2023 at 09:00:52AM +0200, Patrick Steinhardt wrote:
> > @@ -785,6 +802,13 @@ struct packed_ref_iterator {
> >  	/* The end of the part of the buffer that will be iterated over: */
> >  	const char *eof;
> >
> > +	struct jump_list_entry {
> > +		const char *start;
> > +		const char *end;
> > +	} *jump;
> > +	size_t jump_nr, jump_alloc;
> > +	size_t jump_pos;
> >
> Nit: I had some trouble with `jump_pos` given that it sounds so similar
> to `iter->pos`, and thus you tend to think that they both apply to the
> position in the packed-refs file. `jump_curr` or `jump_idx` might help
> to avoid this confusion.

Very fair, thanks for observing. I went with "jump_cur" (as a shorthand
for "cursor").

> > +	for (pattern = excluded_patterns; *pattern; pattern++) {
> > +		struct jump_list_entry *e;
> > +
> > +		/*
> > +		 * We can't feed any excludes with globs in them to the
> > +		 * refs machinery.  It only understands prefix matching.
> > +		 * We likewise can't even feed the string leading up to
> > +		 * the first meta-character, as something like "foo[a]"
> > +		 * should not exclude "foobar" (but the prefix "foo"
> > +		 * would match that and mark it for exclusion).
> > +		 */
> > +		if (has_glob_special(*pattern))
> > +			continue;
> > +
> > +		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
> > +
> > +		e = &iter->jump[iter->jump_nr++];
> > +		e->start = find_reference_location(snapshot, *pattern, 0);
> > +		e->end = find_reference_location_end(snapshot, *pattern, 0);
>
> Nit: we could detect the non-matching case here already, which would
> allow us to skip an allocation. It's probably pre-mature optimization
> though, so please feel free to ignore.

Probably so, this allocation is so lightweight in comparison to all of
the other things that for-each-ref does throughout its execution that I
think it's probably negligible to shave off a few allocations.

> > +	}
> > +
> > +	if (!iter->jump_nr) {
> > +		/*
> > +		 * Every entry in exclude_patterns has a meta-character,
> > +		 * nothing to do here.
> > +		 */
> > +		return;
> > +	}
> > +
> > +	QSORT(iter->jump, iter->jump_nr, jump_list_entry_cmp);
> > +
> > +	/*
> > +	 * As an optimization, merge adjacent entries in the jump list
> > +	 * to jump forwards as far as possible when entering a skipped
> > +	 * region.
> > +	 *
> > +	 * For example, if we have two skipped regions:
> > +	 *
> > +	 *	[[A, B], [B, C]]
> > +	 *
> > +	 * we want to combine that into a single entry jumping from A to
> > +	 * C.
> > +	 */
> > +	last_disjoint = iter->jump;
>
> Nit: if we initialized `j = 0`, then `last_disjoint` would always be
> equal to `iter->jump[j]`. We could then declare the variable inside of
> the loop to make it a bit easier to understand.

Sure, though we would then need to assign `iter->jump_nr = j + 1`, which
I think adds more confusion than inlining the variable is worth.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 11/16] revision.h: store hidden refs in a `strvec`
  2023-06-06  7:00     ` Patrick Steinhardt
@ 2023-06-20 12:16       ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 12:16 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano

On Tue, Jun 06, 2023 at 09:00:58AM +0200, Patrick Steinhardt wrote:
> On Mon, May 15, 2023 at 03:23:39PM -0400, Taylor Blau wrote:
> > In subsequent commits, it will be convenient to have a 'const char **'
> > of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`,
> > etc.), instead of a `string_list`.
> >
> > Convert spots throughout the tree that store the list of hidden refs
> > from a `string_list` to a `strvec`.
> >
> > Note that in `parse_hide_refs_config()` there is an ugly const-cast used
> > to avoid an extra copy of each value before trimming any trailing slash
> > characters. This could instead be written as:
> >
> >     ref = xstrdup(value);
> >     len = strlen(ref);
> >     while (len && ref[len - 1] == '/')
> >             ref[--len] = '\0';
> >     strvec_push(hide_refs, ref);
> >     free(ref);
> >
> > but the double-copy (once when calling `xstrdup()`, and another via
> > `strvec_push()`) is wasteful.
>
> I guess the proper way to fix this would be to introduce something like
> a `strvec_push_nodup()` function that takes ownership. And in fact this
> helper exists already, but it's declared as static. So we could get
> around the ugly cast with a simple change to expose the helper function.

We could, but I'd prefer to explore doing so outside of this series,
since I think strvec_push_nodup() is a little bit of a footgun unless
you are thinking carefully about ownership. So making the function
part of the exposed API may be controversial.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns
  2023-06-06  7:01     ` Patrick Steinhardt
@ 2023-06-20 12:18       ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 12:18 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano

On Tue, Jun 06, 2023 at 09:01:04AM +0200, Patrick Steinhardt wrote:
> On Mon, May 15, 2023 at 03:23:45PM -0400, Taylor Blau wrote:
> > The following commit will want to call `for_each_namespaced_ref()` with
> > a list of excluded patterns.
>
> This statement isn't quite true as the following commit touches
> git-receive-pack(1), which doesn't use `for_each_namespaced_ref_()`.

Oops, well spotted. This was true when I wrote the series, but of course
I decided to swap the order of patches 14 and 15 with one another,
making this statement false. Good eyes ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 00/16] refs: implement jump lists for packed backend
  2023-06-06  7:01   ` [PATCH v2 00/16] refs: implement jump lists for packed backend Patrick Steinhardt
@ 2023-06-20 12:22     ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 12:22 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano

On Tue, Jun 06, 2023 at 09:01:10AM +0200, Patrick Steinhardt wrote:
> On Mon, May 15, 2023 at 03:23:07PM -0400, Taylor Blau wrote:
> > Here is a reroll of my series to implement jump (née skip) lists for the
> > packed refs backend.
> >
> > Not a ton has changed since last time, but some notable things that have
> > changed include:
> >
> >   - Renaming "skip lists" to "jump lists" to clarify that this
> >     implementation does not use the skip list data structure.
> >   - Patch reorganization, splitting out `find_reference_location_end()`
> >     more sensibly, rewording patch messages, etc.
> >   - Addresses feedback from Junio and Patrick Steinhardt's helpful
> >     reviews.
> >
> > As usual, a range-diff is included below for convenience.
> >
> > Given that we are expecting -rc0 today, we should aim to not let review
> > of this topic direct our attention away from testing the release
> > candidates. We can get more serious about it on the other side of 2.41.
> >
> > Thanks in advance for another look.
>
> I didn't have many comments in this round. Personally though I'd split
> up this patch series into two in order to land the individual parts
> faster, where the first part introduces `git for-each-ref --exclude` and
> the second part introduces the jump list for the packed-refs backend.

Thanks for reviewing, and for the good suggestion. I think that
splitting this series in two could be worthwhile, but I'm not sure I
want to make this change for v4.

You could imagine splitting the series there, where the first half would
implement the naive version of `for-each-ref --exclude`, and the second
half would implement jump lists, and make upload- and receive-pack take
advantage of it.

But I think that makes the first half trivial and leaves all of the
complexity in this series to the latter half. I suppose that makes it
easier to graduate the first six or so patches earlier, but they aren't
really all that useful without the remaining patches.

Another split might be:

  - the naive implementation of `for-each-ref --exclude`
  - jump lists, and using them within `for-each-ref`
  - teaching upload- and receive-pack to use the jump list

But juggling this series as three sub-topics feels like it would be
burdensome to the maintainer ;-). So I think I'd rather leave it as-is,
unless you feel strongly.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v4 00/16] refs: implement jump lists for packed backend
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (16 preceding siblings ...)
  2023-06-07 10:40 ` [PATCH v3 " Taylor Blau
@ 2023-06-20 14:20 ` Taylor Blau
  2023-06-20 14:21   ` [PATCH v4 01/16] refs.c: rename `ref_filter` Taylor Blau
                     ` (16 more replies)
  2023-07-10 21:12 ` [PATCH v5 " Taylor Blau
  18 siblings, 17 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:20 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

Here is another reroll of my series to implement jump (née skip) lists
for the packed refs backend, based on top of the current 'master'.

Most changes are minor, limited to changing variable names, moving
changes around between patches and tweaking commit messages for clarity.
I think that the first 9 or so patches are stable, but we may want some
more eyes on the remainder.

As usual, a range-diff is included below for convenience.

Thanks in advance for your review.

Jeff King (5):
  refs.c: rename `ref_filter`
  ref-filter.h: provide `REF_FILTER_INIT`
  ref-filter: clear reachable list pointers after freeing
  ref-filter: add `ref_filter_clear()`
  ref-filter.c: parameterize match functions over patterns

Taylor Blau (11):
  builtin/for-each-ref.c: add `--exclude` option
  refs: plumb `exclude_patterns` argument throughout
  refs/packed-backend.c: refactor `find_reference_location()`
  refs/packed-backend.c: implement jump lists to avoid excluded
    pattern(s)
  refs/packed-backend.c: add trace2 counters for jump list
  revision.h: store hidden refs in a `strvec`
  refs/packed-backend.c: ignore complicated hidden refs rules
  refs.h: let `for_each_namespaced_ref()` take excluded patterns
  builtin/receive-pack.c: avoid enumerating hidden references
  upload-pack.c: avoid enumerating hidden refs where possible
  ls-refs.c: avoid enumerating hidden refs where possible

 Documentation/git-for-each-ref.txt |   6 +
 builtin/branch.c                   |   4 +-
 builtin/for-each-ref.c             |   7 +-
 builtin/receive-pack.c             |   7 +-
 builtin/tag.c                      |   4 +-
 http-backend.c                     |   2 +-
 ls-refs.c                          |   8 +-
 ref-filter.c                       |  63 +++++++--
 ref-filter.h                       |  12 ++
 refs.c                             |  61 ++++----
 refs.h                             |  15 +-
 refs/debug.c                       |   5 +-
 refs/files-backend.c               |   5 +-
 refs/packed-backend.c              | 220 ++++++++++++++++++++++++++---
 refs/refs-internal.h               |   7 +-
 revision.c                         |   4 +-
 revision.h                         |   5 +-
 t/helper/test-reach.c              |   2 +-
 t/helper/test-ref-store.c          |  10 ++
 t/t0041-usage.sh                   |   1 +
 t/t1419-exclude-refs.sh            | 131 +++++++++++++++++
 t/t3402-rebase-merge.sh            |   1 +
 t/t6300-for-each-ref.sh            |  35 +++++
 trace2.h                           |   2 +
 trace2/tr2_ctr.c                   |   5 +
 upload-pack.c                      |  43 ++++--
 26 files changed, 559 insertions(+), 106 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

Range-diff against v3:
 1:  afac948f04 =  1:  c12def5a30 refs.c: rename `ref_filter`
 2:  b9336e3b77 =  2:  7ce82b6a5a ref-filter.h: provide `REF_FILTER_INIT`
 3:  fc28b5caaa =  3:  7e6bf7766d ref-filter: clear reachable list pointers after freeing
 4:  bc5356fe4b =  4:  777e71004d ref-filter: add `ref_filter_clear()`
 5:  1988ca4c0a =  5:  39e9b0f50d ref-filter.c: parameterize match functions over patterns
 6:  60d85aa4ad =  6:  c4fd47fd75 builtin/for-each-ref.c: add `--exclude` option
 7:  c4fe9a1893 !  7:  e6b50c5021 refs: plumb `exclude_patterns` argument throughout
    @@ ls-refs.c: int ls_refs(struct repository *r, struct packet_reader *request)
      ## ref-filter.c ##
     @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      
    - 	if (!filter->name_patterns[0]) {
    - 		/* no patterns; we have to look at everything */
    --		return for_each_fullref_in("", cb, cb_data);
    -+		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
    -+						 "", NULL, cb, cb_data);
    - 	}
    - 
      	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
      						 NULL, filter->name_patterns,
     -						 cb, cb_data);
 8:  9cab5e0699 =  8:  a0990b2916 refs/packed-backend.c: refactor `find_reference_location()`
 9:  8066858bf5 !  9:  386ed468fa refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
    @@ Commit message
             matches.
     
           - Clean up the results from the previous step: discard empty regions,
    -        and combine adjacent regions.
    +        and combine adjacent regions. The set of regions which remains is
    +        referred to as the "jump list", and never contains any references
    +        which should be included in the result set.
     
         Then when iterating through the `packed-refs` file, if `iter->pos` is
         ever contained in one of the regions from the previous steps, advance
    @@ Commit message
         `jump_pos` is used to track the first next-possible region to jump
         through.
     
    -    Second, note that the exclusion list is best-effort, since we do not
    -    handle loose references, and because of the meta-character issue above.
    +    Second, note that the jump list is best-effort, since we do not handle
    +    loose references, and because of the meta-character issue above. The
    +    jump list may not skip past all references which won't appear in the
    +    results, but will never skip over a reference which does appear in the
    +    result set.
     
         In repositories with a large number of hidden references, the speed-up
         can be significant. Tests here are done with a copy of linux.git with a
    @@ Commit message
     
      ## ref-filter.c ##
     @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
    + 
      	if (!filter->name_patterns[0]) {
      		/* no patterns; we have to look at everything */
    - 		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
    --						 "", NULL, cb, cb_data);
    +-		return for_each_fullref_in("", cb, cb_data);
    ++		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
     +						 "", filter->exclude.v, cb, cb_data);
      	}
      
    @@ refs/packed-backend.c: struct packed_ref_iterator {
     +		const char *end;
     +	} *jump;
     +	size_t jump_nr, jump_alloc;
    -+	size_t jump_pos;
    ++	size_t jump_cur;
     +
      	/* Scratch space for current values: */
      	struct object_id oid, peeled;
    @@ refs/packed-backend.c: struct packed_ref_iterator {
     +	 * Note that each skipped region is considered at most once,
     +	 * since they are ordered based on their starting position.
     +	 */
    -+	while (iter->jump_pos < iter->jump_nr) {
    -+		struct jump_list_entry *curr = &iter->jump[iter->jump_pos];
    ++	while (iter->jump_cur < iter->jump_nr) {
    ++		struct jump_list_entry *curr = &iter->jump[iter->jump_cur];
     +		if (iter->pos < curr->start)
     +			break; /* not to the next jump yet */
     +
    -+		iter->jump_pos++;
    ++		iter->jump_cur++;
     +		if (iter->pos < curr->end) {
     +			iter->pos = curr->end;
     +			break;
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +	return 0;
     +}
     +
    -+static const char *ptr_max(const char *x, const char *y)
    -+{
    -+	if (x > y)
    -+		return x;
    -+	return y;
    -+}
    -+
     +static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
     +					struct snapshot *snapshot,
     +					const char **excluded_patterns)
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +			continue;
     +		} else if (ours->start <= last_disjoint->end) {
     +			/* overlapping regions extend the previous one */
    -+			last_disjoint->end = ptr_max(last_disjoint->end, ours->end);
    ++			last_disjoint->end = last_disjoint->end > ours->end
    ++				? last_disjoint->end : ours->end;
     +		} else {
     +			/* otherwise, insert a new region */
     +			iter->jump[j++] = *ours;
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +	}
     +
     +	iter->jump_nr = j;
    -+	iter->jump_pos = 0;
    ++	iter->jump_cur = 0;
     +}
     +
      static struct ref_iterator *packed_ref_iterator_begin(
10:  3c045076a9 ! 10:  49c8f5173a refs/packed-backend.c: add trace2 counters for jump list
    @@ refs/packed-backend.c
      enum mmap_strategy {
      	/*
     @@ refs/packed-backend.c: static int next_record(struct packed_ref_iterator *iter)
    - 		iter->jump_pos++;
    + 		iter->jump_cur++;
      		if (iter->pos < curr->end) {
      			iter->pos = curr->end;
     +			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, 1);
    @@ t/t1419-exclude-refs.sh: for_each_ref () {
     +	local nr="$1"
     +	local trace="$2"
     +
    -+	grep -q "name:jumps_made value:$nr" $trace
    ++	grep -q "name:jumps_made value:$nr$" $trace
     +}
     +
     +assert_no_jumps () {
    @@ t/t1419-exclude-refs.sh: test_expect_success 'setup' '
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_no_jumps
    ++	assert_no_jumps perf
      '
      
      test_expect_success 'meta-characters are discarded' '
    @@ t/t1419-exclude-refs.sh: test_expect_success 'setup' '
      
     -	test_cmp expect actual
     +	test_cmp expect actual &&
    -+	assert_no_jumps
    ++	assert_no_jumps perf
      '
      
      test_done
11:  0ff542eaef = 11:  dd856a3982 revision.h: store hidden refs in a `strvec`
12:  ca006b2c3f ! 12:  845904853e refs/packed-backend.c: ignore complicated hidden refs rules
    @@ Commit message
         refs/packed-backend.c: ignore complicated hidden refs rules
     
         In subsequent commits, we'll teach `receive-pack` and `upload-pack` to
    -    use the new skip-list feature in the packed-refs iterator by ignoring
    +    use the new jump list feature in the packed-refs iterator by ignoring
         references which are mentioned via its respective hideRefs lists.
     
    -    However, the packed-ref skip lists cannot handle un-hiding rules (that
    +    However, the packed-ref jump lists cannot handle un-hiding rules (that
         begin with '!'), or namespace comparisons (that begin with '^'). Detect
         and avoid these cases by falling back to the normal enumeration without
    -    a skip list when such patterns exist.
    +    a jump list when such patterns exist.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ refs/packed-backend.c: static void populate_excluded_jump_list(struct packed_ref
     
      ## t/t1419-exclude-refs.sh ##
     @@ t/t1419-exclude-refs.sh: test_expect_success 'meta-characters are discarded' '
    - 	assert_no_jumps
    + 	assert_no_jumps perf
      '
      
     +test_expect_success 'complex hidden ref rules are discarded' '
13:  cae703a425 ! 13:  8d4d7cc22e refs.h: let `for_each_namespaced_ref()` take excluded patterns
    @@ Metadata
      ## Commit message ##
         refs.h: let `for_each_namespaced_ref()` take excluded patterns
     
    -    The following commit will want to call `for_each_namespaced_ref()` with
    +    A future commit will want to call `for_each_namespaced_ref()` with
         a list of excluded patterns.
     
         We could introduce a variant of that function, say,
    @@ Commit message
         introducing a new function.
     
         For now, supply NULL for the list of excluded patterns at all callers to
    -    avoid changing behavior, which we will do in the subsequent commit.
    +    avoid changing behavior, which we will do in a future change.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
14:  1db10b76ea = 14:  49c665f9f8 builtin/receive-pack.c: avoid enumerating hidden references
15:  014243ebe6 = 15:  19bf4a52d6 upload-pack.c: avoid enumerating hidden refs where possible
16:  e02fe93379 = 16:  ea6cbaf292 ls-refs.c: avoid enumerating hidden refs where possible
-- 
2.41.0.44.gf2359540d2

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v4 01/16] refs.c: rename `ref_filter`
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-07-03  5:13     ` Jeff King
  2023-06-20 14:21   ` [PATCH v4 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
                     ` (15 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

The refs machinery has its own implementation of a `ref_filter` (used by
`for-each-ref`), which is distinct from the `ref-filler.h` API (also
used by `for-each-ref`, among other things).

Rename the one within refs.c to more clearly indicate its purpose.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/refs.c b/refs.c
index 881a0da65c..ba63b69090 100644
--- a/refs.c
+++ b/refs.c
@@ -375,8 +375,8 @@ char *resolve_refdup(const char *refname, int resolve_flags,
 				   oid, flags);
 }
 
-/* The argument to filter_refs */
-struct ref_filter {
+/* The argument to for_each_filter_refs */
+struct for_each_ref_filter {
 	const char *pattern;
 	const char *prefix;
 	each_ref_fn *fn;
@@ -409,10 +409,11 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
-static int filter_refs(const char *refname, const struct object_id *oid,
-			   int flags, void *data)
+static int for_each_filter_refs(const char *refname,
+				const struct object_id *oid,
+				int flags, void *data)
 {
-	struct ref_filter *filter = (struct ref_filter *)data;
+	struct for_each_ref_filter *filter = data;
 
 	if (wildmatch(filter->pattern, refname, 0))
 		return 0;
@@ -569,7 +570,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	const char *prefix, void *cb_data)
 {
 	struct strbuf real_pattern = STRBUF_INIT;
-	struct ref_filter filter;
+	struct for_each_ref_filter filter;
 	int ret;
 
 	if (!prefix && !starts_with(pattern, "refs/"))
@@ -589,7 +590,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	filter.prefix = prefix;
 	filter.fn = fn;
 	filter.cb_data = cb_data;
-	ret = for_each_ref(filter_refs, &filter);
+	ret = for_each_ref(for_each_filter_refs, &filter);
 
 	strbuf_release(&real_pattern);
 	return ret;
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 02/16] ref-filter.h: provide `REF_FILTER_INIT`
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
  2023-06-20 14:21   ` [PATCH v4 01/16] refs.c: rename `ref_filter` Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-07-03  5:15     ` Jeff King
  2023-06-20 14:21   ` [PATCH v4 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
                     ` (14 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

Provide a sane initialization value for `struct ref_filter`, which in a
subsequent patch will be used to initialize a new field.

In the meantime, fix a case in test-reach.c where its `ref_filter` is
not even zero-initialized.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/branch.c       | 3 +--
 builtin/for-each-ref.c | 3 +--
 builtin/tag.c          | 3 +--
 ref-filter.h           | 3 +++
 t/helper/test-reach.c  | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index e6c2655af6..7891dec361 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -707,7 +707,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	int reflog = 0, quiet = 0, icase = 0, force = 0,
 	    recurse_submodules_explicit = 0;
 	enum branch_track track;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	static struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -765,7 +765,6 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 
 	setup_ref_filter_porcelain_msg();
 
-	memset(&filter, 0, sizeof(filter));
 	filter.kind = FILTER_REFS_BRANCHES;
 	filter.abbrev = -1;
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 695fc8f4a5..99ccb73518 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -24,7 +24,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	int maxcount = 0, icase = 0, omit_empty = 0;
 	struct ref_array array;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_format format = REF_FORMAT_INIT;
 	struct strbuf output = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
@@ -61,7 +61,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	};
 
 	memset(&array, 0, sizeof(array));
-	memset(&filter, 0, sizeof(filter));
 
 	format.format = "%(objectname) %(objecttype)\t%(refname)";
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 49b64c7a28..ec778ba860 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -437,7 +437,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	struct msg_arg msg = { .buf = STRBUF_INIT };
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -496,7 +496,6 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	git_config(git_tag_config, &sorting_options);
 
 	memset(&opt, 0, sizeof(opt));
-	memset(&filter, 0, sizeof(filter));
 	filter.lines = -1;
 	opt.sign = -1;
 
diff --git a/ref-filter.h b/ref-filter.h
index 430701cfb7..a920f73b29 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -92,6 +92,9 @@ struct ref_format {
 	struct string_list bases;
 };
 
+#define REF_FILTER_INIT { \
+	.points_at = OID_ARRAY_INIT, \
+}
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
 	.bases = STRING_LIST_INIT_DUP, \
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5b6f217441..ef58f10c2d 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -139,7 +139,7 @@ int cmd__reach(int ac, const char **av)
 
 		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
 	} else if (!strcmp(av[1], "commit_contains")) {
-		struct ref_filter filter;
+		struct ref_filter filter = REF_FILTER_INIT;
 		struct contains_cache cache;
 		init_contains_cache(&cache);
 
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 03/16] ref-filter: clear reachable list pointers after freeing
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
  2023-06-20 14:21   ` [PATCH v4 01/16] refs.c: rename `ref_filter` Taylor Blau
  2023-06-20 14:21   ` [PATCH v4 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-07-03  5:16     ` Jeff King
  2023-06-20 14:21   ` [PATCH v4 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
                     ` (13 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

In reach_filter(), we pop all commits from the reachable lists, leaving
them empty. But because we're operating on a list pointer that was
passed by value, the original filter.reachable_from pointer is left
dangling.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 4991cd4f7a..048d277cbf 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2418,13 +2418,13 @@ void ref_array_clear(struct ref_array *array)
 #define EXCLUDE_REACHED 0
 #define INCLUDE_REACHED 1
 static void reach_filter(struct ref_array *array,
-			 struct commit_list *check_reachable,
+			 struct commit_list **check_reachable,
 			 int include_reached)
 {
 	int i, old_nr;
 	struct commit **to_clear;
 
-	if (!check_reachable)
+	if (!*check_reachable)
 		return;
 
 	CALLOC_ARRAY(to_clear, array->nr);
@@ -2434,7 +2434,7 @@ static void reach_filter(struct ref_array *array,
 	}
 
 	tips_reachable_from_bases(the_repository,
-				  check_reachable,
+				  *check_reachable,
 				  to_clear, array->nr,
 				  UNINTERESTING);
 
@@ -2455,8 +2455,8 @@ static void reach_filter(struct ref_array *array,
 
 	clear_commit_marks_many(old_nr, to_clear, ALL_REV_FLAGS);
 
-	while (check_reachable) {
-		struct commit *merge_commit = pop_commit(&check_reachable);
+	while (*check_reachable) {
+		struct commit *merge_commit = pop_commit(check_reachable);
 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
 	}
 
@@ -2553,8 +2553,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	clear_contains_cache(&ref_cbdata.no_contains_cache);
 
 	/*  Filters that need revision walking */
-	reach_filter(array, filter->reachable_from, INCLUDE_REACHED);
-	reach_filter(array, filter->unreachable_from, EXCLUDE_REACHED);
+	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
+	reach_filter(array, &filter->unreachable_from, EXCLUDE_REACHED);
 
 	save_commit_buffer = save_commit_buffer_orig;
 	return ret;
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 04/16] ref-filter: add `ref_filter_clear()`
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (2 preceding siblings ...)
  2023-06-20 14:21   ` [PATCH v4 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-07-03  5:19     ` Jeff King
  2023-06-20 14:21   ` [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
                     ` (12 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

We did not bother to clean up at all in `git branch` or `git tag`, and
`git for-each-ref` only cleans up a couple of members.

Add and call `ref_filter_clear()` when cleaning up a `struct
ref_filter`. Running this patch (without any test changes) indicates a
couple of now leak-free tests. This was found by running:

    $ make SANITIZE=leak
    $ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate

(Note that the `reachable_from` and `unreachable_from` lists should be
cleaned as they are used. So this is just covering any case where we
might bail before running the reachability check.)

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/branch.c        |  1 +
 builtin/for-each-ref.c  |  3 +--
 builtin/tag.c           |  1 +
 ref-filter.c            | 16 ++++++++++++++++
 ref-filter.h            |  3 +++
 t/t0041-usage.sh        |  1 +
 t/t3402-rebase-merge.sh |  1 +
 7 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 7891dec361..07ee874617 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -858,6 +858,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		print_columns(&output, colopts, NULL);
 		string_list_clear(&output, 0);
 		ref_sorting_release(sorting);
+		ref_filter_clear(&filter);
 		return 0;
 	} else if (edit_description) {
 		const char *branch_name;
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 99ccb73518..c01fa6fefe 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -120,8 +120,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	strbuf_release(&err);
 	strbuf_release(&output);
 	ref_array_clear(&array);
-	free_commit_list(filter.with_commit);
-	free_commit_list(filter.no_commit);
+	ref_filter_clear(&filter);
 	ref_sorting_release(sorting);
 	strvec_clear(&vec);
 	return 0;
diff --git a/builtin/tag.c b/builtin/tag.c
index ec778ba860..2895ff0e45 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -651,6 +651,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 
 cleanup:
 	ref_sorting_release(sorting);
+	ref_filter_clear(&filter);
 	strbuf_release(&buf);
 	strbuf_release(&ref);
 	strbuf_release(&reflog_msg);
diff --git a/ref-filter.c b/ref-filter.c
index 048d277cbf..d32f426898 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2866,3 +2866,19 @@ int parse_opt_merge_filter(const struct option *opt, const char *arg, int unset)
 
 	return 0;
 }
+
+void ref_filter_init(struct ref_filter *filter)
+{
+	struct ref_filter blank = REF_FILTER_INIT;
+	memcpy(filter, &blank, sizeof(blank));
+}
+
+void ref_filter_clear(struct ref_filter *filter)
+{
+	oid_array_clear(&filter->points_at);
+	free_commit_list(filter->with_commit);
+	free_commit_list(filter->no_commit);
+	free_commit_list(filter->reachable_from);
+	free_commit_list(filter->unreachable_from);
+	ref_filter_init(filter);
+}
diff --git a/ref-filter.h b/ref-filter.h
index a920f73b29..160b807224 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -170,4 +170,7 @@ void filter_ahead_behind(struct repository *r,
 			 struct ref_format *format,
 			 struct ref_array *array);
 
+void ref_filter_init(struct ref_filter *filter);
+void ref_filter_clear(struct ref_filter *filter);
+
 #endif /*  REF_FILTER_H  */
diff --git a/t/t0041-usage.sh b/t/t0041-usage.sh
index c4fc34eb18..9ea974b0c6 100755
--- a/t/t0041-usage.sh
+++ b/t/t0041-usage.sh
@@ -5,6 +5,7 @@ test_description='Test commands behavior when given invalid argument value'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup ' '
diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
index 79b0640c00..e9e03ca4b5 100755
--- a/t/t3402-rebase-merge.sh
+++ b/t/t3402-rebase-merge.sh
@@ -8,6 +8,7 @@ test_description='git rebase --merge test'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 T="A quick brown fox
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (3 preceding siblings ...)
  2023-06-20 14:21   ` [PATCH v4 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-07-03  5:27     ` Jeff King
  2023-06-20 14:21   ` [PATCH v4 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
                     ` (11 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

`match_pattern()` and `match_name_as_path()` both take a `struct
ref_filter *`, and then store a stack variable `patterns` pointing at
`filter->patterns`.

The subsequent patch will add a new array of patterns to match over (the
excluded patterns, via a new `git for-each-ref --exclude` option),
treating the return value of these functions differently depending on
which patterns are being used to match.

Tweak `match_pattern()` and `match_name_as_path()` to take an array of
patterns to prepare for passing either in.

Once we start passing either in, `match_pattern()` will have little to
do with a particular `struct ref_filter *` instance. To clarify this,
drop it from the argument list, and replace it with the only bit of the
`ref_filter` that we care about (`filter->ignore_case`).

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index d32f426898..6d91c7cb0d 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2104,12 +2104,12 @@ static int get_ref_atom_value(struct ref_array_item *ref, int atom,
  * matches a pattern "refs/heads/mas") or a wildcard (e.g. the same ref
  * matches "refs/heads/mas*", too).
  */
-static int match_pattern(const struct ref_filter *filter, const char *refname)
+static int match_pattern(const char **patterns, const char *refname,
+			 const int ignore_case)
 {
-	const char **patterns = filter->name_patterns;
 	unsigned flags = 0;
 
-	if (filter->ignore_case)
+	if (ignore_case)
 		flags |= WM_CASEFOLD;
 
 	/*
@@ -2134,9 +2134,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
  * matches a pattern "refs/heads/" but not "refs/heads/m") or a
  * wildcard (e.g. the same ref matches "refs/heads/m*", too).
  */
-static int match_name_as_path(const struct ref_filter *filter, const char *refname)
+static int match_name_as_path(const struct ref_filter *filter,
+			      const char **pattern,
+			      const char *refname)
 {
-	const char **pattern = filter->name_patterns;
 	int namelen = strlen(refname);
 	unsigned flags = WM_PATHNAME;
 
@@ -2165,8 +2166,9 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 	if (!*filter->name_patterns)
 		return 1; /* No pattern always matches */
 	if (filter->match_as_path)
-		return match_name_as_path(filter, refname);
-	return match_pattern(filter, refname);
+		return match_name_as_path(filter, filter->name_patterns, refname);
+	return match_pattern(filter->name_patterns, refname,
+			     filter->ignore_case);
 }
 
 /*
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 06/16] builtin/for-each-ref.c: add `--exclude` option
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (4 preceding siblings ...)
  2023-06-20 14:21   ` [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-06-20 14:21   ` [PATCH v4 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

When using `for-each-ref`, it is sometimes convenient for the caller to
be able to exclude certain parts of the references.

For example, if there are many `refs/__hidden__/*` references, the
caller may want to emit all references *except* the hidden ones.
Currently, the only way to do this is to post-process the output, like:

    $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/'

Which is do-able, but requires processing a potentially large quantity
of references.

Teach `git for-each-ref` a new `--exclude=<pattern>` option, which
excludes references from the results if they match one or more excluded
patterns.

This patch provides a naive implementation where the `ref_filter` still
sees all references (including ones that it will discard) and is left to
check whether each reference matches any excluded pattern(s) before
emitting them.

By culling out references we know the caller doesn't care about, we can
avoid allocating memory for their storage, as well as spending time
sorting the output (among other things). Even the naive implementation
provides a significant speed-up on a modified copy of linux.git (that
has a hidden ref pointing at each commit):

    $ hyperfine \
      'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/'
    Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     820.1 ms ±   2.0 ms    [User: 703.7 ms, System: 152.0 ms]
      Range (min … max):   817.7 ms … 823.3 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/
      Time (mean ± σ):     106.6 ms ±   1.1 ms    [User: 99.4 ms, System: 7.1 ms]
      Range (min … max):   104.7 ms … 109.1 ms    27 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran
        7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"'

Subsequent patches will improve on this by avoiding visiting excluded
sections of the `packed-refs` file in certain cases.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-for-each-ref.txt |  6 +++++
 builtin/for-each-ref.c             |  1 +
 ref-filter.c                       | 13 +++++++++++
 ref-filter.h                       |  6 +++++
 t/t6300-for-each-ref.sh            | 35 ++++++++++++++++++++++++++++++
 5 files changed, 61 insertions(+)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 1e215d4e73..5743eb5def 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -14,6 +14,7 @@ SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
+		   [--exclude=<pattern> ...]
 
 DESCRIPTION
 -----------
@@ -102,6 +103,11 @@ OPTIONS
 	Do not print a newline after formatted refs where the format expands
 	to the empty string.
 
+--exclude=<pattern>::
+	If one or more patterns are given, only refs which do not match
+	any excluded pattern(s) are shown. Matching is done using the
+	same rules as `<pattern>` above.
+
 FIELD NAMES
 -----------
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index c01fa6fefe..3384987428 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -47,6 +47,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
+		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
 		OPT_CALLBACK(0, "points-at", &filter.points_at,
 			     N_("object"), N_("print only refs which points at the given object"),
diff --git a/ref-filter.c b/ref-filter.c
index 6d91c7cb0d..d44418efb7 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2171,6 +2171,15 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 			     filter->ignore_case);
 }
 
+static int filter_exclude_match(struct ref_filter *filter, const char *refname)
+{
+	if (!filter->exclude.nr)
+		return 0;
+	if (filter->match_as_path)
+		return match_name_as_path(filter, filter->exclude.v, refname);
+	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -2338,6 +2347,9 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 	if (!filter_pattern_match(filter, refname))
 		return 0;
 
+	if (filter_exclude_match(filter, refname))
+		return 0;
+
 	if (filter->points_at.nr && !match_points_at(&filter->points_at, oid, refname))
 		return 0;
 
@@ -2877,6 +2889,7 @@ void ref_filter_init(struct ref_filter *filter)
 
 void ref_filter_clear(struct ref_filter *filter)
 {
+	strvec_clear(&filter->exclude);
 	oid_array_clear(&filter->points_at);
 	free_commit_list(filter->with_commit);
 	free_commit_list(filter->no_commit);
diff --git a/ref-filter.h b/ref-filter.h
index 160b807224..1524bc463a 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -6,6 +6,7 @@
 #include "refs.h"
 #include "commit.h"
 #include "string-list.h"
+#include "strvec.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -59,6 +60,7 @@ struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 	struct commit_list *no_commit;
@@ -94,6 +96,7 @@ struct ref_format {
 
 #define REF_FILTER_INIT { \
 	.points_at = OID_ARRAY_INIT, \
+	.exclude = STRVEC_INIT, \
 }
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
@@ -112,6 +115,9 @@ struct ref_format {
 #define OPT_REF_SORT(var) \
 	OPT_STRING_LIST(0, "sort", (var), \
 			N_("key"), N_("field name to sort on"))
+#define OPT_REF_FILTER_EXCLUDE(var) \
+	OPT_STRVEC(0, "exclude", &(var)->exclude, \
+		   N_("pattern"), N_("exclude refs which match pattern"))
 
 /*
  * API for filtering a set of refs. Based on the type of refs the user
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 5c00607608..7e8d578522 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -447,6 +447,41 @@ test_expect_success 'exercise glob patterns with prefixes' '
 	test_cmp expected actual
 '
 
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with prefix exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude=refs/tags/foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+refs/tags/bar
+refs/tags/baz
+refs/tags/foo/one
+refs/tags/testtag
+EOF
+
+test_expect_success 'exercise patterns with pattern exclusions' '
+	for tag in foo/one foo/two foo/three bar baz
+	do
+		git tag "$tag" || return 1
+	done &&
+	test_when_finished "git tag -d foo/one foo/two foo/three bar baz" &&
+	git for-each-ref --format="%(refname)" \
+		refs/tags/ --exclude="refs/tags/foo/t*" >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<\EOF
 'refs/heads/main'
 'refs/remotes/origin/main'
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 07/16] refs: plumb `exclude_patterns` argument throughout
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (5 preceding siblings ...)
  2023-06-20 14:21   ` [PATCH v4 06/16] builtin/for-each-ref.c: add `--exclude` option Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-06-20 14:21   ` [PATCH v4 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The subsequent patch will want to access an optional `excluded_patterns`
array within `refs/packed-backend.c` that will cull out certain
references matching any of the given patterns on a best-effort basis.

To do so, the refs subsystem needs to be updated to pass this value
across a number of different locations.

Prepare for a future patch by introducing this plumbing now, passing
NULLs at top-level APIs in order to make that patch less noisy and more
easily readable.

Signed-off-by: Taylor Blau <me@ttaylorr.co>
---
 ls-refs.c             |  2 +-
 ref-filter.c          |  2 +-
 refs.c                | 32 +++++++++++++++++++-------------
 refs.h                |  8 +++++++-
 refs/debug.c          |  5 +++--
 refs/files-backend.c  |  5 +++--
 refs/packed-backend.c |  5 +++--
 refs/refs-internal.h  |  7 ++++---
 revision.c            |  2 +-
 9 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index f385938b64..6f490b2d9c 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -193,7 +193,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 		strvec_push(&data.prefixes, "");
 	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
 					  get_git_namespace(), data.prefixes.v,
-					  send_ref, &data);
+					  NULL, send_ref, &data);
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
diff --git a/ref-filter.c b/ref-filter.c
index d44418efb7..5e7ed204dc 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2214,7 +2214,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
-						 cb, cb_data);
+						 NULL, cb, cb_data);
 }
 
 /*
diff --git a/refs.c b/refs.c
index ba63b69090..b4b7165fc0 100644
--- a/refs.c
+++ b/refs.c
@@ -1526,7 +1526,9 @@ int head_ref(each_ref_fn fn, void *cb_data)
 
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim,
+		const char *prefix,
+		const char **exclude_patterns,
+		int trim,
 		enum do_for_each_ref_flags flags)
 {
 	struct ref_iterator *iter;
@@ -1542,8 +1544,7 @@ struct ref_iterator *refs_ref_iterator_begin(
 		}
 	}
 
-	iter = refs->be->iterator_begin(refs, prefix, flags);
-
+	iter = refs->be->iterator_begin(refs, prefix, exclude_patterns, flags);
 	/*
 	 * `iterator_begin()` already takes care of prefix, but we
 	 * might need to do some trimming:
@@ -1577,7 +1578,7 @@ static int do_for_each_repo_ref(struct repository *r, const char *prefix,
 	if (!refs)
 		return 0;
 
-	iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
+	iter = refs_ref_iterator_begin(refs, prefix, NULL, trim, flags);
 
 	return do_for_each_repo_ref_iterator(r, iter, fn, cb_data);
 }
@@ -1599,6 +1600,7 @@ static int do_for_each_ref_helper(struct repository *r,
 }
 
 static int do_for_each_ref(struct ref_store *refs, const char *prefix,
+			   const char **exclude_patterns,
 			   each_ref_fn fn, int trim,
 			   enum do_for_each_ref_flags flags, void *cb_data)
 {
@@ -1608,7 +1610,8 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 	if (!refs)
 		return 0;
 
-	iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
+	iter = refs_ref_iterator_begin(refs, prefix, exclude_patterns, trim,
+				       flags);
 
 	return do_for_each_repo_ref_iterator(the_repository, iter,
 					do_for_each_ref_helper, &hp);
@@ -1616,7 +1619,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "", NULL, fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1627,7 +1630,7 @@ int for_each_ref(each_ref_fn fn, void *cb_data)
 int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
 			 each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, prefix, fn, strlen(prefix), 0, cb_data);
+	return do_for_each_ref(refs, prefix, NULL, fn, strlen(prefix), 0, cb_data);
 }
 
 int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
@@ -1638,13 +1641,14 @@ int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 {
 	return do_for_each_ref(get_main_ref_store(the_repository),
-			       prefix, fn, 0, 0, cb_data);
+			       prefix, NULL, fn, 0, 0, cb_data);
 }
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
 			     each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, prefix, fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
 }
 
 int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data)
@@ -1661,14 +1665,14 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 	int ret;
 	strbuf_addf(&buf, "%srefs/", get_git_namespace());
 	ret = do_for_each_ref(get_main_ref_store(the_repository),
-			      buf.buf, fn, 0, 0, cb_data);
+			      buf.buf, NULL, fn, 0, 0, cb_data);
 	strbuf_release(&buf);
 	return ret;
 }
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
+	return do_for_each_ref(refs, "", NULL, fn, 0,
 			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
 }
 
@@ -1738,6 +1742,7 @@ static void find_longest_prefixes(struct string_list *out,
 int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 				      const char *namespace,
 				      const char **patterns,
+				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data)
 {
 	struct string_list prefixes = STRING_LIST_INIT_DUP;
@@ -1753,7 +1758,8 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 
 	for_each_string_list_item(prefix, &prefixes) {
 		strbuf_addstr(&buf, prefix->string);
-		ret = refs_for_each_fullref_in(ref_store, buf.buf, fn, cb_data);
+		ret = refs_for_each_fullref_in(ref_store, buf.buf,
+					       exclude_patterns, fn, cb_data);
 		if (ret)
 			break;
 		strbuf_setlen(&buf, namespace_len);
@@ -2408,7 +2414,7 @@ int refs_verify_refname_available(struct ref_store *refs,
 	strbuf_addstr(&dirname, refname + dirname.len);
 	strbuf_addch(&dirname, '/');
 
-	iter = refs_ref_iterator_begin(refs, dirname.buf, 0,
+	iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 				       DO_FOR_EACH_INCLUDE_BROKEN);
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		if (skip &&
diff --git a/refs.h b/refs.h
index 933fdebe58..3c03b035a3 100644
--- a/refs.h
+++ b/refs.h
@@ -344,6 +344,7 @@ int for_each_ref(each_ref_fn fn, void *cb_data);
 int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
 			     each_ref_fn fn, void *cb_data);
 int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
@@ -351,10 +352,15 @@ int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data);
  * iterate all refs in "patterns" by partitioning patterns into disjoint sets
  * and iterating the longest-common prefix of each set.
  *
+ * references matching any pattern in "exclude_patterns" are omitted from the
+ * result set on a best-effort basis.
+ *
  * callers should be prepared to ignore references that they did not ask for.
  */
 int refs_for_each_fullref_in_prefixes(struct ref_store *refs,
-				      const char *namespace, const char **patterns,
+				      const char *namespace,
+				      const char **patterns,
+				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data);
 
 /**
diff --git a/refs/debug.c b/refs/debug.c
index c0fa707a1d..b7ffc4ce67 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -229,11 +229,12 @@ static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 
 static struct ref_iterator *
 debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
-			 unsigned int flags)
+			 const char **exclude_patterns, unsigned int flags)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
 	struct ref_iterator *res =
-		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+		drefs->refs->be->iterator_begin(drefs->refs, prefix,
+						exclude_patterns, flags);
 	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
 	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
 	diter->iter = res;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9a8333c0d0..0a60037530 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -831,7 +831,8 @@ static struct ref_iterator_vtable files_ref_iterator_vtable = {
 
 static struct ref_iterator *files_ref_iterator_begin(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags)
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags)
 {
 	struct files_ref_store *refs;
 	struct ref_iterator *loose_iter, *packed_iter, *overlay_iter;
@@ -876,7 +877,7 @@ static struct ref_iterator *files_ref_iterator_begin(
 	 * the packed and loose references.
 	 */
 	packed_iter = refs_ref_iterator_begin(
-			refs->packed_ref_store, prefix, 0,
+			refs->packed_ref_store, prefix, exclude_patterns, 0,
 			DO_FOR_EACH_INCLUDE_BROKEN);
 
 	overlay_iter = overlay_ref_iterator_begin(loose_iter, packed_iter);
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 291e53f5cf..6855c9a237 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -924,7 +924,8 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 
 static struct ref_iterator *packed_ref_iterator_begin(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags)
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags)
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
@@ -1149,7 +1150,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 * list of refs is exhausted, set iter to NULL. When the list
 	 * of updates is exhausted, leave i set to updates->nr.
 	 */
-	iter = packed_ref_iterator_begin(&refs->base, "",
+	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
 	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
 		iter = NULL;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index f72b7be894..9db8aec4da 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -367,8 +367,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
  */
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim,
-		enum do_for_each_ref_flags flags);
+		const char *prefix, const char **exclude_patterns,
+		int trim, enum do_for_each_ref_flags flags);
 
 /*
  * A callback function used to instruct merge_ref_iterator how to
@@ -571,7 +571,8 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
  */
 typedef struct ref_iterator *ref_iterator_begin_fn(
 		struct ref_store *ref_store,
-		const char *prefix, unsigned int flags);
+		const char *prefix, const char **exclude_patterns,
+		unsigned int flags);
 
 /* reflog functions */
 
diff --git a/revision.c b/revision.c
index b33cc1d106..89953592f9 100644
--- a/revision.c
+++ b/revision.c
@@ -2670,7 +2670,7 @@ static int for_each_bisect_ref(struct ref_store *refs, each_ref_fn fn,
 	struct strbuf bisect_refs = STRBUF_INIT;
 	int status;
 	strbuf_addf(&bisect_refs, "refs/bisect/%s", term);
-	status = refs_for_each_fullref_in(refs, bisect_refs.buf, fn, cb_data);
+	status = refs_for_each_fullref_in(refs, bisect_refs.buf, NULL, fn, cb_data);
 	strbuf_release(&bisect_refs);
 	return status;
 }
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 08/16] refs/packed-backend.c: refactor `find_reference_location()`
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (6 preceding siblings ...)
  2023-06-20 14:21   ` [PATCH v4 07/16] refs: plumb `exclude_patterns` argument throughout Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-06-20 14:21   ` [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The function `find_reference_location()` is used to perform a
binary search-like function over the contents of a repository's
`$GIT_DIR/packed-refs` file.

The search it implements is unlike a standard binary search in that the
records it searches over are not of a fixed width, so the comparison
must locate the end of a record before comparing it.

Extract the core routine of `find_reference_location()` in order to
implement a function in the following patch which will find the first
location in the `packed-refs` file that *doesn't* match the given
pattern.

The behavior of `find_reference_location()` is unchanged.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 6855c9a237..d9b61d9e03 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -527,22 +527,8 @@ static int load_contents(struct snapshot *snapshot)
 	return 1;
 }
 
-/*
- * Find the place in `snapshot->buf` where the start of the record for
- * `refname` starts. If `mustexist` is true and the reference doesn't
- * exist, then return NULL. If `mustexist` is false and the reference
- * doesn't exist, then return the point where that reference would be
- * inserted, or `snapshot->eof` (which might be NULL) if it would be
- * inserted at the end of the file. In the latter mode, `refname`
- * doesn't have to be a proper reference name; for example, one could
- * search for "refs/replace/" to find the start of any replace
- * references.
- *
- * The record is sought using a binary search, so `snapshot->buf` must
- * be sorted.
- */
-static const char *find_reference_location(struct snapshot *snapshot,
-					   const char *refname, int mustexist)
+static const char *find_reference_location_1(struct snapshot *snapshot,
+					     const char *refname, int mustexist)
 {
 	/*
 	 * This is not *quite* a garden-variety binary search, because
@@ -588,6 +574,26 @@ static const char *find_reference_location(struct snapshot *snapshot,
 		return lo;
 }
 
+/*
+ * Find the place in `snapshot->buf` where the start of the record for
+ * `refname` starts. If `mustexist` is true and the reference doesn't
+ * exist, then return NULL. If `mustexist` is false and the reference
+ * doesn't exist, then return the point where that reference would be
+ * inserted, or `snapshot->eof` (which might be NULL) if it would be
+ * inserted at the end of the file. In the latter mode, `refname`
+ * doesn't have to be a proper reference name; for example, one could
+ * search for "refs/replace/" to find the start of any replace
+ * references.
+ *
+ * The record is sought using a binary search, so `snapshot->buf` must
+ * be sorted.
+ */
+static const char *find_reference_location(struct snapshot *snapshot,
+					   const char *refname, int mustexist)
+{
+	return find_reference_location_1(snapshot, refname, mustexist);
+}
+
 /*
  * Create a newly-allocated `snapshot` of the `packed-refs` file in
  * its current state and return it. The return value will already have
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (7 preceding siblings ...)
  2023-06-20 14:21   ` [PATCH v4 08/16] refs/packed-backend.c: refactor `find_reference_location()` Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-07-03  5:56     ` Jeff King
  2023-06-20 14:21   ` [PATCH v4 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
                     ` (7 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

When iterating through the `packed-refs` file in order to answer a query
like:

    $ git for-each-ref --exclude=refs/__hidden__

it would be useful to avoid walking over all of the entries in
`refs/__hidden__/*` when possible, since we know that the ref-filter
code is going to throw them away anyways.

In certain circumstances, doing so is possible. The algorithm for doing
so is as follows:

  - For each excluded pattern, find the first record that matches it,
    and the first record that *doesn't* match it (i.e. the location
    you'd next want to consider when excluding that pattern).

  - Sort the set of excluded regions from the previous step in ascending
    order of the first location within the `packed-refs` file that
    matches.

  - Clean up the results from the previous step: discard empty regions,
    and combine adjacent regions. The set of regions which remains is
    referred to as the "jump list", and never contains any references
    which should be included in the result set.

Then when iterating through the `packed-refs` file, if `iter->pos` is
ever contained in one of the regions from the previous steps, advance
`iter->pos` past the end of that region, and continue enumeration.

Note that we only perform this optimization when none of the excluded
pattern(s) have special meta-characters in them. For a pattern like
"refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and
everything underneath them) are not connected. A future implementation
that handles this case may split the character class (pretending as if
two patterns were excluded: "refs/fooa", and "refs/fooc").

There are a few other gotchas worth considering. First, note that the
jump list is sorted, so once we jump past a region, we can avoid
considering it (or any regions preceding it) again. The member
`jump_pos` is used to track the first next-possible region to jump
through.

Second, note that the jump list is best-effort, since we do not handle
loose references, and because of the meta-character issue above. The
jump list may not skip past all references which won't appear in the
results, but will never skip over a reference which does appear in the
result set.

In repositories with a large number of hidden references, the speed-up
can be significant. Tests here are done with a copy of linux.git with a
reference "refs/pull/N" pointing at every commit, as in:

    $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' |
        git update-ref --stdin
    $ git pack-refs --all

, it is significantly faster to have `for-each-ref` jump over the
excluded references, as opposed to filtering them out after the fact:

    $ hyperfine \
      'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
    Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
      Range (min … max):   800.0 ms … 807.7 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
      Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
      Range (min … max):     4.3 ms …   6.7 ms    422 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
      172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'

Using the jump list is fairly straightforward (see the changes to
`refs/packed-backend.c::next_record()`), but constructing the list is
not. To ensure that the construction is correct, add a new suite of
tests in t1419 covering various corner cases (overlapping regions,
partially overlapping regions, adjacent regions, etc.).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c              |   6 +-
 refs/packed-backend.c     | 160 ++++++++++++++++++++++++++++++++++++--
 t/helper/test-ref-store.c |  10 +++
 t/t1419-exclude-refs.sh   | 101 ++++++++++++++++++++++++
 4 files changed, 269 insertions(+), 8 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

diff --git a/ref-filter.c b/ref-filter.c
index 5e7ed204dc..ddc7f5204f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2209,12 +2209,14 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return for_each_fullref_in("", cb, cb_data);
+		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+						 "", filter->exclude.v, cb, cb_data);
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
-						 NULL, cb, cb_data);
+						 filter->exclude.v,
+						 cb, cb_data);
 }
 
 /*
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index d9b61d9e03..f624c9921a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -303,7 +303,8 @@ static int cmp_packed_ref_records(const void *v1, const void *v2)
  * Compare a snapshot record at `rec` to the specified NUL-terminated
  * refname.
  */
-static int cmp_record_to_refname(const char *rec, const char *refname)
+static int cmp_record_to_refname(const char *rec, const char *refname,
+				 int start)
 {
 	const char *r1 = rec + the_hash_algo->hexsz + 1;
 	const char *r2 = refname;
@@ -312,7 +313,7 @@ static int cmp_record_to_refname(const char *rec, const char *refname)
 		if (*r1 == '\n')
 			return *r2 ? -1 : 0;
 		if (!*r2)
-			return 1;
+			return start ? 1 : -1;
 		if (*r1 != *r2)
 			return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
 		r1++;
@@ -528,7 +529,8 @@ static int load_contents(struct snapshot *snapshot)
 }
 
 static const char *find_reference_location_1(struct snapshot *snapshot,
-					     const char *refname, int mustexist)
+					     const char *refname, int mustexist,
+					     int start)
 {
 	/*
 	 * This is not *quite* a garden-variety binary search, because
@@ -558,7 +560,7 @@ static const char *find_reference_location_1(struct snapshot *snapshot,
 
 		mid = lo + (hi - lo) / 2;
 		rec = find_start_of_record(lo, mid);
-		cmp = cmp_record_to_refname(rec, refname);
+		cmp = cmp_record_to_refname(rec, refname, start);
 		if (cmp < 0) {
 			lo = find_end_of_record(mid, hi);
 		} else if (cmp > 0) {
@@ -591,7 +593,22 @@ static const char *find_reference_location_1(struct snapshot *snapshot,
 static const char *find_reference_location(struct snapshot *snapshot,
 					   const char *refname, int mustexist)
 {
-	return find_reference_location_1(snapshot, refname, mustexist);
+	return find_reference_location_1(snapshot, refname, mustexist, 1);
+}
+
+/*
+ * Find the place in `snapshot->buf` after the end of the record for
+ * `refname`. In other words, find the location of first thing *after*
+ * `refname`.
+ *
+ * Other semantics are identical to the ones in
+ * `find_reference_location()`.
+ */
+static const char *find_reference_location_end(struct snapshot *snapshot,
+					       const char *refname,
+					       int mustexist)
+{
+	return find_reference_location_1(snapshot, refname, mustexist, 0);
 }
 
 /*
@@ -785,6 +802,13 @@ struct packed_ref_iterator {
 	/* The end of the part of the buffer that will be iterated over: */
 	const char *eof;
 
+	struct jump_list_entry {
+		const char *start;
+		const char *end;
+	} *jump;
+	size_t jump_nr, jump_alloc;
+	size_t jump_cur;
+
 	/* Scratch space for current values: */
 	struct object_id oid, peeled;
 	struct strbuf refname_buf;
@@ -802,14 +826,34 @@ struct packed_ref_iterator {
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
-	const char *p = iter->pos, *eol;
+	const char *p, *eol;
 
 	strbuf_reset(&iter->refname_buf);
 
+	/*
+	 * If iter->pos is contained within a skipped region, jump past
+	 * it.
+	 *
+	 * Note that each skipped region is considered at most once,
+	 * since they are ordered based on their starting position.
+	 */
+	while (iter->jump_cur < iter->jump_nr) {
+		struct jump_list_entry *curr = &iter->jump[iter->jump_cur];
+		if (iter->pos < curr->start)
+			break; /* not to the next jump yet */
+
+		iter->jump_cur++;
+		if (iter->pos < curr->end) {
+			iter->pos = curr->end;
+			break;
+		}
+	}
+
 	if (iter->pos == iter->eof)
 		return ITER_DONE;
 
 	iter->base.flags = REF_ISPACKED;
+	p = iter->pos;
 
 	if (iter->eof - p < the_hash_algo->hexsz + 2 ||
 	    parse_oid_hex(p, &iter->oid, &p) ||
@@ -917,6 +961,7 @@ static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
 	int ok = ITER_DONE;
 
 	strbuf_release(&iter->refname_buf);
+	free(iter->jump);
 	release_snapshot(iter->snapshot);
 	base_ref_iterator_free(ref_iterator);
 	return ok;
@@ -928,6 +973,106 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.abort = packed_ref_iterator_abort
 };
 
+static int jump_list_entry_cmp(const void *va, const void *vb)
+{
+	const struct jump_list_entry *a = va;
+	const struct jump_list_entry *b = vb;
+
+	if (a->start < b->start)
+		return -1;
+	if (a->start > b->start)
+		return 1;
+	return 0;
+}
+
+static int has_glob_special(const char *str)
+{
+	const char *p;
+	for (p = str; *p; p++) {
+		if (is_glob_special(*p))
+			return 1;
+	}
+	return 0;
+}
+
+static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
+					struct snapshot *snapshot,
+					const char **excluded_patterns)
+{
+	size_t i, j;
+	const char **pattern;
+	struct jump_list_entry *last_disjoint;
+
+	if (!excluded_patterns)
+		return;
+
+	for (pattern = excluded_patterns; *pattern; pattern++) {
+		struct jump_list_entry *e;
+
+		/*
+		 * We can't feed any excludes with globs in them to the
+		 * refs machinery.  It only understands prefix matching.
+		 * We likewise can't even feed the string leading up to
+		 * the first meta-character, as something like "foo[a]"
+		 * should not exclude "foobar" (but the prefix "foo"
+		 * would match that and mark it for exclusion).
+		 */
+		if (has_glob_special(*pattern))
+			continue;
+
+		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
+
+		e = &iter->jump[iter->jump_nr++];
+		e->start = find_reference_location(snapshot, *pattern, 0);
+		e->end = find_reference_location_end(snapshot, *pattern, 0);
+	}
+
+	if (!iter->jump_nr) {
+		/*
+		 * Every entry in exclude_patterns has a meta-character,
+		 * nothing to do here.
+		 */
+		return;
+	}
+
+	QSORT(iter->jump, iter->jump_nr, jump_list_entry_cmp);
+
+	/*
+	 * As an optimization, merge adjacent entries in the jump list
+	 * to jump forwards as far as possible when entering a skipped
+	 * region.
+	 *
+	 * For example, if we have two skipped regions:
+	 *
+	 *	[[A, B], [B, C]]
+	 *
+	 * we want to combine that into a single entry jumping from A to
+	 * C.
+	 */
+	last_disjoint = iter->jump;
+
+	for (i = 1, j = 1; i < iter->jump_nr; i++) {
+		struct jump_list_entry *ours = &iter->jump[i];
+
+		if (ours->start == ours->end) {
+			/* ignore empty regions (no matching entries) */
+			continue;
+		} else if (ours->start <= last_disjoint->end) {
+			/* overlapping regions extend the previous one */
+			last_disjoint->end = last_disjoint->end > ours->end
+				? last_disjoint->end : ours->end;
+		} else {
+			/* otherwise, insert a new region */
+			iter->jump[j++] = *ours;
+			last_disjoint = ours;
+
+		}
+	}
+
+	iter->jump_nr = j;
+	iter->jump_cur = 0;
+}
+
 static struct ref_iterator *packed_ref_iterator_begin(
 		struct ref_store *ref_store,
 		const char *prefix, const char **exclude_patterns,
@@ -963,6 +1108,9 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
 
+	if (exclude_patterns)
+		populate_excluded_jump_list(iter, snapshot, exclude_patterns);
+
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
 
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index a6977b5e83..08b6d5a86c 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -184,6 +184,15 @@ static int cmd_for_each_ref(struct ref_store *refs, const char **argv)
 	return refs_for_each_ref_in(refs, prefix, each_ref, NULL);
 }
 
+static int cmd_for_each_ref__exclude(struct ref_store *refs, const char **argv)
+{
+	const char *prefix = notnull(*argv++, "prefix");
+	const char **exclude_patterns = argv;
+
+	return refs_for_each_fullref_in(refs, prefix, exclude_patterns, each_ref,
+					NULL);
+}
+
 static int cmd_resolve_ref(struct ref_store *refs, const char **argv)
 {
 	struct object_id oid = *null_oid();
@@ -316,6 +325,7 @@ static struct command commands[] = {
 	{ "delete-refs", cmd_delete_refs },
 	{ "rename-ref", cmd_rename_ref },
 	{ "for-each-ref", cmd_for_each_ref },
+	{ "for-each-ref--exclude", cmd_for_each_ref__exclude },
 	{ "resolve-ref", cmd_resolve_ref },
 	{ "verify-ref", cmd_verify_ref },
 	{ "for-each-reflog", cmd_for_each_reflog },
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
new file mode 100755
index 0000000000..bc534c8ea1
--- /dev/null
+++ b/t/t1419-exclude-refs.sh
@@ -0,0 +1,101 @@
+#!/bin/sh
+
+test_description='test exclude_patterns functionality in main ref store'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+for_each_ref__exclude () {
+	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
+	cut -d ' ' -f 2 actual.raw
+}
+
+for_each_ref () {
+	git for-each-ref --format='%(refname)' "$@"
+}
+
+test_expect_success 'setup' '
+	test_commit --no-tag base &&
+	base="$(git rev-parse HEAD)" &&
+
+	for name in foo bar baz quux
+	do
+		for i in 1 2 3
+		do
+			echo "create refs/heads/$name/$i $base" || return 1
+		done || return 1
+	done >in &&
+	echo "delete refs/heads/main" >>in &&
+
+	git update-ref --stdin <in &&
+	git pack-refs --all
+'
+
+test_expect_success 'excluded region in middle' '
+	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
+	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'excluded region at beginning' '
+	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'excluded region at end' '
+	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
+	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'disjoint excluded regions' '
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'adjacent, non-overlapping excluded regions' '
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'overlapping excluded regions' '
+	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'several overlapping excluded regions' '
+	for_each_ref__exclude refs/heads \
+		refs/heads/bar refs/heads/baz refs/heads/foo >actual &&
+	for_each_ref refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'non-matching excluded section' '
+	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'meta-characters are discarded' '
+	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 10/16] refs/packed-backend.c: add trace2 counters for jump list
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (8 preceding siblings ...)
  2023-06-20 14:21   ` [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-06-20 14:21   ` [PATCH v4 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

The previous commit added low-level tests to ensure that the packed-refs
iterator did not enumerate excluded sections of the refspace.

However, there was no guarantee that these sections weren't being
visited, only that they were being suppressed from the output. To harden
these tests, add a trace2 counter which tracks the number of regions
skipped by the packed-refs iterator, and assert on its value.

Suggested-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c   |  2 ++
 t/t1419-exclude-refs.sh | 59 ++++++++++++++++++++++++++++-------------
 trace2.h                |  2 ++
 trace2/tr2_ctr.c        |  5 ++++
 4 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index f624c9921a..80b877e00c 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -12,6 +12,7 @@
 #include "../chdir-notify.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
+#include "../trace2.h"
 
 enum mmap_strategy {
 	/*
@@ -845,6 +846,7 @@ static int next_record(struct packed_ref_iterator *iter)
 		iter->jump_cur++;
 		if (iter->pos < curr->end) {
 			iter->pos = curr->end;
+			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, 1);
 			break;
 		}
 	}
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
index bc534c8ea1..5d8c86b657 100755
--- a/t/t1419-exclude-refs.sh
+++ b/t/t1419-exclude-refs.sh
@@ -9,7 +9,8 @@ TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 for_each_ref__exclude () {
-	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
+	GIT_TRACE2_PERF=1 test-tool ref-store main \
+		for-each-ref--exclude "$@" >actual.raw
 	cut -d ' ' -f 2 actual.raw
 }
 
@@ -17,6 +18,17 @@ for_each_ref () {
 	git for-each-ref --format='%(refname)' "$@"
 }
 
+assert_jumps () {
+	local nr="$1"
+	local trace="$2"
+
+	grep -q "name:jumps_made value:$nr$" $trace
+}
+
+assert_no_jumps () {
+	! assert_jumps ".*" "$1"
+}
+
 test_expect_success 'setup' '
 	test_commit --no-tag base &&
 	base="$(git rev-parse HEAD)" &&
@@ -35,67 +47,76 @@ test_expect_success 'setup' '
 '
 
 test_expect_success 'excluded region in middle' '
-	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
+	for_each_ref__exclude refs/heads refs/heads/foo >actual 2>perf &&
 	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'excluded region at beginning' '
-	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar >actual 2>perf &&
 	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'excluded region at end' '
-	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
+	for_each_ref__exclude refs/heads refs/heads/quux >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'disjoint excluded regions' '
-	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual 2>perf &&
 	for_each_ref refs/heads/baz refs/heads/foo >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 2 perf
 '
 
 test_expect_success 'adjacent, non-overlapping excluded regions' '
-	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'overlapping excluded regions' '
-	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
+	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf &&
 	for_each_ref refs/heads/foo refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'several overlapping excluded regions' '
 	for_each_ref__exclude refs/heads \
-		refs/heads/bar refs/heads/baz refs/heads/foo >actual &&
+		refs/heads/bar refs/heads/baz refs/heads/foo >actual 2>perf &&
 	for_each_ref refs/heads/quux >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_jumps 1 perf
 '
 
 test_expect_success 'non-matching excluded section' '
-	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
+	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual 2>perf &&
 	for_each_ref >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_no_jumps perf
 '
 
 test_expect_success 'meta-characters are discarded' '
-	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
+	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual 2>perf &&
 	for_each_ref >expect &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+	assert_no_jumps perf
 '
 
 test_done
diff --git a/trace2.h b/trace2.h
index 4ced30c0db..9452e291f5 100644
--- a/trace2.h
+++ b/trace2.h
@@ -551,6 +551,8 @@ enum trace2_counter_id {
 	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
 	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
 
+	TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, /* counts number of jumps */
+
 	/* Add additional counter definitions before here. */
 	TRACE2_NUMBER_OF_COUNTERS
 };
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
index b342d3b1a3..50570d0165 100644
--- a/trace2/tr2_ctr.c
+++ b/trace2/tr2_ctr.c
@@ -27,6 +27,11 @@ static struct tr2_counter_metadata tr2_counter_metadata[TRACE2_NUMBER_OF_COUNTER
 		.name = "test2",
 		.want_per_thread_events = 1,
 	},
+	[TRACE2_COUNTER_ID_PACKED_REFS_JUMPS] = {
+		.category = "packed-refs",
+		.name = "jumps_made",
+		.want_per_thread_events = 0,
+	},
 
 	/* Add additional metadata before here. */
 };
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 11/16] revision.h: store hidden refs in a `strvec`
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (9 preceding siblings ...)
  2023-06-20 14:21   ` [PATCH v4 10/16] refs/packed-backend.c: add trace2 counters for jump list Taylor Blau
@ 2023-06-20 14:21   ` Taylor Blau
  2023-07-03  5:59     ` Jeff King
  2023-06-20 14:22   ` [PATCH v4 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
                     ` (5 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:21 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In subsequent commits, it will be convenient to have a 'const char **'
of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`,
etc.), instead of a `string_list`.

Convert spots throughout the tree that store the list of hidden refs
from a `string_list` to a `strvec`.

Note that in `parse_hide_refs_config()` there is an ugly const-cast used
to avoid an extra copy of each value before trimming any trailing slash
characters. This could instead be written as:

    ref = xstrdup(value);
    len = strlen(ref);
    while (len && ref[len - 1] == '/')
            ref[--len] = '\0';
    strvec_push(hide_refs, ref);
    free(ref);

but the double-copy (once when calling `xstrdup()`, and another via
`strvec_push()`) is wasteful.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/receive-pack.c |  4 ++--
 ls-refs.c              |  6 +++---
 refs.c                 | 11 ++++++-----
 refs.h                 |  4 ++--
 revision.c             |  2 +-
 revision.h             |  5 +++--
 upload-pack.c          | 10 +++++-----
 7 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 1a31a58367..1a8472eddc 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -90,7 +90,7 @@ static struct object_id push_cert_oid;
 static struct signature_check sigcheck;
 static const char *push_cert_nonce;
 static const char *cert_nonce_seed;
-static struct string_list hidden_refs = STRING_LIST_INIT_DUP;
+static struct strvec hidden_refs = STRVEC_INIT;
 
 static const char *NONCE_UNSOLICITED = "UNSOLICITED";
 static const char *NONCE_BAD = "BAD";
@@ -2619,7 +2619,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		packet_flush(1);
 	oid_array_clear(&shallow);
 	oid_array_clear(&ref);
-	string_list_clear(&hidden_refs, 0);
+	strvec_clear(&hidden_refs);
 	free((void *)push_cert_nonce);
 	return 0;
 }
diff --git a/ls-refs.c b/ls-refs.c
index 6f490b2d9c..8c3181d051 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -72,7 +72,7 @@ struct ls_refs_data {
 	unsigned symrefs;
 	struct strvec prefixes;
 	struct strbuf buf;
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 	unsigned unborn : 1;
 };
 
@@ -155,7 +155,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 	memset(&data, 0, sizeof(data));
 	strvec_init(&data.prefixes);
 	strbuf_init(&data.buf, 0);
-	string_list_init_dup(&data.hidden_refs);
+	strvec_init(&data.hidden_refs);
 
 	git_config(ls_refs_config, &data);
 
@@ -197,7 +197,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
-	string_list_clear(&data.hidden_refs, 0);
+	strvec_clear(&data.hidden_refs);
 	return 0;
 }
 
diff --git a/refs.c b/refs.c
index b4b7165fc0..1f01fdf9e8 100644
--- a/refs.c
+++ b/refs.c
@@ -1427,7 +1427,7 @@ char *shorten_unambiguous_ref(const char *refname, int strict)
 }
 
 int parse_hide_refs_config(const char *var, const char *value, const char *section,
-			   struct string_list *hide_refs)
+			   struct strvec *hide_refs)
 {
 	const char *key;
 	if (!strcmp("transfer.hiderefs", var) ||
@@ -1438,22 +1438,23 @@ int parse_hide_refs_config(const char *var, const char *value, const char *secti
 
 		if (!value)
 			return config_error_nonbool(var);
-		ref = xstrdup(value);
+
+		/* drop const to remove trailing '/' characters */
+		ref = (char *)strvec_push(hide_refs, value);
 		len = strlen(ref);
 		while (len && ref[len - 1] == '/')
 			ref[--len] = '\0';
-		string_list_append_nodup(hide_refs, ref);
 	}
 	return 0;
 }
 
 int ref_is_hidden(const char *refname, const char *refname_full,
-		  const struct string_list *hide_refs)
+		  const struct strvec *hide_refs)
 {
 	int i;
 
 	for (i = hide_refs->nr - 1; i >= 0; i--) {
-		const char *match = hide_refs->items[i].string;
+		const char *match = hide_refs->v[i];
 		const char *subject;
 		int neg = 0;
 		const char *p;
diff --git a/refs.h b/refs.h
index 3c03b035a3..f091741bfa 100644
--- a/refs.h
+++ b/refs.h
@@ -816,7 +816,7 @@ int update_ref(const char *msg, const char *refname,
 	       unsigned int flags, enum action_on_err onerr);
 
 int parse_hide_refs_config(const char *var, const char *value, const char *,
-			   struct string_list *);
+			   struct strvec *);
 
 /*
  * Check whether a ref is hidden. If no namespace is set, both the first and
@@ -826,7 +826,7 @@ int parse_hide_refs_config(const char *var, const char *value, const char *,
  * the ref is outside that namespace, the first parameter is NULL. The second
  * parameter always points to the full ref name.
  */
-int ref_is_hidden(const char *, const char *, const struct string_list *);
+int ref_is_hidden(const char *, const char *, const struct strvec *);
 
 /* Is this a per-worktree ref living in the refs/ namespace? */
 int is_per_worktree_ref(const char *refname);
diff --git a/revision.c b/revision.c
index 89953592f9..7c9367a266 100644
--- a/revision.c
+++ b/revision.c
@@ -1558,7 +1558,7 @@ void init_ref_exclusions(struct ref_exclusions *exclusions)
 void clear_ref_exclusions(struct ref_exclusions *exclusions)
 {
 	string_list_clear(&exclusions->excluded_refs, 0);
-	string_list_clear(&exclusions->hidden_refs, 0);
+	strvec_clear(&exclusions->hidden_refs);
 	exclusions->hidden_refs_configured = 0;
 }
 
diff --git a/revision.h b/revision.h
index 25776af381..7f219cde62 100644
--- a/revision.h
+++ b/revision.h
@@ -10,6 +10,7 @@
 #include "decorate.h"
 #include "ident.h"
 #include "list-objects-filter-options.h"
+#include "strvec.h"
 
 /**
  * The revision walking API offers functions to build a list of revisions
@@ -95,7 +96,7 @@ struct ref_exclusions {
 	 * Hidden refs is a list of patterns that is to be hidden via
 	 * `ref_is_hidden()`.
 	 */
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 
 	/*
 	 * Indicates whether hidden refs have been configured. This is to
@@ -110,7 +111,7 @@ struct ref_exclusions {
  */
 #define REF_EXCLUSIONS_INIT { \
 	.excluded_refs = STRING_LIST_INIT_DUP, \
-	.hidden_refs = STRING_LIST_INIT_DUP, \
+	.hidden_refs = STRVEC_INIT, \
 }
 
 struct oidset;
diff --git a/upload-pack.c b/upload-pack.c
index d3312006a3..1a213ed775 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -69,7 +69,7 @@ struct upload_pack_data {
 	struct object_array have_obj;
 	struct oid_array haves;					/* v2 only */
 	struct string_list wanted_refs;				/* v2 only */
-	struct string_list hidden_refs;
+	struct strvec hidden_refs;
 
 	struct object_array shallows;
 	struct string_list deepen_not;
@@ -127,7 +127,7 @@ static void upload_pack_data_init(struct upload_pack_data *data)
 {
 	struct string_list symref = STRING_LIST_INIT_DUP;
 	struct string_list wanted_refs = STRING_LIST_INIT_DUP;
-	struct string_list hidden_refs = STRING_LIST_INIT_DUP;
+	struct strvec hidden_refs = STRVEC_INIT;
 	struct object_array want_obj = OBJECT_ARRAY_INIT;
 	struct object_array have_obj = OBJECT_ARRAY_INIT;
 	struct oid_array haves = OID_ARRAY_INIT;
@@ -162,7 +162,7 @@ static void upload_pack_data_clear(struct upload_pack_data *data)
 {
 	string_list_clear(&data->symref, 1);
 	string_list_clear(&data->wanted_refs, 1);
-	string_list_clear(&data->hidden_refs, 0);
+	strvec_clear(&data->hidden_refs);
 	object_array_clear(&data->want_obj);
 	object_array_clear(&data->have_obj);
 	oid_array_clear(&data->haves);
@@ -1170,7 +1170,7 @@ static void receive_needs(struct upload_pack_data *data,
 
 /* return non-zero if the ref is hidden, otherwise 0 */
 static int mark_our_ref(const char *refname, const char *refname_full,
-			const struct object_id *oid, const struct string_list *hidden_refs)
+			const struct object_id *oid, const struct strvec *hidden_refs)
 {
 	struct object *o = lookup_unknown_object(the_repository, oid);
 
@@ -1465,7 +1465,7 @@ static int parse_want(struct packet_writer *writer, const char *line,
 
 static int parse_want_ref(struct packet_writer *writer, const char *line,
 			  struct string_list *wanted_refs,
-			  struct string_list *hidden_refs,
+			  struct strvec *hidden_refs,
 			  struct object_array *want_obj)
 {
 	const char *refname_nons;
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 12/16] refs/packed-backend.c: ignore complicated hidden refs rules
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (10 preceding siblings ...)
  2023-06-20 14:21   ` [PATCH v4 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
@ 2023-06-20 14:22   ` Taylor Blau
  2023-07-03  6:18     ` Jeff King
  2023-06-20 14:22   ` [PATCH v4 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
                     ` (4 subsequent siblings)
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:22 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In subsequent commits, we'll teach `receive-pack` and `upload-pack` to
use the new jump list feature in the packed-refs iterator by ignoring
references which are mentioned via its respective hideRefs lists.

However, the packed-ref jump lists cannot handle un-hiding rules (that
begin with '!'), or namespace comparisons (that begin with '^'). Detect
and avoid these cases by falling back to the normal enumeration without
a jump list when such patterns exist.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs/packed-backend.c   | 19 +++++++++++++++++++
 t/t1419-exclude-refs.sh |  9 +++++++++
 2 files changed, 28 insertions(+)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 80b877e00c..2aeec5c601 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1008,6 +1008,25 @@ static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
 	if (!excluded_patterns)
 		return;
 
+	for (pattern = excluded_patterns; *pattern; pattern++) {
+		/*
+		 * We also can't feed any excludes from hidden refs
+		 * config sections, since later rules may override
+		 * previous ones. For example, with rules "refs/foo" and
+		 * "!refs/foo/bar", we should show "refs/foo/bar" (and
+		 * everything underneath it), but the earlier exclusion
+		 * would cause us to skip all of "refs/foo". We likewise
+		 * don't implement the namespace stripping required for
+		 * '^' rules.
+		 *
+		 * Both are possible to do, but complicated, so avoid
+		 * populating the jump list at all if we see either of
+		 * these patterns.
+		 */
+		if (**pattern == '!' || **pattern == '^')
+			return;
+	}
+
 	for (pattern = excluded_patterns; *pattern; pattern++) {
 		struct jump_list_entry *e;
 
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
index 5d8c86b657..f8abf75ab8 100755
--- a/t/t1419-exclude-refs.sh
+++ b/t/t1419-exclude-refs.sh
@@ -119,4 +119,13 @@ test_expect_success 'meta-characters are discarded' '
 	assert_no_jumps perf
 '
 
+test_expect_success 'complex hidden ref rules are discarded' '
+	for_each_ref__exclude refs/heads refs/heads/foo "!refs/heads/foo/1" \
+		>actual 2>perf &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual &&
+	assert_no_jumps
+'
+
 test_done
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (11 preceding siblings ...)
  2023-06-20 14:22   ` [PATCH v4 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
@ 2023-06-20 14:22   ` Taylor Blau
  2023-06-20 14:22   ` [PATCH v4 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:22 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

A future commit will want to call `for_each_namespaced_ref()` with
a list of excluded patterns.

We could introduce a variant of that function, say,
`for_each_namespaced_ref_exclude()` which takes the extra parameter, and
reimplement the original function in terms of that. But all but one
caller (in `http-backend.c`) will supply the new parameter, so add the
new parameter to `for_each_namespaced_ref()` itself instead of
introducing a new function.

For now, supply NULL for the list of excluded patterns at all callers to
avoid changing behavior, which we will do in a future change.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 http-backend.c | 2 +-
 refs.c         | 5 +++--
 refs.h         | 3 ++-
 upload-pack.c  | 6 +++---
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index ac146d85c5..ad500683c8 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -559,7 +559,7 @@ static void get_info_refs(struct strbuf *hdr, char *arg UNUSED)
 
 	} else {
 		select_getanyfile(hdr);
-		for_each_namespaced_ref(show_text_ref, &buf);
+		for_each_namespaced_ref(NULL, show_text_ref, &buf);
 		send_strbuf(hdr, "text/plain", &buf);
 	}
 	strbuf_release(&buf);
diff --git a/refs.c b/refs.c
index 1f01fdf9e8..8613184703 100644
--- a/refs.c
+++ b/refs.c
@@ -1660,13 +1660,14 @@ int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_dat
 				    DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
 }
 
-int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
+int for_each_namespaced_ref(const char **exclude_patterns,
+			    each_ref_fn fn, void *cb_data)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int ret;
 	strbuf_addf(&buf, "%srefs/", get_git_namespace());
 	ret = do_for_each_ref(get_main_ref_store(the_repository),
-			      buf.buf, NULL, fn, 0, 0, cb_data);
+			      buf.buf, exclude_patterns, fn, 0, 0, cb_data);
 	strbuf_release(&buf);
 	return ret;
 }
diff --git a/refs.h b/refs.h
index f091741bfa..27d341d282 100644
--- a/refs.h
+++ b/refs.h
@@ -378,7 +378,8 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 			 const char *prefix, void *cb_data);
 
 int head_ref_namespaced(each_ref_fn fn, void *cb_data);
-int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
+int for_each_namespaced_ref(const char **exclude_patterns,
+			    each_ref_fn fn, void *cb_data);
 
 /* can be used to learn about broken ref and symref */
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data);
diff --git a/upload-pack.c b/upload-pack.c
index 1a213ed775..99d216938c 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -855,7 +855,7 @@ static void deepen(struct upload_pack_data *data, int depth)
 		 * marked with OUR_REF.
 		 */
 		head_ref_namespaced(check_ref, data);
-		for_each_namespaced_ref(check_ref, data);
+		for_each_namespaced_ref(NULL, check_ref, data);
 
 		get_reachable_list(data, &reachable_shallows);
 		result = get_shallow_commits(&reachable_shallows,
@@ -1386,7 +1386,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		if (advertise_refs)
 			data.no_done = 1;
 		head_ref_namespaced(send_ref, &data);
-		for_each_namespaced_ref(send_ref, &data);
+		for_each_namespaced_ref(NULL, send_ref, &data);
 		if (!data.sent_capabilities) {
 			const char *refname = "capabilities^{}";
 			write_v0_ref(&data, refname, refname, null_oid());
@@ -1400,7 +1400,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		packet_flush(1);
 	} else {
 		head_ref_namespaced(check_ref, &data);
-		for_each_namespaced_ref(check_ref, &data);
+		for_each_namespaced_ref(NULL, check_ref, &data);
 	}
 
 	if (!advertise_refs) {
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 14/16] builtin/receive-pack.c: avoid enumerating hidden references
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (12 preceding siblings ...)
  2023-06-20 14:22   ` [PATCH v4 13/16] refs.h: let `for_each_namespaced_ref()` take excluded patterns Taylor Blau
@ 2023-06-20 14:22   ` Taylor Blau
  2023-06-20 14:22   ` [PATCH v4 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:22 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

Now that `refs_for_each_fullref_in()` has the ability to avoid
enumerating references matching certain pattern(s), use that to avoid
visiting hidden refs when constructing the ref advertisement via
receive-pack.

Note that since this exclusion is best-effort, we still need
`show_ref_cb()` to check whether or not each reference is hidden or not
before including it in the advertisement.

As was the case when applying this same optimization to `upload-pack`,
`receive-pack`'s reference advertisement phase can proceed much quicker
by avoiding enumerating references that will not be part of the
advertisement.

(Below, we're still using linux.git with one hidden refs/pull/N ref per
commit):

    $ hyperfine -L v ,.compile 'git{v} -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'
    Benchmark 1: git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
      Time (mean ± σ):      89.1 ms ±   1.7 ms    [User: 82.0 ms, System: 7.0 ms]
      Range (min … max):    87.7 ms …  95.5 ms    31 runs

    Benchmark 2: git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
      Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 0.5 ms, System: 3.9 ms]
      Range (min … max):     4.1 ms …   5.6 ms    508 runs

    Summary
      'git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git' ran
       20.00 ± 1.05 times faster than 'git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/receive-pack.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 1a8472eddc..bd5bcc375f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -337,7 +337,8 @@ static void write_head_info(void)
 {
 	static struct oidset seen = OIDSET_INIT;
 
-	for_each_ref(show_ref_cb, &seen);
+	refs_for_each_fullref_in(get_main_ref_store(the_repository), "",
+				 hidden_refs.v, show_ref_cb, &seen);
 	for_each_alternate_ref(show_one_alternate_ref, &seen);
 	oidset_clear(&seen);
 	if (!sent_capabilities)
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 15/16] upload-pack.c: avoid enumerating hidden refs where possible
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (13 preceding siblings ...)
  2023-06-20 14:22   ` [PATCH v4 14/16] builtin/receive-pack.c: avoid enumerating hidden references Taylor Blau
@ 2023-06-20 14:22   ` Taylor Blau
  2023-07-03  6:26     ` Jeff King
  2023-06-20 14:22   ` [PATCH v4 16/16] ls-refs.c: " Taylor Blau
  2023-07-03  6:29   ` [PATCH v4 00/16] refs: implement jump lists for packed backend Jeff King
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:22 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.

Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:

  - `uploadpack.allowTipSHA1InWant`, or
  - `uploadpack.allowReachableSHA1InWant`

are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.

When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.

When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.

When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):

    $ printf 0000 >in
    $ hyperfine --warmup=1 \
      'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
      'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
      'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
    Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
      Time (mean ± σ):     406.9 ms ±   1.1 ms    [User: 357.3 ms, System: 49.5 ms]
      Range (min … max):   405.7 ms … 409.2 ms    10 runs

    Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
      Time (mean ± σ):     406.5 ms ±   1.3 ms    [User: 356.5 ms, System: 49.9 ms]
      Range (min … max):   404.6 ms … 408.8 ms    10 runs

    Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
      Time (mean ± σ):       4.7 ms ±   0.2 ms    [User: 0.7 ms, System: 3.9 ms]
      Range (min … max):     4.3 ms …   6.1 ms    472 runs

    Summary
      'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
       86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
       86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'

As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 upload-pack.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 99d216938c..366a101d8d 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -602,11 +602,32 @@ static int get_common_commits(struct upload_pack_data *data,
 	}
 }
 
+static int allow_hidden_refs(enum allow_uor allow_uor)
+{
+	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
+}
+
+static void for_each_namespaced_ref_1(each_ref_fn fn,
+				      struct upload_pack_data *data)
+{
+	/*
+	 * If `data->allow_uor` allows fetching hidden refs, we need to
+	 * mark all references (including hidden ones), to check in
+	 * `is_our_ref()` below.
+	 *
+	 * Otherwise, we only care about whether each reference's object
+	 * has the OUR_REF bit set or not, so do not need to visit
+	 * hidden references.
+	 */
+	if (allow_hidden_refs(data->allow_uor))
+		for_each_namespaced_ref(NULL, fn, data);
+	else
+		for_each_namespaced_ref(data->hidden_refs.v, fn, data);
+}
+
 static int is_our_ref(struct object *o, enum allow_uor allow_uor)
 {
-	int allow_hidden_ref = (allow_uor &
-				(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
-	return o->flags & ((allow_hidden_ref ? HIDDEN_REF : 0) | OUR_REF);
+	return o->flags & ((allow_hidden_refs(allow_uor) ? HIDDEN_REF : 0) | OUR_REF);
 }
 
 /*
@@ -855,7 +876,7 @@ static void deepen(struct upload_pack_data *data, int depth)
 		 * marked with OUR_REF.
 		 */
 		head_ref_namespaced(check_ref, data);
-		for_each_namespaced_ref(NULL, check_ref, data);
+		for_each_namespaced_ref_1(check_ref, data);
 
 		get_reachable_list(data, &reachable_shallows);
 		result = get_shallow_commits(&reachable_shallows,
@@ -1386,7 +1407,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		if (advertise_refs)
 			data.no_done = 1;
 		head_ref_namespaced(send_ref, &data);
-		for_each_namespaced_ref(NULL, send_ref, &data);
+		for_each_namespaced_ref_1(send_ref, &data);
 		if (!data.sent_capabilities) {
 			const char *refname = "capabilities^{}";
 			write_v0_ref(&data, refname, refname, null_oid());
@@ -1400,7 +1421,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
 		packet_flush(1);
 	} else {
 		head_ref_namespaced(check_ref, &data);
-		for_each_namespaced_ref(NULL, check_ref, &data);
+		for_each_namespaced_ref_1(check_ref, &data);
 	}
 
 	if (!advertise_refs) {
-- 
2.41.0.44.gf2359540d2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 16/16] ls-refs.c: avoid enumerating hidden refs where possible
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (14 preceding siblings ...)
  2023-06-20 14:22   ` [PATCH v4 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
@ 2023-06-20 14:22   ` Taylor Blau
  2023-07-03  6:27     ` Jeff King
  2023-07-03  6:29   ` [PATCH v4 00/16] refs: implement jump lists for packed backend Jeff King
  16 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-06-20 14:22 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

In a similar fashion as in previous commits, teach `ls-refs` to avoid
enumerating hidden references where possible.

As before, this is linux.git with one hidden reference per commit.

    $ hyperfine -L v ,.compile 'git{v} -c protocol.version=2 ls-remote .'
    Benchmark 1: git -c protocol.version=2 ls-remote .
      Time (mean ± σ):      89.8 ms ±   0.6 ms    [User: 84.3 ms, System: 5.7 ms]
      Range (min … max):    88.8 ms …  91.3 ms    32 runs

    Benchmark 2: git.compile -c protocol.version=2 ls-remote .
      Time (mean ± σ):       6.5 ms ±   0.1 ms    [User: 2.4 ms, System: 4.3 ms]
      Range (min … max):     6.2 ms …   8.3 ms    397 runs

    Summary
      'git.compile -c protocol.version=2 ls-remote .' ran
       13.85 ± 0.33 times faster than 'git -c protocol.version=2 ls-remote .'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ls-refs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ls-refs.c b/ls-refs.c
index 8c3181d051..c9a723ba89 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -193,7 +193,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 		strvec_push(&data.prefixes, "");
 	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
 					  get_git_namespace(), data.prefixes.v,
-					  NULL, send_ref, &data);
+					  data.hidden_refs.v, send_ref, &data);
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
-- 
2.41.0.44.gf2359540d2

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-06-20 12:05       ` Taylor Blau
@ 2023-06-20 18:49         ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2023-06-20 18:49 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Jeff King, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

>> > +static const char *ptr_max(const char *x, const char *y)
>> > +{
>> > +	if (x > y)
>> > +		return x;
>> > +	return y;
>> > +}
>>
>> Hopefully the compiler would inline the function without being told.
>>
>> These pointers point into the same mmapped region of contiguous
>> memory that holds the contents of the packed-refs file, so
>> comparison between them is always defined.  Good.
>>
>> I wondered if
>>
>> 	return (x > y) ? x : y;
>>
>> is easier to read, simply because it treats both cases more equally
>> (in other words, as written, (x>y) appears more "special"), but that
>> is minor.
>
> Yeah, I think that any reasonable compiler would almost certainly inline
> this, especially at higher optimization levels. But I agree with your
> suggestion nonetheless, thanks.

Having seen how this is used (only at a single callsite), I actually
think that special casing (x>y) is the right thing to do, especially
if you inline it in the caller.  That is,

	if (last_disjoint->end < ours->end)
		last_disjoint->end = ours->end;

reads much more naturally than

	last_disjoint->end = (last_disjoint->end > ours->end)
		? last_disjoint->end : ours_end;

as a way to say "if ours is larger, record it as the largest
position we have seen so far".

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 01/16] refs.c: rename `ref_filter`
  2023-06-20 14:21   ` [PATCH v4 01/16] refs.c: rename `ref_filter` Taylor Blau
@ 2023-07-03  5:13     ` Jeff King
  0 siblings, 0 replies; 149+ messages in thread
From: Jeff King @ 2023-07-03  5:13 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:21:00AM -0400, Taylor Blau wrote:

> From: Jeff King <peff@peff.net>
> 
> The refs machinery has its own implementation of a `ref_filter` (used by
> `for-each-ref`), which is distinct from the `ref-filler.h` API (also
> used by `for-each-ref`, among other things).

Small typo here: s/filler/filter/

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 02/16] ref-filter.h: provide `REF_FILTER_INIT`
  2023-06-20 14:21   ` [PATCH v4 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
@ 2023-07-03  5:15     ` Jeff King
  2023-07-03 17:07       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Jeff King @ 2023-07-03  5:15 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:21:06AM -0400, Taylor Blau wrote:

> From: Jeff King <peff@peff.net>
> 
> Provide a sane initialization value for `struct ref_filter`, which in a
> subsequent patch will be used to initialize a new field.
> 
> In the meantime, fix a case in test-reach.c where its `ref_filter` is
> not even zero-initialized.

This test-reach case scared me, but it happens to work now because
commit_contains() only looks at the one field that we set. So we're not
fixing a bug, but more like a bug waiting to happen. :)

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 03/16] ref-filter: clear reachable list pointers after freeing
  2023-06-20 14:21   ` [PATCH v4 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
@ 2023-07-03  5:16     ` Jeff King
  0 siblings, 0 replies; 149+ messages in thread
From: Jeff King @ 2023-07-03  5:16 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:21:11AM -0400, Taylor Blau wrote:

> From: Jeff King <peff@peff.net>
> 
> In reach_filter(), we pop all commits from the reachable lists, leaving
> them empty. But because we're operating on a list pointer that was
> passed by value, the original filter.reachable_from pointer is left
> dangling.

Yep. This isn't a bug (yet) because nobody looks at the now-dangling
pointer. So as with the last patch, we're future-proofing ourselves
against dangerous situations.

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 04/16] ref-filter: add `ref_filter_clear()`
  2023-06-20 14:21   ` [PATCH v4 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
@ 2023-07-03  5:19     ` Jeff King
  2023-07-03 17:13       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Jeff King @ 2023-07-03  5:19 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:21:16AM -0400, Taylor Blau wrote:

> From: Jeff King <peff@peff.net>
> 
> We did not bother to clean up at all in `git branch` or `git tag`, and
> `git for-each-ref` only cleans up a couple of members.
> 
> Add and call `ref_filter_clear()` when cleaning up a `struct
> ref_filter`. Running this patch (without any test changes) indicates a
> couple of now leak-free tests. This was found by running:
> 
>     $ make SANITIZE=leak
>     $ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate
> 
> (Note that the `reachable_from` and `unreachable_from` lists should be
> cleaned as they are used. So this is just covering any case where we
> might bail before running the reachability check.)

And this is the one that benefits from the earlier future-proofing. :)

(In case anyone is wondering why I am reviewing my own commits, it's
because Taylor and I worked on this topic together off-list, but he
wrote the commit messages after I dumped a bunch of cleanups on him).

> +void ref_filter_init(struct ref_filter *filter)
> +{
> +	struct ref_filter blank = REF_FILTER_INIT;
> +	memcpy(filter, &blank, sizeof(blank));
> +}

I was a little surprised by adding init() here, but we need it at the
end of clear(). So this is an OK place for it (the other option would be
in the earlier INIT patch, but it would be unused until now).

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns
  2023-06-20 14:21   ` [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
@ 2023-07-03  5:27     ` Jeff King
  2023-07-03 17:18       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Jeff King @ 2023-07-03  5:27 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:21:21AM -0400, Taylor Blau wrote:

> Once we start passing either in, `match_pattern()` will have little to
> do with a particular `struct ref_filter *` instance. To clarify this,
> drop it from the argument list, and replace it with the only bit of the
> `ref_filter` that we care about (`filter->ignore_case`).

Makes sense, but...

> @@ -2134,9 +2134,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
>   * matches a pattern "refs/heads/" but not "refs/heads/m") or a
>   * wildcard (e.g. the same ref matches "refs/heads/m*", too).
>   */
> -static int match_name_as_path(const struct ref_filter *filter, const char *refname)
> +static int match_name_as_path(const struct ref_filter *filter,
> +			      const char **pattern,
> +			      const char *refname)

...wouldn't we then want to do the same for match_name_as_path()?

I.e., this:

diff --git a/ref-filter.c b/ref-filter.c
index 6aacb99be7..cf10c753e2 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2134,14 +2134,13 @@ static int match_pattern(const char **patterns, const char *refname,
  * matches a pattern "refs/heads/" but not "refs/heads/m") or a
  * wildcard (e.g. the same ref matches "refs/heads/m*", too).
  */
-static int match_name_as_path(const struct ref_filter *filter,
-			      const char **pattern,
-			      const char *refname)
+static int match_name_as_path(const char **pattern, const char *refname,
+			      int ignore_case)
 {
 	int namelen = strlen(refname);
 	unsigned flags = WM_PATHNAME;
 
-	if (filter->ignore_case)
+	if (ignore_case)
 		flags |= WM_CASEFOLD;
 
 	for (; *pattern; pattern++) {
@@ -2166,7 +2165,8 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
 	if (!*filter->name_patterns)
 		return 1; /* No pattern always matches */
 	if (filter->match_as_path)
-		return match_name_as_path(filter, filter->name_patterns, refname);
+		return match_name_as_path(filter->name_patterns, refname,
+					  filter->ignore_case);
 	return match_pattern(filter->name_patterns, refname,
 			     filter->ignore_case);
 }

Also, I noticed that you declared it as "const int ignore_case" in
match_pattern(). That's not wrong, but we usually do not bother (it is
passed by value, so const-ness is irrelevant to the caller, and the
compiler can see inside the function that the value is not changed and
optimize appropriately).

-Peff

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-06-20 14:21   ` [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) Taylor Blau
@ 2023-07-03  5:56     ` Jeff King
  2023-07-03 17:38       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Jeff King @ 2023-07-03  5:56 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:21:42AM -0400, Taylor Blau wrote:

> Second, note that the jump list is best-effort, since we do not handle
> loose references, and because of the meta-character issue above. The
> jump list may not skip past all references which won't appear in the
> results, but will never skip over a reference which does appear in the
> result set.

I wonder if we should be advertising this in a docstring comment above
the relevant function. The problem may be that there are several such
functions. I just think that it's a gotcha that may affect somebody who
wants to call the function, and they're not going to think to dig up
this commit message.

>     $ hyperfine \
>       'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
>       'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
>     Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
>       Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
>       Range (min … max):   800.0 ms … 807.7 ms    10 runs
> 
>     Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
>       Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
>       Range (min … max):     4.3 ms …   6.7 ms    422 runs
> 
>     Summary
>       'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
>       172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'

This measurement is cheating a little, I think, because the earlier
patch to implement --exclude sped that up from ~800ms to ~100ms (because
we avoid writing and all of the ref-filter malloc slowness for the
excluded entries). So the better comparison is between two invocations
with "--exclude", but before/after this patch. You should still see a
20x speedup (100ms down to 5).

> @@ -802,14 +826,34 @@ struct packed_ref_iterator {
>   */
>  static int next_record(struct packed_ref_iterator *iter)
>  {
> -	const char *p = iter->pos, *eol;
> +	const char *p, *eol;
>  
>  	strbuf_reset(&iter->refname_buf);
>  
> +	/*
> +	 * If iter->pos is contained within a skipped region, jump past
> +	 * it.
> +	 *
> +	 * Note that each skipped region is considered at most once,
> +	 * since they are ordered based on their starting position.
> +	 */
> +	while (iter->jump_cur < iter->jump_nr) {
> +		struct jump_list_entry *curr = &iter->jump[iter->jump_cur];
> +		if (iter->pos < curr->start)
> +			break; /* not to the next jump yet */
> +
> +		iter->jump_cur++;
> +		if (iter->pos < curr->end) {
> +			iter->pos = curr->end;
> +			break;
> +		}
> +	}

It took me a minute to convince myself that this second "break" was
right. If we get to it, we know that iter->pos (the current record we
are looking at) is in the current jump region. So it makes sense to
advance to curr->end. But might we hit another jump region immediately?

I guess not, because earlier we would have coalesced the jump regions.
So either there is a non-excluded entry there _or_ we would have
coalesced the later region into a single larger region. So breaking is
the right thing to do.

> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		struct jump_list_entry *e;
> +
> +		/*
> +		 * We can't feed any excludes with globs in them to the
> +		 * refs machinery.  It only understands prefix matching.
> +		 * We likewise can't even feed the string leading up to
> +		 * the first meta-character, as something like "foo[a]"
> +		 * should not exclude "foobar" (but the prefix "foo"
> +		 * would match that and mark it for exclusion).
> +		 */
> +		if (has_glob_special(*pattern))
> +			continue;

OK, and here's where we could split "foo[ac]" into "fooa" and "foob" if
we wanted. But I think it is a very good idea to leave that out of this
initial patch. :)

> +	/*
> +	 * As an optimization, merge adjacent entries in the jump list
> +	 * to jump forwards as far as possible when entering a skipped
> +	 * region.
> +	 *
> +	 * For example, if we have two skipped regions:
> +	 *
> +	 *	[[A, B], [B, C]]
> +	 *
> +	 * we want to combine that into a single entry jumping from A to
> +	 * C.
> +	 */
> +	last_disjoint = iter->jump;
> +
> +	for (i = 1, j = 1; i < iter->jump_nr; i++) {
> +		struct jump_list_entry *ours = &iter->jump[i];
> +
> +		if (ours->start == ours->end) {
> +			/* ignore empty regions (no matching entries) */
> +			continue;

Dropping empty regions makes sense, but our iteration starts at "1"
(because the rest of the checks are inherently looking at last_disjoint
before deciding if each region is worth keeping). So we'd fail to throw
away iter->jump[0] if it is empty, I think.

That could be fixed here by iterating from 0 and checking for a NULL
last_disjoint, but maybe it would be easier to avoid allocating at all
in the earlier loop, when we find that start == end?

> +		} else if (ours->start <= last_disjoint->end) {
> +			/* overlapping regions extend the previous one */
> +			last_disjoint->end = last_disjoint->end > ours->end
> +				? last_disjoint->end : ours->end;

OK, this covers both ([A,C],[B,D]) via "<" and ([A,B],[B,C]) via "=".
Good.

> +		} else {
> +			/* otherwise, insert a new region */
> +			iter->jump[j++] = *ours;
> +			last_disjoint = ours;
> +
> +		}

And this is the rest. Good. There's an extra blank line here before the
closing brace.

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 11/16] revision.h: store hidden refs in a `strvec`
  2023-06-20 14:21   ` [PATCH v4 11/16] revision.h: store hidden refs in a `strvec` Taylor Blau
@ 2023-07-03  5:59     ` Jeff King
  0 siblings, 0 replies; 149+ messages in thread
From: Jeff King @ 2023-07-03  5:59 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:21:57AM -0400, Taylor Blau wrote:

> In subsequent commits, it will be convenient to have a 'const char **'
> of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`,
> etc.), instead of a `string_list`.
> 
> Convert spots throughout the tree that store the list of hidden refs
> from a `string_list` to a `strvec`.
> 
> Note that in `parse_hide_refs_config()` there is an ugly const-cast used
> to avoid an extra copy of each value before trimming any trailing slash
> characters. This could instead be written as:
> 
>     ref = xstrdup(value);
>     len = strlen(ref);
>     while (len && ref[len - 1] == '/')
>             ref[--len] = '\0';
>     strvec_push(hide_refs, ref);
>     free(ref);
> 
> but the double-copy (once when calling `xstrdup()`, and another via
> `strvec_push()`) is wasteful.

I saw strvec_push_nodup() suggested here. I'm OK leaving it like this
for now, but I do think we'll want that in the long run.

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 12/16] refs/packed-backend.c: ignore complicated hidden refs rules
  2023-06-20 14:22   ` [PATCH v4 12/16] refs/packed-backend.c: ignore complicated hidden refs rules Taylor Blau
@ 2023-07-03  6:18     ` Jeff King
  2023-07-04 18:22       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Jeff King @ 2023-07-03  6:18 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:22:02AM -0400, Taylor Blau wrote:

> In subsequent commits, we'll teach `receive-pack` and `upload-pack` to
> use the new jump list feature in the packed-refs iterator by ignoring
> references which are mentioned via its respective hideRefs lists.
> 
> However, the packed-ref jump lists cannot handle un-hiding rules (that
> begin with '!'), or namespace comparisons (that begin with '^'). Detect
> and avoid these cases by falling back to the normal enumeration without
> a jump list when such patterns exist.

I'm a fan of punting on such cases to keep things simple and
incremental. But the location here seems weird to me:

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 80b877e00c..2aeec5c601 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1008,6 +1008,25 @@ static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
>  	if (!excluded_patterns)
>  		return;
>  
> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		/*
> +		 * We also can't feed any excludes from hidden refs
> +		 * config sections, since later rules may override
> +		 * previous ones. For example, with rules "refs/foo" and
> +		 * "!refs/foo/bar", we should show "refs/foo/bar" (and
> +		 * everything underneath it), but the earlier exclusion
> +		 * would cause us to skip all of "refs/foo". We likewise
> +		 * don't implement the namespace stripping required for
> +		 * '^' rules.
> +		 *
> +		 * Both are possible to do, but complicated, so avoid
> +		 * populating the jump list at all if we see either of
> +		 * these patterns.
> +		 */
> +		if (**pattern == '!' || **pattern == '^')
> +			return;
> +	}
> +

This is deep in the packed-refs code, but the magic of "!" and "^" are
specific to ref_is_hidden().

So if I did:

  git for-each-ref --exclude='!refs/heads/foo'

my understanding is that "!" would _not_ have an affect normally, but
now it is turning off this optimization.

The point may be somewhat academic for "^", as it is not allowed in a
refname anyway. But I don't think "!" is forbidden (as stupid as it
would be to include it in this way), is it?

It feels like the hiderefs code should be the one checking for these,
and then feeding only non-adorned refnames to the "exclude" list (though
there is no need to un-adorn them; once we see any with either form of
magic, we know we cannot use this "exclude" feature at all).

Something along the lines of (you'd want a similar tweak for
upload-pack):

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 2b2faa5d18..80a6b11c90 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -339,7 +339,8 @@ static void write_head_info(void)
 	static struct oidset seen = OIDSET_INIT;
 
 	refs_for_each_fullref_in(get_main_ref_store(the_repository), "",
-				 hidden_refs.v, show_ref_cb, &seen);
+				 hidden_refs_to_excludes(&hidden_refs),
+				 show_ref_cb, &seen);
 	for_each_alternate_ref(show_one_alternate_ref, &seen);
 	oidset_clear(&seen);
 	if (!sent_capabilities)
diff --git a/refs.c b/refs.c
index 3065e514fd..213412efd4 100644
--- a/refs.c
+++ b/refs.c
@@ -1482,6 +1482,31 @@ int ref_is_hidden(const char *refname, const char *refname_full,
 	return 0;
 }
 
+const char **hidden_refs_to_excludes(const struct strvec *hide_refs)
+{
+	const char **pattern;
+
+	for (pattern = hide_refs->v; *pattern; pattern++) {
+		/*
+		 * We also can't feed any excludes from hidden refs
+		 * config sections, since later rules may override
+		 * previous ones. For example, with rules "refs/foo" and
+		 * "!refs/foo/bar", we should show "refs/foo/bar" (and
+		 * everything underneath it), but the earlier exclusion
+		 * would cause us to skip all of "refs/foo". We likewise
+		 * don't implement the namespace stripping required for
+		 * '^' rules.
+		 *
+		 * Both are possible to do, but complicated, so avoid
+		 * populating the jump list at all if we see either of
+		 * these patterns.
+		 */
+		if (**pattern == '!' || **pattern == '^')
+			return NULL;
+	}
+	return hide_refs->v;
+}
+
 const char *find_descendant_ref(const char *dirname,
 				const struct string_list *extras,
 				const struct string_list *skip)
diff --git a/refs.h b/refs.h
index 27d341d282..50c92d1f55 100644
--- a/refs.h
+++ b/refs.h
@@ -829,6 +829,8 @@ int parse_hide_refs_config(const char *var, const char *value, const char *,
  */
 int ref_is_hidden(const char *, const char *, const struct strvec *);
 
+const char **hidden_refs_to_excludes(const struct strvec *hide_refs);
+
 /* Is this a per-worktree ref living in the refs/ namespace? */
 int is_per_worktree_ref(const char *refname);
 
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 9dd1795bf2..59c3fe9d91 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1009,25 +1009,6 @@ static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
 	if (!excluded_patterns)
 		return;
 
-	for (pattern = excluded_patterns; *pattern; pattern++) {
-		/*
-		 * We also can't feed any excludes from hidden refs
-		 * config sections, since later rules may override
-		 * previous ones. For example, with rules "refs/foo" and
-		 * "!refs/foo/bar", we should show "refs/foo/bar" (and
-		 * everything underneath it), but the earlier exclusion
-		 * would cause us to skip all of "refs/foo". We likewise
-		 * don't implement the namespace stripping required for
-		 * '^' rules.
-		 *
-		 * Both are possible to do, but complicated, so avoid
-		 * populating the jump list at all if we see either of
-		 * these patterns.
-		 */
-		if (**pattern == '!' || **pattern == '^')
-			return;
-	}
-
 	for (pattern = excluded_patterns; *pattern; pattern++) {
 		struct jump_list_entry *e;
 

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 15/16] upload-pack.c: avoid enumerating hidden refs where possible
  2023-06-20 14:22   ` [PATCH v4 15/16] upload-pack.c: avoid enumerating hidden refs where possible Taylor Blau
@ 2023-07-03  6:26     ` Jeff King
  2023-07-04 18:43       ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Jeff King @ 2023-07-03  6:26 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:22:17AM -0400, Taylor Blau wrote:

> In a similar fashion as a previous commit, teach `upload-pack` to avoid
> enumerating hidden references where possible.
> 
> Note, however, that there are certain cases where cannot avoid
> enumerating even hidden references, in particular when either of:
> 
>   - `uploadpack.allowTipSHA1InWant`, or
>   - `uploadpack.allowReachableSHA1InWant`
> 
> are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
> respectively.
> 
> When either of these bits are set, upload-pack's `is_our_ref()` function
> needs to consider the `HIDDEN_REF` bit of the referent's object flags.
> So we must visit all references, including the hidden ones, in order to
> mark their referents with the `HIDDEN_REF` bit.
> 
> When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
> `is_our_ref()` function considers only the `OUR_REF` bit, and not the
> `HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
> to objects at the tips of non-hidden references, so we do not need to
> visit hidden references in this case.

Both of these are noops if uploadpack.allowAnySHA1InWant is set, or if
we are using the v2 protocol. Setting both allowAny and allowTip is
sufficiently stupid that I don't care much that they lose out on an
optimization. But I could see somebody setting one of those for the
benefit of v0 clients, but then also serving v2 clients (which
effectively ignore those restrictions).

I guess v2 clients don't hit this code at all (they are handled by
ls-refs, which comes in the next patch). So that just leaves the case
that allowAny is set. By itself the optimization should kick in (good).
With allowTip or allowReachable it would not, but that is the "this is
stupid" case in which it is OK to fall back to the existing behavior
(even though we _could_ enable the optimization). OTOH, it would be easy
to check it, as it's just another bit in allow_uor.

I'm OK going either way.

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 16/16] ls-refs.c: avoid enumerating hidden refs where possible
  2023-06-20 14:22   ` [PATCH v4 16/16] ls-refs.c: " Taylor Blau
@ 2023-07-03  6:27     ` Jeff King
  0 siblings, 0 replies; 149+ messages in thread
From: Jeff King @ 2023-07-03  6:27 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:22:22AM -0400, Taylor Blau wrote:

> In a similar fashion as in previous commits, teach `ls-refs` to avoid
> enumerating hidden references where possible.
> 
> As before, this is linux.git with one hidden reference per commit.
> 
>     $ hyperfine -L v ,.compile 'git{v} -c protocol.version=2 ls-remote .'
>     Benchmark 1: git -c protocol.version=2 ls-remote .
>       Time (mean ± σ):      89.8 ms ±   0.6 ms    [User: 84.3 ms, System: 5.7 ms]
>       Range (min … max):    88.8 ms …  91.3 ms    32 runs
> 
>     Benchmark 2: git.compile -c protocol.version=2 ls-remote .
>       Time (mean ± σ):       6.5 ms ±   0.1 ms    [User: 2.4 ms, System: 4.3 ms]
>       Range (min … max):     6.2 ms …   8.3 ms    397 runs

Very nice. I think this may have big real-world consequences for certain
repositories on forges (where they may accrue a large number of hidden
metadata like refs/pull).

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 00/16] refs: implement jump lists for packed backend
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
                     ` (15 preceding siblings ...)
  2023-06-20 14:22   ` [PATCH v4 16/16] ls-refs.c: " Taylor Blau
@ 2023-07-03  6:29   ` Jeff King
  16 siblings, 0 replies; 149+ messages in thread
From: Jeff King @ 2023-07-03  6:29 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Tue, Jun 20, 2023 at 10:20:51AM -0400, Taylor Blau wrote:

> Here is another reroll of my series to implement jump (née skip) lists
> for the packed refs backend, based on top of the current 'master'.
> 
> Most changes are minor, limited to changing variable names, moving
> changes around between patches and tweaking commit messages for clarity.
> I think that the first 9 or so patches are stable, but we may want some
> more eyes on the remainder.

I had been avoiding reading this series too carefully, as I was involved
in many of the early patches. But now it's been long enough that I
mostly forgot everything, so I could look at it with fresh eyes. ;) Plus
the hidden-refs bits at the end were totally new to me.

It looks good to me overall. I left a few small comments, some of which
I think probably justify a re-roll.

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 02/16] ref-filter.h: provide `REF_FILTER_INIT`
  2023-07-03  5:15     ` Jeff King
@ 2023-07-03 17:07       ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-07-03 17:07 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Mon, Jul 03, 2023 at 01:15:04AM -0400, Jeff King wrote:
> > In the meantime, fix a case in test-reach.c where its `ref_filter` is
> > not even zero-initialized.
>
> This test-reach case scared me, but it happens to work now because
> commit_contains() only looks at the one field that we set. So we're not
> fixing a bug, but more like a bug waiting to happen. :)

Good point, I updated the commit message to more clearly signal that
this isn't fixing a bug, but rather preventing one.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 04/16] ref-filter: add `ref_filter_clear()`
  2023-07-03  5:19     ` Jeff King
@ 2023-07-03 17:13       ` Taylor Blau
  2023-07-03 17:32         ` Jeff King
  0 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-07-03 17:13 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Mon, Jul 03, 2023 at 01:19:46AM -0400, Jeff King wrote:
> > +void ref_filter_init(struct ref_filter *filter)
> > +{
> > +	struct ref_filter blank = REF_FILTER_INIT;
> > +	memcpy(filter, &blank, sizeof(blank));
> > +}
>
> I was a little surprised by adding init() here, but we need it at the
> end of clear(). So this is an OK place for it (the other option would be
> in the earlier INIT patch, but it would be unused until now).

I used to write more patches in this style where I would add as much of
a new API as possible as early as possible in the series. But I think
reviewers seem to have an easier time reviewing API additions in the
same patch that adds their caller.

So I tend to agree that this patch is probably a good spot to introduce
`ref_filter_init()`. But if you feel strongly, I'm happy to drag it
around.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns
  2023-07-03  5:27     ` Jeff King
@ 2023-07-03 17:18       ` Taylor Blau
  2023-07-03 17:22         ` Taylor Blau
  0 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-07-03 17:18 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Mon, Jul 03, 2023 at 01:27:24AM -0400, Jeff King wrote:
> On Tue, Jun 20, 2023 at 10:21:21AM -0400, Taylor Blau wrote:
>
> > Once we start passing either in, `match_pattern()` will have little to
> > do with a particular `struct ref_filter *` instance. To clarify this,
> > drop it from the argument list, and replace it with the only bit of the
> > `ref_filter` that we care about (`filter->ignore_case`).
>
> Makes sense, but...
>
> > @@ -2134,9 +2134,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
> >   * matches a pattern "refs/heads/" but not "refs/heads/m") or a
> >   * wildcard (e.g. the same ref matches "refs/heads/m*", too).
> >   */
> > -static int match_name_as_path(const struct ref_filter *filter, const char *refname)
> > +static int match_name_as_path(const struct ref_filter *filter,
> > +			      const char **pattern,
> > +			      const char *refname)
>
> ...wouldn't we then want to do the same for match_name_as_path()?

Yes, definitely :-). I'm not sure how I missed this, since the patch
message even says that `match_name_as_path()` gets the same treatment as
the other function.

But in any case, I obviously agree (and the diff below makes sense to
me). Applied.

> Also, I noticed that you declared it as "const int ignore_case" in
> match_pattern(). That's not wrong, but we usually do not bother (it is
> passed by value, so const-ness is irrelevant to the caller, and the
> compiler can see inside the function that the value is not changed and
> optimize appropriately).

Indeed :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns
  2023-07-03 17:18       ` Taylor Blau
@ 2023-07-03 17:22         ` Taylor Blau
  2023-07-03 17:33           ` Jeff King
  0 siblings, 1 reply; 149+ messages in thread
From: Taylor Blau @ 2023-07-03 17:22 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Mon, Jul 03, 2023 at 01:18:06PM -0400, Taylor Blau wrote:
> On Mon, Jul 03, 2023 at 01:27:24AM -0400, Jeff King wrote:
> > On Tue, Jun 20, 2023 at 10:21:21AM -0400, Taylor Blau wrote:
> >
> > > Once we start passing either in, `match_pattern()` will have little to
> > > do with a particular `struct ref_filter *` instance. To clarify this,
> > > drop it from the argument list, and replace it with the only bit of the
> > > `ref_filter` that we care about (`filter->ignore_case`).
> >
> > Makes sense, but...
> >
> > > @@ -2134,9 +2134,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
> > >   * matches a pattern "refs/heads/" but not "refs/heads/m") or a
> > >   * wildcard (e.g. the same ref matches "refs/heads/m*", too).
> > >   */
> > > -static int match_name_as_path(const struct ref_filter *filter, const char *refname)
> > > +static int match_name_as_path(const struct ref_filter *filter,
> > > +			      const char **pattern,
> > > +			      const char *refname)
> >
> > ...wouldn't we then want to do the same for match_name_as_path()?
>
> Yes, definitely :-). I'm not sure how I missed this, since the patch
> message even says that `match_name_as_path()` gets the same treatment as
> the other function.
>
> But in any case, I obviously agree (and the diff below makes sense to
> me). Applied.

Ah, this is missing one more spot (that we wouldn't complain about
during a non-DEVELOPER build). This needs to go on top, but I otherwise
agree:

--- 8< ---
diff --git a/ref-filter.c b/ref-filter.c
index de85904b8d..845173a904 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2176,7 +2176,8 @@ static int filter_exclude_match(struct ref_filter *filter, const char *refname)
 	if (!filter->exclude.nr)
 		return 0;
 	if (filter->match_as_path)
-		return match_name_as_path(filter, filter->exclude.v, refname);
+		return match_name_as_path(filter->exclude.v, refname,
+					  filter->ignore_case);
 	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
 }
--- >8 ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 04/16] ref-filter: add `ref_filter_clear()`
  2023-07-03 17:13       ` Taylor Blau
@ 2023-07-03 17:32         ` Jeff King
  0 siblings, 0 replies; 149+ messages in thread
From: Jeff King @ 2023-07-03 17:32 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Mon, Jul 03, 2023 at 01:13:42PM -0400, Taylor Blau wrote:

> On Mon, Jul 03, 2023 at 01:19:46AM -0400, Jeff King wrote:
> > > +void ref_filter_init(struct ref_filter *filter)
> > > +{
> > > +	struct ref_filter blank = REF_FILTER_INIT;
> > > +	memcpy(filter, &blank, sizeof(blank));
> > > +}
> >
> > I was a little surprised by adding init() here, but we need it at the
> > end of clear(). So this is an OK place for it (the other option would be
> > in the earlier INIT patch, but it would be unused until now).
> 
> I used to write more patches in this style where I would add as much of
> a new API as possible as early as possible in the series. But I think
> reviewers seem to have an easier time reviewing API additions in the
> same patch that adds their caller.
> 
> So I tend to agree that this patch is probably a good spot to introduce
> `ref_filter_init()`. But if you feel strongly, I'm happy to drag it
> around.

Nope, I don't feel strongly at all. Let's leave it as you have it.

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 05/16] ref-filter.c: parameterize match functions over patterns
  2023-07-03 17:22         ` Taylor Blau
@ 2023-07-03 17:33           ` Jeff King
  0 siblings, 0 replies; 149+ messages in thread
From: Jeff King @ 2023-07-03 17:33 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Mon, Jul 03, 2023 at 01:22:28PM -0400, Taylor Blau wrote:

> On Mon, Jul 03, 2023 at 01:18:06PM -0400, Taylor Blau wrote:
> > On Mon, Jul 03, 2023 at 01:27:24AM -0400, Jeff King wrote:
> > > On Tue, Jun 20, 2023 at 10:21:21AM -0400, Taylor Blau wrote:
> > >
> > > > Once we start passing either in, `match_pattern()` will have little to
> > > > do with a particular `struct ref_filter *` instance. To clarify this,
> > > > drop it from the argument list, and replace it with the only bit of the
> > > > `ref_filter` that we care about (`filter->ignore_case`).
> > >
> > > Makes sense, but...
> > >
> > > > @@ -2134,9 +2134,10 @@ static int match_pattern(const struct ref_filter *filter, const char *refname)
> > > >   * matches a pattern "refs/heads/" but not "refs/heads/m") or a
> > > >   * wildcard (e.g. the same ref matches "refs/heads/m*", too).
> > > >   */
> > > > -static int match_name_as_path(const struct ref_filter *filter, const char *refname)
> > > > +static int match_name_as_path(const struct ref_filter *filter,
> > > > +			      const char **pattern,
> > > > +			      const char *refname)
> > >
> > > ...wouldn't we then want to do the same for match_name_as_path()?
> >
> > Yes, definitely :-). I'm not sure how I missed this, since the patch
> > message even says that `match_name_as_path()` gets the same treatment as
> > the other function.
> >
> > But in any case, I obviously agree (and the diff below makes sense to
> > me). Applied.
> 
> Ah, this is missing one more spot (that we wouldn't complain about
> during a non-DEVELOPER build). This needs to go on top, but I otherwise
> agree:

Heh, yes. I was applying them in order, and had to make the same fixup
on top of the next patch. I think that was the only other fallout as I
applied the rest, though.

-Peff

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 09/16] refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  2023-07-03  5:56     ` Jeff King
@ 2023-07-03 17:38       ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-07-03 17:38 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Mon, Jul 03, 2023 at 01:56:27AM -0400, Jeff King wrote:
> On Tue, Jun 20, 2023 at 10:21:42AM -0400, Taylor Blau wrote:
>
> > Second, note that the jump list is best-effort, since we do not handle
> > loose references, and because of the meta-character issue above. The
> > jump list may not skip past all references which won't appear in the
> > results, but will never skip over a reference which does appear in the
> > result set.
>
> I wonder if we should be advertising this in a docstring comment above
> the relevant function. The problem may be that there are several such
> functions. I just think that it's a gotcha that may affect somebody who
> wants to call the function, and they're not going to think to dig up
> this commit message.

Good idea, thanks.

> >     $ hyperfine \
> >       'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
> >       'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
> >     Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
> >       Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
> >       Range (min … max):   800.0 ms … 807.7 ms    10 runs
> >
> >     Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
> >       Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
> >       Range (min … max):     4.3 ms …   6.7 ms    422 runs
> >
> >     Summary
> >       'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
> >       172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
>
> This measurement is cheating a little, I think, because the earlier
> patch to implement --exclude sped that up from ~800ms to ~100ms (because
> we avoid writing and all of the ref-filter malloc slowness for the
> excluded entries). So the better comparison is between two invocations
> with "--exclude", but before/after this patch. You should still see a
> 20x speedup (100ms down to 5).

I agree. I included a build from the previous commit in this benchmark.
As expected, it's around ~100ms, but at least it gives readers a clearer
picture of how performance changes as a result of this patch.
(

> > @@ -802,14 +826,34 @@ struct packed_ref_iterator {
> >   */
> >  static int next_record(struct packed_ref_iterator *iter)
> >  {
> > -	const char *p = iter->pos, *eol;
> > +	const char *p, *eol;
> >
> >  	strbuf_reset(&iter->refname_buf);
> >
> > +	/*
> > +	 * If iter->pos is contained within a skipped region, jump past
> > +	 * it.
> > +	 *
> > +	 * Note that each skipped region is considered at most once,
> > +	 * since they are ordered based on their starting position.
> > +	 */
> > +	while (iter->jump_cur < iter->jump_nr) {
> > +		struct jump_list_entry *curr = &iter->jump[iter->jump_cur];
> > +		if (iter->pos < curr->start)
> > +			break; /* not to the next jump yet */
> > +
> > +		iter->jump_cur++;
> > +		if (iter->pos < curr->end) {
> > +			iter->pos = curr->end;
> > +			break;
> > +		}
> > +	}
>
> It took me a minute to convince myself that this second "break" was
> right. If we get to it, we know that iter->pos (the current record we
> are looking at) is in the current jump region. So it makes sense to
> advance to curr->end. But might we hit another jump region immediately?
>
> I guess not, because earlier we would have coalesced the jump regions.
> So either there is a non-excluded entry there _or_ we would have
> coalesced the later region into a single larger region. So breaking is
> the right thing to do.

Exactly. I added a short comment to this effect to hopefully avoid any
confusion here.

> > +	for (pattern = excluded_patterns; *pattern; pattern++) {
> > +		struct jump_list_entry *e;
> > +
> > +		/*
> > +		 * We can't feed any excludes with globs in them to the
> > +		 * refs machinery.  It only understands prefix matching.
> > +		 * We likewise can't even feed the string leading up to
> > +		 * the first meta-character, as something like "foo[a]"
> > +		 * should not exclude "foobar" (but the prefix "foo"
> > +		 * would match that and mark it for exclusion).
> > +		 */
> > +		if (has_glob_special(*pattern))
> > +			continue;
>
> OK, and here's where we could split "foo[ac]" into "fooa" and "foob" if
> we wanted. But I think it is a very good idea to leave that out of this
> initial patch. :)

Oh, definitely ;-).

> > +	/*
> > +	 * As an optimization, merge adjacent entries in the jump list
> > +	 * to jump forwards as far as possible when entering a skipped
> > +	 * region.
> > +	 *
> > +	 * For example, if we have two skipped regions:
> > +	 *
> > +	 *	[[A, B], [B, C]]
> > +	 *
> > +	 * we want to combine that into a single entry jumping from A to
> > +	 * C.
> > +	 */
> > +	last_disjoint = iter->jump;
> > +
> > +	for (i = 1, j = 1; i < iter->jump_nr; i++) {
> > +		struct jump_list_entry *ours = &iter->jump[i];
> > +
> > +		if (ours->start == ours->end) {
> > +			/* ignore empty regions (no matching entries) */
> > +			continue;
>
> Dropping empty regions makes sense, but our iteration starts at "1"
> (because the rest of the checks are inherently looking at last_disjoint
> before deciding if each region is worth keeping). So we'd fail to throw
> away iter->jump[0] if it is empty, I think.
>
> That could be fixed here by iterating from 0 and checking for a NULL
> last_disjoint, but maybe it would be easier to avoid allocating at all
> in the earlier loop, when we find that start == end?

Yeah, I agree with this. I think Patrick made a similar suggestion in an
earlier response, and I decided not to take it since it makes the patch
more verbose.

But I think that avoiding the empty region special case is worth it.
Thanks, both :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 12/16] refs/packed-backend.c: ignore complicated hidden refs rules
  2023-07-03  6:18     ` Jeff King
@ 2023-07-04 18:22       ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-07-04 18:22 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Mon, Jul 03, 2023 at 02:18:39AM -0400, Jeff King wrote:
> On Tue, Jun 20, 2023 at 10:22:02AM -0400, Taylor Blau wrote:
>
> > In subsequent commits, we'll teach `receive-pack` and `upload-pack` to
> > use the new jump list feature in the packed-refs iterator by ignoring
> > references which are mentioned via its respective hideRefs lists.
> >
> > However, the packed-ref jump lists cannot handle un-hiding rules (that
> > begin with '!'), or namespace comparisons (that begin with '^'). Detect
> > and avoid these cases by falling back to the normal enumeration without
> > a jump list when such patterns exist.
>
> I'm a fan of punting on such cases to keep things simple and
> incremental. But the location here seems weird to me:
>
> > diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> > index 80b877e00c..2aeec5c601 100644
> > --- a/refs/packed-backend.c
> > +++ b/refs/packed-backend.c
> > @@ -1008,6 +1008,25 @@ static void populate_excluded_jump_list(struct packed_ref_iterator *iter,
> >  	if (!excluded_patterns)
> >  		return;
> >
> > +	for (pattern = excluded_patterns; *pattern; pattern++) {
> > +		/*
> > +		 * We also can't feed any excludes from hidden refs
> > +		 * config sections, since later rules may override
> > +		 * previous ones. For example, with rules "refs/foo" and
> > +		 * "!refs/foo/bar", we should show "refs/foo/bar" (and
> > +		 * everything underneath it), but the earlier exclusion
> > +		 * would cause us to skip all of "refs/foo". We likewise
> > +		 * don't implement the namespace stripping required for
> > +		 * '^' rules.
> > +		 *
> > +		 * Both are possible to do, but complicated, so avoid
> > +		 * populating the jump list at all if we see either of
> > +		 * these patterns.
> > +		 */
> > +		if (**pattern == '!' || **pattern == '^')
> > +			return;
> > +	}
> > +
>
> This is deep in the packed-refs code, but the magic of "!" and "^" are
> specific to ref_is_hidden().
>
> So if I did:
>
>   git for-each-ref --exclude='!refs/heads/foo'
>
> my understanding is that "!" would _not_ have an affect normally, but
> now it is turning off this optimization.

Yeah, that makes sense -- I agree that it is silly to have a reference
with "!" at the beginning, but since it's allowed we should absolutely
support it.

> Something along the lines of (you'd want a similar tweak for
> upload-pack):

Yep. Here's the extra tweak for upload-pack:

--- 8< ---
commit 5a8902731b91a8fc6900586968a79ebc6272e502
Author: Taylor Blau <me@ttaylorr.com>
Date:   Tue Jul 4 14:21:22 2023 -0400

    fixup! upload-pack.c: avoid enumerating hidden refs where possible

diff --git a/upload-pack.c b/upload-pack.c
index 3a176a7209..ef2ca36feb 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -610,6 +610,7 @@ static int allow_hidden_refs(enum allow_uor allow_uor)
 static void for_each_namespaced_ref_1(each_ref_fn fn,
 				      struct upload_pack_data *data)
 {
+	const char **excludes = NULL;
 	/*
 	 * If `data->allow_uor` allows fetching hidden refs, we need to
 	 * mark all references (including hidden ones), to check in
@@ -619,12 +620,13 @@ static void for_each_namespaced_ref_1(each_ref_fn fn,
 	 * has the OUR_REF bit set or not, so do not need to visit
 	 * hidden references.
 	 */
-	if (allow_hidden_refs(data->allow_uor))
-		for_each_namespaced_ref(NULL, fn, data);
-	else
-		for_each_namespaced_ref(data->hidden_refs.v, fn, data);
+	if (!allow_hidden_refs(data->allow_uor))
+		excludes = hidden_refs_to_excludes(&data->hidden_refs);
+
+	for_each_namespaced_ref(excludes, fn, data);
 }

+
 static int is_our_ref(struct object *o, enum allow_uor allow_uor)
 {
 	return o->flags & ((allow_hidden_refs(allow_uor) ? HIDDEN_REF : 0) | OUR_REF);
--- >8 ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 15/16] upload-pack.c: avoid enumerating hidden refs where possible
  2023-07-03  6:26     ` Jeff King
@ 2023-07-04 18:43       ` Taylor Blau
  0 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-07-04 18:43 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Chris Torek, Derrick Stolee, Junio C Hamano,
	Patrick Steinhardt

On Mon, Jul 03, 2023 at 02:26:44AM -0400, Jeff King wrote:
> I guess v2 clients don't hit this code at all (they are handled by
> ls-refs, which comes in the next patch). So that just leaves the case
> that allowAny is set. By itself the optimization should kick in (good).
> With allowTip or allowReachable it would not, but that is the "this is
> stupid" case in which it is OK to fall back to the existing behavior
> (even though we _could_ enable the optimization). OTOH, it would be easy
> to check it, as it's just another bit in allow_uor.
>
> I'm OK going either way.

As am I, though I agree that checking for it is not hard. The return
value of `allow_hidden_refs()` as currently written is confusing, so
here's an inversion of that function with this suggestion on top:

--- >8 ---
diff --git a/upload-pack.c b/upload-pack.c
index ef2ca36feb..5a0fa028c6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -604,7 +604,9 @@ static int get_common_commits(struct upload_pack_data *data,

 static int allow_hidden_refs(enum allow_uor allow_uor)
 {
-	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
+	if ((allow_uor & ALLOW_ANY_SHA1) == ALLOW_ANY_SHA1)
+		return 1;
+	return !(allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
 }

 static void for_each_namespaced_ref_1(each_ref_fn fn,
@@ -620,7 +622,7 @@ static void for_each_namespaced_ref_1(each_ref_fn fn,
 	 * has the OUR_REF bit set or not, so do not need to visit
 	 * hidden references.
 	 */
-	if (!allow_hidden_refs(data->allow_uor))
+	if (allow_hidden_refs(data->allow_uor))
 		excludes = hidden_refs_to_excludes(&data->hidden_refs);

 	for_each_namespaced_ref(excludes, fn, data);
@@ -629,7 +631,7 @@ static void for_each_namespaced_ref_1(each_ref_fn fn,

 static int is_our_ref(struct object *o, enum allow_uor allow_uor)
 {
-	return o->flags & ((allow_hidden_refs(allow_uor) ? HIDDEN_REF : 0) | OUR_REF);
+	return o->flags & ((allow_hidden_refs(allow_uor) ? 0 : HIDDEN_REF) | OUR_REF);
 }

 /*
--- 8< ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 00/16] refs: implement jump lists for packed backend
  2023-05-08 21:59 [PATCH 00/15] refs: implement skip lists for packed backend Taylor Blau
                   ` (17 preceding siblings ...)
  2023-06-20 14:20 ` [PATCH v4 " Taylor Blau
@ 2023-07-10 21:12 ` Taylor Blau
  2023-07-10 21:12   ` [PATCH v5 01/16] refs.c: rename `ref_filter` Taylor Blau
                     ` (16 more replies)
  18 siblings, 17 replies; 149+ messages in thread
From: Taylor Blau @ 2023-07-10 21:12 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

Here is another reroll of my series to implement jump (née skip) lists
for the packed refs backend, based on top of the current 'master'.

This responds to a review of the entire series that Peff gave a couple
of weeks ago[1], which should satisfy all of the comments that he
raised, most of which were relatively minor.

The notable changes from last time are:

  - clarifying some commit messages, notably to explicitly state that
    certain clean-ups do not indicate existing bugs, but are defensive
    against potential future ones
  - polishing up some of the API changes, to avoid having
    `match_name_as_path()` take a ref_filter pointer.
  - more timing data and documentation in the first substantive patch
  - clean-up in the upload-pack.c to make the changes clearer

As usual, a range-diff is included below for convenience.

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/20230703062925.GK3502534@coredump.intra.peff.net/

Jeff King (5):
  refs.c: rename `ref_filter`
  ref-filter.h: provide `REF_FILTER_INIT`
  ref-filter: clear reachable list pointers after freeing
  ref-filter: add `ref_filter_clear()`
  ref-filter.c: parameterize match functions over patterns

Taylor Blau (11):
  builtin/for-each-ref.c: add `--exclude` option
  refs: plumb `exclude_patterns` argument throughout
  refs/packed-backend.c: refactor `find_reference_location()`
  refs/packed-backend.c: implement jump lists to avoid excluded
    pattern(s)
  refs/packed-backend.c: add trace2 counters for jump list
  revision.h: store hidden refs in a `strvec`
  refs.h: let `for_each_namespaced_ref()` take excluded patterns
  refs.h: implement `hidden_refs_to_excludes()`
  builtin/receive-pack.c: avoid enumerating hidden references
  upload-pack.c: avoid enumerating hidden refs where possible
  ls-refs.c: avoid enumerating hidden refs where possible

 Documentation/git-for-each-ref.txt |   6 +
 builtin/branch.c                   |   4 +-
 builtin/for-each-ref.c             |   7 +-
 builtin/receive-pack.c             |   8 +-
 builtin/tag.c                      |   4 +-
 http-backend.c                     |   2 +-
 ls-refs.c                          |   7 +-
 ref-filter.c                       |  66 +++++++---
 ref-filter.h                       |  12 ++
 refs.c                             |  85 ++++++++----
 refs.h                             |  29 +++-
 refs/debug.c                       |   5 +-
 refs/files-backend.c               |   5 +-
 refs/packed-backend.c              | 204 +++++++++++++++++++++++++----
 refs/refs-internal.h               |   7 +-
 revision.c                         |   4 +-
 revision.h                         |   5 +-
 t/helper/test-reach.c              |   2 +-
 t/helper/test-ref-store.c          |  10 ++
 t/t0041-usage.sh                   |   1 +
 t/t1419-exclude-refs.sh            | 122 +++++++++++++++++
 t/t3402-rebase-merge.sh            |   1 +
 t/t6300-for-each-ref.sh            |  35 +++++
 trace2.h                           |   2 +
 trace2/tr2_ctr.c                   |   5 +
 upload-pack.c                      |  47 +++++--
 26 files changed, 579 insertions(+), 106 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

Range-diff against v4:
 1:  c12def5a30a !  1:  5b5ccc40d6b refs.c: rename `ref_filter`
    @@ Commit message
         refs.c: rename `ref_filter`
     
         The refs machinery has its own implementation of a `ref_filter` (used by
    -    `for-each-ref`), which is distinct from the `ref-filler.h` API (also
    +    `for-each-ref`), which is distinct from the `ref-filter.h` API (also
         used by `for-each-ref`, among other things).
     
         Rename the one within refs.c to more clearly indicate its purpose.
 2:  7ce82b6a5a4 !  2:  0c4e995f1d3 ref-filter.h: provide `REF_FILTER_INIT`
    @@ Commit message
         Provide a sane initialization value for `struct ref_filter`, which in a
         subsequent patch will be used to initialize a new field.
     
    -    In the meantime, fix a case in test-reach.c where its `ref_filter` is
    -    not even zero-initialized.
    +    In the meantime, ensure that the `ref_filter` struct used in the
    +    test-helper's `cmd__reach()` is zero-initialized. The lack of
    +    initialization is OK, since `commit_contains()` only looks at the single
    +    `with_commit_tag_algo` field that *is* initialized directly above.
    +
    +    So this does not fix a bug, but rather prevents one from biting us in
    +    the future.
     
         Signed-off-by: Jeff King <peff@peff.net>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
 3:  7e6bf7766d0 !  3:  cea1e88d358 ref-filter: clear reachable list pointers after freeing
    @@ Metadata
      ## Commit message ##
         ref-filter: clear reachable list pointers after freeing
     
    -    In reach_filter(), we pop all commits from the reachable lists, leaving
    -    them empty. But because we're operating on a list pointer that was
    -    passed by value, the original filter.reachable_from pointer is left
    -    dangling.
    +    In `reach_filter()`, we pop all commits from the reachable lists,
    +    leaving them empty. But because we're operating on a list pointer that
    +    was passed by value, the original `filter.reachable_from` and
    +    `filter.unreachable_from` pointers are left dangling.
    +
    +    As is the case with the previous commit, nobody touches either of these
    +    fields after calling `reach_filter()`, so leaving them dangling is OK.
    +    But this future proofs against dangerous situations.
     
         Signed-off-by: Jeff King <peff@peff.net>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
 4:  777e71004d6 =  4:  00a4532d54d ref-filter: add `ref_filter_clear()`
 5:  39e9b0f50d0 !  5:  3ab03ac20df ref-filter.c: parameterize match functions over patterns
    @@ ref-filter.c: static int get_ref_atom_value(struct ref_array_item *ref, int atom
       */
     -static int match_pattern(const struct ref_filter *filter, const char *refname)
     +static int match_pattern(const char **patterns, const char *refname,
    -+			 const int ignore_case)
    ++			 int ignore_case)
      {
     -	const char **patterns = filter->name_patterns;
      	unsigned flags = 0;
    @@ ref-filter.c: static int match_pattern(const struct ref_filter *filter, const ch
       * wildcard (e.g. the same ref matches "refs/heads/m*", too).
       */
     -static int match_name_as_path(const struct ref_filter *filter, const char *refname)
    -+static int match_name_as_path(const struct ref_filter *filter,
    -+			      const char **pattern,
    -+			      const char *refname)
    ++static int match_name_as_path(const char **pattern, const char *refname,
    ++			      int ignore_case)
      {
     -	const char **pattern = filter->name_patterns;
      	int namelen = strlen(refname);
      	unsigned flags = WM_PATHNAME;
      
    +-	if (filter->ignore_case)
    ++	if (ignore_case)
    + 		flags |= WM_CASEFOLD;
    + 
    + 	for (; *pattern; pattern++) {
     @@ ref-filter.c: static int filter_pattern_match(struct ref_filter *filter, const char *refname)
      	if (!*filter->name_patterns)
      		return 1; /* No pattern always matches */
      	if (filter->match_as_path)
     -		return match_name_as_path(filter, refname);
     -	return match_pattern(filter, refname);
    -+		return match_name_as_path(filter, filter->name_patterns, refname);
    ++		return match_name_as_path(filter->name_patterns, refname,
    ++					  filter->ignore_case);
     +	return match_pattern(filter->name_patterns, refname,
     +			     filter->ignore_case);
      }
 6:  c4fd47fd750 !  6:  aa881ca06fa builtin/for-each-ref.c: add `--exclude` option
    @@ ref-filter.c: static int filter_pattern_match(struct ref_filter *filter, const c
     +	if (!filter->exclude.nr)
     +		return 0;
     +	if (filter->match_as_path)
    -+		return match_name_as_path(filter, filter->exclude.v, refname);
    ++		return match_name_as_path(filter->exclude.v, refname,
    ++					  filter->ignore_case);
     +	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
     +}
     +
 7:  e6b50c50219 =  7:  81e223de0c8 refs: plumb `exclude_patterns` argument throughout
 8:  a0990b2916c =  8:  25c099a528c refs/packed-backend.c: refactor `find_reference_location()`
 9:  386ed468fa7 !  9:  af0ce43cc90 refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
    @@ Commit message
     
             $ hyperfine \
               'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
    +          'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' \
               'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
             Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
    -          Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
    -          Range (min … max):   800.0 ms … 807.7 ms    10 runs
    +          Time (mean ± σ):     798.1 ms ±   3.3 ms    [User: 687.6 ms, System: 146.4 ms]
    +          Range (min … max):   794.5 ms … 805.5 ms    10 runs
     
    -        Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
    -          Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
    -          Range (min … max):     4.3 ms …   6.7 ms    422 runs
    +        Benchmark 2: git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
    +          Time (mean ± σ):      98.9 ms ±   1.4 ms    [User: 93.1 ms, System: 5.7 ms]
    +          Range (min … max):    97.0 ms … 104.0 ms    29 runs
    +
    +        Benchmark 3: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
    +          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 0.7 ms, System: 3.8 ms]
    +          Range (min … max):     4.1 ms …   5.8 ms    524 runs
     
             Summary
               'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
    -          172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
    +           21.87 ± 1.05 times faster than 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
    +          176.52 ± 8.19 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
    +
    +    (Comparing stock git and this patch isn't quite fair, since an earlier
    +    commit in this series adds a naive implementation of the `--exclude`
    +    option. `git.prev` is built from the previous commit and includes this
    +    naive implementation).
     
         Using the jump list is fairly straightforward (see the changes to
         `refs/packed-backend.c::next_record()`), but constructing the list is
    @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      
      /*
     
    + ## refs.h ##
    +@@ refs.h: int for_each_ref(each_ref_fn fn, void *cb_data);
    +  */
    + int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
    + 
    ++/*
    ++ * references matching any pattern in "exclude_patterns" are omitted from the
    ++ * result set on a best-effort basis.
    ++ */
    + int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
    + 			     const char **exclude_patterns,
    + 			     each_ref_fn fn, void *cb_data);
    +
      ## refs/packed-backend.c ##
     @@ refs/packed-backend.c: static int cmp_packed_ref_records(const void *v1, const void *v2)
       * Compare a snapshot record at `rec` to the specified NUL-terminated
    @@ refs/packed-backend.c: struct packed_ref_iterator {
     +		iter->jump_cur++;
     +		if (iter->pos < curr->end) {
     +			iter->pos = curr->end;
    ++			/* jumps are coalesced, so only one jump is necessary */
     +			break;
     +		}
     +	}
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +
     +	for (pattern = excluded_patterns; *pattern; pattern++) {
     +		struct jump_list_entry *e;
    ++		const char *start, *end;
     +
     +		/*
     +		 * We can't feed any excludes with globs in them to the
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +		if (has_glob_special(*pattern))
     +			continue;
     +
    ++		start = find_reference_location(snapshot, *pattern, 0);
    ++		end = find_reference_location_end(snapshot, *pattern, 0);
    ++
    ++		if (start == end)
    ++			continue; /* nothing to jump over */
    ++
     +		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
     +
     +		e = &iter->jump[iter->jump_nr++];
    -+		e->start = find_reference_location(snapshot, *pattern, 0);
    -+		e->end = find_reference_location_end(snapshot, *pattern, 0);
    ++		e->start = start;
    ++		e->end = end;
     +	}
     +
     +	if (!iter->jump_nr) {
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +
     +	for (i = 1, j = 1; i < iter->jump_nr; i++) {
     +		struct jump_list_entry *ours = &iter->jump[i];
    -+
    -+		if (ours->start == ours->end) {
    -+			/* ignore empty regions (no matching entries) */
    -+			continue;
    -+		} else if (ours->start <= last_disjoint->end) {
    ++		if (ours->start <= last_disjoint->end) {
     +			/* overlapping regions extend the previous one */
     +			last_disjoint->end = last_disjoint->end > ours->end
     +				? last_disjoint->end : ours->end;
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +			/* otherwise, insert a new region */
     +			iter->jump[j++] = *ours;
     +			last_disjoint = ours;
    -+
     +		}
     +	}
     +
10:  49c8f5173aa ! 10:  e150941c1d1 refs/packed-backend.c: add trace2 counters for jump list
    @@ Commit message
     
      ## refs/packed-backend.c ##
     @@
    - #include "../chdir-notify.h"
    + #include "../statinfo.h"
      #include "../wrapper.h"
      #include "../write-or-die.h"
     +#include "../trace2.h"
    @@ refs/packed-backend.c: static int next_record(struct packed_ref_iterator *iter)
      		if (iter->pos < curr->end) {
      			iter->pos = curr->end;
     +			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, 1);
    + 			/* jumps are coalesced, so only one jump is necessary */
      			break;
      		}
    - 	}
     
      ## t/t1419-exclude-refs.sh ##
     @@ t/t1419-exclude-refs.sh: TEST_PASSES_SANITIZE_LEAK=true
11:  dd856a3982b = 11:  a5093d52008 revision.h: store hidden refs in a `strvec`
12:  845904853ee <  -:  ----------- refs/packed-backend.c: ignore complicated hidden refs rules
13:  8d4d7cc22ee ! 12:  7e72c23c8a4 refs.h: let `for_each_namespaced_ref()` take excluded patterns
    @@ refs.h: int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
      
      int head_ref_namespaced(each_ref_fn fn, void *cb_data);
     -int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
    ++/*
    ++ * references matching any pattern in "exclude_patterns" are omitted from the
    ++ * result set on a best-effort basis.
    ++ */
     +int for_each_namespaced_ref(const char **exclude_patterns,
     +			    each_ref_fn fn, void *cb_data);
      
 -:  ----------- > 13:  f99d89d53b9 refs.h: implement `hidden_refs_to_excludes()`
14:  49c665f9f8f ! 14:  96aada36f9b builtin/receive-pack.c: avoid enumerating hidden references
    @@ builtin/receive-pack.c: static void write_head_info(void)
      
     -	for_each_ref(show_ref_cb, &seen);
     +	refs_for_each_fullref_in(get_main_ref_store(the_repository), "",
    -+				 hidden_refs.v, show_ref_cb, &seen);
    ++				 hidden_refs_to_excludes(&hidden_refs),
    ++				 show_ref_cb, &seen);
      	for_each_alternate_ref(show_one_alternate_ref, &seen);
      	oidset_clear(&seen);
      	if (!sent_capabilities)
15:  19bf4a52d69 ! 15:  8544a647798 upload-pack.c: avoid enumerating hidden refs where possible
    @@ upload-pack.c: static int get_common_commits(struct upload_pack_data *data,
      
     +static int allow_hidden_refs(enum allow_uor allow_uor)
     +{
    -+	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
    ++	if ((allow_uor & ALLOW_ANY_SHA1) == ALLOW_ANY_SHA1)
    ++		return 1;
    ++	return !(allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
     +}
     +
     +static void for_each_namespaced_ref_1(each_ref_fn fn,
     +				      struct upload_pack_data *data)
     +{
    ++	const char **excludes = NULL;
     +	/*
     +	 * If `data->allow_uor` allows fetching hidden refs, we need to
     +	 * mark all references (including hidden ones), to check in
    @@ upload-pack.c: static int get_common_commits(struct upload_pack_data *data,
     +	 * hidden references.
     +	 */
     +	if (allow_hidden_refs(data->allow_uor))
    -+		for_each_namespaced_ref(NULL, fn, data);
    -+	else
    -+		for_each_namespaced_ref(data->hidden_refs.v, fn, data);
    ++		excludes = hidden_refs_to_excludes(&data->hidden_refs);
    ++
    ++	for_each_namespaced_ref(excludes, fn, data);
     +}
    ++
     +
      static int is_our_ref(struct object *o, enum allow_uor allow_uor)
      {
     -	int allow_hidden_ref = (allow_uor &
     -				(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
     -	return o->flags & ((allow_hidden_ref ? HIDDEN_REF : 0) | OUR_REF);
    -+	return o->flags & ((allow_hidden_refs(allow_uor) ? HIDDEN_REF : 0) | OUR_REF);
    ++	return o->flags & ((allow_hidden_refs(allow_uor) ? 0 : HIDDEN_REF) | OUR_REF);
      }
      
      /*
16:  ea6cbaf292f ! 16:  dff068c469f ls-refs.c: avoid enumerating hidden refs where possible
    @@ ls-refs.c: int ls_refs(struct repository *r, struct packet_reader *request)
      	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
      					  get_git_namespace(), data.prefixes.v,
     -					  NULL, send_ref, &data);
    -+					  data.hidden_refs.v, send_ref, &data);
    ++					  hidden_refs_to_excludes(&data.hidden_refs),
    ++					  send_ref, &data);
      	packet_fflush(stdout);
      	strvec_clear(&data.prefixes);
      	strbuf_release(&data.buf);
-- 
2.41.0.343.gdff068c469f

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v5 01/16] refs.c: rename `ref_filter`
  2023-07-10 21:12 ` [PATCH v5 " Taylor Blau
@ 2023-07-10 21:12   ` Taylor Blau
  2023-07-10 21:12   ` [PATCH v5 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-07-10 21:12 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

The refs machinery has its own implementation of a `ref_filter` (used by
`for-each-ref`), which is distinct from the `ref-filter.h` API (also
used by `for-each-ref`, among other things).

Rename the one within refs.c to more clearly indicate its purpose.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 refs.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/refs.c b/refs.c
index c029f64982f..876df4931f6 100644
--- a/refs.c
+++ b/refs.c
@@ -377,8 +377,8 @@ char *resolve_refdup(const char *refname, int resolve_flags,
 				   oid, flags);
 }
 
-/* The argument to filter_refs */
-struct ref_filter {
+/* The argument to for_each_filter_refs */
+struct for_each_ref_filter {
 	const char *pattern;
 	const char *prefix;
 	each_ref_fn *fn;
@@ -411,10 +411,11 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
-static int filter_refs(const char *refname, const struct object_id *oid,
-			   int flags, void *data)
+static int for_each_filter_refs(const char *refname,
+				const struct object_id *oid,
+				int flags, void *data)
 {
-	struct ref_filter *filter = (struct ref_filter *)data;
+	struct for_each_ref_filter *filter = data;
 
 	if (wildmatch(filter->pattern, refname, 0))
 		return 0;
@@ -571,7 +572,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	const char *prefix, void *cb_data)
 {
 	struct strbuf real_pattern = STRBUF_INIT;
-	struct ref_filter filter;
+	struct for_each_ref_filter filter;
 	int ret;
 
 	if (!prefix && !starts_with(pattern, "refs/"))
@@ -591,7 +592,7 @@ int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 	filter.prefix = prefix;
 	filter.fn = fn;
 	filter.cb_data = cb_data;
-	ret = for_each_ref(filter_refs, &filter);
+	ret = for_each_ref(for_each_filter_refs, &filter);
 
 	strbuf_release(&real_pattern);
 	return ret;
-- 
2.41.0.343.gdff068c469f


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 02/16] ref-filter.h: provide `REF_FILTER_INIT`
  2023-07-10 21:12 ` [PATCH v5 " Taylor Blau
  2023-07-10 21:12   ` [PATCH v5 01/16] refs.c: rename `ref_filter` Taylor Blau
@ 2023-07-10 21:12   ` Taylor Blau
  2023-07-10 21:12   ` [PATCH v5 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-07-10 21:12 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

Provide a sane initialization value for `struct ref_filter`, which in a
subsequent patch will be used to initialize a new field.

In the meantime, ensure that the `ref_filter` struct used in the
test-helper's `cmd__reach()` is zero-initialized. The lack of
initialization is OK, since `commit_contains()` only looks at the single
`with_commit_tag_algo` field that *is* initialized directly above.

So this does not fix a bug, but rather prevents one from biting us in
the future.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/branch.c       | 3 +--
 builtin/for-each-ref.c | 3 +--
 builtin/tag.c          | 3 +--
 ref-filter.h           | 3 +++
 t/helper/test-reach.c  | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index e8ff3ecc072..322646f38f5 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -702,7 +702,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	int reflog = 0, quiet = 0, icase = 0, force = 0,
 	    recurse_submodules_explicit = 0;
 	enum branch_track track;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	static struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -760,7 +760,6 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 
 	setup_ref_filter_porcelain_msg();
 
-	memset(&filter, 0, sizeof(filter));
 	filter.kind = FILTER_REFS_BRANCHES;
 	filter.abbrev = -1;
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 15409337f83..6b5e313123f 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -24,7 +24,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	int maxcount = 0, icase = 0, omit_empty = 0;
 	struct ref_array array;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_format format = REF_FORMAT_INIT;
 	struct strbuf output = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
@@ -61,7 +61,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	};
 
 	memset(&array, 0, sizeof(array));
-	memset(&filter, 0, sizeof(filter));
 
 	format.format = "%(objectname) %(objecttype)\t%(refname)";
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 7d34af416c7..a624185d105 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -445,7 +445,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	struct msg_arg msg = { .buf = STRBUF_INIT };
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
-	struct ref_filter filter;
+	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -504,7 +504,6 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	git_config(git_tag_config, &sorting_options);
 
 	memset(&opt, 0, sizeof(opt));
-	memset(&filter, 0, sizeof(filter));
 	filter.lines = -1;
 	opt.sign = -1;
 
diff --git a/ref-filter.h b/ref-filter.h
index 430701cfb76..a920f73b297 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -92,6 +92,9 @@ struct ref_format {
 	struct string_list bases;
 };
 
+#define REF_FILTER_INIT { \
+	.points_at = OID_ARRAY_INIT, \
+}
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
 	.bases = STRING_LIST_INIT_DUP, \
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5b6f2174418..ef58f10c2d6 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -139,7 +139,7 @@ int cmd__reach(int ac, const char **av)
 
 		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
 	} else if (!strcmp(av[1], "commit_contains")) {
-		struct ref_filter filter;
+		struct ref_filter filter = REF_FILTER_INIT;
 		struct contains_cache cache;
 		init_contains_cache(&cache);
 
-- 
2.41.0.343.gdff068c469f


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 03/16] ref-filter: clear reachable list pointers after freeing
  2023-07-10 21:12 ` [PATCH v5 " Taylor Blau
  2023-07-10 21:12   ` [PATCH v5 01/16] refs.c: rename `ref_filter` Taylor Blau
  2023-07-10 21:12   ` [PATCH v5 02/16] ref-filter.h: provide `REF_FILTER_INIT` Taylor Blau
@ 2023-07-10 21:12   ` Taylor Blau
  2023-07-10 21:12   ` [PATCH v5 04/16] ref-filter: add `ref_filter_clear()` Taylor Blau
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-07-10 21:12 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

In `reach_filter()`, we pop all commits from the reachable lists,
leaving them empty. But because we're operating on a list pointer that
was passed by value, the original `filter.reachable_from` and
`filter.unreachable_from` pointers are left dangling.

As is the case with the previous commit, nobody touches either of these
fields after calling `reach_filter()`, so leaving them dangling is OK.
But this future proofs against dangerous situations.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index e0d03a9f8e9..4dd6575400c 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2418,13 +2418,13 @@ void ref_array_clear(struct ref_array *array)
 #define EXCLUDE_REACHED 0
 #define INCLUDE_REACHED 1
 static void reach_filter(struct ref_array *array,
-			 struct commit_list *check_reachable,
+			 struct commit_list **check_reachable,
 			 int include_reached)
 {
 	int i, old_nr;
 	struct commit **to_clear;
 
-	if (!check_reachable)
+	if (!*check_reachable)
 		return;
 
 	CALLOC_ARRAY(to_clear, array->nr);
@@ -2434,7 +2434,7 @@ static void reach_filter(struct ref_array *array,
 	}
 
 	tips_reachable_from_bases(the_repository,
-				  check_reachable,
+				  *check_reachable,
 				  to_clear, array->nr,
 				  UNINTERESTING);
 
@@ -2455,8 +2455,8 @@ static void reach_filter(struct ref_array *array,
 
 	clear_commit_marks_many(old_nr, to_clear, ALL_REV_FLAGS);
 
-	while (check_reachable) {
-		struct commit *merge_commit = pop_commit(&check_reachable);
+	while (*check_reachable) {
+		struct commit *merge_commit = pop_commit(check_reachable);
 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
 	}
 
@@ -2553,8 +2553,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	clear_contains_cache(&ref_cbdata.no_contains_cache);
 
 	/*  Filters that need revision walking */
-	reach_filter(array, filter->reachable_from, INCLUDE_REACHED);
-	reach_filter(array, filter->unreachable_from, EXCLUDE_REACHED);
+	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
+	reach_filter(array, &filter->unreachable_from, EXCLUDE_REACHED);
 
 	save_commit_buffer = save_commit_buffer_orig;
 	return ret;
-- 
2.41.0.343.gdff068c469f


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 04/16] ref-filter: add `ref_filter_clear()`
  2023-07-10 21:12 ` [PATCH v5 " Taylor Blau
                     ` (2 preceding siblings ...)
  2023-07-10 21:12   ` [PATCH v5 03/16] ref-filter: clear reachable list pointers after freeing Taylor Blau
@ 2023-07-10 21:12   ` Taylor Blau
  2023-07-10 21:12   ` [PATCH v5 05/16] ref-filter.c: parameterize match functions over patterns Taylor Blau
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 149+ messages in thread
From: Taylor Blau @ 2023-07-10 21:12 UTC (permalink / raw)
  To: git
  Cc: Chris Torek, Derrick Stolee, Jeff King, Junio C Hamano,
	Patrick Steinhardt

From: Jeff King <peff@peff.net>

We did not bother to clean up at all in `git branch` or `git tag`, and
`git for-each-ref` only cleans up a couple of members.

Add and call `ref_filter_clear()` when cleaning up a `struct
ref_filter`. Running this patch (without any test changes) indicates a
couple of now leak-free tests. This was found by running:

    $ make SANITIZE=leak
    $ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate

(Note that the `reachable_from` and `unreachable_from` lists should be
cleaned as they are used. So this is just covering any case where we
might bail before running the reachability check.)

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/branch.c        |  1 +
 builtin/for-each-ref.c  |  3 +--
 builtin/tag.c           |  1 +
 ref-filter.c            | 16 ++++++++++++++++
 ref-filter.h            |  3 +++
 t/t0041-usage.sh        |  1 +
 t/t3402-rebase-merge.sh |  1 +
 7 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 322646f38f5..f06df4be7a5 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -855,6 +855,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		print_columns(&output, colopts, NULL);
 		string_list_clear(&output, 0);
 		ref_sorting_release(sorting);
+		ref_filter_clear(&filter);
 		return 0;
 	} else if (edit_description) {
 		const char *branch_name;
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 6b5e313123f..ccceba54aa1 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -120,8 +120,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	strbuf_release(&err);
 	strbuf_release(&output);
 	ref_array_clear(&array);
-	free_commit_list(filter.with_commit);
-	free_commit_list(filter.no_commit);
+	ref_filter_clear(&filter);
 	ref_sorting_release(sorting);
 	strvec_clear(&vec);
 	return 0;
diff --git a/builtin/tag.c b/builtin/tag.c
index a624185d105..f33600c0506 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -659,6 +659,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 
 cleanup:
 	ref_sorting_release(sorting);
+	ref_filter_clear(&filter);
 	strbuf_release(&buf);
 	strbuf_release(&ref);
 	strbuf_release(&reflog_msg);
diff --git a/ref-filter.c b/ref-filter.c
index 4dd6575400c..84cd3921307 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2866,3 +2866,19 @@ int parse_opt_merge_filter(const struct option *opt, const char *arg, int unset)
 
 	return 0;
 }
+
+void ref_filter_init(struct ref_filter *filter)
+{
+	struct ref_filter blank = REF_FILTER_INIT;
+	memcpy(filter, &blank, sizeof(blank));
+}
+
+void ref_filter_clear(struct ref_filter *filter)
+{
+	oid_array_clear(&filter->points_at);
+	free_commit_list(filter->with_commit);
+	free_commit_list(filter->no_commit);
+	free_commit_list(filter->reachable_from);
+	free_commit_list(filter->unreachable_from);
+	ref_filter_init(filter);
+}
diff --git a/ref-filter.h b/ref-filter.h
index a920f73b297..160b8072245 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -170,4 +170,7 @@ void filter_ahead_behind(struct repository *r,
 			 struct ref_format *format,
 			 struct ref_array *array);
 
+void ref_filter_init(struct ref_filter *filter);
+void ref_filter_clear(struct ref_filter *filter);
+
 #endif /*  REF_FILTER_H  */