Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] reflog: introduce subcommand to list reflogs
@ 2024-02-19 14:35 Patrick Steinhardt
  2024-02-19 14:35 ` [PATCH 1/6] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-19 14:35 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3051 bytes --]

Hi,

this patch series introduces a new `git reflog list` subcommand that
lists all reflogs of the current repository. This addresses an issue
with discoverability as there is no way for a user to learn about which
reflogs exist. While this isn't all that bad with the "files" backend as
a user could in the worst case figure out which reflogs exist by walking
the ".git/logs" directory, with the "reftable" backend it's basically
impossible to achieve this.

While I think this is sufficient motivation to have such a subcommand
nowadays already, I think the need for such a thing will grow in the
future. It was noted in multiple threads that we may eventually want to
lift the artificial limitations in the "reftable" backend where reflogs
are deleted together with their refs. This limitation is inherited from
the "files" backend, which may otherwise hit issues with directory/file
conflicts if it didn't delete reflogs.

Once that limitation is lifted for the "reftable" backend though, it
will become even more important to give users the tools to discover
reflogs that do not have a corresponding ref.

The series is structured as follows:

  - Patches 1-3 extend the dir iterator so that it can sort directory
    entries lexicographically. This is required such that we can list
    reflogs with deterministic ordering.

  - Patch 4 refactors the reflog iterator interface to demonstrate that
    the object ID and flags aren't needed nowadays, and patch 5 builds
    on top of that and stops resolving the refs altogether. This allows
    us to also surface reflogs of broken refs.

  - Patch 6 introduces the new subcommand.

The series depends on Junio's ps/reftable-backend at 8a0bebdeae
(refs/reftable: fix leak when copying reflog fails, 2024-02-08): the
change in behaviour in patches 4 and 5 apply to both backends.

Patrick

Patrick Steinhardt (6):
  dir-iterator: pass name to `prepare_next_entry_data()` directly
  dir-iterator: support iteration in sorted order
  refs/files: sort reflogs returned by the reflog iterator
  refs: drop unused params from the reflog iterator callback
  refs: stop resolving ref corresponding to reflogs
  builtin/reflog: introduce subcommand to list reflogs

 Documentation/git-reflog.txt   |  3 ++
 builtin/fsck.c                 |  4 +-
 builtin/reflog.c               | 37 +++++++++++++-
 dir-iterator.c                 | 93 ++++++++++++++++++++++++++--------
 dir-iterator.h                 |  3 ++
 refs.c                         | 23 +++++++--
 refs.h                         | 11 +++-
 refs/files-backend.c           | 22 ++------
 refs/reftable-backend.c        | 12 +----
 revision.c                     |  4 +-
 t/helper/test-ref-store.c      | 18 ++++---
 t/t0600-reffiles-backend.sh    | 24 ++++-----
 t/t1405-main-ref-store.sh      |  8 +--
 t/t1406-submodule-ref-store.sh |  8 +--
 t/t1410-reflog.sh              | 69 +++++++++++++++++++++++++
 15 files changed, 251 insertions(+), 88 deletions(-)

-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/6] dir-iterator: pass name to `prepare_next_entry_data()` directly
  2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
@ 2024-02-19 14:35 ` Patrick Steinhardt
  2024-02-19 14:35 ` [PATCH 2/6] dir-iterator: support iteration in sorted order Patrick Steinhardt
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-19 14:35 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]

When adding the next directory entry for `struct dir_iterator` we pass
the complete `struct dirent *` to `prepare_next_entry_data()` even
though we only need the entry's name.

Refactor the code to pass in the name, only. This prepares for a
subsequent commit where we introduce the ability to iterate through
dir entries in an ordered manner.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 dir-iterator.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 278b04243a..f58a97e089 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -94,15 +94,15 @@ static int pop_level(struct dir_iterator_int *iter)
 
 /*
  * Populate iter->base with the necessary information on the next iteration
- * entry, represented by the given dirent de. Return 0 on success and -1
+ * entry, represented by the given name. Return 0 on success and -1
  * otherwise, setting errno accordingly.
  */
 static int prepare_next_entry_data(struct dir_iterator_int *iter,
-				   struct dirent *de)
+				   const char *name)
 {
 	int err, saved_errno;
 
-	strbuf_addstr(&iter->base.path, de->d_name);
+	strbuf_addstr(&iter->base.path, name);
 	/*
 	 * We have to reset these because the path strbuf might have
 	 * been realloc()ed at the previous strbuf_addstr().
@@ -159,7 +159,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
-		if (prepare_next_entry_data(iter, de)) {
+		if (prepare_next_entry_data(iter, de->d_name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
 				goto error_out;
 			continue;
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/6] dir-iterator: support iteration in sorted order
  2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
  2024-02-19 14:35 ` [PATCH 1/6] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
@ 2024-02-19 14:35 ` Patrick Steinhardt
  2024-02-19 23:39   ` Junio C Hamano
  2024-02-19 14:35 ` [PATCH 3/6] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-19 14:35 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 5586 bytes --]

The `struct dir_iterator` is a helper that allows us to iterate through
directory entries. This iterator returns entries in the exact same order
as readdir(3P) does -- or in other words, it guarantees no specific
order at all.

This is about to become problematic as we are introducing a new reflog
subcommand to list reflogs. As the "files" backend uses the directory
iterator to enumerate reflogs, returning reflog names and exposing them
to the user would inherit the indeterministic ordering. Naturally, it
would make for a terrible user interface to show a list with no
discernible order. While this could be handled at a higher level by the
new subcommand itself by collecting and ordering the reflogs, this would
be inefficient and introduce latency when there are many reflogs.

Instead, introduce a new option into the directory iterator that asks
for its entries to be yielded in lexicographical order. If set, the
iterator will read all directory entries greedily end sort them before
we start to iterate over them.

While this will of course also incur overhead as we cannot yield the
directory entries immediately, it should at least be more efficient than
having to sort the complete list of reflogs as we only need to sort one
directory at a time.

This functionality will be used in a follow-up commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 dir-iterator.c | 87 ++++++++++++++++++++++++++++++++++++++++----------
 dir-iterator.h |  3 ++
 2 files changed, 73 insertions(+), 17 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f58a97e089..396c28178f 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -2,9 +2,12 @@
 #include "dir.h"
 #include "iterator.h"
 #include "dir-iterator.h"
+#include "string-list.h"
 
 struct dir_iterator_level {
 	DIR *dir;
+	struct string_list entries;
+	size_t entries_idx;
 
 	/*
 	 * The length of the directory part of path at this level
@@ -72,6 +75,40 @@ static int push_level(struct dir_iterator_int *iter)
 		return -1;
 	}
 
+	string_list_init_dup(&level->entries);
+	level->entries_idx = 0;
+
+	/*
+	 * When the iterator is sorted we read and sort all directory entries
+	 * directly.
+	 */
+	if (iter->flags & DIR_ITERATOR_SORTED) {
+		while (1) {
+			struct dirent *de;
+
+			errno = 0;
+			de = readdir(level->dir);
+			if (!de) {
+				if (errno && errno != ENOENT) {
+					warning_errno("error reading directory '%s'",
+						      iter->base.path.buf);
+					return -1;
+				}
+
+				break;
+			}
+
+			if (is_dot_or_dotdot(de->d_name))
+				continue;
+
+			string_list_append(&level->entries, de->d_name);
+		}
+		string_list_sort(&level->entries);
+
+		closedir(level->dir);
+		level->dir = NULL;
+	}
+
 	return 0;
 }
 
@@ -88,6 +125,7 @@ static int pop_level(struct dir_iterator_int *iter)
 		warning_errno("error closing directory '%s'",
 			      iter->base.path.buf);
 	level->dir = NULL;
+	string_list_clear(&level->entries, 0);
 
 	return --iter->levels_nr;
 }
@@ -136,30 +174,43 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 	/* Loop until we find an entry that we can give back to the caller. */
 	while (1) {
-		struct dirent *de;
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
+		struct dirent *de;
+		const char *name;
 
 		strbuf_setlen(&iter->base.path, level->prefix_len);
-		errno = 0;
-		de = readdir(level->dir);
-
-		if (!de) {
-			if (errno) {
-				warning_errno("error reading directory '%s'",
-					      iter->base.path.buf);
-				if (iter->flags & DIR_ITERATOR_PEDANTIC)
-					goto error_out;
-			} else if (pop_level(iter) == 0) {
-				return dir_iterator_abort(dir_iterator);
+
+		if (level->dir) {
+			errno = 0;
+			de = readdir(level->dir);
+			if (!de) {
+				if (errno) {
+					warning_errno("error reading directory '%s'",
+						      iter->base.path.buf);
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
+				} else if (pop_level(iter) == 0) {
+					return dir_iterator_abort(dir_iterator);
+				}
+				continue;
 			}
-			continue;
-		}
 
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
+			if (is_dot_or_dotdot(de->d_name))
+				continue;
 
-		if (prepare_next_entry_data(iter, de->d_name)) {
+			name = de->d_name;
+		} else {
+			if (level->entries_idx >= level->entries.nr) {
+				if (pop_level(iter) == 0)
+					return dir_iterator_abort(dir_iterator);
+				continue;
+			}
+
+			name = level->entries.items[level->entries_idx++].string;
+		}
+
+		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
 				goto error_out;
 			continue;
@@ -188,6 +239,8 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 			warning_errno("error closing directory '%s'",
 				      iter->base.path.buf);
 		}
+
+		string_list_clear(&level->entries, 0);
 	}
 
 	free(iter->levels);
diff --git a/dir-iterator.h b/dir-iterator.h
index 479e1ec784..6d438809b6 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -54,8 +54,11 @@
  *   and ITER_ERROR is returned immediately. In both cases, a meaningful
  *   warning is emitted. Note: ENOENT errors are always ignored so that
  *   the API users may remove files during iteration.
+ *
+ * - DIR_ITERATOR_SORTED: sort directory entries alphabetically.
  */
 #define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_SORTED   (1 << 1)
 
 struct dir_iterator {
 	/* The current path: */
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/6] refs/files: sort reflogs returned by the reflog iterator
  2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
  2024-02-19 14:35 ` [PATCH 1/6] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
  2024-02-19 14:35 ` [PATCH 2/6] dir-iterator: support iteration in sorted order Patrick Steinhardt
@ 2024-02-19 14:35 ` Patrick Steinhardt
  2024-02-20  0:04   ` Junio C Hamano
  2024-02-19 14:35 ` [PATCH 4/6] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-19 14:35 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3433 bytes --]

We use a directory iterator to return reflogs via the reflog iterator.
This iterator returns entries in the same order as readdir(3P) would and
will thus yield reflogs with no discernible order.

Set the new `DIR_ITERATOR_SORTED` flag that was introduced in the
preceding commit so that the order is deterministic. While the effect of
this can only been observed in a test tool, a subsequent commit will
start to expose this functionality to users via a new `git reflog list`
subcommand.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c           | 4 ++--
 t/t0600-reffiles-backend.sh    | 4 ++--
 t/t1405-main-ref-store.sh      | 2 +-
 t/t1406-submodule-ref-store.sh | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 75dcc21ecb..2ffc63185f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2193,7 +2193,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	strbuf_addf(&sb, "%s/logs", gitdir);
 
-	diter = dir_iterator_begin(sb.buf, 0);
+	diter = dir_iterator_begin(sb.buf, DIR_ITERATOR_SORTED);
 	if (!diter) {
 		strbuf_release(&sb);
 		return empty_ref_iterator_begin();
@@ -2202,7 +2202,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 1);
 	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
diff --git a/t/t0600-reffiles-backend.sh b/t/t0600-reffiles-backend.sh
index e6a5f1868f..4f860285cc 100755
--- a/t/t0600-reffiles-backend.sh
+++ b/t/t0600-reffiles-backend.sh
@@ -287,7 +287,7 @@ test_expect_success 'for_each_reflog()' '
 	mkdir -p     .git/worktrees/wt/logs/refs/bisect &&
 	echo $ZERO_OID > .git/worktrees/wt/logs/refs/bisect/wt-random &&
 
-	$RWT for-each-reflog | cut -d" " -f 2- | sort >actual &&
+	$RWT for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	PSEUDO-WT 0x0
@@ -297,7 +297,7 @@ test_expect_success 'for_each_reflog()' '
 	EOF
 	test_cmp expected actual &&
 
-	$RMAIN for-each-reflog | cut -d" " -f 2- | sort >actual &&
+	$RMAIN for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	PSEUDO-MAIN 0x0
diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index 976bd71efb..cfb583f544 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -74,7 +74,7 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | sort -k2 | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	refs/heads/main 0x0
diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
index e6a7f7334b..40332e23cc 100755
--- a/t/t1406-submodule-ref-store.sh
+++ b/t/t1406-submodule-ref-store.sh
@@ -63,7 +63,7 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | sort | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	refs/heads/main 0x0
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/6] refs: drop unused params from the reflog iterator callback
  2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2024-02-19 14:35 ` [PATCH 3/6] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
@ 2024-02-19 14:35 ` Patrick Steinhardt
  2024-02-20  0:14   ` Junio C Hamano
  2024-02-19 14:35 ` [PATCH 5/6] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-19 14:35 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 12618 bytes --]

The ref and reflog iterators share much of the same underlying code to
iterate over the corresponding entries. This results in some weird code
because the reflog iterator also exposes an object ID as well as a flag
to the callback function. Neither of these fields do refer to the reflog
though -- they refer to the corresponding ref with the same name. This
is quite misleading. In practice at least the object ID cannot really be
implemented in any other way as a reflog does not have a specific object
ID in the first place. This is further stressed by the fact that none of
the callbacks except for our test helper make use of these fields.

Split up the infrastucture so that ref and reflog iterators use separate
callback signatures. This allows us to drop the nonsensical fields from
the reflog iterator.

Note that internally, the backends still use the same shared infra to
iterate over both types. As the backends should never end up being
called directly anyway, this is not much of a problem and thus kept
as-is for simplicity's sake.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fsck.c                 |  4 +---
 builtin/reflog.c               |  3 +--
 refs.c                         | 23 +++++++++++++++++++----
 refs.h                         | 11 +++++++++--
 refs/files-backend.c           |  8 +-------
 refs/reftable-backend.c        |  8 +-------
 revision.c                     |  4 +---
 t/helper/test-ref-store.c      | 18 ++++++++++++------
 t/t0600-reffiles-backend.sh    | 24 ++++++++++++------------
 t/t1405-main-ref-store.sh      |  8 ++++----
 t/t1406-submodule-ref-store.sh |  8 ++++----
 11 files changed, 65 insertions(+), 54 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index a7cf94f67e..f892487c9b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -509,9 +509,7 @@ static int fsck_handle_reflog_ent(struct object_id *ooid, struct object_id *noid
 	return 0;
 }
 
-static int fsck_handle_reflog(const char *logname,
-			      const struct object_id *oid UNUSED,
-			      int flag UNUSED, void *cb_data)
+static int fsck_handle_reflog(const char *logname, void *cb_data)
 {
 	struct strbuf refname = STRBUF_INIT;
 
diff --git a/builtin/reflog.c b/builtin/reflog.c
index a5a4099f61..3a0c4d4322 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -60,8 +60,7 @@ struct worktree_reflogs {
 	struct string_list reflogs;
 };
 
-static int collect_reflog(const char *ref, const struct object_id *oid UNUSED,
-			  int flags UNUSED, void *cb_data)
+static int collect_reflog(const char *ref, void *cb_data)
 {
 	struct worktree_reflogs *cb = cb_data;
 	struct worktree *worktree = cb->worktree;
diff --git a/refs.c b/refs.c
index fff343c256..6d76000f13 100644
--- a/refs.c
+++ b/refs.c
@@ -2516,18 +2516,33 @@ int refs_verify_refname_available(struct ref_store *refs,
 	return ret;
 }
 
-int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data)
+struct do_for_each_reflog_help {
+	each_reflog_fn *fn;
+	void *cb_data;
+};
+
+static int do_for_each_reflog_helper(struct repository *r UNUSED,
+				     const char *refname,
+				     const struct object_id *oid UNUSED,
+				     int flags,
+				     void *cb_data)
+{
+	struct do_for_each_reflog_help *hp = cb_data;
+	return hp->fn(refname, hp->cb_data);
+}
+
+int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_data)
 {
 	struct ref_iterator *iter;
-	struct do_for_each_ref_help hp = { fn, cb_data };
+	struct do_for_each_reflog_help hp = { fn, cb_data };
 
 	iter = refs->be->reflog_iterator_begin(refs);
 
 	return do_for_each_repo_ref_iterator(the_repository, iter,
-					     do_for_each_ref_helper, &hp);
+					     do_for_each_reflog_helper, &hp);
 }
 
-int for_each_reflog(each_ref_fn fn, void *cb_data)
+int for_each_reflog(each_reflog_fn fn, void *cb_data)
 {
 	return refs_for_each_reflog(get_main_ref_store(the_repository), fn, cb_data);
 }
diff --git a/refs.h b/refs.h
index 303c5fac4d..895579aeb7 100644
--- a/refs.h
+++ b/refs.h
@@ -534,12 +534,19 @@ int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_dat
 /* youngest entry first */
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
+/*
+ * The signature for the callback function for the {refs_,}for_each_reflog()
+ * functions below. The memory pointed to by the refname argument is only
+ * guaranteed to be valid for the duration of a single callback invocation.
+ */
+typedef int each_reflog_fn(const char *refname, void *cb_data);
+
 /*
  * Calls the specified function for each reflog file until it returns nonzero,
  * and returns the value. Reflog file order is unspecified.
  */
-int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data);
-int for_each_reflog(each_ref_fn fn, void *cb_data);
+int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_data);
+int for_each_reflog(each_reflog_fn fn, void *cb_data);
 
 #define REFNAME_ALLOW_ONELEVEL 1
 #define REFNAME_REFSPEC_PATTERN 2
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2ffc63185f..2b3c99b00d 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2116,10 +2116,8 @@ static int files_for_each_reflog_ent(struct ref_store *ref_store,
 
 struct files_reflog_iterator {
 	struct ref_iterator base;
-
 	struct ref_store *ref_store;
 	struct dir_iterator *dir_iterator;
-	struct object_id oid;
 };
 
 static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
@@ -2130,8 +2128,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
-		int flags;
-
 		if (!S_ISREG(diter->st.st_mode))
 			continue;
 		if (diter->basename[0] == '.')
@@ -2141,14 +2137,12 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (!refs_resolve_ref_unsafe(iter->ref_store,
 					     diter->relative_path, 0,
-					     &iter->oid, &flags)) {
+					     NULL, NULL)) {
 			error("bad ref for %s", diter->path.buf);
 			continue;
 		}
 
 		iter->base.refname = diter->relative_path;
-		iter->base.oid = &iter->oid;
-		iter->base.flags = flags;
 		return ITER_OK;
 	}
 
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index a14f2ad7f4..889bb1f1ba 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1637,7 +1637,6 @@ struct reftable_reflog_iterator {
 	struct reftable_ref_store *refs;
 	struct reftable_iterator iter;
 	struct reftable_log_record log;
-	struct object_id oid;
 	char *last_name;
 	int err;
 };
@@ -1648,8 +1647,6 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		(struct reftable_reflog_iterator *)ref_iterator;
 
 	while (!iter->err) {
-		int flags;
-
 		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
 		if (iter->err)
 			break;
@@ -1663,7 +1660,7 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 			continue;
 
 		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
-					     0, &iter->oid, &flags)) {
+					     0, NULL, NULL)) {
 			error(_("bad ref for %s"), iter->log.refname);
 			continue;
 		}
@@ -1671,8 +1668,6 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		free(iter->last_name);
 		iter->last_name = xstrdup(iter->log.refname);
 		iter->base.refname = iter->log.refname;
-		iter->base.oid = &iter->oid;
-		iter->base.flags = flags;
 
 		break;
 	}
@@ -1725,7 +1720,6 @@ static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
 	iter->refs = refs;
-	iter->base.oid = &iter->oid;
 
 	ret = refs->err;
 	if (ret)
diff --git a/revision.c b/revision.c
index 2424c9bd67..ac45c6d8f2 100644
--- a/revision.c
+++ b/revision.c
@@ -1686,9 +1686,7 @@ static int handle_one_reflog_ent(struct object_id *ooid, struct object_id *noid,
 	return 0;
 }
 
-static int handle_one_reflog(const char *refname_in_wt,
-			     const struct object_id *oid UNUSED,
-			     int flag UNUSED, void *cb_data)
+static int handle_one_reflog(const char *refname_in_wt, void *cb_data)
 {
 	struct all_refs_cb *cb = cb_data;
 	struct strbuf refname = STRBUF_INIT;
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 702ec1f128..7a0f6cac53 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -221,15 +221,21 @@ static int cmd_verify_ref(struct ref_store *refs, const char **argv)
 	return ret;
 }
 
+static int each_reflog(const char *refname, void *cb_data UNUSED)
+{
+	printf("%s\n", refname);
+	return 0;
+}
+
 static int cmd_for_each_reflog(struct ref_store *refs,
 			       const char **argv UNUSED)
 {
-	return refs_for_each_reflog(refs, each_ref, NULL);
+	return refs_for_each_reflog(refs, each_reflog, NULL);
 }
 
-static int each_reflog(struct object_id *old_oid, struct object_id *new_oid,
-		       const char *committer, timestamp_t timestamp,
-		       int tz, const char *msg, void *cb_data UNUSED)
+static int each_reflog_ent(struct object_id *old_oid, struct object_id *new_oid,
+			   const char *committer, timestamp_t timestamp,
+			   int tz, const char *msg, void *cb_data UNUSED)
 {
 	printf("%s %s %s %" PRItime " %+05d%s%s", oid_to_hex(old_oid),
 	       oid_to_hex(new_oid), committer, timestamp, tz,
@@ -241,14 +247,14 @@ static int cmd_for_each_reflog_ent(struct ref_store *refs, const char **argv)
 {
 	const char *refname = notnull(*argv++, "refname");
 
-	return refs_for_each_reflog_ent(refs, refname, each_reflog, refs);
+	return refs_for_each_reflog_ent(refs, refname, each_reflog_ent, refs);
 }
 
 static int cmd_for_each_reflog_ent_reverse(struct ref_store *refs, const char **argv)
 {
 	const char *refname = notnull(*argv++, "refname");
 
-	return refs_for_each_reflog_ent_reverse(refs, refname, each_reflog, refs);
+	return refs_for_each_reflog_ent_reverse(refs, refname, each_reflog_ent, refs);
 }
 
 static int cmd_reflog_exists(struct ref_store *refs, const char **argv)
diff --git a/t/t0600-reffiles-backend.sh b/t/t0600-reffiles-backend.sh
index 4f860285cc..56a3196b83 100755
--- a/t/t0600-reffiles-backend.sh
+++ b/t/t0600-reffiles-backend.sh
@@ -287,23 +287,23 @@ test_expect_success 'for_each_reflog()' '
 	mkdir -p     .git/worktrees/wt/logs/refs/bisect &&
 	echo $ZERO_OID > .git/worktrees/wt/logs/refs/bisect/wt-random &&
 
-	$RWT for-each-reflog | cut -d" " -f 2- >actual &&
+	$RWT for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	PSEUDO-WT 0x0
-	refs/bisect/wt-random 0x0
-	refs/heads/main 0x0
-	refs/heads/wt-main 0x0
+	HEAD
+	PSEUDO-WT
+	refs/bisect/wt-random
+	refs/heads/main
+	refs/heads/wt-main
 	EOF
 	test_cmp expected actual &&
 
-	$RMAIN for-each-reflog | cut -d" " -f 2- >actual &&
+	$RMAIN for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	PSEUDO-MAIN 0x0
-	refs/bisect/random 0x0
-	refs/heads/main 0x0
-	refs/heads/wt-main 0x0
+	HEAD
+	PSEUDO-MAIN
+	refs/bisect/random
+	refs/heads/main
+	refs/heads/wt-main
 	EOF
 	test_cmp expected actual
 '
diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index cfb583f544..3eee758bce 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -74,11 +74,11 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	refs/heads/main 0x0
-	refs/heads/new-main 0x0
+	HEAD
+	refs/heads/main
+	refs/heads/new-main
 	EOF
 	test_cmp expected actual
 '
diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
index 40332e23cc..c01f0f14a1 100755
--- a/t/t1406-submodule-ref-store.sh
+++ b/t/t1406-submodule-ref-store.sh
@@ -63,11 +63,11 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	refs/heads/main 0x0
-	refs/heads/new-main 0x0
+	HEAD
+	refs/heads/main
+	refs/heads/new-main
 	EOF
 	test_cmp expected actual
 '
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/6] refs: stop resolving ref corresponding to reflogs
  2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2024-02-19 14:35 ` [PATCH 4/6] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
@ 2024-02-19 14:35 ` Patrick Steinhardt
  2024-02-20  0:14   ` Junio C Hamano
  2024-02-19 14:35 ` [PATCH 6/6] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-19 14:35 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3407 bytes --]

The reflog iterator tries to resolve the corresponding ref for every
reflog that it is about to yield. Historically, this was done due to
multiple reasons:

  - It ensures that the refname is safe because we end up calling
    `check_refname_format()`. Also, non-conformant refnames are skipped
    altogether.

  - The iterator used to yield the resolved object ID as well as its
    flags to the callback. This info was never used though, and the
    corresponding parameters were dropped in the preceding commit.

  - When a ref is corrupt then the reflog is not emitted at all.

We're about to introduce a new `git reflog list` subcommand that will
print all reflogs that the refdb knows about. Skipping over reflogs
whose refs are corrupted would be quite counterproductive in this case
as the user would have no way to learn about reflogs which may still
exist in their repository to help and rescue such a corrupted ref. Thus,
the only remaining reason for why we'd want to resolve the ref is to
verify its refname.

Refactor the code to call `check_refname_format()` directly instead of
trying to resolve the ref. This is significantly more efficient given
that we don't have to hit the object database anymore to list reflogs.
And second, it ensures that we end up showing reflogs of broken refs,
which will help to make the reflog more useful.

Note that this really only impacts the case where the corresponding ref
is corrupt. Reflogs for nonexistent refs would have been returned to the
caller beforehand already as we did not pass `RESOLVE_REF_READING` to
the function, and thus `refs_resolve_ref_unsafe()` would have returned
successfully in that case.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c    | 12 ++----------
 refs/reftable-backend.c |  6 ++----
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2b3c99b00d..741148087d 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2130,17 +2130,9 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
 		if (!S_ISREG(diter->st.st_mode))
 			continue;
-		if (diter->basename[0] == '.')
+		if (check_refname_format(diter->basename,
+					 REFNAME_ALLOW_ONELEVEL))
 			continue;
-		if (ends_with(diter->basename, ".lock"))
-			continue;
-
-		if (!refs_resolve_ref_unsafe(iter->ref_store,
-					     diter->relative_path, 0,
-					     NULL, NULL)) {
-			error("bad ref for %s", diter->path.buf);
-			continue;
-		}
 
 		iter->base.refname = diter->relative_path;
 		return ITER_OK;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 889bb1f1ba..efbbf23c72 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1659,11 +1659,9 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
 			continue;
 
-		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
-					     0, NULL, NULL)) {
-			error(_("bad ref for %s"), iter->log.refname);
+		if (check_refname_format(iter->log.refname,
+					 REFNAME_ALLOW_ONELEVEL))
 			continue;
-		}
 
 		free(iter->last_name);
 		iter->last_name = xstrdup(iter->log.refname);
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/6] builtin/reflog: introduce subcommand to list reflogs
  2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2024-02-19 14:35 ` [PATCH 5/6] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
@ 2024-02-19 14:35 ` Patrick Steinhardt
  2024-02-20  0:32   ` Junio C Hamano
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
  7 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-19 14:35 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 6086 bytes --]

While the git-reflog(1) command has subcommands to show reflog entries
or check for reflog existence, it does not have any subcommands that
would allow the user to enumerate all existing reflogs. This makes it
quite hard to discover which reflogs a repository has. While this can
be worked around with the "files" backend by enumerating files in the
".git/logs" directory, users of the "reftable" backend don't enjoy such
a luxury.

Introduce a new subcommand `git reflog list` that lists all reflogs the
repository knows of to fill this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git-reflog.txt |  3 ++
 builtin/reflog.c             | 34 ++++++++++++++++++
 t/t1410-reflog.sh            | 69 ++++++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)

diff --git a/Documentation/git-reflog.txt b/Documentation/git-reflog.txt
index ec64cbff4c..a929c52982 100644
--- a/Documentation/git-reflog.txt
+++ b/Documentation/git-reflog.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git reflog' [show] [<log-options>] [<ref>]
+'git reflog list'
 'git reflog expire' [--expire=<time>] [--expire-unreachable=<time>]
 	[--rewrite] [--updateref] [--stale-fix]
 	[--dry-run | -n] [--verbose] [--all [--single-worktree] | <refs>...]
@@ -39,6 +40,8 @@ actions, and in addition the `HEAD` reflog records branch switching.
 `git reflog show` is an alias for `git log -g --abbrev-commit
 --pretty=oneline`; see linkgit:git-log[1] for more information.
 
+The "list" subcommand lists all refs which have a corresponding reflog.
+
 The "expire" subcommand prunes older reflog entries. Entries older
 than `expire` time, or entries older than `expire-unreachable` time
 and not reachable from the current tip, are removed from the reflog.
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 3a0c4d4322..63cd4d8b29 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -7,11 +7,15 @@
 #include "wildmatch.h"
 #include "worktree.h"
 #include "reflog.h"
+#include "refs.h"
 #include "parse-options.h"
 
 #define BUILTIN_REFLOG_SHOW_USAGE \
 	N_("git reflog [show] [<log-options>] [<ref>]")
 
+#define BUILTIN_REFLOG_LIST_USAGE \
+	N_("git reflog list")
+
 #define BUILTIN_REFLOG_EXPIRE_USAGE \
 	N_("git reflog expire [--expire=<time>] [--expire-unreachable=<time>]\n" \
 	   "                  [--rewrite] [--updateref] [--stale-fix]\n" \
@@ -29,6 +33,11 @@ static const char *const reflog_show_usage[] = {
 	NULL,
 };
 
+static const char *const reflog_list_usage[] = {
+	BUILTIN_REFLOG_LIST_USAGE,
+	NULL,
+};
+
 static const char *const reflog_expire_usage[] = {
 	BUILTIN_REFLOG_EXPIRE_USAGE,
 	NULL
@@ -46,6 +55,7 @@ static const char *const reflog_exists_usage[] = {
 
 static const char *const reflog_usage[] = {
 	BUILTIN_REFLOG_SHOW_USAGE,
+	BUILTIN_REFLOG_LIST_USAGE,
 	BUILTIN_REFLOG_EXPIRE_USAGE,
 	BUILTIN_REFLOG_DELETE_USAGE,
 	BUILTIN_REFLOG_EXISTS_USAGE,
@@ -238,6 +248,29 @@ static int cmd_reflog_show(int argc, const char **argv, const char *prefix)
 	return cmd_log_reflog(argc, argv, prefix);
 }
 
+static int show_reflog(const char *refname, void *cb_data UNUSED)
+{
+	printf("%s\n", refname);
+	return 0;
+}
+
+static int cmd_reflog_list(int argc, const char **argv, const char *prefix)
+{
+	struct option options[] = {
+		OPT_END()
+	};
+	struct ref_store *ref_store;
+
+	argc = parse_options(argc, argv, prefix, options, reflog_list_usage, 0);
+	if (argc)
+		return error(_("%s does not accept arguments: '%s'"),
+			     "list", argv[0]);
+
+	ref_store = get_main_ref_store(the_repository);
+
+	return refs_for_each_reflog(ref_store, show_reflog, NULL);
+}
+
 static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
 {
 	struct cmd_reflog_expire_cb cmd = { 0 };
@@ -417,6 +450,7 @@ int cmd_reflog(int argc, const char **argv, const char *prefix)
 	parse_opt_subcommand_fn *fn = NULL;
 	struct option options[] = {
 		OPT_SUBCOMMAND("show", &fn, cmd_reflog_show),
+		OPT_SUBCOMMAND("list", &fn, cmd_reflog_list),
 		OPT_SUBCOMMAND("expire", &fn, cmd_reflog_expire),
 		OPT_SUBCOMMAND("delete", &fn, cmd_reflog_delete),
 		OPT_SUBCOMMAND("exists", &fn, cmd_reflog_exists),
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index d2f5f42e67..6d8d5a253d 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -436,4 +436,73 @@ test_expect_success 'empty reflog' '
 	test_must_be_empty err
 '
 
+test_expect_success 'list reflogs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git reflog list >actual &&
+		test_must_be_empty actual &&
+
+		test_commit A &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual &&
+
+		git branch b &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/b
+		refs/heads/main
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog list returns error with additional args' '
+	cat >expect <<-EOF &&
+	error: list does not accept arguments: ${SQ}bogus${SQ}
+	EOF
+	test_must_fail git reflog list bogus 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'reflog for symref with unborn target can be listed' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		git symbolic-ref HEAD refs/heads/unborn &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog with invalid object ID can be listed' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		test-tool ref-store main update-ref msg refs/heads/missing \
+			$(test_oid deadbeef) "$ZERO_OID" REF_SKIP_OID_VERIFICATION &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		refs/heads/missing
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] dir-iterator: support iteration in sorted order
  2024-02-19 14:35 ` [PATCH 2/6] dir-iterator: support iteration in sorted order Patrick Steinhardt
@ 2024-02-19 23:39   ` Junio C Hamano
  2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2024-02-19 23:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> The `struct dir_iterator` is a helper that allows us to iterate through
> directory entries. This iterator returns entries in the exact same order
> as readdir(3P) does -- or in other words, it guarantees no specific
> order at all.
>
> This is about to become problematic as we are introducing a new reflog
> subcommand to list reflogs. As the "files" backend uses the directory
> iterator to enumerate reflogs, returning reflog names and exposing them
> to the user would inherit the indeterministic ordering. Naturally, it
> would make for a terrible user interface to show a list with no
> discernible order. While this could be handled at a higher level by the
> new subcommand itself by collecting and ordering the reflogs, this would
> be inefficient and introduce latency when there are many reflogs.

I do not quite understand this argument.  Why is sorting at higher
level less (or more, for that matter) efficient than doing so at
lower level?  We'd need to sort somewhere no matter what, and I of
course have no problem in listing in a deterministic order.

> Instead, introduce a new option into the directory iterator that asks
> for its entries to be yielded in lexicographical order. If set, the
> iterator will read all directory entries greedily end sort them before
> we start to iterate over them.

"end" -> "and".  And of course without such sorting option, this
codepath is allowed to yield entries in any order that is the
easiest to produce?  That makes sense.

> While this will of course also incur overhead as we cannot yield the
> directory entries immediately, it should at least be more efficient than
> having to sort the complete list of reflogs as we only need to sort one
> directory at a time.

True.  The initial latency before we see the first byte of the
output often matters more in perceived performance the throughput.
As we need to sort to give a reasonable output, that cannot be
avoided.

> This functionality will be used in a follow-up commit.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  dir-iterator.c | 87 ++++++++++++++++++++++++++++++++++++++++----------
>  dir-iterator.h |  3 ++
>  2 files changed, 73 insertions(+), 17 deletions(-)
>
> diff --git a/dir-iterator.c b/dir-iterator.c
> index f58a97e089..396c28178f 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -2,9 +2,12 @@
>  #include "dir.h"
>  #include "iterator.h"
>  #include "dir-iterator.h"
> +#include "string-list.h"
>  
>  struct dir_iterator_level {
>  	DIR *dir;
> +	struct string_list entries;
> +	size_t entries_idx;

Does it deserve a comment that "dir == NULL" is used as a signal
that we have read the level and sorted its contents into the
"entries" list (and also we have already called closedir(), of
course)?

> @@ -72,6 +75,40 @@ static int push_level(struct dir_iterator_int *iter)
>  		return -1;
>  	}
>  
> +	string_list_init_dup(&level->entries);
> +	level->entries_idx = 0;
> +
> +	/*
> +	 * When the iterator is sorted we read and sort all directory entries
> +	 * directly.
> +	 */
> +	if (iter->flags & DIR_ITERATOR_SORTED) {
> +		while (1) {
> +			struct dirent *de;
> +
> +			errno = 0;
> +			de = readdir(level->dir);
> +			if (!de) {
> +				if (errno && errno != ENOENT) {
> +					warning_errno("error reading directory '%s'",
> +						      iter->base.path.buf);
> +					return -1;
> +				}
> +
> +				break;
> +			}
> +
> +			if (is_dot_or_dotdot(de->d_name))
> +				continue;

The condition to skip an entry currently is simple enough that "."
and ".." are the only ones that are skipped, but it must be kept in
sync with the condition in dir_iterator_advance().

If it becomes more complex than it is now (e.g., we may start to
skip any name that begins with a dot, like ".git" or ".dummy"), it
probably is a good idea *not* to add the same filtering logic here
and in dir_iterator_advance().  Instead, keep the filtering here to
an absolute minumum, and filter the name, whether it came from
readdir() or from the .entries string list, in a single copy of
filtering logic in dir_iterator_advance() function.

We could drop the dot-or-dotdot filter here, too, if we want to
ensure that unified filtering will be correctly done over there.

> +			string_list_append(&level->entries, de->d_name);
> +		}
> +		string_list_sort(&level->entries);
> +
> +		closedir(level->dir);
> +		level->dir = NULL;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -88,6 +125,7 @@ static int pop_level(struct dir_iterator_int *iter)
>  		warning_errno("error closing directory '%s'",
>  			      iter->base.path.buf);
>  	level->dir = NULL;
> +	string_list_clear(&level->entries, 0);
>  
>  	return --iter->levels_nr;
>  }

It is somewhat interesting that the original code already has
conditional call to closedir() and prepares .dir to be NULL,
so that we do not have to make it conditional here.

> @@ -136,30 +174,43 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  
>  	/* Loop until we find an entry that we can give back to the caller. */
>  	while (1) {
> -		struct dirent *de;
>  		struct dir_iterator_level *level =
>  			&iter->levels[iter->levels_nr - 1];
> +		struct dirent *de;
> +		const char *name;

Not a huge deal but this is an unnecessary reordering, right?

>  		strbuf_setlen(&iter->base.path, level->prefix_len);
> +
> +		if (level->dir) {
> +			errno = 0;
> +			de = readdir(level->dir);
> +			if (!de) {
> +				if (errno) {
> +					warning_errno("error reading directory '%s'",
> +						      iter->base.path.buf);
> +					if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +						goto error_out;
> +				} else if (pop_level(iter) == 0) {
> +					return dir_iterator_abort(dir_iterator);
> +				}
> +				continue;
>  			}
>  
> +			if (is_dot_or_dotdot(de->d_name))
> +				continue;

This is the target of the "if we will end up filtering even more in
the future, it would probably be a good idea not to duplicate the
logic to decide what gets filtered in this function and in
push_level()" comment.  If we wanted to go that route, we can get
rid of the filtering from push_level(), and move this filter code
outside this if/else before calling prepare_next_entry_data().

The fact that .entries.nr represents the number of entries that are
shown is unusable (because there is an unsorted codepath that does
not even populate .entries), so I am not worried about correctness
gotchas caused by including names in .entries to be filtered out.
But an obvious downside is that the size of the list to be sorted
will become larger.

Or we could introduce a shared helper function that takes a name and
decides if it is to be included, and replace the is_dot_or_dotdot()
call here and in the push_level() with calls to that helper.

In any case, that is primarily a maintainability issue.  The code
posted as-is is correct.

> +			name = de->d_name;
> +		} else {
> +			if (level->entries_idx >= level->entries.nr) {
> +				if (pop_level(iter) == 0)
> +					return dir_iterator_abort(dir_iterator);
> +				continue;
> +			}
> +
> +			name = level->entries.items[level->entries_idx++].string;
> +		}
> +
> +		if (prepare_next_entry_data(iter, name)) {
>  			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
>  				goto error_out;
>  			continue;
> @@ -188,6 +239,8 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  			warning_errno("error closing directory '%s'",
>  				      iter->base.path.buf);
>  		}
> +
> +		string_list_clear(&level->entries, 0);
>  	}
>  
>  	free(iter->levels);
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 479e1ec784..6d438809b6 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -54,8 +54,11 @@
>   *   and ITER_ERROR is returned immediately. In both cases, a meaningful
>   *   warning is emitted. Note: ENOENT errors are always ignored so that
>   *   the API users may remove files during iteration.
> + *
> + * - DIR_ITERATOR_SORTED: sort directory entries alphabetically.
>   */
>  #define DIR_ITERATOR_PEDANTIC (1 << 0)
> +#define DIR_ITERATOR_SORTED   (1 << 1)
>  
>  struct dir_iterator {
>  	/* The current path: */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/6] refs/files: sort reflogs returned by the reflog iterator
  2024-02-19 14:35 ` [PATCH 3/6] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
@ 2024-02-20  0:04   ` Junio C Hamano
  2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2024-02-20  0:04 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> We use a directory iterator to return reflogs via the reflog iterator.
> This iterator returns entries in the same order as readdir(3P) would and
> will thus yield reflogs with no discernible order.
>
> Set the new `DIR_ITERATOR_SORTED` flag that was introduced in the
> preceding commit so that the order is deterministic. While the effect of
> this can only been observed in a test tool, a subsequent commit will
> start to expose this functionality to users via a new `git reflog list`
> subcommand.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/files-backend.c           | 4 ++--
>  t/t0600-reffiles-backend.sh    | 4 ++--
>  t/t1405-main-ref-store.sh      | 2 +-
>  t/t1406-submodule-ref-store.sh | 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 75dcc21ecb..2ffc63185f 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2193,7 +2193,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
>  
>  	strbuf_addf(&sb, "%s/logs", gitdir);
>  
> -	diter = dir_iterator_begin(sb.buf, 0);
> +	diter = dir_iterator_begin(sb.buf, DIR_ITERATOR_SORTED);
>  	if (!diter) {
>  		strbuf_release(&sb);
>  		return empty_ref_iterator_begin();
> @@ -2202,7 +2202,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
>  	CALLOC_ARRAY(iter, 1);
>  	ref_iterator = &iter->base;
>  
> -	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
> +	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 1);

This caught my attention.  Once we apply this patch, the only way
base_ref_iterator_init() can receive 0 for its last parameter
(i.e. 'ordered') is via the merge_ref_iterator_begin() call in
files_reflog_iterator_begin() that passes 0 as 'ordered'.  If we
force files_reflog_iterator_begin() to ask for an ordered
merge_ref_iterator, then we will have no unordered ref iterators.
Am I reading the code right?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/6] refs: drop unused params from the reflog iterator callback
  2024-02-19 14:35 ` [PATCH 4/6] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
@ 2024-02-20  0:14   ` Junio C Hamano
  2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2024-02-20  0:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> The ref and reflog iterators share much of the same underlying code to
> iterate over the corresponding entries. This results in some weird code
> because the reflog iterator also exposes an object ID as well as a flag
> to the callback function. Neither of these fields do refer to the reflog
> though -- they refer to the corresponding ref with the same name. This
> is quite misleading. In practice at least the object ID cannot really be
> implemented in any other way as a reflog does not have a specific object
> ID in the first place. This is further stressed by the fact that none of
> the callbacks except for our test helper make use of these fields.

Interesting observation.  Of course this will make the callstack
longer by another level of indirection ...

> +struct do_for_each_reflog_help {
> +	each_reflog_fn *fn;
> +	void *cb_data;
> +};
> +
> +static int do_for_each_reflog_helper(struct repository *r UNUSED,
> +				     const char *refname,
> +				     const struct object_id *oid UNUSED,
> +				     int flags,
> +				     void *cb_data)
> +{
> +	struct do_for_each_reflog_help *hp = cb_data;
> +	return hp->fn(refname, hp->cb_data);
> +}

... but I think it would be worth it.

> +/*
> + * The signature for the callback function for the {refs_,}for_each_reflog()
> + * functions below. The memory pointed to by the refname argument is only
> + * guaranteed to be valid for the duration of a single callback invocation.
> + */
> +typedef int each_reflog_fn(const char *refname, void *cb_data);
> +
>  /*
>   * Calls the specified function for each reflog file until it returns nonzero,
>   * and returns the value. Reflog file order is unspecified.
>   */
> -int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data);
> -int for_each_reflog(each_ref_fn fn, void *cb_data);
> +int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_data);
> +int for_each_reflog(each_reflog_fn fn, void *cb_data);

Nice simplification.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/6] refs: stop resolving ref corresponding to reflogs
  2024-02-19 14:35 ` [PATCH 5/6] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
@ 2024-02-20  0:14   ` Junio C Hamano
  2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2024-02-20  0:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> Refactor the code to call `check_refname_format()` directly instead of
> trying to resolve the ref. This is significantly more efficient given
> that we don't have to hit the object database anymore to list reflogs.
> And second, it ensures that we end up showing reflogs of broken refs,
> which will help to make the reflog more useful.

And the user would notice corrupt ones among those reflogs listed
when using "rev-list -g" on the reflog anyway?  Which sounds like a
sensible thing to do.

> Note that this really only impacts the case where the corresponding ref
> is corrupt. Reflogs for nonexistent refs would have been returned to the
> caller beforehand already as we did not pass `RESOLVE_REF_READING` to
> the function, and thus `refs_resolve_ref_unsafe()` would have returned
> successfully in that case.

What do "Reflogs for nonexistent refs" really mean?  With the files
backend, if "git branch -d main" that removed the "main" branch
somehow forgot to remove the ".git/logs/refs/heads/main" file, the
reflog entries in such a file is for nonexistent ref.  Is that what
you meant?  As a tool to help diagnosing and correcting minor repo
breakages, finding such a leftover file that should not exist is a
good idea, I would think.

Would we see missing reflog for a ref that exists in the iteration?
I guess we shouldn't, as the reflog iterator that recursively
enumerates files under "$GIT_DIR/logs/" would not see such a missing
reflog by definition.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 2b3c99b00d..741148087d 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2130,17 +2130,9 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
>  	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
>  		if (!S_ISREG(diter->st.st_mode))
>  			continue;
> -		if (diter->basename[0] == '.')
> +		if (check_refname_format(diter->basename,
> +					 REFNAME_ALLOW_ONELEVEL))
>  			continue;

A tangent.

I've never liked the code arrangement in the check_refname_format()
that assumes that each level can be separately checked with exactly
the same logic, and the only thing ALLOW_ONELEVEL does is to include
pseudorefs and HEAD; this makes such assumption even more ingrained.
I am not sure what to think about it, but let's keep reading.

> -		if (ends_with(diter->basename, ".lock"))
> -			continue;

This can safely go, as it is rejected by check_refname_format().

> -		if (!refs_resolve_ref_unsafe(iter->ref_store,
> -					     diter->relative_path, 0,
> -					     NULL, NULL)) {
> -			error("bad ref for %s", diter->path.buf);
> -			continue;
> -		}

This is the focus of this step in the series.  We did not abort the
iteration before, but now we no longer issue any error message.

>  		iter->base.refname = diter->relative_path;
>  		return ITER_OK;
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 889bb1f1ba..efbbf23c72 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -1659,11 +1659,9 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
>  		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
>  			continue;
>  
> -		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
> -					     0, NULL, NULL)) {
> -			error(_("bad ref for %s"), iter->log.refname);
> +		if (check_refname_format(iter->log.refname,
> +					 REFNAME_ALLOW_ONELEVEL))
>  			continue;
> -		}

This side is much more straight-forward.  Looking good.

>  
>  		free(iter->last_name);
>  		iter->last_name = xstrdup(iter->log.refname);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/6] builtin/reflog: introduce subcommand to list reflogs
  2024-02-19 14:35 ` [PATCH 6/6] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
@ 2024-02-20  0:32   ` Junio C Hamano
  2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2024-02-20  0:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
> index d2f5f42e67..6d8d5a253d 100755
> --- a/t/t1410-reflog.sh
> +++ b/t/t1410-reflog.sh
> @@ -436,4 +436,73 @@ test_expect_success 'empty reflog' '
>  	test_must_be_empty err
>  '
>  
> +test_expect_success 'list reflogs' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git reflog list >actual &&
> +		test_must_be_empty actual &&
> +
> +		test_commit A &&
> +		cat >expect <<-EOF &&
> +		HEAD
> +		refs/heads/main
> +		EOF
> +		git reflog list >actual &&
> +		test_cmp expect actual &&
> +
> +		git branch b &&
> +		cat >expect <<-EOF &&
> +		HEAD
> +		refs/heads/b
> +		refs/heads/main
> +		EOF
> +		git reflog list >actual &&
> +		test_cmp expect actual
> +	)
> +'

OK.  This is a quite boring baseline.

> +test_expect_success 'reflog list returns error with additional args' '
> +	cat >expect <<-EOF &&
> +	error: list does not accept arguments: ${SQ}bogus${SQ}
> +	EOF
> +	test_must_fail git reflog list bogus 2>err &&
> +	test_cmp expect err
> +'

Makes sense.

> +test_expect_success 'reflog for symref with unborn target can be listed' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit A &&
> +		git symbolic-ref HEAD refs/heads/unborn &&
> +		cat >expect <<-EOF &&
> +		HEAD
> +		refs/heads/main
> +		EOF
> +		git reflog list >actual &&
> +		test_cmp expect actual
> +	)
> +'

Should this be under REFFILES?  Ah, no, "git symbolic-ref" is valid
under reftable as well, so there is no need to.

Without [5/6], would it have failed to show the reflog for HEAD?

> +test_expect_success 'reflog with invalid object ID can be listed' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit A &&
> +		test-tool ref-store main update-ref msg refs/heads/missing \
> +			$(test_oid deadbeef) "$ZERO_OID" REF_SKIP_OID_VERIFICATION &&
> +		cat >expect <<-EOF &&
> +		HEAD
> +		refs/heads/main
> +		refs/heads/missing
> +		EOF
> +		git reflog list >actual &&
> +		test_cmp expect actual
> +	)
> +'

OK.

>  test_done

It would have been "interesting" to see an example of "there is a
reflog but the underlying ref for it is missing" case, but I think
that falls into a minor repository corruption category, so lack of
such a test is also fine.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] dir-iterator: support iteration in sorted order
  2024-02-19 23:39   ` Junio C Hamano
@ 2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  8:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 9982 bytes --]

On Mon, Feb 19, 2024 at 03:39:08PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > The `struct dir_iterator` is a helper that allows us to iterate through
> > directory entries. This iterator returns entries in the exact same order
> > as readdir(3P) does -- or in other words, it guarantees no specific
> > order at all.
> >
> > This is about to become problematic as we are introducing a new reflog
> > subcommand to list reflogs. As the "files" backend uses the directory
> > iterator to enumerate reflogs, returning reflog names and exposing them
> > to the user would inherit the indeterministic ordering. Naturally, it
> > would make for a terrible user interface to show a list with no
> > discernible order. While this could be handled at a higher level by the
> > new subcommand itself by collecting and ordering the reflogs, this would
> > be inefficient and introduce latency when there are many reflogs.
> 
> I do not quite understand this argument.  Why is sorting at higher
> level less (or more, for that matter) efficient than doing so at
> lower level?  We'd need to sort somewhere no matter what, and I of
> course have no problem in listing in a deterministic order.

By sorting at a lower level we only need to sort the respective
directory entries and can then return them without having to recurse
into all subdirectories yet. Sorting at a higher level would require us
to first collect _all_ reflogs and then sort them.

Will rephrase a bit.

> > Instead, introduce a new option into the directory iterator that asks
> > for its entries to be yielded in lexicographical order. If set, the
> > iterator will read all directory entries greedily end sort them before
> > we start to iterate over them.
> 
> "end" -> "and".  And of course without such sorting option, this
> codepath is allowed to yield entries in any order that is the
> easiest to produce?  That makes sense.
> 
> > While this will of course also incur overhead as we cannot yield the
> > directory entries immediately, it should at least be more efficient than
> > having to sort the complete list of reflogs as we only need to sort one
> > directory at a time.
> 
> True.  The initial latency before we see the first byte of the
> output often matters more in perceived performance the throughput.
> As we need to sort to give a reasonable output, that cannot be
> avoided.
> 
> > This functionality will be used in a follow-up commit.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  dir-iterator.c | 87 ++++++++++++++++++++++++++++++++++++++++----------
> >  dir-iterator.h |  3 ++
> >  2 files changed, 73 insertions(+), 17 deletions(-)
> >
> > diff --git a/dir-iterator.c b/dir-iterator.c
> > index f58a97e089..396c28178f 100644
> > --- a/dir-iterator.c
> > +++ b/dir-iterator.c
> > @@ -2,9 +2,12 @@
> >  #include "dir.h"
> >  #include "iterator.h"
> >  #include "dir-iterator.h"
> > +#include "string-list.h"
> >  
> >  struct dir_iterator_level {
> >  	DIR *dir;
> > +	struct string_list entries;
> > +	size_t entries_idx;
> 
> Does it deserve a comment that "dir == NULL" is used as a signal
> that we have read the level and sorted its contents into the
> "entries" list (and also we have already called closedir(), of
> course)?

Yeah, probably.

> > @@ -72,6 +75,40 @@ static int push_level(struct dir_iterator_int *iter)
> >  		return -1;
> >  	}
> >  
> > +	string_list_init_dup(&level->entries);
> > +	level->entries_idx = 0;
> > +
> > +	/*
> > +	 * When the iterator is sorted we read and sort all directory entries
> > +	 * directly.
> > +	 */
> > +	if (iter->flags & DIR_ITERATOR_SORTED) {
> > +		while (1) {
> > +			struct dirent *de;
> > +
> > +			errno = 0;
> > +			de = readdir(level->dir);
> > +			if (!de) {
> > +				if (errno && errno != ENOENT) {
> > +					warning_errno("error reading directory '%s'",
> > +						      iter->base.path.buf);
> > +					return -1;
> > +				}
> > +
> > +				break;
> > +			}
> > +
> > +			if (is_dot_or_dotdot(de->d_name))
> > +				continue;
> 
> The condition to skip an entry currently is simple enough that "."
> and ".." are the only ones that are skipped, but it must be kept in
> sync with the condition in dir_iterator_advance().
> 
> If it becomes more complex than it is now (e.g., we may start to
> skip any name that begins with a dot, like ".git" or ".dummy"), it
> probably is a good idea *not* to add the same filtering logic here
> and in dir_iterator_advance().  Instead, keep the filtering here to
> an absolute minumum, and filter the name, whether it came from
> readdir() or from the .entries string list, in a single copy of
> filtering logic in dir_iterator_advance() function.
> 
> We could drop the dot-or-dotdot filter here, too, if we want to
> ensure that unified filtering will be correctly done over there.

Fair point. As you mention further down below, there are two ways to
approach it:

  - Just filter at the later stage and accept that we'll allocate memory
    for entries that are about to be discarded.

  - Create a function `should_include_entry()` that gets called at both
    code sites.

I don't think the allocation overhead should matter much, but neither
does it hurt to create a common `should_include_entry()` function. And
as both are trivial to implement I rather lean towards the more
efficient variant, even though the efficiency gain should be negligible.

> > +			string_list_append(&level->entries, de->d_name);
> > +		}
> > +		string_list_sort(&level->entries);
> > +
> > +		closedir(level->dir);
> > +		level->dir = NULL;
> > +	}
> > +
> >  	return 0;
> >  }
> >  
> > @@ -88,6 +125,7 @@ static int pop_level(struct dir_iterator_int *iter)
> >  		warning_errno("error closing directory '%s'",
> >  			      iter->base.path.buf);
> >  	level->dir = NULL;
> > +	string_list_clear(&level->entries, 0);
> >  
> >  	return --iter->levels_nr;
> >  }
> 
> It is somewhat interesting that the original code already has
> conditional call to closedir() and prepares .dir to be NULL,
> so that we do not have to make it conditional here.
> 
> > @@ -136,30 +174,43 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >  
> >  	/* Loop until we find an entry that we can give back to the caller. */
> >  	while (1) {
> > -		struct dirent *de;
> >  		struct dir_iterator_level *level =
> >  			&iter->levels[iter->levels_nr - 1];
> > +		struct dirent *de;
> > +		const char *name;
> 
> Not a huge deal but this is an unnecessary reordering, right?

Right.

> >  		strbuf_setlen(&iter->base.path, level->prefix_len);
> > +
> > +		if (level->dir) {
> > +			errno = 0;
> > +			de = readdir(level->dir);
> > +			if (!de) {
> > +				if (errno) {
> > +					warning_errno("error reading directory '%s'",
> > +						      iter->base.path.buf);
> > +					if (iter->flags & DIR_ITERATOR_PEDANTIC)
> > +						goto error_out;
> > +				} else if (pop_level(iter) == 0) {
> > +					return dir_iterator_abort(dir_iterator);
> > +				}
> > +				continue;
> >  			}
> >  
> > +			if (is_dot_or_dotdot(de->d_name))
> > +				continue;
> 
> This is the target of the "if we will end up filtering even more in
> the future, it would probably be a good idea not to duplicate the
> logic to decide what gets filtered in this function and in
> push_level()" comment.  If we wanted to go that route, we can get
> rid of the filtering from push_level(), and move this filter code
> outside this if/else before calling prepare_next_entry_data().
> 
> The fact that .entries.nr represents the number of entries that are
> shown is unusable (because there is an unsorted codepath that does
> not even populate .entries), so I am not worried about correctness
> gotchas caused by including names in .entries to be filtered out.
> But an obvious downside is that the size of the list to be sorted
> will become larger.
> 
> Or we could introduce a shared helper function that takes a name and
> decides if it is to be included, and replace the is_dot_or_dotdot()
> call here and in the push_level() with calls to that helper.
> 
> In any case, that is primarily a maintainability issue.  The code
> posted as-is is correct.

Yeah, let's use the proposed helper function. In fact, I think we can
share even more code than merely the filtering part: the errno handling
is a bit special, and the warning is the same across both code sites,
too.

Patrick

> > +			name = de->d_name;
> > +		} else {
> > +			if (level->entries_idx >= level->entries.nr) {
> > +				if (pop_level(iter) == 0)
> > +					return dir_iterator_abort(dir_iterator);
> > +				continue;
> > +			}
> > +
> > +			name = level->entries.items[level->entries_idx++].string;
> > +		}
> > +
> > +		if (prepare_next_entry_data(iter, name)) {
> >  			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
> >  				goto error_out;
> >  			continue;
> > @@ -188,6 +239,8 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
> >  			warning_errno("error closing directory '%s'",
> >  				      iter->base.path.buf);
> >  		}
> > +
> > +		string_list_clear(&level->entries, 0);
> >  	}
> >  
> >  	free(iter->levels);
> > diff --git a/dir-iterator.h b/dir-iterator.h
> > index 479e1ec784..6d438809b6 100644
> > --- a/dir-iterator.h
> > +++ b/dir-iterator.h
> > @@ -54,8 +54,11 @@
> >   *   and ITER_ERROR is returned immediately. In both cases, a meaningful
> >   *   warning is emitted. Note: ENOENT errors are always ignored so that
> >   *   the API users may remove files during iteration.
> > + *
> > + * - DIR_ITERATOR_SORTED: sort directory entries alphabetically.
> >   */
> >  #define DIR_ITERATOR_PEDANTIC (1 << 0)
> > +#define DIR_ITERATOR_SORTED   (1 << 1)
> >  
> >  struct dir_iterator {
> >  	/* The current path: */

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/6] refs: drop unused params from the reflog iterator callback
  2024-02-20  0:14   ` Junio C Hamano
@ 2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  8:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2472 bytes --]

On Mon, Feb 19, 2024 at 04:14:16PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > The ref and reflog iterators share much of the same underlying code to
> > iterate over the corresponding entries. This results in some weird code
> > because the reflog iterator also exposes an object ID as well as a flag
> > to the callback function. Neither of these fields do refer to the reflog
> > though -- they refer to the corresponding ref with the same name. This
> > is quite misleading. In practice at least the object ID cannot really be
> > implemented in any other way as a reflog does not have a specific object
> > ID in the first place. This is further stressed by the fact that none of
> > the callbacks except for our test helper make use of these fields.
> 
> Interesting observation.  Of course this will make the callstack
> longer by another level of indirection ...

It actually doesn't -- the old code had a `do_for_each_ref_helper()`
that did the same wrapping, so the level of indirection is exactly the
same. Over there we have it to drop the unused `struct repository`
parameter.

Patrick

> > +struct do_for_each_reflog_help {
> > +	each_reflog_fn *fn;
> > +	void *cb_data;
> > +};
> > +
> > +static int do_for_each_reflog_helper(struct repository *r UNUSED,
> > +				     const char *refname,
> > +				     const struct object_id *oid UNUSED,
> > +				     int flags,
> > +				     void *cb_data)
> > +{
> > +	struct do_for_each_reflog_help *hp = cb_data;
> > +	return hp->fn(refname, hp->cb_data);
> > +}
> 
> ... but I think it would be worth it.
> 
> > +/*
> > + * The signature for the callback function for the {refs_,}for_each_reflog()
> > + * functions below. The memory pointed to by the refname argument is only
> > + * guaranteed to be valid for the duration of a single callback invocation.
> > + */
> > +typedef int each_reflog_fn(const char *refname, void *cb_data);
> > +
> >  /*
> >   * Calls the specified function for each reflog file until it returns nonzero,
> >   * and returns the value. Reflog file order is unspecified.
> >   */
> > -int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data);
> > -int for_each_reflog(each_ref_fn fn, void *cb_data);
> > +int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_data);
> > +int for_each_reflog(each_reflog_fn fn, void *cb_data);
> 
> Nice simplification.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/6] refs: stop resolving ref corresponding to reflogs
  2024-02-20  0:14   ` Junio C Hamano
@ 2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  8:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 6112 bytes --]

On Mon, Feb 19, 2024 at 04:14:21PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Refactor the code to call `check_refname_format()` directly instead of
> > trying to resolve the ref. This is significantly more efficient given
> > that we don't have to hit the object database anymore to list reflogs.
> > And second, it ensures that we end up showing reflogs of broken refs,
> > which will help to make the reflog more useful.
> 
> And the user would notice corrupt ones among those reflogs listed
> when using "rev-list -g" on the reflog anyway?  Which sounds like a
> sensible thing to do.

Yeah. Overall the user experience is still quite lacking when you have
such "funny" reflogs. Corrupted ones would result in errors as you
mentioned, and that's to be expected in my opinion.

The more dubious behaviour is that `git reflog show $REFLOG` refuses to
show the reflog when the corresponding ref is missing. This is something
I plan to address in a follow-up patch series.

> > Note that this really only impacts the case where the corresponding ref
> > is corrupt. Reflogs for nonexistent refs would have been returned to the
> > caller beforehand already as we did not pass `RESOLVE_REF_READING` to
> > the function, and thus `refs_resolve_ref_unsafe()` would have returned
> > successfully in that case.
> 
> What do "Reflogs for nonexistent refs" really mean?  With the files
> backend, if "git branch -d main" that removed the "main" branch
> somehow forgot to remove the ".git/logs/refs/heads/main" file, the
> reflog entries in such a file is for nonexistent ref.  Is that what
> you meant?

Yes. Would "Reflogs which do not have a corresponding ref with the same
name" be clearer?

> As a tool to help diagnosing and correcting minor repo
> breakages, finding such a leftover file that should not exist is a
> good idea, I would think.
> 
> Would we see missing reflog for a ref that exists in the iteration?
> I guess we shouldn't, as the reflog iterator that recursively
> enumerates files under "$GIT_DIR/logs/" would not see such a missing
> reflog by definition.

No, and I'd claim we shouldn't. The reflog mechanism gives the user
control over which reflogs should and which shouldn't exist. For one,
`core.logAllRefUpdates` allows the user to either enable or disable the
reflog mechanism. If set to "false" then no reflogs are created, with
"true" some are created, and with "always" we always end up creating
reflogs. So depending on this setting it's expected that a subset of
reflogs do not exist.

But that'also not the whole story yet. Theoretically speaking, reflogs
have a subtle opt-in mechanism: once a reflog is created, we will
continue writing to it no matter what `core.logAllRefUpdates` says. So
it's feasible to have `core.logAllRefUpdates=false`, but then explicitly
create a specific reflog so that you log changes to a specific ref.

With this behaviour in mind I'd say that we shouldn't log missing
reflogs.

> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 2b3c99b00d..741148087d 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -2130,17 +2130,9 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
> >  	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
> >  		if (!S_ISREG(diter->st.st_mode))
> >  			continue;
> > -		if (diter->basename[0] == '.')
> > +		if (check_refname_format(diter->basename,
> > +					 REFNAME_ALLOW_ONELEVEL))
> >  			continue;
> 
> A tangent.
> 
> I've never liked the code arrangement in the check_refname_format()
> that assumes that each level can be separately checked with exactly
> the same logic, and the only thing ALLOW_ONELEVEL does is to include
> pseudorefs and HEAD; this makes such assumption even more ingrained.
> I am not sure what to think about it, but let's keep reading.

Yeah. This code here is basically just copied over from
`refs_resolve_ref_unsafe()` to ensure that it remains compatible. In a
future patch series we might include a new option `--include-broken`
that would also surface broken-but-safe reflog names.

But going down the tangent even more: one think I've noticed is that the
way `check_refname_format()` is structured is also wildly inefficient.
It's quite astonishing that when iterating over refs, we spend _more_
time in `check_refname_format()` than reading the refs from disk,
parsing them and massaging them into their final representation.

Overall, the whole infra to check refnames could use some improvement.
But this has already been discussed in other threads recently.

Patrick

> > -		if (ends_with(diter->basename, ".lock"))
> > -			continue;
> 
> This can safely go, as it is rejected by check_refname_format().
> 
> > -		if (!refs_resolve_ref_unsafe(iter->ref_store,
> > -					     diter->relative_path, 0,
> > -					     NULL, NULL)) {
> > -			error("bad ref for %s", diter->path.buf);
> > -			continue;
> > -		}
> 
> This is the focus of this step in the series.  We did not abort the
> iteration before, but now we no longer issue any error message.
> 
> >  		iter->base.refname = diter->relative_path;
> >  		return ITER_OK;
> > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> > index 889bb1f1ba..efbbf23c72 100644
> > --- a/refs/reftable-backend.c
> > +++ b/refs/reftable-backend.c
> > @@ -1659,11 +1659,9 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
> >  		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
> >  			continue;
> >  
> > -		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
> > -					     0, NULL, NULL)) {
> > -			error(_("bad ref for %s"), iter->log.refname);
> > +		if (check_refname_format(iter->log.refname,
> > +					 REFNAME_ALLOW_ONELEVEL))
> >  			continue;
> > -		}
> 
> This side is much more straight-forward.  Looking good.
> 
> >  
> >  		free(iter->last_name);
> >  		iter->last_name = xstrdup(iter->log.refname);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/6] builtin/reflog: introduce subcommand to list reflogs
  2024-02-20  0:32   ` Junio C Hamano
@ 2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  8:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3649 bytes --]

On Mon, Feb 19, 2024 at 04:32:43PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
> > index d2f5f42e67..6d8d5a253d 100755
> > --- a/t/t1410-reflog.sh
> > +++ b/t/t1410-reflog.sh
> > @@ -436,4 +436,73 @@ test_expect_success 'empty reflog' '
> >  	test_must_be_empty err
> >  '
> >  
> > +test_expect_success 'list reflogs' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		git reflog list >actual &&
> > +		test_must_be_empty actual &&
> > +
> > +		test_commit A &&
> > +		cat >expect <<-EOF &&
> > +		HEAD
> > +		refs/heads/main
> > +		EOF
> > +		git reflog list >actual &&
> > +		test_cmp expect actual &&
> > +
> > +		git branch b &&
> > +		cat >expect <<-EOF &&
> > +		HEAD
> > +		refs/heads/b
> > +		refs/heads/main
> > +		EOF
> > +		git reflog list >actual &&
> > +		test_cmp expect actual
> > +	)
> > +'
> 
> OK.  This is a quite boring baseline.
> 
> > +test_expect_success 'reflog list returns error with additional args' '
> > +	cat >expect <<-EOF &&
> > +	error: list does not accept arguments: ${SQ}bogus${SQ}
> > +	EOF
> > +	test_must_fail git reflog list bogus 2>err &&
> > +	test_cmp expect err
> > +'
> 
> Makes sense.
> 
> > +test_expect_success 'reflog for symref with unborn target can be listed' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		test_commit A &&
> > +		git symbolic-ref HEAD refs/heads/unborn &&
> > +		cat >expect <<-EOF &&
> > +		HEAD
> > +		refs/heads/main
> > +		EOF
> > +		git reflog list >actual &&
> > +		test_cmp expect actual
> > +	)
> > +'
> 
> Should this be under REFFILES?  Ah, no, "git symbolic-ref" is valid
> under reftable as well, so there is no need to.
> 
> Without [5/6], would it have failed to show the reflog for HEAD?

I initially thought so, but no. `refs_resolve_ref_unsafe()` is weird as
it returns successfully even if a symref cannot be resolved unless you
pass `RESOLVE_REF_READING`, which we didn't.

The case where it does make a difference is if we had a corrupt ref. So
if you "echo garbage >.git/refs/heads/branch", then the corresponding
reflog would not have been listed. Even worse, even after this patch
series it's still impossible to `git reflog show` the reflog because we
fail to resolve the ref itself, which basically breaks the whole point
of the reflog.

This is something that I plan to address in a follow-up patch series.

> > +test_expect_success 'reflog with invalid object ID can be listed' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		test_commit A &&
> > +		test-tool ref-store main update-ref msg refs/heads/missing \
> > +			$(test_oid deadbeef) "$ZERO_OID" REF_SKIP_OID_VERIFICATION &&
> > +		cat >expect <<-EOF &&
> > +		HEAD
> > +		refs/heads/main
> > +		refs/heads/missing
> > +		EOF
> > +		git reflog list >actual &&
> > +		test_cmp expect actual
> > +	)
> > +'
> 
> OK.
> 
> >  test_done
> 
> It would have been "interesting" to see an example of "there is a
> reflog but the underlying ref for it is missing" case, but I think
> that falls into a minor repository corruption category, so lack of
> such a test is also fine.

The reason why I didn't include such a test is that it's by necessity
specific to the backend: we don't have any way to delete a ref without
also deleting the corresponding reflog. So we'd have to manually delete
it, which only works with the REFFILES backend.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/6] refs/files: sort reflogs returned by the reflog iterator
  2024-02-20  0:04   ` Junio C Hamano
@ 2024-02-20  8:34     ` Patrick Steinhardt
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  8:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

On Mon, Feb 19, 2024 at 04:04:16PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > We use a directory iterator to return reflogs via the reflog iterator.
> > This iterator returns entries in the same order as readdir(3P) would and
> > will thus yield reflogs with no discernible order.
> >
> > Set the new `DIR_ITERATOR_SORTED` flag that was introduced in the
> > preceding commit so that the order is deterministic. While the effect of
> > this can only been observed in a test tool, a subsequent commit will
> > start to expose this functionality to users via a new `git reflog list`
> > subcommand.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  refs/files-backend.c           | 4 ++--
> >  t/t0600-reffiles-backend.sh    | 4 ++--
> >  t/t1405-main-ref-store.sh      | 2 +-
> >  t/t1406-submodule-ref-store.sh | 2 +-
> >  4 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 75dcc21ecb..2ffc63185f 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -2193,7 +2193,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
> >  
> >  	strbuf_addf(&sb, "%s/logs", gitdir);
> >  
> > -	diter = dir_iterator_begin(sb.buf, 0);
> > +	diter = dir_iterator_begin(sb.buf, DIR_ITERATOR_SORTED);
> >  	if (!diter) {
> >  		strbuf_release(&sb);
> >  		return empty_ref_iterator_begin();
> > @@ -2202,7 +2202,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
> >  	CALLOC_ARRAY(iter, 1);
> >  	ref_iterator = &iter->base;
> >  
> > -	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
> > +	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 1);
> 
> This caught my attention.  Once we apply this patch, the only way
> base_ref_iterator_init() can receive 0 for its last parameter
> (i.e. 'ordered') is via the merge_ref_iterator_begin() call in
> files_reflog_iterator_begin() that passes 0 as 'ordered'.  If we
> force files_reflog_iterator_begin() to ask for an ordered
> merge_ref_iterator, then we will have no unordered ref iterators.
> Am I reading the code right?

Ah, true indeed. The "files" reflog iterator was the only remaining
iterator that wasn't ordered. I'll include an additional patch on top
that drops the `ordered` bit altogether.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 0/7] reflog: introduce subcommand to list reflogs
  2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2024-02-19 14:35 ` [PATCH 6/6] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
@ 2024-02-20  9:06 ` Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 1/7] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
                     ` (7 more replies)
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
  7 siblings, 8 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  9:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 10804 bytes --]

Hi,

this is the second version of my patch series that introduces a new `git
reflog list` subcommand to list available reflogs in a repository.

Changes compared to v1:

  - Patch 2: Clarified the commit message to hopefully explain better
    why a higher level implementation of reflog sorting would have
    increased latency.

  - Patch 2: Introduced a helper function that unifies the logic to
    yield the next directory entry.

  - Patch 3: Mark the merged reflog iterator as sorted, which I missed
    in my previous round.

  - Patch 4: This patch is new and simplifies the code to require all
    ref iterators to be sorted.

Junio, I noticed that you already merged v1 of this patch series to
`next`. I was a bit surprised to see it merged down this fast, so I
assume that this is only done due to the pending Git v2.44 release and
that you plan to reroll `next` anyway. I thus didn't send follow-up
patches but resent the whole patch series as v2. If I misinterpreted
your intent I'm happy to send the changes as follow-up patches instead.

The patch series continues to depend on ps/reftable-backend at
8a0bebdeae (refs/reftable: fix leak when copying reflog fails,
2024-02-08).

Thanks!

Patrick

Patrick Steinhardt (7):
  dir-iterator: pass name to `prepare_next_entry_data()` directly
  dir-iterator: support iteration in sorted order
  refs/files: sort reflogs returned by the reflog iterator
  refs: always treat iterators as ordered
  refs: drop unused params from the reflog iterator callback
  refs: stop resolving ref corresponding to reflogs
  builtin/reflog: introduce subcommand to list reflogs

 Documentation/git-reflog.txt   |   3 +
 builtin/fsck.c                 |   4 +-
 builtin/reflog.c               |  37 +++++++++++-
 dir-iterator.c                 | 105 ++++++++++++++++++++++++++++-----
 dir-iterator.h                 |   3 +
 refs.c                         |  27 ++++++---
 refs.h                         |  11 +++-
 refs/debug.c                   |   3 +-
 refs/files-backend.c           |  27 ++-------
 refs/iterator.c                |  26 +++-----
 refs/packed-backend.c          |   2 +-
 refs/ref-cache.c               |   2 +-
 refs/refs-internal.h           |  18 +-----
 refs/reftable-backend.c        |  20 ++-----
 revision.c                     |   4 +-
 t/helper/test-ref-store.c      |  18 ++++--
 t/t0600-reffiles-backend.sh    |  24 ++++----
 t/t1405-main-ref-store.sh      |   8 +--
 t/t1406-submodule-ref-store.sh |   8 +--
 t/t1410-reflog.sh              |  69 ++++++++++++++++++++++
 20 files changed, 286 insertions(+), 133 deletions(-)

Range-diff against v1:
1:  12de25dfe2 = 1:  12de25dfe2 dir-iterator: pass name to `prepare_next_entry_data()` directly
2:  8a588175db ! 2:  788afce189 dir-iterator: support iteration in sorted order
    @@ Commit message
         iterator to enumerate reflogs, returning reflog names and exposing them
         to the user would inherit the indeterministic ordering. Naturally, it
         would make for a terrible user interface to show a list with no
    -    discernible order. While this could be handled at a higher level by the
    -    new subcommand itself by collecting and ordering the reflogs, this would
    -    be inefficient and introduce latency when there are many reflogs.
    +    discernible order.
    +
    +    While this could be handled at a higher level by the new subcommand
    +    itself by collecting and ordering the reflogs, this would be inefficient
    +    because we would first have to collect all reflogs before we can sort
    +    them, which would introduce additional latency when there are many
    +    reflogs.
     
         Instead, introduce a new option into the directory iterator that asks
         for its entries to be yielded in lexicographical order. If set, the
    -    iterator will read all directory entries greedily end sort them before
    +    iterator will read all directory entries greedily and sort them before
         we start to iterate over them.
     
         While this will of course also incur overhead as we cannot yield the
    @@ dir-iterator.c
      
      struct dir_iterator_level {
      	DIR *dir;
    + 
    ++	/*
    ++	 * The directory entries of the current level. This list will only be
    ++	 * populated when the iterator is ordered. In that case, `dir` will be
    ++	 * set to `NULL`.
    ++	 */
     +	struct string_list entries;
     +	size_t entries_idx;
    - 
    ++
      	/*
      	 * The length of the directory part of path at this level
    + 	 * (including a trailing '/'):
    +@@ dir-iterator.c: struct dir_iterator_int {
    + 	unsigned int flags;
    + };
    + 
    ++static int next_directory_entry(DIR *dir, const char *path,
    ++				struct dirent **out)
    ++{
    ++	struct dirent *de;
    ++
    ++repeat:
    ++	errno = 0;
    ++	de = readdir(dir);
    ++	if (!de) {
    ++		if (errno) {
    ++			warning_errno("error reading directory '%s'",
    ++				      path);
    ++			return -1;
    ++		}
    ++
    ++		return 1;
    ++	}
    ++
    ++	if (is_dot_or_dotdot(de->d_name))
    ++		goto repeat;
    ++
    ++	*out = de;
    ++	return 0;
    ++}
    ++
    + /*
    +  * Push a level in the iter stack and initialize it with information from
    +  * the directory pointed by iter->base->path. It is assumed that this
     @@ dir-iterator.c: static int push_level(struct dir_iterator_int *iter)
      		return -1;
      	}
    @@ dir-iterator.c: static int push_level(struct dir_iterator_int *iter)
     +	 * directly.
     +	 */
     +	if (iter->flags & DIR_ITERATOR_SORTED) {
    -+		while (1) {
    -+			struct dirent *de;
    ++		struct dirent *de;
     +
    -+			errno = 0;
    -+			de = readdir(level->dir);
    -+			if (!de) {
    -+				if (errno && errno != ENOENT) {
    -+					warning_errno("error reading directory '%s'",
    -+						      iter->base.path.buf);
    ++		while (1) {
    ++			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
    ++			if (ret < 0) {
    ++				if (errno != ENOENT &&
    ++				    iter->flags & DIR_ITERATOR_PEDANTIC)
     +					return -1;
    -+				}
    -+
    ++				continue;
    ++			} else if (ret > 0) {
     +				break;
     +			}
     +
    -+			if (is_dot_or_dotdot(de->d_name))
    -+				continue;
    -+
     +			string_list_append(&level->entries, de->d_name);
     +		}
     +		string_list_sort(&level->entries);
    @@ dir-iterator.c: static int pop_level(struct dir_iterator_int *iter)
      	return --iter->levels_nr;
      }
     @@ dir-iterator.c: int dir_iterator_advance(struct dir_iterator *dir_iterator)
    - 
    - 	/* Loop until we find an entry that we can give back to the caller. */
    - 	while (1) {
    --		struct dirent *de;
    + 		struct dirent *de;
      		struct dir_iterator_level *level =
      			&iter->levels[iter->levels_nr - 1];
    -+		struct dirent *de;
     +		const char *name;
      
      		strbuf_setlen(&iter->base.path, level->prefix_len);
     -		errno = 0;
     -		de = readdir(level->dir);
    --
    + 
     -		if (!de) {
     -			if (errno) {
     -				warning_errno("error reading directory '%s'",
     -					      iter->base.path.buf);
    --				if (iter->flags & DIR_ITERATOR_PEDANTIC)
    --					goto error_out;
    ++		if (level->dir) {
    ++			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
    ++			if (ret < 0) {
    + 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
    + 					goto error_out;
     -			} else if (pop_level(iter) == 0) {
     -				return dir_iterator_abort(dir_iterator);
    -+
    -+		if (level->dir) {
    -+			errno = 0;
    -+			de = readdir(level->dir);
    -+			if (!de) {
    -+				if (errno) {
    -+					warning_errno("error reading directory '%s'",
    -+						      iter->base.path.buf);
    -+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
    -+						goto error_out;
    -+				} else if (pop_level(iter) == 0) {
    ++				continue;
    ++			} else if (ret > 0) {
    ++				if (pop_level(iter) == 0)
     +					return dir_iterator_abort(dir_iterator);
    -+				}
     +				continue;
      			}
     -			continue;
    @@ dir-iterator.c: int dir_iterator_advance(struct dir_iterator *dir_iterator)
      
     -		if (is_dot_or_dotdot(de->d_name))
     -			continue;
    -+			if (is_dot_or_dotdot(de->d_name))
    -+				continue;
    - 
    --		if (prepare_next_entry_data(iter, de->d_name)) {
     +			name = de->d_name;
     +		} else {
     +			if (level->entries_idx >= level->entries.nr) {
    @@ dir-iterator.c: int dir_iterator_advance(struct dir_iterator *dir_iterator)
     +					return dir_iterator_abort(dir_iterator);
     +				continue;
     +			}
    -+
    + 
    +-		if (prepare_next_entry_data(iter, de->d_name)) {
     +			name = level->entries.items[level->entries_idx++].string;
     +		}
     +
3:  e4e4fac05c ! 3:  32b24a3d4b refs/files: sort reflogs returned by the reflog iterator
    @@ refs/files-backend.c: static struct ref_iterator *reflog_iterator_begin(struct r
      	iter->dir_iterator = diter;
      	iter->ref_store = ref_store;
      	strbuf_release(&sb);
    +@@ refs/files-backend.c: static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
    + 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
    + 	} else {
    + 		return merge_ref_iterator_begin(
    +-			0, reflog_iterator_begin(ref_store, refs->base.gitdir),
    ++			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
    + 			reflog_iterator_begin(ref_store, refs->gitcommondir),
    + 			reflog_iterator_select, refs);
    + 	}
     
      ## t/t0600-reffiles-backend.sh ##
     @@ t/t0600-reffiles-backend.sh: test_expect_success 'for_each_reflog()' '
-:  ---------- > 4:  4254f23fd4 refs: always treat iterators as ordered
4:  be512ef268 ! 5:  240334df6c refs: drop unused params from the reflog iterator callback
    @@ refs/reftable-backend.c: static int reftable_reflog_iterator_advance(struct ref_
      	}
     @@ refs/reftable-backend.c: static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
      	iter = xcalloc(1, sizeof(*iter));
    - 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
    + 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable);
      	iter->refs = refs;
     -	iter->base.oid = &iter->oid;
      
5:  a7459b9483 = 6:  7928661318 refs: stop resolving ref corresponding to reflogs
6:  cddb2de939 = 7:  d7b9cff4c3 builtin/reflog: introduce subcommand to list reflogs
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/7] dir-iterator: pass name to `prepare_next_entry_data()` directly
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
@ 2024-02-20  9:06   ` Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 2/7] dir-iterator: support iteration in sorted order Patrick Steinhardt
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  9:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]

When adding the next directory entry for `struct dir_iterator` we pass
the complete `struct dirent *` to `prepare_next_entry_data()` even
though we only need the entry's name.

Refactor the code to pass in the name, only. This prepares for a
subsequent commit where we introduce the ability to iterate through
dir entries in an ordered manner.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 dir-iterator.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 278b04243a..f58a97e089 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -94,15 +94,15 @@ static int pop_level(struct dir_iterator_int *iter)
 
 /*
  * Populate iter->base with the necessary information on the next iteration
- * entry, represented by the given dirent de. Return 0 on success and -1
+ * entry, represented by the given name. Return 0 on success and -1
  * otherwise, setting errno accordingly.
  */
 static int prepare_next_entry_data(struct dir_iterator_int *iter,
-				   struct dirent *de)
+				   const char *name)
 {
 	int err, saved_errno;
 
-	strbuf_addstr(&iter->base.path, de->d_name);
+	strbuf_addstr(&iter->base.path, name);
 	/*
 	 * We have to reset these because the path strbuf might have
 	 * been realloc()ed at the previous strbuf_addstr().
@@ -159,7 +159,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
-		if (prepare_next_entry_data(iter, de)) {
+		if (prepare_next_entry_data(iter, de->d_name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
 				goto error_out;
 			continue;
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/7] dir-iterator: support iteration in sorted order
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 1/7] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
@ 2024-02-20  9:06   ` Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 3/7] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  9:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 6195 bytes --]

The `struct dir_iterator` is a helper that allows us to iterate through
directory entries. This iterator returns entries in the exact same order
as readdir(3P) does -- or in other words, it guarantees no specific
order at all.

This is about to become problematic as we are introducing a new reflog
subcommand to list reflogs. As the "files" backend uses the directory
iterator to enumerate reflogs, returning reflog names and exposing them
to the user would inherit the indeterministic ordering. Naturally, it
would make for a terrible user interface to show a list with no
discernible order.

While this could be handled at a higher level by the new subcommand
itself by collecting and ordering the reflogs, this would be inefficient
because we would first have to collect all reflogs before we can sort
them, which would introduce additional latency when there are many
reflogs.

Instead, introduce a new option into the directory iterator that asks
for its entries to be yielded in lexicographical order. If set, the
iterator will read all directory entries greedily and sort them before
we start to iterate over them.

While this will of course also incur overhead as we cannot yield the
directory entries immediately, it should at least be more efficient than
having to sort the complete list of reflogs as we only need to sort one
directory at a time.

This functionality will be used in a follow-up commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 dir-iterator.c | 99 +++++++++++++++++++++++++++++++++++++++++++-------
 dir-iterator.h |  3 ++
 2 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f58a97e089..de619846f2 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -2,10 +2,19 @@
 #include "dir.h"
 #include "iterator.h"
 #include "dir-iterator.h"
+#include "string-list.h"
 
 struct dir_iterator_level {
 	DIR *dir;
 
+	/*
+	 * The directory entries of the current level. This list will only be
+	 * populated when the iterator is ordered. In that case, `dir` will be
+	 * set to `NULL`.
+	 */
+	struct string_list entries;
+	size_t entries_idx;
+
 	/*
 	 * The length of the directory part of path at this level
 	 * (including a trailing '/'):
@@ -43,6 +52,31 @@ struct dir_iterator_int {
 	unsigned int flags;
 };
 
+static int next_directory_entry(DIR *dir, const char *path,
+				struct dirent **out)
+{
+	struct dirent *de;
+
+repeat:
+	errno = 0;
+	de = readdir(dir);
+	if (!de) {
+		if (errno) {
+			warning_errno("error reading directory '%s'",
+				      path);
+			return -1;
+		}
+
+		return 1;
+	}
+
+	if (is_dot_or_dotdot(de->d_name))
+		goto repeat;
+
+	*out = de;
+	return 0;
+}
+
 /*
  * Push a level in the iter stack and initialize it with information from
  * the directory pointed by iter->base->path. It is assumed that this
@@ -72,6 +106,35 @@ static int push_level(struct dir_iterator_int *iter)
 		return -1;
 	}
 
+	string_list_init_dup(&level->entries);
+	level->entries_idx = 0;
+
+	/*
+	 * When the iterator is sorted we read and sort all directory entries
+	 * directly.
+	 */
+	if (iter->flags & DIR_ITERATOR_SORTED) {
+		struct dirent *de;
+
+		while (1) {
+			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
+			if (ret < 0) {
+				if (errno != ENOENT &&
+				    iter->flags & DIR_ITERATOR_PEDANTIC)
+					return -1;
+				continue;
+			} else if (ret > 0) {
+				break;
+			}
+
+			string_list_append(&level->entries, de->d_name);
+		}
+		string_list_sort(&level->entries);
+
+		closedir(level->dir);
+		level->dir = NULL;
+	}
+
 	return 0;
 }
 
@@ -88,6 +151,7 @@ static int pop_level(struct dir_iterator_int *iter)
 		warning_errno("error closing directory '%s'",
 			      iter->base.path.buf);
 	level->dir = NULL;
+	string_list_clear(&level->entries, 0);
 
 	return --iter->levels_nr;
 }
@@ -139,27 +203,34 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		struct dirent *de;
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
+		const char *name;
 
 		strbuf_setlen(&iter->base.path, level->prefix_len);
-		errno = 0;
-		de = readdir(level->dir);
 
-		if (!de) {
-			if (errno) {
-				warning_errno("error reading directory '%s'",
-					      iter->base.path.buf);
+		if (level->dir) {
+			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
+			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
 					goto error_out;
-			} else if (pop_level(iter) == 0) {
-				return dir_iterator_abort(dir_iterator);
+				continue;
+			} else if (ret > 0) {
+				if (pop_level(iter) == 0)
+					return dir_iterator_abort(dir_iterator);
+				continue;
 			}
-			continue;
-		}
 
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
+			name = de->d_name;
+		} else {
+			if (level->entries_idx >= level->entries.nr) {
+				if (pop_level(iter) == 0)
+					return dir_iterator_abort(dir_iterator);
+				continue;
+			}
 
-		if (prepare_next_entry_data(iter, de->d_name)) {
+			name = level->entries.items[level->entries_idx++].string;
+		}
+
+		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
 				goto error_out;
 			continue;
@@ -188,6 +259,8 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 			warning_errno("error closing directory '%s'",
 				      iter->base.path.buf);
 		}
+
+		string_list_clear(&level->entries, 0);
 	}
 
 	free(iter->levels);
diff --git a/dir-iterator.h b/dir-iterator.h
index 479e1ec784..6d438809b6 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -54,8 +54,11 @@
  *   and ITER_ERROR is returned immediately. In both cases, a meaningful
  *   warning is emitted. Note: ENOENT errors are always ignored so that
  *   the API users may remove files during iteration.
+ *
+ * - DIR_ITERATOR_SORTED: sort directory entries alphabetically.
  */
 #define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_SORTED   (1 << 1)
 
 struct dir_iterator {
 	/* The current path: */
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/7] refs/files: sort reflogs returned by the reflog iterator
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 1/7] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 2/7] dir-iterator: support iteration in sorted order Patrick Steinhardt
@ 2024-02-20  9:06   ` Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 4/7] refs: always treat iterators as ordered Patrick Steinhardt
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  9:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3875 bytes --]

We use a directory iterator to return reflogs via the reflog iterator.
This iterator returns entries in the same order as readdir(3P) would and
will thus yield reflogs with no discernible order.

Set the new `DIR_ITERATOR_SORTED` flag that was introduced in the
preceding commit so that the order is deterministic. While the effect of
this can only been observed in a test tool, a subsequent commit will
start to expose this functionality to users via a new `git reflog list`
subcommand.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c           | 6 +++---
 t/t0600-reffiles-backend.sh    | 4 ++--
 t/t1405-main-ref-store.sh      | 2 +-
 t/t1406-submodule-ref-store.sh | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 75dcc21ecb..a7b7cdef36 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2193,7 +2193,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	strbuf_addf(&sb, "%s/logs", gitdir);
 
-	diter = dir_iterator_begin(sb.buf, 0);
+	diter = dir_iterator_begin(sb.buf, DIR_ITERATOR_SORTED);
 	if (!diter) {
 		strbuf_release(&sb);
 		return empty_ref_iterator_begin();
@@ -2202,7 +2202,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 1);
 	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
@@ -2246,7 +2246,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
 	} else {
 		return merge_ref_iterator_begin(
-			0, reflog_iterator_begin(ref_store, refs->base.gitdir),
+			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
 			reflog_iterator_begin(ref_store, refs->gitcommondir),
 			reflog_iterator_select, refs);
 	}
diff --git a/t/t0600-reffiles-backend.sh b/t/t0600-reffiles-backend.sh
index e6a5f1868f..4f860285cc 100755
--- a/t/t0600-reffiles-backend.sh
+++ b/t/t0600-reffiles-backend.sh
@@ -287,7 +287,7 @@ test_expect_success 'for_each_reflog()' '
 	mkdir -p     .git/worktrees/wt/logs/refs/bisect &&
 	echo $ZERO_OID > .git/worktrees/wt/logs/refs/bisect/wt-random &&
 
-	$RWT for-each-reflog | cut -d" " -f 2- | sort >actual &&
+	$RWT for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	PSEUDO-WT 0x0
@@ -297,7 +297,7 @@ test_expect_success 'for_each_reflog()' '
 	EOF
 	test_cmp expected actual &&
 
-	$RMAIN for-each-reflog | cut -d" " -f 2- | sort >actual &&
+	$RMAIN for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	PSEUDO-MAIN 0x0
diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index 976bd71efb..cfb583f544 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -74,7 +74,7 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | sort -k2 | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	refs/heads/main 0x0
diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
index e6a7f7334b..40332e23cc 100755
--- a/t/t1406-submodule-ref-store.sh
+++ b/t/t1406-submodule-ref-store.sh
@@ -63,7 +63,7 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | sort | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	refs/heads/main 0x0
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 4/7] refs: always treat iterators as ordered
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-02-20  9:06   ` [PATCH v2 3/7] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
@ 2024-02-20  9:06   ` Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 5/7] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  9:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 11337 bytes --]

In the preceding commit we have converted the reflog iterator of the
"files" backend to be ordered, which was the only remaining ref iterator
that wasn't ordered. Refactor the ref iterator infrastructure so that we
always assume iterators to be ordered, thus simplifying the code.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c                  |  4 ----
 refs/debug.c            |  3 +--
 refs/files-backend.c    |  7 +++----
 refs/iterator.c         | 26 ++++++++------------------
 refs/packed-backend.c   |  2 +-
 refs/ref-cache.c        |  2 +-
 refs/refs-internal.h    | 18 ++----------------
 refs/reftable-backend.c |  8 ++++----
 8 files changed, 20 insertions(+), 50 deletions(-)

diff --git a/refs.c b/refs.c
index fff343c256..dc25606a82 100644
--- a/refs.c
+++ b/refs.c
@@ -1594,10 +1594,6 @@ struct ref_iterator *refs_ref_iterator_begin(
 	if (trim)
 		iter = prefix_ref_iterator_begin(iter, "", trim);
 
-	/* Sanity check for subclasses: */
-	if (!iter->ordered)
-		BUG("reference iterator is not ordered");
-
 	return iter;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index 634681ca44..c7531b17f0 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -181,7 +181,6 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		trace_printf_key(&trace_refs, "iterator_advance: %s (0)\n",
 			diter->iter->refname);
 
-	diter->base.ordered = diter->iter->ordered;
 	diter->base.refname = diter->iter->refname;
 	diter->base.oid = diter->iter->oid;
 	diter->base.flags = diter->iter->flags;
@@ -222,7 +221,7 @@ debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
 		drefs->refs->be->iterator_begin(drefs->refs, prefix,
 						exclude_patterns, flags);
 	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
-	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
+	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable);
 	diter->iter = res;
 	trace_printf_key(&trace_refs, "ref_iterator_begin: \"%s\" (0x%x)\n",
 			 prefix, flags);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index a7b7cdef36..51d57d98d2 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -879,8 +879,7 @@ static struct ref_iterator *files_ref_iterator_begin(
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable,
-			       overlay_iter->ordered);
+	base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable);
 	iter->iter0 = overlay_iter;
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
@@ -2202,7 +2201,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
 	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
@@ -2246,7 +2245,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
 	} else {
 		return merge_ref_iterator_begin(
-			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
+			reflog_iterator_begin(ref_store, refs->base.gitdir),
 			reflog_iterator_begin(ref_store, refs->gitcommondir),
 			reflog_iterator_select, refs);
 	}
diff --git a/refs/iterator.c b/refs/iterator.c
index 6b680f610e..f9a9a808e0 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -25,11 +25,9 @@ int ref_iterator_abort(struct ref_iterator *ref_iterator)
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
-			    struct ref_iterator_vtable *vtable,
-			    int ordered)
+			    struct ref_iterator_vtable *vtable)
 {
 	iter->vtable = vtable;
-	iter->ordered = !!ordered;
 	iter->refname = NULL;
 	iter->oid = NULL;
 	iter->flags = 0;
@@ -74,7 +72,7 @@ struct ref_iterator *empty_ref_iterator_begin(void)
 	struct empty_ref_iterator *iter = xcalloc(1, sizeof(*iter));
 	struct ref_iterator *ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable);
 	return ref_iterator;
 }
 
@@ -207,7 +205,6 @@ static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
-		int ordered,
 		struct ref_iterator *iter0, struct ref_iterator *iter1,
 		ref_iterator_select_fn *select, void *cb_data)
 {
@@ -222,7 +219,7 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 * references through only if they exist in both iterators.
 	 */
 
-	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable, ordered);
+	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
 	iter->iter0 = iter0;
 	iter->iter1 = iter1;
 	iter->select = select;
@@ -271,12 +268,9 @@ struct ref_iterator *overlay_ref_iterator_begin(
 	} else if (is_empty_ref_iterator(back)) {
 		ref_iterator_abort(back);
 		return front;
-	} else if (!front->ordered || !back->ordered) {
-		BUG("overlay_ref_iterator requires ordered inputs");
 	}
 
-	return merge_ref_iterator_begin(1, front, back,
-					overlay_iterator_select, NULL);
+	return merge_ref_iterator_begin(front, back, overlay_iterator_select, NULL);
 }
 
 struct prefix_ref_iterator {
@@ -315,16 +309,12 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (cmp > 0) {
 			/*
-			 * If the source iterator is ordered, then we
+			 * As the source iterator is ordered, we
 			 * can stop the iteration as soon as we see a
 			 * refname that comes after the prefix:
 			 */
-			if (iter->iter0->ordered) {
-				ok = ref_iterator_abort(iter->iter0);
-				break;
-			} else {
-				continue;
-			}
+			ok = ref_iterator_abort(iter->iter0);
+			break;
 		}
 
 		if (iter->trim) {
@@ -396,7 +386,7 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &prefix_ref_iterator_vtable, iter0->ordered);
+	base_ref_iterator_init(ref_iterator, &prefix_ref_iterator_vtable);
 
 	iter->iter0 = iter0;
 	iter->prefix = xstrdup(prefix);
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a499a91c7e..4e826c05ff 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1111,7 +1111,7 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
 
 	if (exclude_patterns)
 		populate_excluded_jump_list(iter, snapshot, exclude_patterns);
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index a372a00941..9f9797209a 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -486,7 +486,7 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
 	iter->levels_nr = 1;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 83e0f0bba3..1e8a9f9f13 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -312,13 +312,6 @@ enum do_for_each_ref_flags {
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
-
-	/*
-	 * Does this `ref_iterator` iterate over references in order
-	 * by refname?
-	 */
-	unsigned int ordered : 1;
-
 	const char *refname;
 	const struct object_id *oid;
 	unsigned int flags;
@@ -390,11 +383,9 @@ typedef enum iterator_selection ref_iterator_select_fn(
  * Iterate over the entries from iter0 and iter1, with the values
  * interleaved as directed by the select function. The iterator takes
  * ownership of iter0 and iter1 and frees them when the iteration is
- * over. A derived class should set `ordered` to 1 or 0 based on
- * whether it generates its output in order by reference name.
+ * over.
  */
 struct ref_iterator *merge_ref_iterator_begin(
-		int ordered,
 		struct ref_iterator *iter0, struct ref_iterator *iter1,
 		ref_iterator_select_fn *select, void *cb_data);
 
@@ -423,8 +414,6 @@ struct ref_iterator *overlay_ref_iterator_begin(
  * As an convenience to callers, if prefix is the empty string and
  * trim is zero, this function returns iter0 directly, without
  * wrapping it.
- *
- * The resulting ref_iterator is ordered if iter0 is.
  */
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 					       const char *prefix,
@@ -435,14 +424,11 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 /*
  * Base class constructor for ref_iterators. Initialize the
  * ref_iterator part of iter, setting its vtable pointer as specified.
- * `ordered` should be set to 1 if the iterator will iterate over
- * references in order by refname; otherwise it should be set to 0.
  * This is meant to be called only by the initializers of derived
  * classes.
  */
 void base_ref_iterator_init(struct ref_iterator *iter,
-			    struct ref_iterator_vtable *vtable,
-			    int ordered);
+			    struct ref_iterator_vtable *vtable);
 
 /*
  * Base class destructor for ref_iterators. Destroy the ref_iterator
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index a14f2ad7f4..70a16dfb9e 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -479,7 +479,7 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	int ret;
 
 	iter = xcalloc(1, sizeof(*iter));
-	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable, 1);
+	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable);
 	iter->prefix = prefix;
 	iter->base.oid = &iter->oid;
 	iter->flags = flags;
@@ -575,7 +575,7 @@ static struct ref_iterator *reftable_be_iterator_begin(struct ref_store *ref_sto
 	 * single iterator.
 	 */
 	worktree_iter = ref_iterator_for_stack(refs, refs->worktree_stack, prefix, flags);
-	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+	return merge_ref_iterator_begin(&worktree_iter->base, &main_iter->base,
 					iterator_select, NULL);
 }
 
@@ -1723,7 +1723,7 @@ static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
 	int ret;
 
 	iter = xcalloc(1, sizeof(*iter));
-	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
+	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable);
 	iter->refs = refs;
 	iter->base.oid = &iter->oid;
 
@@ -1758,7 +1758,7 @@ static struct ref_iterator *reftable_be_reflog_iterator_begin(struct ref_store *
 
 	worktree_iter = reflog_iterator_for_stack(refs, refs->worktree_stack);
 
-	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+	return merge_ref_iterator_begin(&worktree_iter->base, &main_iter->base,
 					iterator_select, NULL);
 }
 
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 5/7] refs: drop unused params from the reflog iterator callback
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2024-02-20  9:06   ` [PATCH v2 4/7] refs: always treat iterators as ordered Patrick Steinhardt
@ 2024-02-20  9:06   ` Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 6/7] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  9:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 12615 bytes --]

The ref and reflog iterators share much of the same underlying code to
iterate over the corresponding entries. This results in some weird code
because the reflog iterator also exposes an object ID as well as a flag
to the callback function. Neither of these fields do refer to the reflog
though -- they refer to the corresponding ref with the same name. This
is quite misleading. In practice at least the object ID cannot really be
implemented in any other way as a reflog does not have a specific object
ID in the first place. This is further stressed by the fact that none of
the callbacks except for our test helper make use of these fields.

Split up the infrastucture so that ref and reflog iterators use separate
callback signatures. This allows us to drop the nonsensical fields from
the reflog iterator.

Note that internally, the backends still use the same shared infra to
iterate over both types. As the backends should never end up being
called directly anyway, this is not much of a problem and thus kept
as-is for simplicity's sake.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fsck.c                 |  4 +---
 builtin/reflog.c               |  3 +--
 refs.c                         | 23 +++++++++++++++++++----
 refs.h                         | 11 +++++++++--
 refs/files-backend.c           |  8 +-------
 refs/reftable-backend.c        |  8 +-------
 revision.c                     |  4 +---
 t/helper/test-ref-store.c      | 18 ++++++++++++------
 t/t0600-reffiles-backend.sh    | 24 ++++++++++++------------
 t/t1405-main-ref-store.sh      |  8 ++++----
 t/t1406-submodule-ref-store.sh |  8 ++++----
 11 files changed, 65 insertions(+), 54 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index a7cf94f67e..f892487c9b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -509,9 +509,7 @@ static int fsck_handle_reflog_ent(struct object_id *ooid, struct object_id *noid
 	return 0;
 }
 
-static int fsck_handle_reflog(const char *logname,
-			      const struct object_id *oid UNUSED,
-			      int flag UNUSED, void *cb_data)
+static int fsck_handle_reflog(const char *logname, void *cb_data)
 {
 	struct strbuf refname = STRBUF_INIT;
 
diff --git a/builtin/reflog.c b/builtin/reflog.c
index a5a4099f61..3a0c4d4322 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -60,8 +60,7 @@ struct worktree_reflogs {
 	struct string_list reflogs;
 };
 
-static int collect_reflog(const char *ref, const struct object_id *oid UNUSED,
-			  int flags UNUSED, void *cb_data)
+static int collect_reflog(const char *ref, void *cb_data)
 {
 	struct worktree_reflogs *cb = cb_data;
 	struct worktree *worktree = cb->worktree;
diff --git a/refs.c b/refs.c
index dc25606a82..f9261267f0 100644
--- a/refs.c
+++ b/refs.c
@@ -2512,18 +2512,33 @@ int refs_verify_refname_available(struct ref_store *refs,
 	return ret;
 }
 
-int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data)
+struct do_for_each_reflog_help {
+	each_reflog_fn *fn;
+	void *cb_data;
+};
+
+static int do_for_each_reflog_helper(struct repository *r UNUSED,
+				     const char *refname,
+				     const struct object_id *oid UNUSED,
+				     int flags,
+				     void *cb_data)
+{
+	struct do_for_each_reflog_help *hp = cb_data;
+	return hp->fn(refname, hp->cb_data);
+}
+
+int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_data)
 {
 	struct ref_iterator *iter;
-	struct do_for_each_ref_help hp = { fn, cb_data };
+	struct do_for_each_reflog_help hp = { fn, cb_data };
 
 	iter = refs->be->reflog_iterator_begin(refs);
 
 	return do_for_each_repo_ref_iterator(the_repository, iter,
-					     do_for_each_ref_helper, &hp);
+					     do_for_each_reflog_helper, &hp);
 }
 
-int for_each_reflog(each_ref_fn fn, void *cb_data)
+int for_each_reflog(each_reflog_fn fn, void *cb_data)
 {
 	return refs_for_each_reflog(get_main_ref_store(the_repository), fn, cb_data);
 }
diff --git a/refs.h b/refs.h
index 303c5fac4d..895579aeb7 100644
--- a/refs.h
+++ b/refs.h
@@ -534,12 +534,19 @@ int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_dat
 /* youngest entry first */
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
+/*
+ * The signature for the callback function for the {refs_,}for_each_reflog()
+ * functions below. The memory pointed to by the refname argument is only
+ * guaranteed to be valid for the duration of a single callback invocation.
+ */
+typedef int each_reflog_fn(const char *refname, void *cb_data);
+
 /*
  * Calls the specified function for each reflog file until it returns nonzero,
  * and returns the value. Reflog file order is unspecified.
  */
-int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data);
-int for_each_reflog(each_ref_fn fn, void *cb_data);
+int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_data);
+int for_each_reflog(each_reflog_fn fn, void *cb_data);
 
 #define REFNAME_ALLOW_ONELEVEL 1
 #define REFNAME_REFSPEC_PATTERN 2
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 51d57d98d2..48cc60d71b 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2115,10 +2115,8 @@ static int files_for_each_reflog_ent(struct ref_store *ref_store,
 
 struct files_reflog_iterator {
 	struct ref_iterator base;
-
 	struct ref_store *ref_store;
 	struct dir_iterator *dir_iterator;
-	struct object_id oid;
 };
 
 static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
@@ -2129,8 +2127,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
-		int flags;
-
 		if (!S_ISREG(diter->st.st_mode))
 			continue;
 		if (diter->basename[0] == '.')
@@ -2140,14 +2136,12 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (!refs_resolve_ref_unsafe(iter->ref_store,
 					     diter->relative_path, 0,
-					     &iter->oid, &flags)) {
+					     NULL, NULL)) {
 			error("bad ref for %s", diter->path.buf);
 			continue;
 		}
 
 		iter->base.refname = diter->relative_path;
-		iter->base.oid = &iter->oid;
-		iter->base.flags = flags;
 		return ITER_OK;
 	}
 
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 70a16dfb9e..5247e09d58 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1637,7 +1637,6 @@ struct reftable_reflog_iterator {
 	struct reftable_ref_store *refs;
 	struct reftable_iterator iter;
 	struct reftable_log_record log;
-	struct object_id oid;
 	char *last_name;
 	int err;
 };
@@ -1648,8 +1647,6 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		(struct reftable_reflog_iterator *)ref_iterator;
 
 	while (!iter->err) {
-		int flags;
-
 		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
 		if (iter->err)
 			break;
@@ -1663,7 +1660,7 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 			continue;
 
 		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
-					     0, &iter->oid, &flags)) {
+					     0, NULL, NULL)) {
 			error(_("bad ref for %s"), iter->log.refname);
 			continue;
 		}
@@ -1671,8 +1668,6 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		free(iter->last_name);
 		iter->last_name = xstrdup(iter->log.refname);
 		iter->base.refname = iter->log.refname;
-		iter->base.oid = &iter->oid;
-		iter->base.flags = flags;
 
 		break;
 	}
@@ -1725,7 +1720,6 @@ static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable);
 	iter->refs = refs;
-	iter->base.oid = &iter->oid;
 
 	ret = refs->err;
 	if (ret)
diff --git a/revision.c b/revision.c
index 2424c9bd67..ac45c6d8f2 100644
--- a/revision.c
+++ b/revision.c
@@ -1686,9 +1686,7 @@ static int handle_one_reflog_ent(struct object_id *ooid, struct object_id *noid,
 	return 0;
 }
 
-static int handle_one_reflog(const char *refname_in_wt,
-			     const struct object_id *oid UNUSED,
-			     int flag UNUSED, void *cb_data)
+static int handle_one_reflog(const char *refname_in_wt, void *cb_data)
 {
 	struct all_refs_cb *cb = cb_data;
 	struct strbuf refname = STRBUF_INIT;
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 702ec1f128..7a0f6cac53 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -221,15 +221,21 @@ static int cmd_verify_ref(struct ref_store *refs, const char **argv)
 	return ret;
 }
 
+static int each_reflog(const char *refname, void *cb_data UNUSED)
+{
+	printf("%s\n", refname);
+	return 0;
+}
+
 static int cmd_for_each_reflog(struct ref_store *refs,
 			       const char **argv UNUSED)
 {
-	return refs_for_each_reflog(refs, each_ref, NULL);
+	return refs_for_each_reflog(refs, each_reflog, NULL);
 }
 
-static int each_reflog(struct object_id *old_oid, struct object_id *new_oid,
-		       const char *committer, timestamp_t timestamp,
-		       int tz, const char *msg, void *cb_data UNUSED)
+static int each_reflog_ent(struct object_id *old_oid, struct object_id *new_oid,
+			   const char *committer, timestamp_t timestamp,
+			   int tz, const char *msg, void *cb_data UNUSED)
 {
 	printf("%s %s %s %" PRItime " %+05d%s%s", oid_to_hex(old_oid),
 	       oid_to_hex(new_oid), committer, timestamp, tz,
@@ -241,14 +247,14 @@ static int cmd_for_each_reflog_ent(struct ref_store *refs, const char **argv)
 {
 	const char *refname = notnull(*argv++, "refname");
 
-	return refs_for_each_reflog_ent(refs, refname, each_reflog, refs);
+	return refs_for_each_reflog_ent(refs, refname, each_reflog_ent, refs);
 }
 
 static int cmd_for_each_reflog_ent_reverse(struct ref_store *refs, const char **argv)
 {
 	const char *refname = notnull(*argv++, "refname");
 
-	return refs_for_each_reflog_ent_reverse(refs, refname, each_reflog, refs);
+	return refs_for_each_reflog_ent_reverse(refs, refname, each_reflog_ent, refs);
 }
 
 static int cmd_reflog_exists(struct ref_store *refs, const char **argv)
diff --git a/t/t0600-reffiles-backend.sh b/t/t0600-reffiles-backend.sh
index 4f860285cc..56a3196b83 100755
--- a/t/t0600-reffiles-backend.sh
+++ b/t/t0600-reffiles-backend.sh
@@ -287,23 +287,23 @@ test_expect_success 'for_each_reflog()' '
 	mkdir -p     .git/worktrees/wt/logs/refs/bisect &&
 	echo $ZERO_OID > .git/worktrees/wt/logs/refs/bisect/wt-random &&
 
-	$RWT for-each-reflog | cut -d" " -f 2- >actual &&
+	$RWT for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	PSEUDO-WT 0x0
-	refs/bisect/wt-random 0x0
-	refs/heads/main 0x0
-	refs/heads/wt-main 0x0
+	HEAD
+	PSEUDO-WT
+	refs/bisect/wt-random
+	refs/heads/main
+	refs/heads/wt-main
 	EOF
 	test_cmp expected actual &&
 
-	$RMAIN for-each-reflog | cut -d" " -f 2- >actual &&
+	$RMAIN for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	PSEUDO-MAIN 0x0
-	refs/bisect/random 0x0
-	refs/heads/main 0x0
-	refs/heads/wt-main 0x0
+	HEAD
+	PSEUDO-MAIN
+	refs/bisect/random
+	refs/heads/main
+	refs/heads/wt-main
 	EOF
 	test_cmp expected actual
 '
diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index cfb583f544..3eee758bce 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -74,11 +74,11 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	refs/heads/main 0x0
-	refs/heads/new-main 0x0
+	HEAD
+	refs/heads/main
+	refs/heads/new-main
 	EOF
 	test_cmp expected actual
 '
diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
index 40332e23cc..c01f0f14a1 100755
--- a/t/t1406-submodule-ref-store.sh
+++ b/t/t1406-submodule-ref-store.sh
@@ -63,11 +63,11 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	refs/heads/main 0x0
-	refs/heads/new-main 0x0
+	HEAD
+	refs/heads/main
+	refs/heads/new-main
 	EOF
 	test_cmp expected actual
 '
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 6/7] refs: stop resolving ref corresponding to reflogs
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2024-02-20  9:06   ` [PATCH v2 5/7] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
@ 2024-02-20  9:06   ` Patrick Steinhardt
  2024-02-20  9:06   ` [PATCH v2 7/7] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
  2024-02-20 17:22   ` [PATCH v2 0/7] reflog: " Junio C Hamano
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  9:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3407 bytes --]

The reflog iterator tries to resolve the corresponding ref for every
reflog that it is about to yield. Historically, this was done due to
multiple reasons:

  - It ensures that the refname is safe because we end up calling
    `check_refname_format()`. Also, non-conformant refnames are skipped
    altogether.

  - The iterator used to yield the resolved object ID as well as its
    flags to the callback. This info was never used though, and the
    corresponding parameters were dropped in the preceding commit.

  - When a ref is corrupt then the reflog is not emitted at all.

We're about to introduce a new `git reflog list` subcommand that will
print all reflogs that the refdb knows about. Skipping over reflogs
whose refs are corrupted would be quite counterproductive in this case
as the user would have no way to learn about reflogs which may still
exist in their repository to help and rescue such a corrupted ref. Thus,
the only remaining reason for why we'd want to resolve the ref is to
verify its refname.

Refactor the code to call `check_refname_format()` directly instead of
trying to resolve the ref. This is significantly more efficient given
that we don't have to hit the object database anymore to list reflogs.
And second, it ensures that we end up showing reflogs of broken refs,
which will help to make the reflog more useful.

Note that this really only impacts the case where the corresponding ref
is corrupt. Reflogs for nonexistent refs would have been returned to the
caller beforehand already as we did not pass `RESOLVE_REF_READING` to
the function, and thus `refs_resolve_ref_unsafe()` would have returned
successfully in that case.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c    | 12 ++----------
 refs/reftable-backend.c |  6 ++----
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 48cc60d71b..4726b04baa 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2129,17 +2129,9 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
 		if (!S_ISREG(diter->st.st_mode))
 			continue;
-		if (diter->basename[0] == '.')
+		if (check_refname_format(diter->basename,
+					 REFNAME_ALLOW_ONELEVEL))
 			continue;
-		if (ends_with(diter->basename, ".lock"))
-			continue;
-
-		if (!refs_resolve_ref_unsafe(iter->ref_store,
-					     diter->relative_path, 0,
-					     NULL, NULL)) {
-			error("bad ref for %s", diter->path.buf);
-			continue;
-		}
 
 		iter->base.refname = diter->relative_path;
 		return ITER_OK;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 5247e09d58..f3200a1886 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1659,11 +1659,9 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
 			continue;
 
-		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
-					     0, NULL, NULL)) {
-			error(_("bad ref for %s"), iter->log.refname);
+		if (check_refname_format(iter->log.refname,
+					 REFNAME_ALLOW_ONELEVEL))
 			continue;
-		}
 
 		free(iter->last_name);
 		iter->last_name = xstrdup(iter->log.refname);
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 7/7] builtin/reflog: introduce subcommand to list reflogs
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2024-02-20  9:06   ` [PATCH v2 6/7] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
@ 2024-02-20  9:06   ` Patrick Steinhardt
  2024-04-24  7:30     ` Teng Long
  2024-02-20 17:22   ` [PATCH v2 0/7] reflog: " Junio C Hamano
  7 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-20  9:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 6086 bytes --]

While the git-reflog(1) command has subcommands to show reflog entries
or check for reflog existence, it does not have any subcommands that
would allow the user to enumerate all existing reflogs. This makes it
quite hard to discover which reflogs a repository has. While this can
be worked around with the "files" backend by enumerating files in the
".git/logs" directory, users of the "reftable" backend don't enjoy such
a luxury.

Introduce a new subcommand `git reflog list` that lists all reflogs the
repository knows of to fill this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git-reflog.txt |  3 ++
 builtin/reflog.c             | 34 ++++++++++++++++++
 t/t1410-reflog.sh            | 69 ++++++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)

diff --git a/Documentation/git-reflog.txt b/Documentation/git-reflog.txt
index ec64cbff4c..a929c52982 100644
--- a/Documentation/git-reflog.txt
+++ b/Documentation/git-reflog.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git reflog' [show] [<log-options>] [<ref>]
+'git reflog list'
 'git reflog expire' [--expire=<time>] [--expire-unreachable=<time>]
 	[--rewrite] [--updateref] [--stale-fix]
 	[--dry-run | -n] [--verbose] [--all [--single-worktree] | <refs>...]
@@ -39,6 +40,8 @@ actions, and in addition the `HEAD` reflog records branch switching.
 `git reflog show` is an alias for `git log -g --abbrev-commit
 --pretty=oneline`; see linkgit:git-log[1] for more information.
 
+The "list" subcommand lists all refs which have a corresponding reflog.
+
 The "expire" subcommand prunes older reflog entries. Entries older
 than `expire` time, or entries older than `expire-unreachable` time
 and not reachable from the current tip, are removed from the reflog.
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 3a0c4d4322..63cd4d8b29 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -7,11 +7,15 @@
 #include "wildmatch.h"
 #include "worktree.h"
 #include "reflog.h"
+#include "refs.h"
 #include "parse-options.h"
 
 #define BUILTIN_REFLOG_SHOW_USAGE \
 	N_("git reflog [show] [<log-options>] [<ref>]")
 
+#define BUILTIN_REFLOG_LIST_USAGE \
+	N_("git reflog list")
+
 #define BUILTIN_REFLOG_EXPIRE_USAGE \
 	N_("git reflog expire [--expire=<time>] [--expire-unreachable=<time>]\n" \
 	   "                  [--rewrite] [--updateref] [--stale-fix]\n" \
@@ -29,6 +33,11 @@ static const char *const reflog_show_usage[] = {
 	NULL,
 };
 
+static const char *const reflog_list_usage[] = {
+	BUILTIN_REFLOG_LIST_USAGE,
+	NULL,
+};
+
 static const char *const reflog_expire_usage[] = {
 	BUILTIN_REFLOG_EXPIRE_USAGE,
 	NULL
@@ -46,6 +55,7 @@ static const char *const reflog_exists_usage[] = {
 
 static const char *const reflog_usage[] = {
 	BUILTIN_REFLOG_SHOW_USAGE,
+	BUILTIN_REFLOG_LIST_USAGE,
 	BUILTIN_REFLOG_EXPIRE_USAGE,
 	BUILTIN_REFLOG_DELETE_USAGE,
 	BUILTIN_REFLOG_EXISTS_USAGE,
@@ -238,6 +248,29 @@ static int cmd_reflog_show(int argc, const char **argv, const char *prefix)
 	return cmd_log_reflog(argc, argv, prefix);
 }
 
+static int show_reflog(const char *refname, void *cb_data UNUSED)
+{
+	printf("%s\n", refname);
+	return 0;
+}
+
+static int cmd_reflog_list(int argc, const char **argv, const char *prefix)
+{
+	struct option options[] = {
+		OPT_END()
+	};
+	struct ref_store *ref_store;
+
+	argc = parse_options(argc, argv, prefix, options, reflog_list_usage, 0);
+	if (argc)
+		return error(_("%s does not accept arguments: '%s'"),
+			     "list", argv[0]);
+
+	ref_store = get_main_ref_store(the_repository);
+
+	return refs_for_each_reflog(ref_store, show_reflog, NULL);
+}
+
 static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
 {
 	struct cmd_reflog_expire_cb cmd = { 0 };
@@ -417,6 +450,7 @@ int cmd_reflog(int argc, const char **argv, const char *prefix)
 	parse_opt_subcommand_fn *fn = NULL;
 	struct option options[] = {
 		OPT_SUBCOMMAND("show", &fn, cmd_reflog_show),
+		OPT_SUBCOMMAND("list", &fn, cmd_reflog_list),
 		OPT_SUBCOMMAND("expire", &fn, cmd_reflog_expire),
 		OPT_SUBCOMMAND("delete", &fn, cmd_reflog_delete),
 		OPT_SUBCOMMAND("exists", &fn, cmd_reflog_exists),
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index d2f5f42e67..6d8d5a253d 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -436,4 +436,73 @@ test_expect_success 'empty reflog' '
 	test_must_be_empty err
 '
 
+test_expect_success 'list reflogs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git reflog list >actual &&
+		test_must_be_empty actual &&
+
+		test_commit A &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual &&
+
+		git branch b &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/b
+		refs/heads/main
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog list returns error with additional args' '
+	cat >expect <<-EOF &&
+	error: list does not accept arguments: ${SQ}bogus${SQ}
+	EOF
+	test_must_fail git reflog list bogus 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'reflog for symref with unborn target can be listed' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		git symbolic-ref HEAD refs/heads/unborn &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog with invalid object ID can be listed' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		test-tool ref-store main update-ref msg refs/heads/missing \
+			$(test_oid deadbeef) "$ZERO_OID" REF_SKIP_OID_VERIFICATION &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		refs/heads/missing
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/7] reflog: introduce subcommand to list reflogs
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2024-02-20  9:06   ` [PATCH v2 7/7] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
@ 2024-02-20 17:22   ` Junio C Hamano
  2024-02-21 11:48     ` Patrick Steinhardt
  7 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2024-02-20 17:22 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

>       struct dir_iterator_level {
>       	DIR *dir;
>     + 
>     ++	/*
>     ++	 * The directory entries of the current level. This list will only be
>     ++	 * populated when the iterator is ordered. In that case, `dir` will be
>     ++	 * set to `NULL`.
>     ++	 */
>      +	struct string_list entries;
>      +	size_t entries_idx;

Reads well.  Nice.

>     ++static int next_directory_entry(DIR *dir, const char *path,
>     ++				struct dirent **out)
>     ++{
>     ++	struct dirent *de;
>     ++
>     ++repeat:
>     ++	errno = 0;
>     ++	de = readdir(dir);
>     ++	if (!de) {
>     ++		if (errno) {
>     ++			warning_errno("error reading directory '%s'",
>     ++				      path);
>     ++			return -1;
>     ++		}
>     ++
>     ++		return 1;
>     ++	}
>     ++
>     ++	if (is_dot_or_dotdot(de->d_name))
>     ++		goto repeat;
>     ++
>     ++	*out = de;
>     ++	return 0;
>     ++}

Very nice to encapsulate the common readdir() loop into this helper.

> 3:  e4e4fac05c ! 3:  32b24a3d4b refs/files: sort reflogs returned by the reflog iterator
>     @@ refs/files-backend.c: static struct ref_iterator *reflog_iterator_begin(struct r
>       	iter->dir_iterator = diter;
>       	iter->ref_store = ref_store;
>       	strbuf_release(&sb);
>     +@@ refs/files-backend.c: static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
>     + 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
>     + 	} else {
>     + 		return merge_ref_iterator_begin(
>     +-			0, reflog_iterator_begin(ref_store, refs->base.gitdir),
>     ++			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
>     + 			reflog_iterator_begin(ref_store, refs->gitcommondir),
>     + 			reflog_iterator_select, refs);
>     + 	}

This hunk is new.  Is there a downside to force merged iterators to
always be sorted?  The ones that are combined are all sorted so it
is natural to force sorting like this code does?  It might deserve
explaining, and would certainly help future readers who runs "blame"
on this code to figure out what made us think always sorting is a
good direction forward.

> -:  ---------- > 4:  4254f23fd4 refs: always treat iterators as ordered

This one is new, and deserves a separate review.

> 4:  be512ef268 ! 5:  240334df6c refs: drop unused params from the reflog iterator callback
>     @@ refs/reftable-backend.c: static int reftable_reflog_iterator_advance(struct ref_
>       	}
>      @@ refs/reftable-backend.c: static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
>       	iter = xcalloc(1, sizeof(*iter));
>     - 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
>     + 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable);
>       	iter->refs = refs;
>      -	iter->base.oid = &iter->oid;
>       
> 5:  a7459b9483 = 6:  7928661318 refs: stop resolving ref corresponding to reflogs
> 6:  cddb2de939 = 7:  d7b9cff4c3 builtin/reflog: introduce subcommand to list reflogs

Looking good from a cursory read.

Thanks for a quick reroll.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/7] reflog: introduce subcommand to list reflogs
  2024-02-20 17:22   ` [PATCH v2 0/7] reflog: " Junio C Hamano
@ 2024-02-21 11:48     ` Patrick Steinhardt
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 11:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]

On Tue, Feb 20, 2024 at 09:22:36AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
[snip]
> > 3:  e4e4fac05c ! 3:  32b24a3d4b refs/files: sort reflogs returned by the reflog iterator
> >     @@ refs/files-backend.c: static struct ref_iterator *reflog_iterator_begin(struct r
> >       	iter->dir_iterator = diter;
> >       	iter->ref_store = ref_store;
> >       	strbuf_release(&sb);
> >     +@@ refs/files-backend.c: static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
> >     + 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
> >     + 	} else {
> >     + 		return merge_ref_iterator_begin(
> >     +-			0, reflog_iterator_begin(ref_store, refs->base.gitdir),
> >     ++			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
> >     + 			reflog_iterator_begin(ref_store, refs->gitcommondir),
> >     + 			reflog_iterator_select, refs);
> >     + 	}
> 
> This hunk is new.  Is there a downside to force merged iterators to
> always be sorted?  The ones that are combined are all sorted so it
> is natural to force sorting like this code does?  It might deserve
> explaining, and would certainly help future readers who runs "blame"
> on this code to figure out what made us think always sorting is a
> good direction forward.

Not really -- it merely gets passed down to the base ref iterator to
indicate that the entries are returned in lexicographic order. But I've
been jumping the gun here: the `reflog_iterator_select()` function does
not ensure lexicographic ordering between the two merged iterators right
now. I was assuming so because I implemented it in the reftable backend
like that. Should've double checked.

It's an easy fix though, which I'll add as another patch on top. Thanks
for making me think twice.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 0/8] reflog: introduce subcommand to list reflogs
  2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
                   ` (6 preceding siblings ...)
  2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
@ 2024-02-21 12:37 ` Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 1/8] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
                     ` (7 more replies)
  7 siblings, 8 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 7246 bytes --]

Hi,

this is the second version of my patch series that introduces a new `git
reflog list` subcommand to list available reflogs in a repository.

There is only a single change compared to v2. Junio made me double check
my assumption that the select function for reflogs yielded by the merged
iterator in the "files" backend would return things lexicographically.
Turns out it didn't -- instead, it returns all worktree refs first, then
all common refs.

This isn't only an issue because I prematurely marked the merged iter as
sorted. It's also an issue because the user-visible ordering would be
quite weird when executing `git reflog list` in a worktree.

I've thus added another commit on top that extracts the preexisting
logic to merge worktree and common refs by name from the "reftable"
backend and reuses it for the "files" reflogs.

Patrick

Patrick Steinhardt (8):
  dir-iterator: pass name to `prepare_next_entry_data()` directly
  dir-iterator: support iteration in sorted order
  refs/files: sort reflogs returned by the reflog iterator
  refs/files: sort merged worktree and common reflogs
  refs: always treat iterators as ordered
  refs: drop unused params from the reflog iterator callback
  refs: stop resolving ref corresponding to reflogs
  builtin/reflog: introduce subcommand to list reflogs

 Documentation/git-reflog.txt   |   3 +
 builtin/fsck.c                 |   4 +-
 builtin/reflog.c               |  37 ++++++++++-
 dir-iterator.c                 | 105 +++++++++++++++++++++++++++-----
 dir-iterator.h                 |   3 +
 refs.c                         |  27 ++++++---
 refs.h                         |  11 +++-
 refs/debug.c                   |   3 +-
 refs/files-backend.c           |  55 +++--------------
 refs/iterator.c                |  69 +++++++++++++++------
 refs/packed-backend.c          |   2 +-
 refs/ref-cache.c               |   2 +-
 refs/refs-internal.h           |  27 ++++-----
 refs/reftable-backend.c        |  67 +++-----------------
 revision.c                     |   4 +-
 t/helper/test-ref-store.c      |  18 ++++--
 t/t0600-reffiles-backend.sh    |  24 ++++----
 t/t1405-main-ref-store.sh      |   8 +--
 t/t1406-submodule-ref-store.sh |   8 +--
 t/t1410-reflog.sh              | 108 +++++++++++++++++++++++++++++++++
 20 files changed, 380 insertions(+), 205 deletions(-)

Range-diff against v2:
1:  12de25dfe2 = 1:  d474f9cf77 dir-iterator: pass name to `prepare_next_entry_data()` directly
2:  788afce189 = 2:  89cf960d47 dir-iterator: support iteration in sorted order
3:  32b24a3d4b ! 3:  8ad63eb3f6 refs/files: sort reflogs returned by the reflog iterator
    @@ refs/files-backend.c: static struct ref_iterator *reflog_iterator_begin(struct r
      	iter->dir_iterator = diter;
      	iter->ref_store = ref_store;
      	strbuf_release(&sb);
    -@@ refs/files-backend.c: static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
    - 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
    - 	} else {
    - 		return merge_ref_iterator_begin(
    --			0, reflog_iterator_begin(ref_store, refs->base.gitdir),
    -+			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
    - 			reflog_iterator_begin(ref_store, refs->gitcommondir),
    - 			reflog_iterator_select, refs);
    - 	}
     
      ## t/t0600-reffiles-backend.sh ##
     @@ t/t0600-reffiles-backend.sh: test_expect_success 'for_each_reflog()' '
-:  ---------- > 4:  0b52f6c4af refs/files: sort merged worktree and common reflogs
4:  4254f23fd4 ! 5:  d44564c8b3 refs: always treat iterators as ordered
    @@ refs/files-backend.c: static struct ref_iterator *files_reflog_iterator_begin(st
     -			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
     +			reflog_iterator_begin(ref_store, refs->base.gitdir),
      			reflog_iterator_begin(ref_store, refs->gitcommondir),
    - 			reflog_iterator_select, refs);
    + 			ref_iterator_select, refs);
      	}
     
      ## refs/iterator.c ##
    @@ refs/refs-internal.h: enum do_for_each_ref_flags {
      	const char *refname;
      	const struct object_id *oid;
      	unsigned int flags;
    -@@ refs/refs-internal.h: typedef enum iterator_selection ref_iterator_select_fn(
    +@@ refs/refs-internal.h: enum iterator_selection ref_iterator_select(struct ref_iterator *iter_worktree,
       * Iterate over the entries from iter0 and iter1, with the values
       * interleaved as directed by the select function. The iterator takes
       * ownership of iter0 and iter1 and frees them when the iteration is
    @@ refs/reftable-backend.c: static struct ref_iterator *reftable_be_iterator_begin(
      	worktree_iter = ref_iterator_for_stack(refs, refs->worktree_stack, prefix, flags);
     -	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
     +	return merge_ref_iterator_begin(&worktree_iter->base, &main_iter->base,
    - 					iterator_select, NULL);
    + 					ref_iterator_select, NULL);
      }
      
     @@ refs/reftable-backend.c: static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
    @@ refs/reftable-backend.c: static struct ref_iterator *reftable_be_reflog_iterator
      
     -	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
     +	return merge_ref_iterator_begin(&worktree_iter->base, &main_iter->base,
    - 					iterator_select, NULL);
    + 					ref_iterator_select, NULL);
      }
      
5:  240334df6c = 6:  c06fef8a64 refs: drop unused params from the reflog iterator callback
6:  7928661318 = 7:  fc96d5bbab refs: stop resolving ref corresponding to reflogs
7:  d7b9cff4c3 ! 8:  f3f50f3742 builtin/reflog: introduce subcommand to list reflogs
    @@ t/t1410-reflog.sh: test_expect_success 'empty reflog' '
     +	)
     +'
     +
    ++test_expect_success 'list reflogs with worktree' '
    ++	test_when_finished "rm -rf repo" &&
    ++	git init repo &&
    ++	(
    ++		cd repo &&
    ++
    ++		test_commit A &&
    ++		git worktree add wt &&
    ++		git -c core.logAllRefUpdates=always \
    ++			update-ref refs/worktree/main HEAD &&
    ++		git -c core.logAllRefUpdates=always \
    ++			update-ref refs/worktree/per-worktree HEAD &&
    ++		git -c core.logAllRefUpdates=always -C wt \
    ++			update-ref refs/worktree/per-worktree HEAD &&
    ++		git -c core.logAllRefUpdates=always -C wt \
    ++			update-ref refs/worktree/worktree HEAD &&
    ++
    ++		cat >expect <<-EOF &&
    ++		HEAD
    ++		refs/heads/main
    ++		refs/heads/wt
    ++		refs/worktree/main
    ++		refs/worktree/per-worktree
    ++		EOF
    ++		git reflog list >actual &&
    ++		test_cmp expect actual &&
    ++
    ++		cat >expect <<-EOF &&
    ++		HEAD
    ++		refs/heads/main
    ++		refs/heads/wt
    ++		refs/worktree/per-worktree
    ++		refs/worktree/worktree
    ++		EOF
    ++		git -C wt reflog list >actual &&
    ++		test_cmp expect actual
    ++	)
    ++'
    ++
     +test_expect_success 'reflog list returns error with additional args' '
     +	cat >expect <<-EOF &&
     +	error: list does not accept arguments: ${SQ}bogus${SQ}
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 1/8] dir-iterator: pass name to `prepare_next_entry_data()` directly
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
@ 2024-02-21 12:37   ` Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 2/8] dir-iterator: support iteration in sorted order Patrick Steinhardt
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]

When adding the next directory entry for `struct dir_iterator` we pass
the complete `struct dirent *` to `prepare_next_entry_data()` even
though we only need the entry's name.

Refactor the code to pass in the name, only. This prepares for a
subsequent commit where we introduce the ability to iterate through
dir entries in an ordered manner.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 dir-iterator.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 278b04243a..f58a97e089 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -94,15 +94,15 @@ static int pop_level(struct dir_iterator_int *iter)
 
 /*
  * Populate iter->base with the necessary information on the next iteration
- * entry, represented by the given dirent de. Return 0 on success and -1
+ * entry, represented by the given name. Return 0 on success and -1
  * otherwise, setting errno accordingly.
  */
 static int prepare_next_entry_data(struct dir_iterator_int *iter,
-				   struct dirent *de)
+				   const char *name)
 {
 	int err, saved_errno;
 
-	strbuf_addstr(&iter->base.path, de->d_name);
+	strbuf_addstr(&iter->base.path, name);
 	/*
 	 * We have to reset these because the path strbuf might have
 	 * been realloc()ed at the previous strbuf_addstr().
@@ -159,7 +159,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
-		if (prepare_next_entry_data(iter, de)) {
+		if (prepare_next_entry_data(iter, de->d_name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
 				goto error_out;
 			continue;
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 2/8] dir-iterator: support iteration in sorted order
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 1/8] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
@ 2024-02-21 12:37   ` Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 3/8] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 6195 bytes --]

The `struct dir_iterator` is a helper that allows us to iterate through
directory entries. This iterator returns entries in the exact same order
as readdir(3P) does -- or in other words, it guarantees no specific
order at all.

This is about to become problematic as we are introducing a new reflog
subcommand to list reflogs. As the "files" backend uses the directory
iterator to enumerate reflogs, returning reflog names and exposing them
to the user would inherit the indeterministic ordering. Naturally, it
would make for a terrible user interface to show a list with no
discernible order.

While this could be handled at a higher level by the new subcommand
itself by collecting and ordering the reflogs, this would be inefficient
because we would first have to collect all reflogs before we can sort
them, which would introduce additional latency when there are many
reflogs.

Instead, introduce a new option into the directory iterator that asks
for its entries to be yielded in lexicographical order. If set, the
iterator will read all directory entries greedily and sort them before
we start to iterate over them.

While this will of course also incur overhead as we cannot yield the
directory entries immediately, it should at least be more efficient than
having to sort the complete list of reflogs as we only need to sort one
directory at a time.

This functionality will be used in a follow-up commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 dir-iterator.c | 99 +++++++++++++++++++++++++++++++++++++++++++-------
 dir-iterator.h |  3 ++
 2 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f58a97e089..de619846f2 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -2,10 +2,19 @@
 #include "dir.h"
 #include "iterator.h"
 #include "dir-iterator.h"
+#include "string-list.h"
 
 struct dir_iterator_level {
 	DIR *dir;
 
+	/*
+	 * The directory entries of the current level. This list will only be
+	 * populated when the iterator is ordered. In that case, `dir` will be
+	 * set to `NULL`.
+	 */
+	struct string_list entries;
+	size_t entries_idx;
+
 	/*
 	 * The length of the directory part of path at this level
 	 * (including a trailing '/'):
@@ -43,6 +52,31 @@ struct dir_iterator_int {
 	unsigned int flags;
 };
 
+static int next_directory_entry(DIR *dir, const char *path,
+				struct dirent **out)
+{
+	struct dirent *de;
+
+repeat:
+	errno = 0;
+	de = readdir(dir);
+	if (!de) {
+		if (errno) {
+			warning_errno("error reading directory '%s'",
+				      path);
+			return -1;
+		}
+
+		return 1;
+	}
+
+	if (is_dot_or_dotdot(de->d_name))
+		goto repeat;
+
+	*out = de;
+	return 0;
+}
+
 /*
  * Push a level in the iter stack and initialize it with information from
  * the directory pointed by iter->base->path. It is assumed that this
@@ -72,6 +106,35 @@ static int push_level(struct dir_iterator_int *iter)
 		return -1;
 	}
 
+	string_list_init_dup(&level->entries);
+	level->entries_idx = 0;
+
+	/*
+	 * When the iterator is sorted we read and sort all directory entries
+	 * directly.
+	 */
+	if (iter->flags & DIR_ITERATOR_SORTED) {
+		struct dirent *de;
+
+		while (1) {
+			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
+			if (ret < 0) {
+				if (errno != ENOENT &&
+				    iter->flags & DIR_ITERATOR_PEDANTIC)
+					return -1;
+				continue;
+			} else if (ret > 0) {
+				break;
+			}
+
+			string_list_append(&level->entries, de->d_name);
+		}
+		string_list_sort(&level->entries);
+
+		closedir(level->dir);
+		level->dir = NULL;
+	}
+
 	return 0;
 }
 
@@ -88,6 +151,7 @@ static int pop_level(struct dir_iterator_int *iter)
 		warning_errno("error closing directory '%s'",
 			      iter->base.path.buf);
 	level->dir = NULL;
+	string_list_clear(&level->entries, 0);
 
 	return --iter->levels_nr;
 }
@@ -139,27 +203,34 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		struct dirent *de;
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
+		const char *name;
 
 		strbuf_setlen(&iter->base.path, level->prefix_len);
-		errno = 0;
-		de = readdir(level->dir);
 
-		if (!de) {
-			if (errno) {
-				warning_errno("error reading directory '%s'",
-					      iter->base.path.buf);
+		if (level->dir) {
+			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
+			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
 					goto error_out;
-			} else if (pop_level(iter) == 0) {
-				return dir_iterator_abort(dir_iterator);
+				continue;
+			} else if (ret > 0) {
+				if (pop_level(iter) == 0)
+					return dir_iterator_abort(dir_iterator);
+				continue;
 			}
-			continue;
-		}
 
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
+			name = de->d_name;
+		} else {
+			if (level->entries_idx >= level->entries.nr) {
+				if (pop_level(iter) == 0)
+					return dir_iterator_abort(dir_iterator);
+				continue;
+			}
 
-		if (prepare_next_entry_data(iter, de->d_name)) {
+			name = level->entries.items[level->entries_idx++].string;
+		}
+
+		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
 				goto error_out;
 			continue;
@@ -188,6 +259,8 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 			warning_errno("error closing directory '%s'",
 				      iter->base.path.buf);
 		}
+
+		string_list_clear(&level->entries, 0);
 	}
 
 	free(iter->levels);
diff --git a/dir-iterator.h b/dir-iterator.h
index 479e1ec784..6d438809b6 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -54,8 +54,11 @@
  *   and ITER_ERROR is returned immediately. In both cases, a meaningful
  *   warning is emitted. Note: ENOENT errors are always ignored so that
  *   the API users may remove files during iteration.
+ *
+ * - DIR_ITERATOR_SORTED: sort directory entries alphabetically.
  */
 #define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_SORTED   (1 << 1)
 
 struct dir_iterator {
 	/* The current path: */
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 3/8] refs/files: sort reflogs returned by the reflog iterator
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 1/8] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 2/8] dir-iterator: support iteration in sorted order Patrick Steinhardt
@ 2024-02-21 12:37   ` Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 4/8] refs/files: sort merged worktree and common reflogs Patrick Steinhardt
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3433 bytes --]

We use a directory iterator to return reflogs via the reflog iterator.
This iterator returns entries in the same order as readdir(3P) would and
will thus yield reflogs with no discernible order.

Set the new `DIR_ITERATOR_SORTED` flag that was introduced in the
preceding commit so that the order is deterministic. While the effect of
this can only been observed in a test tool, a subsequent commit will
start to expose this functionality to users via a new `git reflog list`
subcommand.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c           | 4 ++--
 t/t0600-reffiles-backend.sh    | 4 ++--
 t/t1405-main-ref-store.sh      | 2 +-
 t/t1406-submodule-ref-store.sh | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 75dcc21ecb..2ffc63185f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2193,7 +2193,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	strbuf_addf(&sb, "%s/logs", gitdir);
 
-	diter = dir_iterator_begin(sb.buf, 0);
+	diter = dir_iterator_begin(sb.buf, DIR_ITERATOR_SORTED);
 	if (!diter) {
 		strbuf_release(&sb);
 		return empty_ref_iterator_begin();
@@ -2202,7 +2202,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 1);
 	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
diff --git a/t/t0600-reffiles-backend.sh b/t/t0600-reffiles-backend.sh
index e6a5f1868f..4f860285cc 100755
--- a/t/t0600-reffiles-backend.sh
+++ b/t/t0600-reffiles-backend.sh
@@ -287,7 +287,7 @@ test_expect_success 'for_each_reflog()' '
 	mkdir -p     .git/worktrees/wt/logs/refs/bisect &&
 	echo $ZERO_OID > .git/worktrees/wt/logs/refs/bisect/wt-random &&
 
-	$RWT for-each-reflog | cut -d" " -f 2- | sort >actual &&
+	$RWT for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	PSEUDO-WT 0x0
@@ -297,7 +297,7 @@ test_expect_success 'for_each_reflog()' '
 	EOF
 	test_cmp expected actual &&
 
-	$RMAIN for-each-reflog | cut -d" " -f 2- | sort >actual &&
+	$RMAIN for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	PSEUDO-MAIN 0x0
diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index 976bd71efb..cfb583f544 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -74,7 +74,7 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | sort -k2 | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	refs/heads/main 0x0
diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
index e6a7f7334b..40332e23cc 100755
--- a/t/t1406-submodule-ref-store.sh
+++ b/t/t1406-submodule-ref-store.sh
@@ -63,7 +63,7 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | sort | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
 	cat >expected <<-\EOF &&
 	HEAD 0x1
 	refs/heads/main 0x0
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 4/8] refs/files: sort merged worktree and common reflogs
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-02-21 12:37   ` [PATCH v3 3/8] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
@ 2024-02-21 12:37   ` Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 5/8] refs: always treat iterators as ordered Patrick Steinhardt
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 8134 bytes --]

When iterating through reflogs in a worktree we create a merged iterator
that merges reflogs from both refdbs. The resulting refs are ordered so
that instead we first return all worktree reflogs before we return all
common refs.

This is the only remaining case where a ref iterator returns entries in
a non-lexicographic order. The result would look something like the
following (listed with a command we introduce in a subsequent commit):

```
$ git reflog list
HEAD
refs/worktree/per-worktree
refs/heads/main
refs/heads/wt
```

So we first print the per-worktree reflogs in lexicographic order, then
the common reflogs in lexicographic order. This is confusing and not
consistent with how we print per-worktree refs, which are exclusively
sorted lexicographically.

Sort reflogs lexicographically in the same way as we sort normal refs.
As this is already implemented properly by the "reftable" backend via a
separate selection function, we simply pull out that logic and reuse it
for the "files" backend. As logs are properly sorted now, mark the
merged reflog iterator as sorted.

Tests will be added in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c    | 30 ++------------------------
 refs/iterator.c         | 43 +++++++++++++++++++++++++++++++++++++
 refs/refs-internal.h    |  9 ++++++++
 refs/reftable-backend.c | 47 ++---------------------------------------
 4 files changed, 56 insertions(+), 73 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2ffc63185f..551cafdf76 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2210,32 +2210,6 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 	return ref_iterator;
 }
 
-static enum iterator_selection reflog_iterator_select(
-	struct ref_iterator *iter_worktree,
-	struct ref_iterator *iter_common,
-	void *cb_data UNUSED)
-{
-	if (iter_worktree) {
-		/*
-		 * We're a bit loose here. We probably should ignore
-		 * common refs if they are accidentally added as
-		 * per-worktree refs.
-		 */
-		return ITER_SELECT_0;
-	} else if (iter_common) {
-		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
-				       NULL) == REF_WORKTREE_SHARED)
-			return ITER_SELECT_1;
-
-		/*
-		 * The main ref store may contain main worktree's
-		 * per-worktree refs, which should be ignored
-		 */
-		return ITER_SKIP_1;
-	} else
-		return ITER_DONE;
-}
-
 static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_store)
 {
 	struct files_ref_store *refs =
@@ -2246,9 +2220,9 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
 	} else {
 		return merge_ref_iterator_begin(
-			0, reflog_iterator_begin(ref_store, refs->base.gitdir),
+			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
 			reflog_iterator_begin(ref_store, refs->gitcommondir),
-			reflog_iterator_select, refs);
+			ref_iterator_select, refs);
 	}
 }
 
diff --git a/refs/iterator.c b/refs/iterator.c
index 6b680f610e..b7ab5dc92f 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -98,6 +98,49 @@ struct merge_ref_iterator {
 	struct ref_iterator **current;
 };
 
+enum iterator_selection ref_iterator_select(struct ref_iterator *iter_worktree,
+					    struct ref_iterator *iter_common,
+					    void *cb_data UNUSED)
+{
+	if (iter_worktree && !iter_common) {
+		/*
+		 * Return the worktree ref if there are no more common refs.
+		 */
+		return ITER_SELECT_0;
+	} else if (iter_common) {
+		/*
+		 * In case we have pending worktree and common refs we need to
+		 * yield them based on their lexicographical order. Worktree
+		 * refs that have the same name as common refs shadow the
+		 * latter.
+		 */
+		if (iter_worktree) {
+			int cmp = strcmp(iter_worktree->refname,
+					 iter_common->refname);
+			if (cmp < 0)
+				return ITER_SELECT_0;
+			else if (!cmp)
+				return ITER_SELECT_0_SKIP_1;
+		}
+
+		 /*
+		  * We now know that the lexicographically-next ref is a common
+		  * ref. When the common ref is a shared one we return it.
+		  */
+		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
+				       NULL) == REF_WORKTREE_SHARED)
+			return ITER_SELECT_1;
+
+		/*
+		 * Otherwise, if the common ref is a per-worktree ref we skip
+		 * it because it would belong to the main worktree, not ours.
+		 */
+		return ITER_SKIP_1;
+	} else {
+		return ITER_DONE;
+	}
+}
+
 static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 83e0f0bba3..51f612e122 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -386,6 +386,15 @@ typedef enum iterator_selection ref_iterator_select_fn(
 		struct ref_iterator *iter0, struct ref_iterator *iter1,
 		void *cb_data);
 
+/*
+ * An implementation of ref_iterator_select_fn that merges worktree and common
+ * refs. Per-worktree refs from the common iterator are ignored, worktree refs
+ * override common refs. Refs are selected lexicographically.
+ */
+enum iterator_selection ref_iterator_select(struct ref_iterator *iter_worktree,
+					    struct ref_iterator *iter_common,
+					    void *cb_data);
+
 /*
  * Iterate over the entries from iter0 and iter1, with the values
  * interleaved as directed by the select function. The iterator takes
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index a14f2ad7f4..68d32a9101 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -504,49 +504,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	return iter;
 }
 
-static enum iterator_selection iterator_select(struct ref_iterator *iter_worktree,
-					       struct ref_iterator *iter_common,
-					       void *cb_data UNUSED)
-{
-	if (iter_worktree && !iter_common) {
-		/*
-		 * Return the worktree ref if there are no more common refs.
-		 */
-		return ITER_SELECT_0;
-	} else if (iter_common) {
-		/*
-		 * In case we have pending worktree and common refs we need to
-		 * yield them based on their lexicographical order. Worktree
-		 * refs that have the same name as common refs shadow the
-		 * latter.
-		 */
-		if (iter_worktree) {
-			int cmp = strcmp(iter_worktree->refname,
-					 iter_common->refname);
-			if (cmp < 0)
-				return ITER_SELECT_0;
-			else if (!cmp)
-				return ITER_SELECT_0_SKIP_1;
-		}
-
-		 /*
-		  * We now know that the lexicographically-next ref is a common
-		  * ref. When the common ref is a shared one we return it.
-		  */
-		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
-				       NULL) == REF_WORKTREE_SHARED)
-			return ITER_SELECT_1;
-
-		/*
-		 * Otherwise, if the common ref is a per-worktree ref we skip
-		 * it because it would belong to the main worktree, not ours.
-		 */
-		return ITER_SKIP_1;
-	} else {
-		return ITER_DONE;
-	}
-}
-
 static struct ref_iterator *reftable_be_iterator_begin(struct ref_store *ref_store,
 						       const char *prefix,
 						       const char **exclude_patterns,
@@ -576,7 +533,7 @@ static struct ref_iterator *reftable_be_iterator_begin(struct ref_store *ref_sto
 	 */
 	worktree_iter = ref_iterator_for_stack(refs, refs->worktree_stack, prefix, flags);
 	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
-					iterator_select, NULL);
+					ref_iterator_select, NULL);
 }
 
 static int reftable_be_read_raw_ref(struct ref_store *ref_store,
@@ -1759,7 +1716,7 @@ static struct ref_iterator *reftable_be_reflog_iterator_begin(struct ref_store *
 	worktree_iter = reflog_iterator_for_stack(refs, refs->worktree_stack);
 
 	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
-					iterator_select, NULL);
+					ref_iterator_select, NULL);
 }
 
 static int yield_log_record(struct reftable_log_record *log,
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 5/8] refs: always treat iterators as ordered
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2024-02-21 12:37   ` [PATCH v3 4/8] refs/files: sort merged worktree and common reflogs Patrick Steinhardt
@ 2024-02-21 12:37   ` Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 6/8] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 11366 bytes --]

In the preceding commit we have converted the reflog iterator of the
"files" backend to be ordered, which was the only remaining ref iterator
that wasn't ordered. Refactor the ref iterator infrastructure so that we
always assume iterators to be ordered, thus simplifying the code.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c                  |  4 ----
 refs/debug.c            |  3 +--
 refs/files-backend.c    |  7 +++----
 refs/iterator.c         | 26 ++++++++------------------
 refs/packed-backend.c   |  2 +-
 refs/ref-cache.c        |  2 +-
 refs/refs-internal.h    | 18 ++----------------
 refs/reftable-backend.c |  8 ++++----
 8 files changed, 20 insertions(+), 50 deletions(-)

diff --git a/refs.c b/refs.c
index fff343c256..dc25606a82 100644
--- a/refs.c
+++ b/refs.c
@@ -1594,10 +1594,6 @@ struct ref_iterator *refs_ref_iterator_begin(
 	if (trim)
 		iter = prefix_ref_iterator_begin(iter, "", trim);
 
-	/* Sanity check for subclasses: */
-	if (!iter->ordered)
-		BUG("reference iterator is not ordered");
-
 	return iter;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index 634681ca44..c7531b17f0 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -181,7 +181,6 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		trace_printf_key(&trace_refs, "iterator_advance: %s (0)\n",
 			diter->iter->refname);
 
-	diter->base.ordered = diter->iter->ordered;
 	diter->base.refname = diter->iter->refname;
 	diter->base.oid = diter->iter->oid;
 	diter->base.flags = diter->iter->flags;
@@ -222,7 +221,7 @@ debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
 		drefs->refs->be->iterator_begin(drefs->refs, prefix,
 						exclude_patterns, flags);
 	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
-	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
+	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable);
 	diter->iter = res;
 	trace_printf_key(&trace_refs, "ref_iterator_begin: \"%s\" (0x%x)\n",
 			 prefix, flags);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 551cafdf76..05bb0c875c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -879,8 +879,7 @@ static struct ref_iterator *files_ref_iterator_begin(
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable,
-			       overlay_iter->ordered);
+	base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable);
 	iter->iter0 = overlay_iter;
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
@@ -2202,7 +2201,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
 	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
@@ -2220,7 +2219,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
 	} else {
 		return merge_ref_iterator_begin(
-			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
+			reflog_iterator_begin(ref_store, refs->base.gitdir),
 			reflog_iterator_begin(ref_store, refs->gitcommondir),
 			ref_iterator_select, refs);
 	}
diff --git a/refs/iterator.c b/refs/iterator.c
index b7ab5dc92f..9db8b056d5 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -25,11 +25,9 @@ int ref_iterator_abort(struct ref_iterator *ref_iterator)
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
-			    struct ref_iterator_vtable *vtable,
-			    int ordered)
+			    struct ref_iterator_vtable *vtable)
 {
 	iter->vtable = vtable;
-	iter->ordered = !!ordered;
 	iter->refname = NULL;
 	iter->oid = NULL;
 	iter->flags = 0;
@@ -74,7 +72,7 @@ struct ref_iterator *empty_ref_iterator_begin(void)
 	struct empty_ref_iterator *iter = xcalloc(1, sizeof(*iter));
 	struct ref_iterator *ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable);
 	return ref_iterator;
 }
 
@@ -250,7 +248,6 @@ static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
-		int ordered,
 		struct ref_iterator *iter0, struct ref_iterator *iter1,
 		ref_iterator_select_fn *select, void *cb_data)
 {
@@ -265,7 +262,7 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 * references through only if they exist in both iterators.
 	 */
 
-	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable, ordered);
+	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
 	iter->iter0 = iter0;
 	iter->iter1 = iter1;
 	iter->select = select;
@@ -314,12 +311,9 @@ struct ref_iterator *overlay_ref_iterator_begin(
 	} else if (is_empty_ref_iterator(back)) {
 		ref_iterator_abort(back);
 		return front;
-	} else if (!front->ordered || !back->ordered) {
-		BUG("overlay_ref_iterator requires ordered inputs");
 	}
 
-	return merge_ref_iterator_begin(1, front, back,
-					overlay_iterator_select, NULL);
+	return merge_ref_iterator_begin(front, back, overlay_iterator_select, NULL);
 }
 
 struct prefix_ref_iterator {
@@ -358,16 +352,12 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (cmp > 0) {
 			/*
-			 * If the source iterator is ordered, then we
+			 * As the source iterator is ordered, we
 			 * can stop the iteration as soon as we see a
 			 * refname that comes after the prefix:
 			 */
-			if (iter->iter0->ordered) {
-				ok = ref_iterator_abort(iter->iter0);
-				break;
-			} else {
-				continue;
-			}
+			ok = ref_iterator_abort(iter->iter0);
+			break;
 		}
 
 		if (iter->trim) {
@@ -439,7 +429,7 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &prefix_ref_iterator_vtable, iter0->ordered);
+	base_ref_iterator_init(ref_iterator, &prefix_ref_iterator_vtable);
 
 	iter->iter0 = iter0;
 	iter->prefix = xstrdup(prefix);
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a499a91c7e..4e826c05ff 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1111,7 +1111,7 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
 
 	if (exclude_patterns)
 		populate_excluded_jump_list(iter, snapshot, exclude_patterns);
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index a372a00941..9f9797209a 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -486,7 +486,7 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
 	iter->levels_nr = 1;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 51f612e122..a9b6e887f8 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -312,13 +312,6 @@ enum do_for_each_ref_flags {
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
-
-	/*
-	 * Does this `ref_iterator` iterate over references in order
-	 * by refname?
-	 */
-	unsigned int ordered : 1;
-
 	const char *refname;
 	const struct object_id *oid;
 	unsigned int flags;
@@ -399,11 +392,9 @@ enum iterator_selection ref_iterator_select(struct ref_iterator *iter_worktree,
  * Iterate over the entries from iter0 and iter1, with the values
  * interleaved as directed by the select function. The iterator takes
  * ownership of iter0 and iter1 and frees them when the iteration is
- * over. A derived class should set `ordered` to 1 or 0 based on
- * whether it generates its output in order by reference name.
+ * over.
  */
 struct ref_iterator *merge_ref_iterator_begin(
-		int ordered,
 		struct ref_iterator *iter0, struct ref_iterator *iter1,
 		ref_iterator_select_fn *select, void *cb_data);
 
@@ -432,8 +423,6 @@ struct ref_iterator *overlay_ref_iterator_begin(
  * As an convenience to callers, if prefix is the empty string and
  * trim is zero, this function returns iter0 directly, without
  * wrapping it.
- *
- * The resulting ref_iterator is ordered if iter0 is.
  */
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 					       const char *prefix,
@@ -444,14 +433,11 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 /*
  * Base class constructor for ref_iterators. Initialize the
  * ref_iterator part of iter, setting its vtable pointer as specified.
- * `ordered` should be set to 1 if the iterator will iterate over
- * references in order by refname; otherwise it should be set to 0.
  * This is meant to be called only by the initializers of derived
  * classes.
  */
 void base_ref_iterator_init(struct ref_iterator *iter,
-			    struct ref_iterator_vtable *vtable,
-			    int ordered);
+			    struct ref_iterator_vtable *vtable);
 
 /*
  * Base class destructor for ref_iterators. Destroy the ref_iterator
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 68d32a9101..39e9a9d4e2 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -479,7 +479,7 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	int ret;
 
 	iter = xcalloc(1, sizeof(*iter));
-	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable, 1);
+	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable);
 	iter->prefix = prefix;
 	iter->base.oid = &iter->oid;
 	iter->flags = flags;
@@ -532,7 +532,7 @@ static struct ref_iterator *reftable_be_iterator_begin(struct ref_store *ref_sto
 	 * single iterator.
 	 */
 	worktree_iter = ref_iterator_for_stack(refs, refs->worktree_stack, prefix, flags);
-	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+	return merge_ref_iterator_begin(&worktree_iter->base, &main_iter->base,
 					ref_iterator_select, NULL);
 }
 
@@ -1680,7 +1680,7 @@ static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
 	int ret;
 
 	iter = xcalloc(1, sizeof(*iter));
-	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
+	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable);
 	iter->refs = refs;
 	iter->base.oid = &iter->oid;
 
@@ -1715,7 +1715,7 @@ static struct ref_iterator *reftable_be_reflog_iterator_begin(struct ref_store *
 
 	worktree_iter = reflog_iterator_for_stack(refs, refs->worktree_stack);
 
-	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+	return merge_ref_iterator_begin(&worktree_iter->base, &main_iter->base,
 					ref_iterator_select, NULL);
 }
 
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 6/8] refs: drop unused params from the reflog iterator callback
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2024-02-21 12:37   ` [PATCH v3 5/8] refs: always treat iterators as ordered Patrick Steinhardt
@ 2024-02-21 12:37   ` Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 7/8] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 8/8] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 12615 bytes --]

The ref and reflog iterators share much of the same underlying code to
iterate over the corresponding entries. This results in some weird code
because the reflog iterator also exposes an object ID as well as a flag
to the callback function. Neither of these fields do refer to the reflog
though -- they refer to the corresponding ref with the same name. This
is quite misleading. In practice at least the object ID cannot really be
implemented in any other way as a reflog does not have a specific object
ID in the first place. This is further stressed by the fact that none of
the callbacks except for our test helper make use of these fields.

Split up the infrastucture so that ref and reflog iterators use separate
callback signatures. This allows us to drop the nonsensical fields from
the reflog iterator.

Note that internally, the backends still use the same shared infra to
iterate over both types. As the backends should never end up being
called directly anyway, this is not much of a problem and thus kept
as-is for simplicity's sake.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fsck.c                 |  4 +---
 builtin/reflog.c               |  3 +--
 refs.c                         | 23 +++++++++++++++++++----
 refs.h                         | 11 +++++++++--
 refs/files-backend.c           |  8 +-------
 refs/reftable-backend.c        |  8 +-------
 revision.c                     |  4 +---
 t/helper/test-ref-store.c      | 18 ++++++++++++------
 t/t0600-reffiles-backend.sh    | 24 ++++++++++++------------
 t/t1405-main-ref-store.sh      |  8 ++++----
 t/t1406-submodule-ref-store.sh |  8 ++++----
 11 files changed, 65 insertions(+), 54 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index a7cf94f67e..f892487c9b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -509,9 +509,7 @@ static int fsck_handle_reflog_ent(struct object_id *ooid, struct object_id *noid
 	return 0;
 }
 
-static int fsck_handle_reflog(const char *logname,
-			      const struct object_id *oid UNUSED,
-			      int flag UNUSED, void *cb_data)
+static int fsck_handle_reflog(const char *logname, void *cb_data)
 {
 	struct strbuf refname = STRBUF_INIT;
 
diff --git a/builtin/reflog.c b/builtin/reflog.c
index a5a4099f61..3a0c4d4322 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -60,8 +60,7 @@ struct worktree_reflogs {
 	struct string_list reflogs;
 };
 
-static int collect_reflog(const char *ref, const struct object_id *oid UNUSED,
-			  int flags UNUSED, void *cb_data)
+static int collect_reflog(const char *ref, void *cb_data)
 {
 	struct worktree_reflogs *cb = cb_data;
 	struct worktree *worktree = cb->worktree;
diff --git a/refs.c b/refs.c
index dc25606a82..f9261267f0 100644
--- a/refs.c
+++ b/refs.c
@@ -2512,18 +2512,33 @@ int refs_verify_refname_available(struct ref_store *refs,
 	return ret;
 }
 
-int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data)
+struct do_for_each_reflog_help {
+	each_reflog_fn *fn;
+	void *cb_data;
+};
+
+static int do_for_each_reflog_helper(struct repository *r UNUSED,
+				     const char *refname,
+				     const struct object_id *oid UNUSED,
+				     int flags,
+				     void *cb_data)
+{
+	struct do_for_each_reflog_help *hp = cb_data;
+	return hp->fn(refname, hp->cb_data);
+}
+
+int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_data)
 {
 	struct ref_iterator *iter;
-	struct do_for_each_ref_help hp = { fn, cb_data };
+	struct do_for_each_reflog_help hp = { fn, cb_data };
 
 	iter = refs->be->reflog_iterator_begin(refs);
 
 	return do_for_each_repo_ref_iterator(the_repository, iter,
-					     do_for_each_ref_helper, &hp);
+					     do_for_each_reflog_helper, &hp);
 }
 
-int for_each_reflog(each_ref_fn fn, void *cb_data)
+int for_each_reflog(each_reflog_fn fn, void *cb_data)
 {
 	return refs_for_each_reflog(get_main_ref_store(the_repository), fn, cb_data);
 }
diff --git a/refs.h b/refs.h
index 303c5fac4d..895579aeb7 100644
--- a/refs.h
+++ b/refs.h
@@ -534,12 +534,19 @@ int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_dat
 /* youngest entry first */
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
+/*
+ * The signature for the callback function for the {refs_,}for_each_reflog()
+ * functions below. The memory pointed to by the refname argument is only
+ * guaranteed to be valid for the duration of a single callback invocation.
+ */
+typedef int each_reflog_fn(const char *refname, void *cb_data);
+
 /*
  * Calls the specified function for each reflog file until it returns nonzero,
  * and returns the value. Reflog file order is unspecified.
  */
-int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data);
-int for_each_reflog(each_ref_fn fn, void *cb_data);
+int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_data);
+int for_each_reflog(each_reflog_fn fn, void *cb_data);
 
 #define REFNAME_ALLOW_ONELEVEL 1
 #define REFNAME_REFSPEC_PATTERN 2
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 05bb0c875c..c7aff6b331 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2115,10 +2115,8 @@ static int files_for_each_reflog_ent(struct ref_store *ref_store,
 
 struct files_reflog_iterator {
 	struct ref_iterator base;
-
 	struct ref_store *ref_store;
 	struct dir_iterator *dir_iterator;
-	struct object_id oid;
 };
 
 static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
@@ -2129,8 +2127,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
-		int flags;
-
 		if (!S_ISREG(diter->st.st_mode))
 			continue;
 		if (diter->basename[0] == '.')
@@ -2140,14 +2136,12 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (!refs_resolve_ref_unsafe(iter->ref_store,
 					     diter->relative_path, 0,
-					     &iter->oid, &flags)) {
+					     NULL, NULL)) {
 			error("bad ref for %s", diter->path.buf);
 			continue;
 		}
 
 		iter->base.refname = diter->relative_path;
-		iter->base.oid = &iter->oid;
-		iter->base.flags = flags;
 		return ITER_OK;
 	}
 
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 39e9a9d4e2..4998b676c2 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1594,7 +1594,6 @@ struct reftable_reflog_iterator {
 	struct reftable_ref_store *refs;
 	struct reftable_iterator iter;
 	struct reftable_log_record log;
-	struct object_id oid;
 	char *last_name;
 	int err;
 };
@@ -1605,8 +1604,6 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		(struct reftable_reflog_iterator *)ref_iterator;
 
 	while (!iter->err) {
-		int flags;
-
 		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
 		if (iter->err)
 			break;
@@ -1620,7 +1617,7 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 			continue;
 
 		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
-					     0, &iter->oid, &flags)) {
+					     0, NULL, NULL)) {
 			error(_("bad ref for %s"), iter->log.refname);
 			continue;
 		}
@@ -1628,8 +1625,6 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		free(iter->last_name);
 		iter->last_name = xstrdup(iter->log.refname);
 		iter->base.refname = iter->log.refname;
-		iter->base.oid = &iter->oid;
-		iter->base.flags = flags;
 
 		break;
 	}
@@ -1682,7 +1677,6 @@ static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable);
 	iter->refs = refs;
-	iter->base.oid = &iter->oid;
 
 	ret = refs->err;
 	if (ret)
diff --git a/revision.c b/revision.c
index 2424c9bd67..ac45c6d8f2 100644
--- a/revision.c
+++ b/revision.c
@@ -1686,9 +1686,7 @@ static int handle_one_reflog_ent(struct object_id *ooid, struct object_id *noid,
 	return 0;
 }
 
-static int handle_one_reflog(const char *refname_in_wt,
-			     const struct object_id *oid UNUSED,
-			     int flag UNUSED, void *cb_data)
+static int handle_one_reflog(const char *refname_in_wt, void *cb_data)
 {
 	struct all_refs_cb *cb = cb_data;
 	struct strbuf refname = STRBUF_INIT;
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 702ec1f128..7a0f6cac53 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -221,15 +221,21 @@ static int cmd_verify_ref(struct ref_store *refs, const char **argv)
 	return ret;
 }
 
+static int each_reflog(const char *refname, void *cb_data UNUSED)
+{
+	printf("%s\n", refname);
+	return 0;
+}
+
 static int cmd_for_each_reflog(struct ref_store *refs,
 			       const char **argv UNUSED)
 {
-	return refs_for_each_reflog(refs, each_ref, NULL);
+	return refs_for_each_reflog(refs, each_reflog, NULL);
 }
 
-static int each_reflog(struct object_id *old_oid, struct object_id *new_oid,
-		       const char *committer, timestamp_t timestamp,
-		       int tz, const char *msg, void *cb_data UNUSED)
+static int each_reflog_ent(struct object_id *old_oid, struct object_id *new_oid,
+			   const char *committer, timestamp_t timestamp,
+			   int tz, const char *msg, void *cb_data UNUSED)
 {
 	printf("%s %s %s %" PRItime " %+05d%s%s", oid_to_hex(old_oid),
 	       oid_to_hex(new_oid), committer, timestamp, tz,
@@ -241,14 +247,14 @@ static int cmd_for_each_reflog_ent(struct ref_store *refs, const char **argv)
 {
 	const char *refname = notnull(*argv++, "refname");
 
-	return refs_for_each_reflog_ent(refs, refname, each_reflog, refs);
+	return refs_for_each_reflog_ent(refs, refname, each_reflog_ent, refs);
 }
 
 static int cmd_for_each_reflog_ent_reverse(struct ref_store *refs, const char **argv)
 {
 	const char *refname = notnull(*argv++, "refname");
 
-	return refs_for_each_reflog_ent_reverse(refs, refname, each_reflog, refs);
+	return refs_for_each_reflog_ent_reverse(refs, refname, each_reflog_ent, refs);
 }
 
 static int cmd_reflog_exists(struct ref_store *refs, const char **argv)
diff --git a/t/t0600-reffiles-backend.sh b/t/t0600-reffiles-backend.sh
index 4f860285cc..56a3196b83 100755
--- a/t/t0600-reffiles-backend.sh
+++ b/t/t0600-reffiles-backend.sh
@@ -287,23 +287,23 @@ test_expect_success 'for_each_reflog()' '
 	mkdir -p     .git/worktrees/wt/logs/refs/bisect &&
 	echo $ZERO_OID > .git/worktrees/wt/logs/refs/bisect/wt-random &&
 
-	$RWT for-each-reflog | cut -d" " -f 2- >actual &&
+	$RWT for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	PSEUDO-WT 0x0
-	refs/bisect/wt-random 0x0
-	refs/heads/main 0x0
-	refs/heads/wt-main 0x0
+	HEAD
+	PSEUDO-WT
+	refs/bisect/wt-random
+	refs/heads/main
+	refs/heads/wt-main
 	EOF
 	test_cmp expected actual &&
 
-	$RMAIN for-each-reflog | cut -d" " -f 2- >actual &&
+	$RMAIN for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	PSEUDO-MAIN 0x0
-	refs/bisect/random 0x0
-	refs/heads/main 0x0
-	refs/heads/wt-main 0x0
+	HEAD
+	PSEUDO-MAIN
+	refs/bisect/random
+	refs/heads/main
+	refs/heads/wt-main
 	EOF
 	test_cmp expected actual
 '
diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index cfb583f544..3eee758bce 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -74,11 +74,11 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	refs/heads/main 0x0
-	refs/heads/new-main 0x0
+	HEAD
+	refs/heads/main
+	refs/heads/new-main
 	EOF
 	test_cmp expected actual
 '
diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
index 40332e23cc..c01f0f14a1 100755
--- a/t/t1406-submodule-ref-store.sh
+++ b/t/t1406-submodule-ref-store.sh
@@ -63,11 +63,11 @@ test_expect_success 'verify_ref(new-main)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-	$RUN for-each-reflog | cut -d" " -f 2- >actual &&
+	$RUN for-each-reflog >actual &&
 	cat >expected <<-\EOF &&
-	HEAD 0x1
-	refs/heads/main 0x0
-	refs/heads/new-main 0x0
+	HEAD
+	refs/heads/main
+	refs/heads/new-main
 	EOF
 	test_cmp expected actual
 '
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 7/8] refs: stop resolving ref corresponding to reflogs
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2024-02-21 12:37   ` [PATCH v3 6/8] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
@ 2024-02-21 12:37   ` Patrick Steinhardt
  2024-02-21 12:37   ` [PATCH v3 8/8] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3407 bytes --]

The reflog iterator tries to resolve the corresponding ref for every
reflog that it is about to yield. Historically, this was done due to
multiple reasons:

  - It ensures that the refname is safe because we end up calling
    `check_refname_format()`. Also, non-conformant refnames are skipped
    altogether.

  - The iterator used to yield the resolved object ID as well as its
    flags to the callback. This info was never used though, and the
    corresponding parameters were dropped in the preceding commit.

  - When a ref is corrupt then the reflog is not emitted at all.

We're about to introduce a new `git reflog list` subcommand that will
print all reflogs that the refdb knows about. Skipping over reflogs
whose refs are corrupted would be quite counterproductive in this case
as the user would have no way to learn about reflogs which may still
exist in their repository to help and rescue such a corrupted ref. Thus,
the only remaining reason for why we'd want to resolve the ref is to
verify its refname.

Refactor the code to call `check_refname_format()` directly instead of
trying to resolve the ref. This is significantly more efficient given
that we don't have to hit the object database anymore to list reflogs.
And second, it ensures that we end up showing reflogs of broken refs,
which will help to make the reflog more useful.

Note that this really only impacts the case where the corresponding ref
is corrupt. Reflogs for nonexistent refs would have been returned to the
caller beforehand already as we did not pass `RESOLVE_REF_READING` to
the function, and thus `refs_resolve_ref_unsafe()` would have returned
successfully in that case.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c    | 12 ++----------
 refs/reftable-backend.c |  6 ++----
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index c7aff6b331..6f98168a81 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2129,17 +2129,9 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
 		if (!S_ISREG(diter->st.st_mode))
 			continue;
-		if (diter->basename[0] == '.')
+		if (check_refname_format(diter->basename,
+					 REFNAME_ALLOW_ONELEVEL))
 			continue;
-		if (ends_with(diter->basename, ".lock"))
-			continue;
-
-		if (!refs_resolve_ref_unsafe(iter->ref_store,
-					     diter->relative_path, 0,
-					     NULL, NULL)) {
-			error("bad ref for %s", diter->path.buf);
-			continue;
-		}
 
 		iter->base.refname = diter->relative_path;
 		return ITER_OK;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 4998b676c2..6c11c4a5e3 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1616,11 +1616,9 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
 			continue;
 
-		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
-					     0, NULL, NULL)) {
-			error(_("bad ref for %s"), iter->log.refname);
+		if (check_refname_format(iter->log.refname,
+					 REFNAME_ALLOW_ONELEVEL))
 			continue;
-		}
 
 		free(iter->last_name);
 		iter->last_name = xstrdup(iter->log.refname);
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 8/8] builtin/reflog: introduce subcommand to list reflogs
  2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2024-02-21 12:37   ` [PATCH v3 7/8] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
@ 2024-02-21 12:37   ` Patrick Steinhardt
  7 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 7053 bytes --]

While the git-reflog(1) command has subcommands to show reflog entries
or check for reflog existence, it does not have any subcommands that
would allow the user to enumerate all existing reflogs. This makes it
quite hard to discover which reflogs a repository has. While this can
be worked around with the "files" backend by enumerating files in the
".git/logs" directory, users of the "reftable" backend don't enjoy such
a luxury.

Introduce a new subcommand `git reflog list` that lists all reflogs the
repository knows of to fill this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git-reflog.txt |   3 +
 builtin/reflog.c             |  34 +++++++++++
 t/t1410-reflog.sh            | 108 +++++++++++++++++++++++++++++++++++
 3 files changed, 145 insertions(+)

diff --git a/Documentation/git-reflog.txt b/Documentation/git-reflog.txt
index ec64cbff4c..a929c52982 100644
--- a/Documentation/git-reflog.txt
+++ b/Documentation/git-reflog.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git reflog' [show] [<log-options>] [<ref>]
+'git reflog list'
 'git reflog expire' [--expire=<time>] [--expire-unreachable=<time>]
 	[--rewrite] [--updateref] [--stale-fix]
 	[--dry-run | -n] [--verbose] [--all [--single-worktree] | <refs>...]
@@ -39,6 +40,8 @@ actions, and in addition the `HEAD` reflog records branch switching.
 `git reflog show` is an alias for `git log -g --abbrev-commit
 --pretty=oneline`; see linkgit:git-log[1] for more information.
 
+The "list" subcommand lists all refs which have a corresponding reflog.
+
 The "expire" subcommand prunes older reflog entries. Entries older
 than `expire` time, or entries older than `expire-unreachable` time
 and not reachable from the current tip, are removed from the reflog.
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 3a0c4d4322..63cd4d8b29 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -7,11 +7,15 @@
 #include "wildmatch.h"
 #include "worktree.h"
 #include "reflog.h"
+#include "refs.h"
 #include "parse-options.h"
 
 #define BUILTIN_REFLOG_SHOW_USAGE \
 	N_("git reflog [show] [<log-options>] [<ref>]")
 
+#define BUILTIN_REFLOG_LIST_USAGE \
+	N_("git reflog list")
+
 #define BUILTIN_REFLOG_EXPIRE_USAGE \
 	N_("git reflog expire [--expire=<time>] [--expire-unreachable=<time>]\n" \
 	   "                  [--rewrite] [--updateref] [--stale-fix]\n" \
@@ -29,6 +33,11 @@ static const char *const reflog_show_usage[] = {
 	NULL,
 };
 
+static const char *const reflog_list_usage[] = {
+	BUILTIN_REFLOG_LIST_USAGE,
+	NULL,
+};
+
 static const char *const reflog_expire_usage[] = {
 	BUILTIN_REFLOG_EXPIRE_USAGE,
 	NULL
@@ -46,6 +55,7 @@ static const char *const reflog_exists_usage[] = {
 
 static const char *const reflog_usage[] = {
 	BUILTIN_REFLOG_SHOW_USAGE,
+	BUILTIN_REFLOG_LIST_USAGE,
 	BUILTIN_REFLOG_EXPIRE_USAGE,
 	BUILTIN_REFLOG_DELETE_USAGE,
 	BUILTIN_REFLOG_EXISTS_USAGE,
@@ -238,6 +248,29 @@ static int cmd_reflog_show(int argc, const char **argv, const char *prefix)
 	return cmd_log_reflog(argc, argv, prefix);
 }
 
+static int show_reflog(const char *refname, void *cb_data UNUSED)
+{
+	printf("%s\n", refname);
+	return 0;
+}
+
+static int cmd_reflog_list(int argc, const char **argv, const char *prefix)
+{
+	struct option options[] = {
+		OPT_END()
+	};
+	struct ref_store *ref_store;
+
+	argc = parse_options(argc, argv, prefix, options, reflog_list_usage, 0);
+	if (argc)
+		return error(_("%s does not accept arguments: '%s'"),
+			     "list", argv[0]);
+
+	ref_store = get_main_ref_store(the_repository);
+
+	return refs_for_each_reflog(ref_store, show_reflog, NULL);
+}
+
 static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
 {
 	struct cmd_reflog_expire_cb cmd = { 0 };
@@ -417,6 +450,7 @@ int cmd_reflog(int argc, const char **argv, const char *prefix)
 	parse_opt_subcommand_fn *fn = NULL;
 	struct option options[] = {
 		OPT_SUBCOMMAND("show", &fn, cmd_reflog_show),
+		OPT_SUBCOMMAND("list", &fn, cmd_reflog_list),
 		OPT_SUBCOMMAND("expire", &fn, cmd_reflog_expire),
 		OPT_SUBCOMMAND("delete", &fn, cmd_reflog_delete),
 		OPT_SUBCOMMAND("exists", &fn, cmd_reflog_exists),
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index d2f5f42e67..5bf883f1e3 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -436,4 +436,112 @@ test_expect_success 'empty reflog' '
 	test_must_be_empty err
 '
 
+test_expect_success 'list reflogs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git reflog list >actual &&
+		test_must_be_empty actual &&
+
+		test_commit A &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual &&
+
+		git branch b &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/b
+		refs/heads/main
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'list reflogs with worktree' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		git worktree add wt &&
+		git -c core.logAllRefUpdates=always \
+			update-ref refs/worktree/main HEAD &&
+		git -c core.logAllRefUpdates=always \
+			update-ref refs/worktree/per-worktree HEAD &&
+		git -c core.logAllRefUpdates=always -C wt \
+			update-ref refs/worktree/per-worktree HEAD &&
+		git -c core.logAllRefUpdates=always -C wt \
+			update-ref refs/worktree/worktree HEAD &&
+
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		refs/heads/wt
+		refs/worktree/main
+		refs/worktree/per-worktree
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual &&
+
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		refs/heads/wt
+		refs/worktree/per-worktree
+		refs/worktree/worktree
+		EOF
+		git -C wt reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog list returns error with additional args' '
+	cat >expect <<-EOF &&
+	error: list does not accept arguments: ${SQ}bogus${SQ}
+	EOF
+	test_must_fail git reflog list bogus 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'reflog for symref with unborn target can be listed' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		git symbolic-ref HEAD refs/heads/unborn &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog with invalid object ID can be listed' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		test-tool ref-store main update-ref msg refs/heads/missing \
+			$(test_oid deadbeef) "$ZERO_OID" REF_SKIP_OID_VERIFICATION &&
+		cat >expect <<-EOF &&
+		HEAD
+		refs/heads/main
+		refs/heads/missing
+		EOF
+		git reflog list >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 7/7] builtin/reflog: introduce subcommand to list reflogs
  2024-02-20  9:06   ` [PATCH v2 7/7] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
@ 2024-04-24  7:30     ` Teng Long
  2024-04-24  8:01       ` Patrick Steinhardt
  0 siblings, 1 reply; 39+ messages in thread
From: Teng Long @ 2024-04-24  7:30 UTC (permalink / raw)
  To: ps; +Cc: git, gitster

Patrick Steinhardt <ps@pks.im> wrote:

+#define BUILTIN_REFLOG_LIST_USAGE \
+	N_("git reflog list")

Doesn't seem to need a translation here?

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 7/7] builtin/reflog: introduce subcommand to list reflogs
  2024-04-24  7:30     ` Teng Long
@ 2024-04-24  8:01       ` Patrick Steinhardt
  2024-04-24 14:53         ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2024-04-24  8:01 UTC (permalink / raw)
  To: Teng Long; +Cc: git, gitster

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

On Wed, Apr 24, 2024 at 03:30:47PM +0800, Teng Long wrote:
> Patrick Steinhardt <ps@pks.im> wrote:
> 
> +#define BUILTIN_REFLOG_LIST_USAGE \
> +	N_("git reflog list")
> 
> Doesn't seem to need a translation here?

I was following the precedent of the other subcommands, which all mark
their usage as needing translation. Whether that is ultimately warranted
I can't really tell. In any case, if we decide that it's not we should
also drop the marker for all the other usages.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 7/7] builtin/reflog: introduce subcommand to list reflogs
  2024-04-24  8:01       ` Patrick Steinhardt
@ 2024-04-24 14:53         ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2024-04-24 14:53 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Teng Long, git

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Apr 24, 2024 at 03:30:47PM +0800, Teng Long wrote:
>> Patrick Steinhardt <ps@pks.im> wrote:
>> 
>> +#define BUILTIN_REFLOG_LIST_USAGE \
>> +	N_("git reflog list")
>> 
>> Doesn't seem to need a translation here?
>
> I was following the precedent of the other subcommands, which all mark
> their usage as needing translation. Whether that is ultimately warranted
> I can't really tell. In any case, if we decide that it's not we should
> also drop the marker for all the other usages.

The motivation for N_() in others (namely, the ones with
<placeholder>s) is that the literal part like "git" "reflog"
"expire" and "--expire=" cannot be given differently even when the
user works in a different locale, but placeholders that explain the
meaning of what the user must plug in there like "<time>" is easier
for the users to be in their language.

The "list" subcommand is currently an oddball that does not happen
to take an argument or an option with value, and does not use any
<placeholder>, but for consistency and future-proofing, it is better
to have it in N_().

Thanks.


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2024-04-24 14:53 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 1/6] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 2/6] dir-iterator: support iteration in sorted order Patrick Steinhardt
2024-02-19 23:39   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 3/6] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
2024-02-20  0:04   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 4/6] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
2024-02-20  0:14   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 5/6] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
2024-02-20  0:14   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 6/6] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
2024-02-20  0:32   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 1/7] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 2/7] dir-iterator: support iteration in sorted order Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 3/7] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 4/7] refs: always treat iterators as ordered Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 5/7] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 6/7] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 7/7] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
2024-04-24  7:30     ` Teng Long
2024-04-24  8:01       ` Patrick Steinhardt
2024-04-24 14:53         ` Junio C Hamano
2024-02-20 17:22   ` [PATCH v2 0/7] reflog: " Junio C Hamano
2024-02-21 11:48     ` Patrick Steinhardt
2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 1/8] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 2/8] dir-iterator: support iteration in sorted order Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 3/8] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 4/8] refs/files: sort merged worktree and common reflogs Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 5/8] refs: always treat iterators as ordered Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 6/8] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 7/8] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 8/8] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).