Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] merge-tree: handle missing objects correctly
@ 2024-02-06  9:49 Johannes Schindelin via GitGitGadget
  2024-02-06  9:49 ` [PATCH 1/4] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-06  9:49 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin

I recently looked into issues where git merge-tree calls returned bogus data
(in one instance returning an empty tree for non-empty merge parents). By
the time I had a look at the corresponding repository, the issue was no
longer reproducible, but a closer look at the code combined with some manual
experimenting turned up the fact that missing tree objects aren't handled as
errors by git merge-tree.

While at it, I added a commit on top that tries to catch all remaining
unchecked parse_tree() calls.

This patch series is based on js/merge-tree-3-trees because I introduced
three unchecked parse_tree() calls in that topic branch.

Johannes Schindelin (4):
  merge-tree: fail with a non-zero exit code on missing tree objects
  merge-ort: do check `parse_tree()`'s return value
  t4301: verify that merge-tree fails on missing blob objects
  Always check `parse_tree*()`'s return value

 builtin/checkout.c               | 19 ++++++++++++++++---
 builtin/clone.c                  |  3 ++-
 builtin/commit.c                 |  3 ++-
 builtin/merge-tree.c             |  6 ++++++
 builtin/read-tree.c              |  3 ++-
 builtin/reset.c                  |  4 ++++
 cache-tree.c                     |  4 ++--
 merge-ort.c                      | 16 +++++++++++-----
 merge-recursive.c                |  3 ++-
 merge.c                          |  5 ++++-
 reset.c                          |  5 +++++
 sequencer.c                      |  4 ++++
 t/t4301-merge-tree-write-tree.sh | 24 ++++++++++++++++++++++++
 13 files changed, 84 insertions(+), 15 deletions(-)


base-commit: 5f43cf5b2e4b68386d3774bce880b0f74d801635
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1651%2Fdscho%2Fmerge-tree-and-missing-objects-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1651/dscho/merge-tree-and-missing-objects-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1651
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/4] merge-tree: fail with a non-zero exit code on missing tree objects
  2024-02-06  9:49 [PATCH 0/4] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
@ 2024-02-06  9:49 ` Johannes Schindelin via GitGitGadget
  2024-02-07  7:42   ` Patrick Steinhardt
  2024-02-06  9:49 ` [PATCH 2/4] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-06  9:49 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When `git merge-tree` encounters a missing tree object, it should error
out and not continue quietly as if nothing had happened.

However, as of time of writing, `git merge-tree` _does_ continue, and
then offers the empty tree as result.

Let's fix this.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 merge-ort.c                      |  7 ++++---
 t/t4301-merge-tree-write-tree.sh | 10 ++++++++++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 6491070d965..c37fc035f13 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1659,9 +1659,10 @@ static int collect_merge_info(struct merge_options *opt,
 	info.data = opt;
 	info.show_all_errors = 1;
 
-	parse_tree(merge_base);
-	parse_tree(side1);
-	parse_tree(side2);
+	if (parse_tree(merge_base) < 0 ||
+	    parse_tree(side1) < 0 ||
+	    parse_tree(side2) < 0)
+		return -1;
 	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
 	init_tree_desc(t + 1, side1->buffer, side1->size);
 	init_tree_desc(t + 2, side2->buffer, side2->size);
diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 7d0fa74da74..4ea1b74445d 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -950,5 +950,15 @@ test_expect_success '--merge-base with tree OIDs' '
 	git merge-tree --merge-base=side1^^{tree} side1^{tree} side3^{tree} >with-trees &&
 	test_cmp with-commits with-trees
 '
+test_expect_success 'error out on missing tree objects' '
+	git init --bare missing-tree.git &&
+	(
+		git rev-list side3 &&
+		git rev-parse side3^:
+	) | git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing &&
+	side3=$(git rev-parse side3) &&
+	test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual &&
+	test_must_be_empty actual
+'
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/4] merge-ort: do check `parse_tree()`'s return value
  2024-02-06  9:49 [PATCH 0/4] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  2024-02-06  9:49 ` [PATCH 1/4] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
@ 2024-02-06  9:49 ` Johannes Schindelin via GitGitGadget
  2024-02-06  9:49 ` [PATCH 3/4] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-06  9:49 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The previous commit fixed a bug where a missing tree was reported, but
not treated as an error.

This patch addresses the same issue for the remaining two callers of
`parse_tree()`.

This change is not accompanied by a regression test because the code in
question is only reached at the `checkout` stage, i.e. after the merge
has happened (and therefore the tree objects could only be missing if
the disk had gone bad in that short time window, or something similarly
tricky to recreate in the test suite).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 merge-ort.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index c37fc035f13..79d9e18f63d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4379,9 +4379,11 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.verbose_update = (opt->verbosity > 2);
 	unpack_opts.fn = twoway_merge;
 	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
-	parse_tree(prev);
+	if (parse_tree(prev) < 0)
+		return -1;
 	init_tree_desc(&trees[0], prev->buffer, prev->size);
-	parse_tree(next);
+	if (parse_tree(next) < 0)
+		return -1;
 	init_tree_desc(&trees[1], next->buffer, next->size);
 
 	ret = unpack_trees(2, trees, &unpack_opts);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/4] t4301: verify that merge-tree fails on missing blob objects
  2024-02-06  9:49 [PATCH 0/4] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  2024-02-06  9:49 ` [PATCH 1/4] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
  2024-02-06  9:49 ` [PATCH 2/4] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-06  9:49 ` Johannes Schindelin via GitGitGadget
  2024-02-06  9:49 ` [PATCH 4/4] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-06  9:49 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

We just fixed a problem where `merge-tree` would not fail on missing
tree objects. Let's ensure that that problem does not occur with blob
objects (and won't, in the future, either).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t4301-merge-tree-write-tree.sh | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 4ea1b74445d..86807a57d4d 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -961,4 +961,18 @@ test_expect_success 'error out on missing tree objects' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'error out on missing blob objects' '
+	seq1=$(test_seq 1 10 | git hash-object -w --stdin) &&
+	seq2=$(test_seq 1 11 | git hash-object -w --stdin) &&
+	seq3=$(test_seq 0 10 | git hash-object -w --stdin) &&
+	tree1=$(printf "100644 blob %s\tsequence" $seq1 | git mktree) &&
+	tree2=$(printf "100644 blob %s\tsequence" $seq2 | git mktree) &&
+	tree3=$(printf "100644 blob %s\tsequence" $seq3 | git mktree) &&
+	git init --bare missing-blob.git &&
+	test_write_lines $seq1 $seq3 $tree1 $tree2 $tree3 |
+	git pack-objects missing-blob.git/objects/pack/side1-whatever-is-missing &&
+	test_must_fail git --git-dir=missing-blob.git merge-tree --merge-base=$tree1 $tree2 $tree3 >actual &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/4] Always check `parse_tree*()`'s return value
  2024-02-06  9:49 [PATCH 0/4] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2024-02-06  9:49 ` [PATCH 3/4] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
@ 2024-02-06  9:49 ` Johannes Schindelin via GitGitGadget
  2024-02-06 22:07   ` Junio C Hamano
  2024-02-07  7:42   ` Patrick Steinhardt
  2024-02-06 21:12 ` [PATCH 0/4] merge-tree: handle missing objects correctly Junio C Hamano
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-06  9:49 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Otherwise we may easily run into serious crashes: For example, if we run
`init_tree_desc()` directly after a failed `parse_tree()`, we are
accessing uninitialized data or trying to dereference `NULL`.

Note that the `parse_tree()` function already takes care of showing an
error message. The `parse_tree_indirectly()` and
`repo_get_commit_tree()` functions do not, therefore those latter call
sites need to show a useful error message while the former do not.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/checkout.c   | 19 ++++++++++++++++---
 builtin/clone.c      |  3 ++-
 builtin/commit.c     |  3 ++-
 builtin/merge-tree.c |  6 ++++++
 builtin/read-tree.c  |  3 ++-
 builtin/reset.c      |  4 ++++
 cache-tree.c         |  4 ++--
 merge-ort.c          |  3 +++
 merge-recursive.c    |  3 ++-
 merge.c              |  5 ++++-
 reset.c              |  5 +++++
 sequencer.c          |  4 ++++
 12 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index f02434bc155..84108ec3635 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -707,7 +707,8 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	init_checkout_metadata(&opts.meta, info->refname,
 			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		return 128;
 	init_tree_desc(&tree_desc, tree->buffer, tree->size);
 	switch (unpack_trees(1, &tree_desc, &opts)) {
 	case -2:
@@ -786,9 +787,15 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		if (new_branch_info->commit)
 			BUG("'switch --orphan' should never accept a commit as starting point");
 		new_tree = parse_tree_indirect(the_hash_algo->empty_tree);
-	} else
+		if (!new_tree)
+			BUG("unable to read empty tree");
+	} else {
 		new_tree = repo_get_commit_tree(the_repository,
 						new_branch_info->commit);
+		if (!new_tree)
+			return error(_("unable to read tree %s"),
+				     oid_to_hex(&new_branch_info->commit->object.oid));
+	}
 	if (opts->discard_changes) {
 		ret = reset_tree(new_tree, opts, 1, writeout_error, new_branch_info);
 		if (ret)
@@ -823,7 +830,8 @@ static int merge_working_tree(const struct checkout_opts *opts,
 				oid_to_hex(old_commit_oid));
 
 		init_tree_desc(&trees[0], tree->buffer, tree->size);
-		parse_tree(new_tree);
+		if (parse_tree(new_tree) < 0)
+			exit(128);
 		tree = new_tree;
 		init_tree_desc(&trees[1], tree->buffer, tree->size);
 
@@ -1239,10 +1247,15 @@ static void setup_new_branch_info_and_source_tree(
 	if (!new_branch_info->commit) {
 		/* not a commit */
 		*source_tree = parse_tree_indirect(rev);
+		if (!*source_tree)
+			die(_("unable to read tree %s"), oid_to_hex(rev));
 	} else {
 		parse_commit_or_die(new_branch_info->commit);
 		*source_tree = repo_get_commit_tree(the_repository,
 						    new_branch_info->commit);
+		if (!*source_tree)
+			die(_("unable to read tree %s"),
+			    oid_to_hex(&new_branch_info->commit->object.oid));
 	}
 }
 
diff --git a/builtin/clone.c b/builtin/clone.c
index c6357af9498..4410b55be98 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -736,7 +736,8 @@ static int checkout(int submodule_progress, int filter_submodules)
 	tree = parse_tree_indirect(&oid);
 	if (!tree)
 		die(_("unable to parse commit %s"), oid_to_hex(&oid));
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(&t, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts) < 0)
 		die(_("unable to checkout working tree"));
diff --git a/builtin/commit.c b/builtin/commit.c
index 781af2e206c..0723f06de7a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -339,7 +339,8 @@ static void create_base_index(const struct commit *current_head)
 	tree = parse_tree_indirect(&current_head->object.oid);
 	if (!tree)
 		die(_("failed to unpack HEAD tree object"));
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(&t, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts))
 		exit(128); /* We've already reported the error, finish dying */
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 2d4ce5b3886..ba84d00deee 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -447,12 +447,18 @@ static int real_merge(struct merge_tree_options *o,
 		if (repo_get_oid_treeish(the_repository, merge_base, &base_oid))
 			die(_("could not parse as tree '%s'"), merge_base);
 		base_tree = parse_tree_indirect(&base_oid);
+		if (!base_tree)
+			die(_("unable to read tree %s"), oid_to_hex(&base_oid));
 		if (repo_get_oid_treeish(the_repository, branch1, &head_oid))
 			die(_("could not parse as tree '%s'"), branch1);
 		parent1_tree = parse_tree_indirect(&head_oid);
+		if (!parent1_tree)
+			die(_("unable to read tree %s"), oid_to_hex(&head_oid));
 		if (repo_get_oid_treeish(the_repository, branch2, &merge_oid))
 			die(_("could not parse as tree '%s'"), branch2);
 		parent2_tree = parse_tree_indirect(&merge_oid);
+		if (!parent2_tree)
+			die(_("unable to read tree %s"), oid_to_hex(&merge_oid));
 
 		opt.ancestor = merge_base;
 		merge_incore_nonrecursive(&opt, base_tree, parent1_tree, parent2_tree, &result);
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 8196ca9dd85..5923ed36893 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -263,7 +263,8 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	cache_tree_free(&the_index.cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
-		parse_tree(tree);
+		if (parse_tree(tree) < 0)
+			return 128;
 		init_tree_desc(t+i, tree->buffer, tree->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
diff --git a/builtin/reset.c b/builtin/reset.c
index 4b018d20e3b..f030f57f4e9 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -119,6 +119,10 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 
 	if (reset_type == MIXED || reset_type == HARD) {
 		tree = parse_tree_indirect(oid);
+		if (!tree) {
+			error(_("unable to read tree %s"), oid_to_hex(oid));
+			goto out;
+		}
 		prime_cache_tree(the_repository, the_repository->index, tree);
 	}
 
diff --git a/cache-tree.c b/cache-tree.c
index 641427ed410..c6508b64a5c 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -779,8 +779,8 @@ static void prime_cache_tree_rec(struct repository *r,
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
 
-			if (!subtree->object.parsed)
-				parse_tree(subtree);
+			if (!subtree->object.parsed && parse_tree(subtree) < 0)
+				exit(128);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
 
diff --git a/merge-ort.c b/merge-ort.c
index 79d9e18f63d..534ddaf16ba 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4983,6 +4983,9 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 
 	if (result->clean >= 0) {
 		result->tree = parse_tree_indirect(&working_tree_oid);
+		if (!result->tree)
+			die(_("unable to read tree %s"),
+			    oid_to_hex(&working_tree_oid));
 		/* existence of conflicted entries implies unclean */
 		result->clean &= strmap_empty(&opt->priv->conflicted);
 	}
diff --git a/merge-recursive.c b/merge-recursive.c
index e3beb0801b1..10d41bfd487 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -410,7 +410,8 @@ static inline int merge_detect_rename(struct merge_options *opt)
 
 static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree)
 {
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(desc, tree->buffer, tree->size);
 }
 
diff --git a/merge.c b/merge.c
index b60925459c2..14a7325859d 100644
--- a/merge.c
+++ b/merge.c
@@ -80,7 +80,10 @@ int checkout_fast_forward(struct repository *r,
 		return -1;
 	}
 	for (i = 0; i < nr_trees; i++) {
-		parse_tree(trees[i]);
+		if (parse_tree(trees[i]) < 0) {
+			rollback_lock_file(&lock_file);
+			return -1;
+		}
 		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
 	}
 
diff --git a/reset.c b/reset.c
index 48da0adf851..a93fdbc12e3 100644
--- a/reset.c
+++ b/reset.c
@@ -158,6 +158,11 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
 	}
 
 	tree = parse_tree_indirect(oid);
+	if (!tree) {
+		ret = error(_("unable to read tree %s"), oid_to_hex(oid));
+		goto leave_reset_head;
+	}
+
 	prime_cache_tree(r, r->index, tree);
 
 	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0) {
diff --git a/sequencer.c b/sequencer.c
index d584cac8ed9..407473bab28 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -715,6 +715,8 @@ static int do_recursive_merge(struct repository *r,
 	o.show_rename_progress = 1;
 
 	head_tree = parse_tree_indirect(head);
+	if (!head_tree)
+		return error(_("unable to read tree %s"), oid_to_hex(head));
 	next_tree = next ? repo_get_commit_tree(r, next) : empty_tree(r);
 	base_tree = base ? repo_get_commit_tree(r, base) : empty_tree(r);
 
@@ -3887,6 +3889,8 @@ static int do_reset(struct repository *r,
 	}
 
 	tree = parse_tree_indirect(&oid);
+	if (!tree)
+		return error(_("unable to read tree %s"), oid_to_hex(&oid));
 	prime_cache_tree(r, r->index, tree);
 
 	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/4] merge-tree: handle missing objects correctly
  2024-02-06  9:49 [PATCH 0/4] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2024-02-06  9:49 ` [PATCH 4/4] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-06 21:12 ` Junio C Hamano
  2024-02-07  7:42 ` Patrick Steinhardt
  2024-02-07 16:47 ` [PATCH v2 0/5] " Johannes Schindelin via GitGitGadget
  6 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-06 21:12 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> I recently looked into issues where git merge-tree calls returned bogus data
> (in one instance returning an empty tree for non-empty merge parents). By
> the time I had a look at the corresponding repository, the issue was no
> longer reproducible, but a closer look at the code combined with some manual
> experimenting turned up the fact that missing tree objects aren't handled as
> errors by git merge-tree.
>
> While at it, I added a commit on top that tries to catch all remaining
> unchecked parse_tree() calls.
>
> This patch series is based on js/merge-tree-3-trees because I introduced
> three unchecked parse_tree() calls in that topic branch.

Thanks.  All the added checks looked reasonable to me.

Will queue.

>
> Johannes Schindelin (4):
>   merge-tree: fail with a non-zero exit code on missing tree objects
>   merge-ort: do check `parse_tree()`'s return value
>   t4301: verify that merge-tree fails on missing blob objects
>   Always check `parse_tree*()`'s return value
>
>  builtin/checkout.c               | 19 ++++++++++++++++---
>  builtin/clone.c                  |  3 ++-
>  builtin/commit.c                 |  3 ++-
>  builtin/merge-tree.c             |  6 ++++++
>  builtin/read-tree.c              |  3 ++-
>  builtin/reset.c                  |  4 ++++
>  cache-tree.c                     |  4 ++--
>  merge-ort.c                      | 16 +++++++++++-----
>  merge-recursive.c                |  3 ++-
>  merge.c                          |  5 ++++-
>  reset.c                          |  5 +++++
>  sequencer.c                      |  4 ++++
>  t/t4301-merge-tree-write-tree.sh | 24 ++++++++++++++++++++++++
>  13 files changed, 84 insertions(+), 15 deletions(-)
>
>
> base-commit: 5f43cf5b2e4b68386d3774bce880b0f74d801635
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1651%2Fdscho%2Fmerge-tree-and-missing-objects-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1651/dscho/merge-tree-and-missing-objects-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1651

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Always check `parse_tree*()`'s return value
  2024-02-06  9:49 ` [PATCH 4/4] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-06 22:07   ` Junio C Hamano
  2024-02-07  7:42   ` Patrick Steinhardt
  1 sibling, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-06 22:07 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> @@ -707,7 +707,8 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
>  	init_checkout_metadata(&opts.meta, info->refname,
>  			       info->commit ? &info->commit->object.oid : null_oid(),
>  			       NULL);
> -	parse_tree(tree);
> +	if (parse_tree(tree) < 0)
> +		return 128;
>  	init_tree_desc(&tree_desc, tree->buffer, tree->size);

Another thing that caught my attention during today's integration
cycle was that this "parse the tree object in variable tree and then
call init_tree_desc on it" is a recurring pattern.  After the dust
settles, we might want to have a helper function perhaps like

    int init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree)
    {
	if (parse_tree(tree))
		return -1;
	init_tree_desc(desc, tree->buffer, tree->size);
    }

It was a bit irritating as another topic in flight changes the
function signature for init_tree_desc(), causing a textual conflict
that could be easily avoided.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/4] merge-tree: fail with a non-zero exit code on missing tree objects
  2024-02-06  9:49 ` [PATCH 1/4] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
@ 2024-02-07  7:42   ` Patrick Steinhardt
  0 siblings, 0 replies; 48+ messages in thread
From: Patrick Steinhardt @ 2024-02-07  7:42 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 2478 bytes --]

On Tue, Feb 06, 2024 at 09:49:38AM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> When `git merge-tree` encounters a missing tree object, it should error
> out and not continue quietly as if nothing had happened.
> 
> However, as of time of writing, `git merge-tree` _does_ continue, and
> then offers the empty tree as result.
> 
> Let's fix this.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  merge-ort.c                      |  7 ++++---
>  t/t4301-merge-tree-write-tree.sh | 10 ++++++++++
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/merge-ort.c b/merge-ort.c
> index 6491070d965..c37fc035f13 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -1659,9 +1659,10 @@ static int collect_merge_info(struct merge_options *opt,
>  	info.data = opt;
>  	info.show_all_errors = 1;
>  
> -	parse_tree(merge_base);
> -	parse_tree(side1);
> -	parse_tree(side2);
> +	if (parse_tree(merge_base) < 0 ||
> +	    parse_tree(side1) < 0 ||
> +	    parse_tree(side2) < 0)
> +		return -1;

I was wondering whether we also want to print an error in this case. But
`parse_tree()` calls `parse_tree_gently(tree, 0)`, where the second
parameter instructs the function to print an error message when the tree
is missing.

>  	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
>  	init_tree_desc(t + 1, side1->buffer, side1->size);
>  	init_tree_desc(t + 2, side2->buffer, side2->size);
> diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> index 7d0fa74da74..4ea1b74445d 100755
> --- a/t/t4301-merge-tree-write-tree.sh
> +++ b/t/t4301-merge-tree-write-tree.sh
> @@ -950,5 +950,15 @@ test_expect_success '--merge-base with tree OIDs' '
>  	git merge-tree --merge-base=side1^^{tree} side1^{tree} side3^{tree} >with-trees &&
>  	test_cmp with-commits with-trees
>  '

Nit: missing newline.

> +test_expect_success 'error out on missing tree objects' '
> +	git init --bare missing-tree.git &&
> +	(
> +		git rev-list side3 &&
> +		git rev-parse side3^:
> +	) | git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing &&
> +	side3=$(git rev-parse side3) &&
> +	test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual &&
> +	test_must_be_empty actual
> +'

Can't we avoid the subshell by using curly braces to pipe the outputs

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Always check `parse_tree*()`'s return value
  2024-02-06  9:49 ` [PATCH 4/4] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
  2024-02-06 22:07   ` Junio C Hamano
@ 2024-02-07  7:42   ` Patrick Steinhardt
  1 sibling, 0 replies; 48+ messages in thread
From: Patrick Steinhardt @ 2024-02-07  7:42 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 894 bytes --]

On Tue, Feb 06, 2024 at 09:49:41AM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
[snip]
> diff --git a/cache-tree.c b/cache-tree.c
> index 641427ed410..c6508b64a5c 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -779,8 +779,8 @@ static void prime_cache_tree_rec(struct repository *r,
>  			struct cache_tree_sub *sub;
>  			struct tree *subtree = lookup_tree(r, &entry.oid);
>  
> -			if (!subtree->object.parsed)
> -				parse_tree(subtree);
> +			if (!subtree->object.parsed && parse_tree(subtree) < 0)
> +				exit(128);

It's somewhat weird that we have the `subtree->object.parsed` check in
the first place, because the first thing that `parse_tree_gently()` does
is to check that flag and immediately return when the tree is parsed
already. It's a preexisting issue though, so this is fine.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/4] merge-tree: handle missing objects correctly
  2024-02-06  9:49 [PATCH 0/4] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                   ` (4 preceding siblings ...)
  2024-02-06 21:12 ` [PATCH 0/4] merge-tree: handle missing objects correctly Junio C Hamano
@ 2024-02-07  7:42 ` Patrick Steinhardt
  2024-02-07 16:47 ` [PATCH v2 0/5] " Johannes Schindelin via GitGitGadget
  6 siblings, 0 replies; 48+ messages in thread
From: Patrick Steinhardt @ 2024-02-07  7:42 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 910 bytes --]

On Tue, Feb 06, 2024 at 09:49:37AM +0000, Johannes Schindelin via GitGitGadget wrote:
> I recently looked into issues where git merge-tree calls returned bogus data
> (in one instance returning an empty tree for non-empty merge parents). By
> the time I had a look at the corresponding repository, the issue was no
> longer reproducible, but a closer look at the code combined with some manual
> experimenting turned up the fact that missing tree objects aren't handled as
> errors by git merge-tree.
> 
> While at it, I added a commit on top that tries to catch all remaining
> unchecked parse_tree() calls.
> 
> This patch series is based on js/merge-tree-3-trees because I introduced
> three unchecked parse_tree() calls in that topic branch.

These patches all look good to me, thanks! I have two nits for the first
patch, but I don't really think that those are worth a reroll.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 0/5] merge-tree: handle missing objects correctly
  2024-02-06  9:49 [PATCH 0/4] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                   ` (5 preceding siblings ...)
  2024-02-07  7:42 ` Patrick Steinhardt
@ 2024-02-07 16:47 ` Johannes Schindelin via GitGitGadget
  2024-02-07 16:47   ` [PATCH v2 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
                     ` (5 more replies)
  6 siblings, 6 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-07 16:47 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Johannes Schindelin

I recently looked into issues where git merge-tree calls returned bogus data
(in one instance returning an empty tree for non-empty merge parents). By
the time I had a look at the corresponding repository, the issue was no
longer reproducible, but a closer look at the code combined with some manual
experimenting turned up the fact that missing tree objects aren't handled as
errors by git merge-tree.

While at it, I added a commit on top that tries to catch all remaining
unchecked parse_tree() calls.

This patch series is based on js/merge-tree-3-trees because I introduced
three unchecked parse_tree() calls in that topic branch.

Changes since v1:

 * Simplified the test case, avoiding a subshell and a pipe in the process.
 * Added a patch to remove a superfluous subtree->object.parsed guard around
   a parse_tree(subtree) call.

Johannes Schindelin (5):
  merge-tree: fail with a non-zero exit code on missing tree objects
  merge-ort: do check `parse_tree()`'s return value
  t4301: verify that merge-tree fails on missing blob objects
  Always check `parse_tree*()`'s return value
  cache-tree: avoid an unnecessary check

 builtin/checkout.c               | 19 ++++++++++++++++---
 builtin/clone.c                  |  3 ++-
 builtin/commit.c                 |  3 ++-
 builtin/merge-tree.c             |  6 ++++++
 builtin/read-tree.c              |  3 ++-
 builtin/reset.c                  |  4 ++++
 cache-tree.c                     |  4 ++--
 merge-ort.c                      | 16 +++++++++++-----
 merge-recursive.c                |  3 ++-
 merge.c                          |  5 ++++-
 reset.c                          |  5 +++++
 sequencer.c                      |  4 ++++
 t/t4301-merge-tree-write-tree.sh | 24 ++++++++++++++++++++++++
 13 files changed, 84 insertions(+), 15 deletions(-)


base-commit: 5f43cf5b2e4b68386d3774bce880b0f74d801635
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1651%2Fdscho%2Fmerge-tree-and-missing-objects-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1651/dscho/merge-tree-and-missing-objects-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1651

Range-diff vs v1:

 1:  a3e8ae86114 ! 1:  01dfd66568c merge-tree: fail with a non-zero exit code on missing tree objects
     @@ merge-ort.c: static int collect_merge_info(struct merge_options *opt,
      
       ## t/t4301-merge-tree-write-tree.sh ##
      @@ t/t4301-merge-tree-write-tree.sh: test_expect_success '--merge-base with tree OIDs' '
     - 	git merge-tree --merge-base=side1^^{tree} side1^{tree} side3^{tree} >with-trees &&
       	test_cmp with-commits with-trees
       '
     + 
      +test_expect_success 'error out on missing tree objects' '
      +	git init --bare missing-tree.git &&
     -+	(
     -+		git rev-list side3 &&
     -+		git rev-parse side3^:
     -+	) | git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing &&
     ++	git rev-list side3 >list &&
     ++	git rev-parse side3^: >list &&
     ++	git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing <list &&
      +	side3=$(git rev-parse side3) &&
      +	test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual &&
      +	test_must_be_empty actual
      +'
     - 
     ++
       test_done
 2:  3e5b787fc03 = 2:  a1bbb7e06e5 merge-ort: do check `parse_tree()`'s return value
 3:  85d3e672871 = 3:  be1dadf2850 t4301: verify that merge-tree fails on missing blob objects
 4:  93abd7000b8 = 4:  ffd38ad602a Always check `parse_tree*()`'s return value
 -:  ----------- > 5:  43c04749513 cache-tree: avoid an unnecessary check

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 1/5] merge-tree: fail with a non-zero exit code on missing tree objects
  2024-02-07 16:47 ` [PATCH v2 0/5] " Johannes Schindelin via GitGitGadget
@ 2024-02-07 16:47   ` Johannes Schindelin via GitGitGadget
  2024-02-07 17:02     ` Eric Sunshine
  2024-02-07 16:47   ` [PATCH v2 2/5] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-07 16:47 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When `git merge-tree` encounters a missing tree object, it should error
out and not continue quietly as if nothing had happened.

However, as of time of writing, `git merge-tree` _does_ continue, and
then offers the empty tree as result.

Let's fix this.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 merge-ort.c                      |  7 ++++---
 t/t4301-merge-tree-write-tree.sh | 10 ++++++++++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 6491070d965..c37fc035f13 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1659,9 +1659,10 @@ static int collect_merge_info(struct merge_options *opt,
 	info.data = opt;
 	info.show_all_errors = 1;
 
-	parse_tree(merge_base);
-	parse_tree(side1);
-	parse_tree(side2);
+	if (parse_tree(merge_base) < 0 ||
+	    parse_tree(side1) < 0 ||
+	    parse_tree(side2) < 0)
+		return -1;
 	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
 	init_tree_desc(t + 1, side1->buffer, side1->size);
 	init_tree_desc(t + 2, side2->buffer, side2->size);
diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 7d0fa74da74..7d588557bdf 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -951,4 +951,14 @@ test_expect_success '--merge-base with tree OIDs' '
 	test_cmp with-commits with-trees
 '
 
+test_expect_success 'error out on missing tree objects' '
+	git init --bare missing-tree.git &&
+	git rev-list side3 >list &&
+	git rev-parse side3^: >list &&
+	git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing <list &&
+	side3=$(git rev-parse side3) &&
+	test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 2/5] merge-ort: do check `parse_tree()`'s return value
  2024-02-07 16:47 ` [PATCH v2 0/5] " Johannes Schindelin via GitGitGadget
  2024-02-07 16:47   ` [PATCH v2 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
@ 2024-02-07 16:47   ` Johannes Schindelin via GitGitGadget
  2024-02-07 16:47   ` [PATCH v2 3/5] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-07 16:47 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The previous commit fixed a bug where a missing tree was reported, but
not treated as an error.

This patch addresses the same issue for the remaining two callers of
`parse_tree()`.

This change is not accompanied by a regression test because the code in
question is only reached at the `checkout` stage, i.e. after the merge
has happened (and therefore the tree objects could only be missing if
the disk had gone bad in that short time window, or something similarly
tricky to recreate in the test suite).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 merge-ort.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index c37fc035f13..79d9e18f63d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4379,9 +4379,11 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.verbose_update = (opt->verbosity > 2);
 	unpack_opts.fn = twoway_merge;
 	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
-	parse_tree(prev);
+	if (parse_tree(prev) < 0)
+		return -1;
 	init_tree_desc(&trees[0], prev->buffer, prev->size);
-	parse_tree(next);
+	if (parse_tree(next) < 0)
+		return -1;
 	init_tree_desc(&trees[1], next->buffer, next->size);
 
 	ret = unpack_trees(2, trees, &unpack_opts);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 3/5] t4301: verify that merge-tree fails on missing blob objects
  2024-02-07 16:47 ` [PATCH v2 0/5] " Johannes Schindelin via GitGitGadget
  2024-02-07 16:47   ` [PATCH v2 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
  2024-02-07 16:47   ` [PATCH v2 2/5] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-07 16:47   ` Johannes Schindelin via GitGitGadget
  2024-02-07 17:24     ` Eric Sunshine
  2024-02-07 16:47   ` [PATCH v2 4/5] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-07 16:47 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

We just fixed a problem where `merge-tree` would not fail on missing
tree objects. Let's ensure that that problem does not occur with blob
objects (and won't, in the future, either).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t4301-merge-tree-write-tree.sh | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 7d588557bdf..9211cb58aa1 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -961,4 +961,18 @@ test_expect_success 'error out on missing tree objects' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'error out on missing blob objects' '
+	seq1=$(test_seq 1 10 | git hash-object -w --stdin) &&
+	seq2=$(test_seq 1 11 | git hash-object -w --stdin) &&
+	seq3=$(test_seq 0 10 | git hash-object -w --stdin) &&
+	tree1=$(printf "100644 blob %s\tsequence" $seq1 | git mktree) &&
+	tree2=$(printf "100644 blob %s\tsequence" $seq2 | git mktree) &&
+	tree3=$(printf "100644 blob %s\tsequence" $seq3 | git mktree) &&
+	git init --bare missing-blob.git &&
+	test_write_lines $seq1 $seq3 $tree1 $tree2 $tree3 |
+	git pack-objects missing-blob.git/objects/pack/side1-whatever-is-missing &&
+	test_must_fail git --git-dir=missing-blob.git merge-tree --merge-base=$tree1 $tree2 $tree3 >actual &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 4/5] Always check `parse_tree*()`'s return value
  2024-02-07 16:47 ` [PATCH v2 0/5] " Johannes Schindelin via GitGitGadget
                     ` (2 preceding siblings ...)
  2024-02-07 16:47   ` [PATCH v2 3/5] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
@ 2024-02-07 16:47   ` Johannes Schindelin via GitGitGadget
  2024-02-07 17:26     ` Eric Sunshine
  2024-02-07 16:47   ` [PATCH v2 5/5] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
  2024-02-22 14:36   ` [PATCH v3 0/5] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-07 16:47 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Otherwise we may easily run into serious crashes: For example, if we run
`init_tree_desc()` directly after a failed `parse_tree()`, we are
accessing uninitialized data or trying to dereference `NULL`.

Note that the `parse_tree()` function already takes care of showing an
error message. The `parse_tree_indirectly()` and
`repo_get_commit_tree()` functions do not, therefore those latter call
sites need to show a useful error message while the former do not.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/checkout.c   | 19 ++++++++++++++++---
 builtin/clone.c      |  3 ++-
 builtin/commit.c     |  3 ++-
 builtin/merge-tree.c |  6 ++++++
 builtin/read-tree.c  |  3 ++-
 builtin/reset.c      |  4 ++++
 cache-tree.c         |  4 ++--
 merge-ort.c          |  3 +++
 merge-recursive.c    |  3 ++-
 merge.c              |  5 ++++-
 reset.c              |  5 +++++
 sequencer.c          |  4 ++++
 12 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index f02434bc155..84108ec3635 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -707,7 +707,8 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	init_checkout_metadata(&opts.meta, info->refname,
 			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		return 128;
 	init_tree_desc(&tree_desc, tree->buffer, tree->size);
 	switch (unpack_trees(1, &tree_desc, &opts)) {
 	case -2:
@@ -786,9 +787,15 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		if (new_branch_info->commit)
 			BUG("'switch --orphan' should never accept a commit as starting point");
 		new_tree = parse_tree_indirect(the_hash_algo->empty_tree);
-	} else
+		if (!new_tree)
+			BUG("unable to read empty tree");
+	} else {
 		new_tree = repo_get_commit_tree(the_repository,
 						new_branch_info->commit);
+		if (!new_tree)
+			return error(_("unable to read tree %s"),
+				     oid_to_hex(&new_branch_info->commit->object.oid));
+	}
 	if (opts->discard_changes) {
 		ret = reset_tree(new_tree, opts, 1, writeout_error, new_branch_info);
 		if (ret)
@@ -823,7 +830,8 @@ static int merge_working_tree(const struct checkout_opts *opts,
 				oid_to_hex(old_commit_oid));
 
 		init_tree_desc(&trees[0], tree->buffer, tree->size);
-		parse_tree(new_tree);
+		if (parse_tree(new_tree) < 0)
+			exit(128);
 		tree = new_tree;
 		init_tree_desc(&trees[1], tree->buffer, tree->size);
 
@@ -1239,10 +1247,15 @@ static void setup_new_branch_info_and_source_tree(
 	if (!new_branch_info->commit) {
 		/* not a commit */
 		*source_tree = parse_tree_indirect(rev);
+		if (!*source_tree)
+			die(_("unable to read tree %s"), oid_to_hex(rev));
 	} else {
 		parse_commit_or_die(new_branch_info->commit);
 		*source_tree = repo_get_commit_tree(the_repository,
 						    new_branch_info->commit);
+		if (!*source_tree)
+			die(_("unable to read tree %s"),
+			    oid_to_hex(&new_branch_info->commit->object.oid));
 	}
 }
 
diff --git a/builtin/clone.c b/builtin/clone.c
index c6357af9498..4410b55be98 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -736,7 +736,8 @@ static int checkout(int submodule_progress, int filter_submodules)
 	tree = parse_tree_indirect(&oid);
 	if (!tree)
 		die(_("unable to parse commit %s"), oid_to_hex(&oid));
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(&t, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts) < 0)
 		die(_("unable to checkout working tree"));
diff --git a/builtin/commit.c b/builtin/commit.c
index 781af2e206c..0723f06de7a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -339,7 +339,8 @@ static void create_base_index(const struct commit *current_head)
 	tree = parse_tree_indirect(&current_head->object.oid);
 	if (!tree)
 		die(_("failed to unpack HEAD tree object"));
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(&t, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts))
 		exit(128); /* We've already reported the error, finish dying */
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 2d4ce5b3886..ba84d00deee 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -447,12 +447,18 @@ static int real_merge(struct merge_tree_options *o,
 		if (repo_get_oid_treeish(the_repository, merge_base, &base_oid))
 			die(_("could not parse as tree '%s'"), merge_base);
 		base_tree = parse_tree_indirect(&base_oid);
+		if (!base_tree)
+			die(_("unable to read tree %s"), oid_to_hex(&base_oid));
 		if (repo_get_oid_treeish(the_repository, branch1, &head_oid))
 			die(_("could not parse as tree '%s'"), branch1);
 		parent1_tree = parse_tree_indirect(&head_oid);
+		if (!parent1_tree)
+			die(_("unable to read tree %s"), oid_to_hex(&head_oid));
 		if (repo_get_oid_treeish(the_repository, branch2, &merge_oid))
 			die(_("could not parse as tree '%s'"), branch2);
 		parent2_tree = parse_tree_indirect(&merge_oid);
+		if (!parent2_tree)
+			die(_("unable to read tree %s"), oid_to_hex(&merge_oid));
 
 		opt.ancestor = merge_base;
 		merge_incore_nonrecursive(&opt, base_tree, parent1_tree, parent2_tree, &result);
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 8196ca9dd85..5923ed36893 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -263,7 +263,8 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	cache_tree_free(&the_index.cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
-		parse_tree(tree);
+		if (parse_tree(tree) < 0)
+			return 128;
 		init_tree_desc(t+i, tree->buffer, tree->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
diff --git a/builtin/reset.c b/builtin/reset.c
index 4b018d20e3b..f030f57f4e9 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -119,6 +119,10 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 
 	if (reset_type == MIXED || reset_type == HARD) {
 		tree = parse_tree_indirect(oid);
+		if (!tree) {
+			error(_("unable to read tree %s"), oid_to_hex(oid));
+			goto out;
+		}
 		prime_cache_tree(the_repository, the_repository->index, tree);
 	}
 
diff --git a/cache-tree.c b/cache-tree.c
index 641427ed410..c6508b64a5c 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -779,8 +779,8 @@ static void prime_cache_tree_rec(struct repository *r,
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
 
-			if (!subtree->object.parsed)
-				parse_tree(subtree);
+			if (!subtree->object.parsed && parse_tree(subtree) < 0)
+				exit(128);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
 
diff --git a/merge-ort.c b/merge-ort.c
index 79d9e18f63d..534ddaf16ba 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4983,6 +4983,9 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 
 	if (result->clean >= 0) {
 		result->tree = parse_tree_indirect(&working_tree_oid);
+		if (!result->tree)
+			die(_("unable to read tree %s"),
+			    oid_to_hex(&working_tree_oid));
 		/* existence of conflicted entries implies unclean */
 		result->clean &= strmap_empty(&opt->priv->conflicted);
 	}
diff --git a/merge-recursive.c b/merge-recursive.c
index e3beb0801b1..10d41bfd487 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -410,7 +410,8 @@ static inline int merge_detect_rename(struct merge_options *opt)
 
 static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree)
 {
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(desc, tree->buffer, tree->size);
 }
 
diff --git a/merge.c b/merge.c
index b60925459c2..14a7325859d 100644
--- a/merge.c
+++ b/merge.c
@@ -80,7 +80,10 @@ int checkout_fast_forward(struct repository *r,
 		return -1;
 	}
 	for (i = 0; i < nr_trees; i++) {
-		parse_tree(trees[i]);
+		if (parse_tree(trees[i]) < 0) {
+			rollback_lock_file(&lock_file);
+			return -1;
+		}
 		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
 	}
 
diff --git a/reset.c b/reset.c
index 48da0adf851..a93fdbc12e3 100644
--- a/reset.c
+++ b/reset.c
@@ -158,6 +158,11 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
 	}
 
 	tree = parse_tree_indirect(oid);
+	if (!tree) {
+		ret = error(_("unable to read tree %s"), oid_to_hex(oid));
+		goto leave_reset_head;
+	}
+
 	prime_cache_tree(r, r->index, tree);
 
 	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0) {
diff --git a/sequencer.c b/sequencer.c
index d584cac8ed9..407473bab28 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -715,6 +715,8 @@ static int do_recursive_merge(struct repository *r,
 	o.show_rename_progress = 1;
 
 	head_tree = parse_tree_indirect(head);
+	if (!head_tree)
+		return error(_("unable to read tree %s"), oid_to_hex(head));
 	next_tree = next ? repo_get_commit_tree(r, next) : empty_tree(r);
 	base_tree = base ? repo_get_commit_tree(r, base) : empty_tree(r);
 
@@ -3887,6 +3889,8 @@ static int do_reset(struct repository *r,
 	}
 
 	tree = parse_tree_indirect(&oid);
+	if (!tree)
+		return error(_("unable to read tree %s"), oid_to_hex(&oid));
 	prime_cache_tree(r, r->index, tree);
 
 	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 5/5] cache-tree: avoid an unnecessary check
  2024-02-07 16:47 ` [PATCH v2 0/5] " Johannes Schindelin via GitGitGadget
                     ` (3 preceding siblings ...)
  2024-02-07 16:47   ` [PATCH v2 4/5] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-07 16:47   ` Johannes Schindelin via GitGitGadget
  2024-02-07 16:58     ` Junio C Hamano
  2024-02-22 14:36   ` [PATCH v3 0/5] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-07 16:47 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The first thing the `parse_tree()` function does is to return early if
the tree has already been parsed. Therefore we do not need to guard the
`parse_tree()` call behind a check of that flag.

As of time of writing, there are no other instances of this in Git's
code bases: whenever the `parsed` flag guards a `parse_tree()` call, it
guards more than just that call.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 cache-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index c6508b64a5c..78d6ba92853 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -779,7 +779,7 @@ static void prime_cache_tree_rec(struct repository *r,
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
 
-			if (!subtree->object.parsed && parse_tree(subtree) < 0)
+			if (parse_tree(subtree) < 0)
 				exit(128);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 5/5] cache-tree: avoid an unnecessary check
  2024-02-07 16:47   ` [PATCH v2 5/5] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
@ 2024-02-07 16:58     ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-07 16:58 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The first thing the `parse_tree()` function does is to return early if
> the tree has already been parsed. Therefore we do not need to guard the
> `parse_tree()` call behind a check of that flag.

Makes sense, and it doubly makes sense to keep this separate from
the change done to the same location in [4/5].

Will queue.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] merge-tree: fail with a non-zero exit code on missing tree objects
  2024-02-07 16:47   ` [PATCH v2 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
@ 2024-02-07 17:02     ` Eric Sunshine
  2024-02-22 14:09       ` Johannes Schindelin
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sunshine @ 2024-02-07 17:02 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Johannes Schindelin

On Wed, Feb 7, 2024 at 11:48 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> When `git merge-tree` encounters a missing tree object, it should error
> out and not continue quietly as if nothing had happened.
>
> However, as of time of writing, `git merge-tree` _does_ continue, and
> then offers the empty tree as result.
>
> Let's fix this.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> @@ -951,4 +951,14 @@ test_expect_success '--merge-base with tree OIDs' '
> +test_expect_success 'error out on missing tree objects' '
> +       git init --bare missing-tree.git &&
> +       git rev-list side3 >list &&
> +       git rev-parse side3^: >list &&

Isn't the git-rev-parse invocation simply overwriting "list" rather
than appending to it? Did you mean:

    git rev-list side3 >list &&
    git rev-parse side3^: >>list &&

An alternative would be:

    {
        git rev-list side3 &&
        git rev-parse side3^:
    } >list &&

> +       git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing <list &&
> +       side3=$(git rev-parse side3) &&
> +       test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual &&
> +       test_must_be_empty actual
> +'

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] t4301: verify that merge-tree fails on missing blob objects
  2024-02-07 16:47   ` [PATCH v2 3/5] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
@ 2024-02-07 17:24     ` Eric Sunshine
  2024-02-07 21:24       ` Junio C Hamano
                         ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Eric Sunshine @ 2024-02-07 17:24 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Johannes Schindelin

On Wed, Feb 7, 2024 at 11:48 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> We just fixed a problem where `merge-tree` would not fail on missing
> tree objects. Let's ensure that that problem does not occur with blob
> objects (and won't, in the future, either).
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> @@ -961,4 +961,18 @@ test_expect_success 'error out on missing tree objects' '
> +test_expect_success 'error out on missing blob objects' '
> +       seq1=$(test_seq 1 10 | git hash-object -w --stdin) &&
> +       seq2=$(test_seq 1 11 | git hash-object -w --stdin) &&
> +       seq3=$(test_seq 0 10 | git hash-object -w --stdin) &&

Is there significance to the ranges passed to test_seq()? Or, can the
same be achieved by using arbitrary content for each blob?

    blob1=$(echo "one" | git hash-object -w --stdin) &&
    blob2=$(echo "two" | git hash-object -w --stdin) &&
    blob3=$(echo "three" | git hash-object -w --stdin) &&

> +       tree1=$(printf "100644 blob %s\tsequence" $seq1 | git mktree) &&
> +       tree2=$(printf "100644 blob %s\tsequence" $seq2 | git mktree) &&
> +       tree3=$(printf "100644 blob %s\tsequence" $seq3 | git mktree) &&

I found the lack of terminating "\n" in the `printf` confusing,
especially since the variable names (seq1, seq2, etc.) and the use of
`printf` seem to imply, at first glance, that each git-mktree
invocation is receiving multiple lines as input which, it turns out,
is not the case. Adding the missing "\n" would help:

   tree1=$(printf "100644 blob %s\tsequence\n" $seq1 | git mktree) &&
   tree2=$(printf "100644 blob %s\tsequence\n" $seq2 | git mktree) &&
   tree3=$(printf "100644 blob %s\tsequence\n" $seq3 | git mktree) &&

Interpolating the $seqN variable directly into the string rather than
using %s would make it even clearer that only a single line is being
generated as input to git-mktree:

   tree1=$(printf "100644 blob $seq1\tsequence\n" | git mktree) &&
   tree2=$(printf "100644 blob $seq2\tsequence\n" | git mktree) &&
   tree3=$(printf "100644 blob $seq3\tsequence\n" | git mktree) &&

Alternatively `echo` could be used, though it's not necessarily any nicer:

    tree1=$(echo "100644 blob $seq1Qsequence" | q_to_tab | git mktree) &&
    tree2=$(echo "100644 blob $seq2Qsequence" | q_to_tab | git mktree) &&
    tree3=$(echo "100644 blob $seq3Qsequence" | q_to_tab | git mktree) &&

> +       git init --bare missing-blob.git &&
> +       test_write_lines $seq1 $seq3 $tree1 $tree2 $tree3 |
> +       git pack-objects missing-blob.git/objects/pack/side1-whatever-is-missing &&
> +       test_must_fail git --git-dir=missing-blob.git merge-tree --merge-base=$tree1 $tree2 $tree3 >actual &&
> +       test_must_be_empty actual
> +'

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 4/5] Always check `parse_tree*()`'s return value
  2024-02-07 16:47   ` [PATCH v2 4/5] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-07 17:26     ` Eric Sunshine
  2024-02-22 14:08       ` Johannes Schindelin
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sunshine @ 2024-02-07 17:26 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Johannes Schindelin

On Wed, Feb 7, 2024 at 11:48 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Always check `parse_tree*()`'s return value

If you happen to reroll for some reason, perhaps: s/Always/always/

> Otherwise we may easily run into serious crashes: For example, if we run
> `init_tree_desc()` directly after a failed `parse_tree()`, we are
> accessing uninitialized data or trying to dereference `NULL`.
>
> Note that the `parse_tree()` function already takes care of showing an
> error message. The `parse_tree_indirectly()` and
> `repo_get_commit_tree()` functions do not, therefore those latter call
> sites need to show a useful error message while the former do not.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] t4301: verify that merge-tree fails on missing blob objects
  2024-02-07 17:24     ` Eric Sunshine
@ 2024-02-07 21:24       ` Junio C Hamano
  2024-02-08  0:52       ` Eric Sunshine
  2024-02-22 14:12       ` Johannes Schindelin
  2 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-07 21:24 UTC (permalink / raw
  To: Eric Sunshine
  Cc: Johannes Schindelin via GitGitGadget, git, Patrick Steinhardt,
	Johannes Schindelin

Eric Sunshine <sunshine@sunshineco.com> writes:

>> diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
>> @@ -961,4 +961,18 @@ test_expect_success 'error out on missing tree objects' '
>> +test_expect_success 'error out on missing blob objects' '
>> +       seq1=$(test_seq 1 10 | git hash-object -w --stdin) &&
>> +       seq2=$(test_seq 1 11 | git hash-object -w --stdin) &&
>> +       seq3=$(test_seq 0 10 | git hash-object -w --stdin) &&
>
> Is there significance to the ranges passed to test_seq()? Or, can the
> same be achieved by using arbitrary content for each blob?
>
>     blob1=$(echo "one" | git hash-object -w --stdin) &&
>     blob2=$(echo "two" | git hash-object -w --stdin) &&
>     blob3=$(echo "three" | git hash-object -w --stdin) &&
>
>> +       tree1=$(printf "100644 blob %s\tsequence" $seq1 | git mktree) &&
>> +       tree2=$(printf "100644 blob %s\tsequence" $seq2 | git mktree) &&
>> +       tree3=$(printf "100644 blob %s\tsequence" $seq3 | git mktree) &&
>
> I found the lack of terminating "\n" in the `printf` confusing,
> especially since the variable names (seq1, seq2, etc.) and the use of
> `printf` seem to imply, at first glance, that each git-mktree
> invocation is receiving multiple lines as input which, it turns out,
> is not the case. Adding the missing "\n" would help:
>
>    tree1=$(printf "100644 blob %s\tsequence\n" $seq1 | git mktree) &&
>    tree2=$(printf "100644 blob %s\tsequence\n" $seq2 | git mktree) &&
>    tree3=$(printf "100644 blob %s\tsequence\n" $seq3 | git mktree) &&
>
> Interpolating the $seqN variable directly into the string rather than
> using %s would make it even clearer that only a single line is being
> generated as input to git-mktree:
>
>    tree1=$(printf "100644 blob $seq1\tsequence\n" | git mktree) &&
>    tree2=$(printf "100644 blob $seq2\tsequence\n" | git mktree) &&
>    tree3=$(printf "100644 blob $seq3\tsequence\n" | git mktree) &&

All good points.  Thanks for excellent reviews (not just this one
but another one in the series).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] t4301: verify that merge-tree fails on missing blob objects
  2024-02-07 17:24     ` Eric Sunshine
  2024-02-07 21:24       ` Junio C Hamano
@ 2024-02-08  0:52       ` Eric Sunshine
  2024-02-22 14:12       ` Johannes Schindelin
  2 siblings, 0 replies; 48+ messages in thread
From: Eric Sunshine @ 2024-02-08  0:52 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Johannes Schindelin

On Wed, Feb 7, 2024 at 12:24 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> Interpolating the $seqN variable directly into the string rather than
> using %s would make it even clearer that only a single line is being
> generated as input to git-mktree:
>
>    tree1=$(printf "100644 blob $seq1\tsequence\n" | git mktree) &&
>    tree2=$(printf "100644 blob $seq2\tsequence\n" | git mktree) &&
>    tree3=$(printf "100644 blob $seq3\tsequence\n" | git mktree) &&
>
> Alternatively `echo` could be used, though it's not necessarily any nicer:
>
>     tree1=$(echo "100644 blob $seq1Qsequence" | q_to_tab | git mktree) &&
>     tree2=$(echo "100644 blob $seq2Qsequence" | q_to_tab | git mktree) &&
>     tree3=$(echo "100644 blob $seq3Qsequence" | q_to_tab | git mktree) &&

The `printf` example is probably cleaner, thus preferable. For
completeness, though, I should mention that the `echo` example is, of
course, not quite correct. For the interpolation to work as intended,
it would need ${...}:

    tree1=$(echo "100644 blob ${seq1}Qsequence" | q_to_tab | git mktree) &&
    tree2=$(echo "100644 blob ${seq2}Qsequence" | q_to_tab | git mktree) &&
    tree3=$(echo "100644 blob ${seq3}Qsequence" | q_to_tab | git mktree) &&

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 4/5] Always check `parse_tree*()`'s return value
  2024-02-07 17:26     ` Eric Sunshine
@ 2024-02-22 14:08       ` Johannes Schindelin
  2024-02-22 17:05         ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin @ 2024-02-22 14:08 UTC (permalink / raw
  To: Eric Sunshine
  Cc: Johannes Schindelin via GitGitGadget, git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 1101 bytes --]

Hi Eric,

On Wed, 7 Feb 2024, Eric Sunshine wrote:

> On Wed, Feb 7, 2024 at 11:48 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > Always check `parse_tree*()`'s return value
>
> If you happen to reroll for some reason, perhaps: s/Always/always/

Unless I am missing something we only ask the part of a oneline after an
initial "<something>:" to be downcased.

At least all the "Merge branch [...]" commits are still capitalized and
nobody complains ;-)

Ciao,
Johannes

>
> > Otherwise we may easily run into serious crashes: For example, if we run
> > `init_tree_desc()` directly after a failed `parse_tree()`, we are
> > accessing uninitialized data or trying to dereference `NULL`.
> >
> > Note that the `parse_tree()` function already takes care of showing an
> > error message. The `parse_tree_indirectly()` and
> > `repo_get_commit_tree()` functions do not, therefore those latter call
> > sites need to show a useful error message while the former do not.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/5] merge-tree: fail with a non-zero exit code on missing tree objects
  2024-02-07 17:02     ` Eric Sunshine
@ 2024-02-22 14:09       ` Johannes Schindelin
  0 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin @ 2024-02-22 14:09 UTC (permalink / raw
  To: Eric Sunshine
  Cc: Johannes Schindelin via GitGitGadget, git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

Hi Eric,

On Wed, 7 Feb 2024, Eric Sunshine wrote:

> On Wed, Feb 7, 2024 at 11:48 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > When `git merge-tree` encounters a missing tree object, it should error
> > out and not continue quietly as if nothing had happened.
> >
> > However, as of time of writing, `git merge-tree` _does_ continue, and
> > then offers the empty tree as result.
> >
> > Let's fix this.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> > @@ -951,4 +951,14 @@ test_expect_success '--merge-base with tree OIDs' '
> > +test_expect_success 'error out on missing tree objects' '
> > +       git init --bare missing-tree.git &&
> > +       git rev-list side3 >list &&
> > +       git rev-parse side3^: >list &&
>
> Isn't the git-rev-parse invocation simply overwriting "list" rather
> than appending to it? Did you mean:
>
>     git rev-list side3 >list &&
>     git rev-parse side3^: >>list &&

Yes. And the fact that this test case still passed was for the _wrong_
reason! I adjusted the test case by writing the correct contents to `list`
and by verifying that the expected error message is shown.

>
> An alternative would be:
>
>     {
>         git rev-list side3 &&
>         git rev-parse side3^:
>     } >list &&

I find that uglier and longer, so I'll stay with the first option ;-)

Thank you for your review!
Johannes

>
> > +       git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing <list &&
> > +       side3=$(git rev-parse side3) &&
> > +       test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual &&
> > +       test_must_be_empty actual
> > +'
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/5] t4301: verify that merge-tree fails on missing blob objects
  2024-02-07 17:24     ` Eric Sunshine
  2024-02-07 21:24       ` Junio C Hamano
  2024-02-08  0:52       ` Eric Sunshine
@ 2024-02-22 14:12       ` Johannes Schindelin
  2 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin @ 2024-02-22 14:12 UTC (permalink / raw
  To: Eric Sunshine
  Cc: Johannes Schindelin via GitGitGadget, git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 3458 bytes --]

Hi Eric,

On Wed, 7 Feb 2024, Eric Sunshine wrote:

> On Wed, Feb 7, 2024 at 11:48 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > We just fixed a problem where `merge-tree` would not fail on missing
> > tree objects. Let's ensure that that problem does not occur with blob
> > objects (and won't, in the future, either).
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> > @@ -961,4 +961,18 @@ test_expect_success 'error out on missing tree objects' '
> > +test_expect_success 'error out on missing blob objects' '
> > +       seq1=$(test_seq 1 10 | git hash-object -w --stdin) &&
> > +       seq2=$(test_seq 1 11 | git hash-object -w --stdin) &&
> > +       seq3=$(test_seq 0 10 | git hash-object -w --stdin) &&
>
> Is there significance to the ranges passed to test_seq()?

Originally, there was, since I wanted the merge to succeed without
conflicts.

However, I simplified the test case similar to what you suggested and
now specifically verify that the expected error message is shown (as
opposed to, say, an error message indicating a merge conflict).

> Or, can the same be achieved by using arbitrary content for each blob?
>
>     blob1=$(echo "one" | git hash-object -w --stdin) &&
>     blob2=$(echo "two" | git hash-object -w --stdin) &&
>     blob3=$(echo "three" | git hash-object -w --stdin) &&
>
> > +       tree1=$(printf "100644 blob %s\tsequence" $seq1 | git mktree) &&
> > +       tree2=$(printf "100644 blob %s\tsequence" $seq2 | git mktree) &&
> > +       tree3=$(printf "100644 blob %s\tsequence" $seq3 | git mktree) &&
>
> I found the lack of terminating "\n" in the `printf` confusing,
> especially since the variable names (seq1, seq2, etc.) and the use of
> `printf` seem to imply, at first glance, that each git-mktree
> invocation is receiving multiple lines as input which, it turns out,
> is not the case. Adding the missing "\n" would help:
>
>    tree1=$(printf "100644 blob %s\tsequence\n" $seq1 | git mktree) &&
>    tree2=$(printf "100644 blob %s\tsequence\n" $seq2 | git mktree) &&
>    tree3=$(printf "100644 blob %s\tsequence\n" $seq3 | git mktree) &&

I added the `\n` and now avoid the `%s`.

Thank you for your review!
Johannes

>
> Interpolating the $seqN variable directly into the string rather than
> using %s would make it even clearer that only a single line is being
> generated as input to git-mktree:
>
>    tree1=$(printf "100644 blob $seq1\tsequence\n" | git mktree) &&
>    tree2=$(printf "100644 blob $seq2\tsequence\n" | git mktree) &&
>    tree3=$(printf "100644 blob $seq3\tsequence\n" | git mktree) &&
>
> Alternatively `echo` could be used, though it's not necessarily any nicer:
>
>     tree1=$(echo "100644 blob $seq1Qsequence" | q_to_tab | git mktree) &&
>     tree2=$(echo "100644 blob $seq2Qsequence" | q_to_tab | git mktree) &&
>     tree3=$(echo "100644 blob $seq3Qsequence" | q_to_tab | git mktree) &&
>
> > +       git init --bare missing-blob.git &&
> > +       test_write_lines $seq1 $seq3 $tree1 $tree2 $tree3 |
> > +       git pack-objects missing-blob.git/objects/pack/side1-whatever-is-missing &&
> > +       test_must_fail git --git-dir=missing-blob.git merge-tree --merge-base=$tree1 $tree2 $tree3 >actual &&
> > +       test_must_be_empty actual
> > +'
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 0/5] merge-tree: handle missing objects correctly
  2024-02-07 16:47 ` [PATCH v2 0/5] " Johannes Schindelin via GitGitGadget
                     ` (4 preceding siblings ...)
  2024-02-07 16:47   ` [PATCH v2 5/5] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
@ 2024-02-22 14:36   ` Johannes Schindelin via GitGitGadget
  2024-02-22 14:36     ` [PATCH v3 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
                       ` (5 more replies)
  5 siblings, 6 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-22 14:36 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin

I recently looked into issues where git merge-tree calls returned bogus data
(in one instance returning an empty tree for non-empty merge parents). By
the time I had a look at the corresponding repository, the issue was no
longer reproducible, but a closer look at the code combined with some manual
experimenting turned up the fact that missing tree objects aren't handled as
errors by git merge-tree.

While at it, I added a commit on top that tries to catch all remaining
unchecked parse_tree() calls.

This patch series is based on 5f43cf5b2e4 (merge-tree: accept 3 trees as
arguments, 2024-01-28) (the original tip commit of js/merge-tree-3-trees)
because I introduced three unchecked parse_tree() calls in that topic.

Changes since v2:

 * Fixed the new "missing tree object" test case in t4301 that succeeded for
   the wrong reason.
 * Adjusted the new "missing blob object" test case to avoid succeeding for
   the wrong reason.
 * Simplified the "missing blob object" test case.

Changes since v1:

 * Simplified the test case, avoiding a subshell and a pipe in the process.
 * Added a patch to remove a superfluous subtree->object.parsed guard around
   a parse_tree(subtree) call.

Johannes Schindelin (5):
  merge-tree: fail with a non-zero exit code on missing tree objects
  merge-ort: do check `parse_tree()`'s return value
  t4301: verify that merge-tree fails on missing blob objects
  Always check `parse_tree*()`'s return value
  cache-tree: avoid an unnecessary check

 builtin/checkout.c               | 19 ++++++++++++++++---
 builtin/clone.c                  |  3 ++-
 builtin/commit.c                 |  3 ++-
 builtin/merge-tree.c             |  6 ++++++
 builtin/read-tree.c              |  3 ++-
 builtin/reset.c                  |  4 ++++
 cache-tree.c                     |  4 ++--
 merge-ort.c                      | 16 +++++++++++-----
 merge-recursive.c                |  3 ++-
 merge.c                          |  5 ++++-
 reset.c                          |  5 +++++
 sequencer.c                      |  4 ++++
 t/t4301-merge-tree-write-tree.sh | 27 +++++++++++++++++++++++++++
 13 files changed, 87 insertions(+), 15 deletions(-)


base-commit: 5f43cf5b2e4b68386d3774bce880b0f74d801635
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1651%2Fdscho%2Fmerge-tree-and-missing-objects-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1651/dscho/merge-tree-and-missing-objects-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1651

Range-diff vs v2:

 1:  01dfd66568c ! 1:  11b9cd8c5da merge-tree: fail with a non-zero exit code on missing tree objects
     @@ t/t4301-merge-tree-write-tree.sh: test_expect_success '--merge-base with tree OI
      +test_expect_success 'error out on missing tree objects' '
      +	git init --bare missing-tree.git &&
      +	git rev-list side3 >list &&
     -+	git rev-parse side3^: >list &&
     ++	git rev-parse side3^: >>list &&
      +	git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing <list &&
      +	side3=$(git rev-parse side3) &&
     -+	test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual &&
     ++	test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual 2>err &&
     ++	test_grep "Could not read $(git rev-parse $side3:)" err &&
      +	test_must_be_empty actual
      +'
      +
 2:  a1bbb7e06e5 = 2:  f01f4eb011b merge-ort: do check `parse_tree()`'s return value
 3:  be1dadf2850 ! 3:  e82fdf7fbcb t4301: verify that merge-tree fails on missing blob objects
     @@ t/t4301-merge-tree-write-tree.sh: test_expect_success 'error out on missing tree
       '
       
      +test_expect_success 'error out on missing blob objects' '
     -+	seq1=$(test_seq 1 10 | git hash-object -w --stdin) &&
     -+	seq2=$(test_seq 1 11 | git hash-object -w --stdin) &&
     -+	seq3=$(test_seq 0 10 | git hash-object -w --stdin) &&
     -+	tree1=$(printf "100644 blob %s\tsequence" $seq1 | git mktree) &&
     -+	tree2=$(printf "100644 blob %s\tsequence" $seq2 | git mktree) &&
     -+	tree3=$(printf "100644 blob %s\tsequence" $seq3 | git mktree) &&
     ++	echo 1 | git hash-object -w --stdin >blob1 &&
     ++	echo 2 | git hash-object -w --stdin >blob2 &&
     ++	echo 3 | git hash-object -w --stdin >blob3 &&
     ++	printf "100644 blob $(cat blob1)\tblob\n" | git mktree >tree1 &&
     ++	printf "100644 blob $(cat blob2)\tblob\n" | git mktree >tree2 &&
     ++	printf "100644 blob $(cat blob3)\tblob\n" | git mktree >tree3 &&
      +	git init --bare missing-blob.git &&
     -+	test_write_lines $seq1 $seq3 $tree1 $tree2 $tree3 |
     ++	cat blob1 blob3 tree1 tree2 tree3 |
      +	git pack-objects missing-blob.git/objects/pack/side1-whatever-is-missing &&
     -+	test_must_fail git --git-dir=missing-blob.git merge-tree --merge-base=$tree1 $tree2 $tree3 >actual &&
     ++	test_must_fail git --git-dir=missing-blob.git >actual 2>err \
     ++		merge-tree --merge-base=$(cat tree1) $(cat tree2) $(cat tree3) &&
     ++	test_grep "unable to read blob object $(cat blob2)" err &&
      +	test_must_be_empty actual
      +'
      +
 4:  ffd38ad602a = 4:  9e4dc94ef03 Always check `parse_tree*()`'s return value
 5:  43c04749513 = 5:  91dc4ccd04e cache-tree: avoid an unnecessary check

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 1/5] merge-tree: fail with a non-zero exit code on missing tree objects
  2024-02-22 14:36   ` [PATCH v3 0/5] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
@ 2024-02-22 14:36     ` Johannes Schindelin via GitGitGadget
  2024-02-22 17:13       ` Junio C Hamano
  2024-02-22 14:36     ` [PATCH v3 2/5] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-22 14:36 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When `git merge-tree` encounters a missing tree object, it should error
out and not continue quietly as if nothing had happened.

However, as of time of writing, `git merge-tree` _does_ continue, and
then offers the empty tree as result.

Let's fix this.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 merge-ort.c                      |  7 ++++---
 t/t4301-merge-tree-write-tree.sh | 11 +++++++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 6491070d965..c37fc035f13 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1659,9 +1659,10 @@ static int collect_merge_info(struct merge_options *opt,
 	info.data = opt;
 	info.show_all_errors = 1;
 
-	parse_tree(merge_base);
-	parse_tree(side1);
-	parse_tree(side2);
+	if (parse_tree(merge_base) < 0 ||
+	    parse_tree(side1) < 0 ||
+	    parse_tree(side2) < 0)
+		return -1;
 	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
 	init_tree_desc(t + 1, side1->buffer, side1->size);
 	init_tree_desc(t + 2, side2->buffer, side2->size);
diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 7d0fa74da74..908c9b540c8 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -951,4 +951,15 @@ test_expect_success '--merge-base with tree OIDs' '
 	test_cmp with-commits with-trees
 '
 
+test_expect_success 'error out on missing tree objects' '
+	git init --bare missing-tree.git &&
+	git rev-list side3 >list &&
+	git rev-parse side3^: >>list &&
+	git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing <list &&
+	side3=$(git rev-parse side3) &&
+	test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual 2>err &&
+	test_grep "Could not read $(git rev-parse $side3:)" err &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 2/5] merge-ort: do check `parse_tree()`'s return value
  2024-02-22 14:36   ` [PATCH v3 0/5] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  2024-02-22 14:36     ` [PATCH v3 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
@ 2024-02-22 14:36     ` Johannes Schindelin via GitGitGadget
  2024-02-22 17:18       ` Junio C Hamano
  2024-02-22 14:36     ` [PATCH v3 3/5] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-22 14:36 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The previous commit fixed a bug where a missing tree was reported, but
not treated as an error.

This patch addresses the same issue for the remaining two callers of
`parse_tree()`.

This change is not accompanied by a regression test because the code in
question is only reached at the `checkout` stage, i.e. after the merge
has happened (and therefore the tree objects could only be missing if
the disk had gone bad in that short time window, or something similarly
tricky to recreate in the test suite).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 merge-ort.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index c37fc035f13..79d9e18f63d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4379,9 +4379,11 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.verbose_update = (opt->verbosity > 2);
 	unpack_opts.fn = twoway_merge;
 	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
-	parse_tree(prev);
+	if (parse_tree(prev) < 0)
+		return -1;
 	init_tree_desc(&trees[0], prev->buffer, prev->size);
-	parse_tree(next);
+	if (parse_tree(next) < 0)
+		return -1;
 	init_tree_desc(&trees[1], next->buffer, next->size);
 
 	ret = unpack_trees(2, trees, &unpack_opts);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 3/5] t4301: verify that merge-tree fails on missing blob objects
  2024-02-22 14:36   ` [PATCH v3 0/5] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  2024-02-22 14:36     ` [PATCH v3 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
  2024-02-22 14:36     ` [PATCH v3 2/5] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-22 14:36     ` Johannes Schindelin via GitGitGadget
  2024-02-22 17:27       ` Junio C Hamano
  2024-02-22 14:36     ` [PATCH v3 4/5] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-22 14:36 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

We just fixed a problem where `merge-tree` would not fail on missing
tree objects. Let's ensure that that problem does not occur with blob
objects (and won't, in the future, either).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t4301-merge-tree-write-tree.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 908c9b540c8..d4463a45706 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -962,4 +962,20 @@ test_expect_success 'error out on missing tree objects' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'error out on missing blob objects' '
+	echo 1 | git hash-object -w --stdin >blob1 &&
+	echo 2 | git hash-object -w --stdin >blob2 &&
+	echo 3 | git hash-object -w --stdin >blob3 &&
+	printf "100644 blob $(cat blob1)\tblob\n" | git mktree >tree1 &&
+	printf "100644 blob $(cat blob2)\tblob\n" | git mktree >tree2 &&
+	printf "100644 blob $(cat blob3)\tblob\n" | git mktree >tree3 &&
+	git init --bare missing-blob.git &&
+	cat blob1 blob3 tree1 tree2 tree3 |
+	git pack-objects missing-blob.git/objects/pack/side1-whatever-is-missing &&
+	test_must_fail git --git-dir=missing-blob.git >actual 2>err \
+		merge-tree --merge-base=$(cat tree1) $(cat tree2) $(cat tree3) &&
+	test_grep "unable to read blob object $(cat blob2)" err &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 4/5] Always check `parse_tree*()`'s return value
  2024-02-22 14:36   ` [PATCH v3 0/5] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                       ` (2 preceding siblings ...)
  2024-02-22 14:36     ` [PATCH v3 3/5] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
@ 2024-02-22 14:36     ` Johannes Schindelin via GitGitGadget
  2024-02-22 17:58       ` Junio C Hamano
  2024-02-22 14:36     ` [PATCH v3 5/5] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
  2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-22 14:36 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Otherwise we may easily run into serious crashes: For example, if we run
`init_tree_desc()` directly after a failed `parse_tree()`, we are
accessing uninitialized data or trying to dereference `NULL`.

Note that the `parse_tree()` function already takes care of showing an
error message. The `parse_tree_indirectly()` and
`repo_get_commit_tree()` functions do not, therefore those latter call
sites need to show a useful error message while the former do not.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/checkout.c   | 19 ++++++++++++++++---
 builtin/clone.c      |  3 ++-
 builtin/commit.c     |  3 ++-
 builtin/merge-tree.c |  6 ++++++
 builtin/read-tree.c  |  3 ++-
 builtin/reset.c      |  4 ++++
 cache-tree.c         |  4 ++--
 merge-ort.c          |  3 +++
 merge-recursive.c    |  3 ++-
 merge.c              |  5 ++++-
 reset.c              |  5 +++++
 sequencer.c          |  4 ++++
 12 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index f02434bc155..84108ec3635 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -707,7 +707,8 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	init_checkout_metadata(&opts.meta, info->refname,
 			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		return 128;
 	init_tree_desc(&tree_desc, tree->buffer, tree->size);
 	switch (unpack_trees(1, &tree_desc, &opts)) {
 	case -2:
@@ -786,9 +787,15 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		if (new_branch_info->commit)
 			BUG("'switch --orphan' should never accept a commit as starting point");
 		new_tree = parse_tree_indirect(the_hash_algo->empty_tree);
-	} else
+		if (!new_tree)
+			BUG("unable to read empty tree");
+	} else {
 		new_tree = repo_get_commit_tree(the_repository,
 						new_branch_info->commit);
+		if (!new_tree)
+			return error(_("unable to read tree %s"),
+				     oid_to_hex(&new_branch_info->commit->object.oid));
+	}
 	if (opts->discard_changes) {
 		ret = reset_tree(new_tree, opts, 1, writeout_error, new_branch_info);
 		if (ret)
@@ -823,7 +830,8 @@ static int merge_working_tree(const struct checkout_opts *opts,
 				oid_to_hex(old_commit_oid));
 
 		init_tree_desc(&trees[0], tree->buffer, tree->size);
-		parse_tree(new_tree);
+		if (parse_tree(new_tree) < 0)
+			exit(128);
 		tree = new_tree;
 		init_tree_desc(&trees[1], tree->buffer, tree->size);
 
@@ -1239,10 +1247,15 @@ static void setup_new_branch_info_and_source_tree(
 	if (!new_branch_info->commit) {
 		/* not a commit */
 		*source_tree = parse_tree_indirect(rev);
+		if (!*source_tree)
+			die(_("unable to read tree %s"), oid_to_hex(rev));
 	} else {
 		parse_commit_or_die(new_branch_info->commit);
 		*source_tree = repo_get_commit_tree(the_repository,
 						    new_branch_info->commit);
+		if (!*source_tree)
+			die(_("unable to read tree %s"),
+			    oid_to_hex(&new_branch_info->commit->object.oid));
 	}
 }
 
diff --git a/builtin/clone.c b/builtin/clone.c
index c6357af9498..4410b55be98 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -736,7 +736,8 @@ static int checkout(int submodule_progress, int filter_submodules)
 	tree = parse_tree_indirect(&oid);
 	if (!tree)
 		die(_("unable to parse commit %s"), oid_to_hex(&oid));
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(&t, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts) < 0)
 		die(_("unable to checkout working tree"));
diff --git a/builtin/commit.c b/builtin/commit.c
index 781af2e206c..0723f06de7a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -339,7 +339,8 @@ static void create_base_index(const struct commit *current_head)
 	tree = parse_tree_indirect(&current_head->object.oid);
 	if (!tree)
 		die(_("failed to unpack HEAD tree object"));
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(&t, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts))
 		exit(128); /* We've already reported the error, finish dying */
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 2d4ce5b3886..ba84d00deee 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -447,12 +447,18 @@ static int real_merge(struct merge_tree_options *o,
 		if (repo_get_oid_treeish(the_repository, merge_base, &base_oid))
 			die(_("could not parse as tree '%s'"), merge_base);
 		base_tree = parse_tree_indirect(&base_oid);
+		if (!base_tree)
+			die(_("unable to read tree %s"), oid_to_hex(&base_oid));
 		if (repo_get_oid_treeish(the_repository, branch1, &head_oid))
 			die(_("could not parse as tree '%s'"), branch1);
 		parent1_tree = parse_tree_indirect(&head_oid);
+		if (!parent1_tree)
+			die(_("unable to read tree %s"), oid_to_hex(&head_oid));
 		if (repo_get_oid_treeish(the_repository, branch2, &merge_oid))
 			die(_("could not parse as tree '%s'"), branch2);
 		parent2_tree = parse_tree_indirect(&merge_oid);
+		if (!parent2_tree)
+			die(_("unable to read tree %s"), oid_to_hex(&merge_oid));
 
 		opt.ancestor = merge_base;
 		merge_incore_nonrecursive(&opt, base_tree, parent1_tree, parent2_tree, &result);
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 8196ca9dd85..5923ed36893 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -263,7 +263,8 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	cache_tree_free(&the_index.cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
-		parse_tree(tree);
+		if (parse_tree(tree) < 0)
+			return 128;
 		init_tree_desc(t+i, tree->buffer, tree->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
diff --git a/builtin/reset.c b/builtin/reset.c
index 4b018d20e3b..f030f57f4e9 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -119,6 +119,10 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 
 	if (reset_type == MIXED || reset_type == HARD) {
 		tree = parse_tree_indirect(oid);
+		if (!tree) {
+			error(_("unable to read tree %s"), oid_to_hex(oid));
+			goto out;
+		}
 		prime_cache_tree(the_repository, the_repository->index, tree);
 	}
 
diff --git a/cache-tree.c b/cache-tree.c
index 641427ed410..c6508b64a5c 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -779,8 +779,8 @@ static void prime_cache_tree_rec(struct repository *r,
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
 
-			if (!subtree->object.parsed)
-				parse_tree(subtree);
+			if (!subtree->object.parsed && parse_tree(subtree) < 0)
+				exit(128);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
 
diff --git a/merge-ort.c b/merge-ort.c
index 79d9e18f63d..534ddaf16ba 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4983,6 +4983,9 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 
 	if (result->clean >= 0) {
 		result->tree = parse_tree_indirect(&working_tree_oid);
+		if (!result->tree)
+			die(_("unable to read tree %s"),
+			    oid_to_hex(&working_tree_oid));
 		/* existence of conflicted entries implies unclean */
 		result->clean &= strmap_empty(&opt->priv->conflicted);
 	}
diff --git a/merge-recursive.c b/merge-recursive.c
index e3beb0801b1..10d41bfd487 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -410,7 +410,8 @@ static inline int merge_detect_rename(struct merge_options *opt)
 
 static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree)
 {
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(desc, tree->buffer, tree->size);
 }
 
diff --git a/merge.c b/merge.c
index b60925459c2..14a7325859d 100644
--- a/merge.c
+++ b/merge.c
@@ -80,7 +80,10 @@ int checkout_fast_forward(struct repository *r,
 		return -1;
 	}
 	for (i = 0; i < nr_trees; i++) {
-		parse_tree(trees[i]);
+		if (parse_tree(trees[i]) < 0) {
+			rollback_lock_file(&lock_file);
+			return -1;
+		}
 		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
 	}
 
diff --git a/reset.c b/reset.c
index 48da0adf851..a93fdbc12e3 100644
--- a/reset.c
+++ b/reset.c
@@ -158,6 +158,11 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
 	}
 
 	tree = parse_tree_indirect(oid);
+	if (!tree) {
+		ret = error(_("unable to read tree %s"), oid_to_hex(oid));
+		goto leave_reset_head;
+	}
+
 	prime_cache_tree(r, r->index, tree);
 
 	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0) {
diff --git a/sequencer.c b/sequencer.c
index d584cac8ed9..407473bab28 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -715,6 +715,8 @@ static int do_recursive_merge(struct repository *r,
 	o.show_rename_progress = 1;
 
 	head_tree = parse_tree_indirect(head);
+	if (!head_tree)
+		return error(_("unable to read tree %s"), oid_to_hex(head));
 	next_tree = next ? repo_get_commit_tree(r, next) : empty_tree(r);
 	base_tree = base ? repo_get_commit_tree(r, base) : empty_tree(r);
 
@@ -3887,6 +3889,8 @@ static int do_reset(struct repository *r,
 	}
 
 	tree = parse_tree_indirect(&oid);
+	if (!tree)
+		return error(_("unable to read tree %s"), oid_to_hex(&oid));
 	prime_cache_tree(r, r->index, tree);
 
 	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 5/5] cache-tree: avoid an unnecessary check
  2024-02-22 14:36   ` [PATCH v3 0/5] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                       ` (3 preceding siblings ...)
  2024-02-22 14:36     ` [PATCH v3 4/5] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-22 14:36     ` Johannes Schindelin via GitGitGadget
  2024-02-22 18:00       ` Junio C Hamano
  2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-22 14:36 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The first thing the `parse_tree()` function does is to return early if
the tree has already been parsed. Therefore we do not need to guard the
`parse_tree()` call behind a check of that flag.

As of time of writing, there are no other instances of this in Git's
code bases: whenever the `parsed` flag guards a `parse_tree()` call, it
guards more than just that call.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 cache-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index c6508b64a5c..78d6ba92853 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -779,7 +779,7 @@ static void prime_cache_tree_rec(struct repository *r,
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
 
-			if (!subtree->object.parsed && parse_tree(subtree) < 0)
+			if (parse_tree(subtree) < 0)
 				exit(128);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 4/5] Always check `parse_tree*()`'s return value
  2024-02-22 14:08       ` Johannes Schindelin
@ 2024-02-22 17:05         ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-22 17:05 UTC (permalink / raw
  To: Johannes Schindelin
  Cc: Eric Sunshine, Johannes Schindelin via GitGitGadget, git,
	Patrick Steinhardt

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> If you happen to reroll for some reason, perhaps: s/Always/always/
>
> Unless I am missing something we only ask the part of a oneline after an
> initial "<something>:" to be downcased.
>
> At least all the "Merge branch [...]" commits are still capitalized and
> nobody complains ;-)

Of course.  Merge commits and single-parent commits are very
different.

I suspect that the lack of "<area>:" was what triggered the comment,
but this being more about the callers all over the place in the
code, it may be hard to say what "area" this belongs to.  If I were
pressed to choose:

    Subject: parse_tree*(): make callers always check return value

would probably be what I would choose, but I usually opt for going
the lazy way of not labelling the area at all ;-).


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/5] merge-tree: fail with a non-zero exit code on missing tree objects
  2024-02-22 14:36     ` [PATCH v3 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
@ 2024-02-22 17:13       ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-22 17:13 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Eric Sunshine, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> -	parse_tree(merge_base);
> -	parse_tree(side1);
> -	parse_tree(side2);
> +	if (parse_tree(merge_base) < 0 ||
> +	    parse_tree(side1) < 0 ||
> +	    parse_tree(side2) < 0)
> +		return -1;

Obviously good.

> diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> index 7d0fa74da74..908c9b540c8 100755
> --- a/t/t4301-merge-tree-write-tree.sh
> +++ b/t/t4301-merge-tree-write-tree.sh
> @@ -951,4 +951,15 @@ test_expect_success '--merge-base with tree OIDs' '
>  	test_cmp with-commits with-trees
>  '
>  
> +test_expect_success 'error out on missing tree objects' '
> +	git init --bare missing-tree.git &&
> +	git rev-list side3 >list &&
> +	git rev-parse side3^: >>list &&
> +	git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing <list &&
> +	side3=$(git rev-parse side3) &&
> +	test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual 2>err &&
> +	test_grep "Could not read $(git rev-parse $side3:)" err &&
> +	test_must_be_empty actual
> +'

I very much like the way this test emulates an operation in a
repository that lack certaion objects so cleanly, and wish we
had used this pattern instead of poking at loose object files
in hundreds of existing tests (#leftoverbits obviously).

It also justifies why a silent "return -1" in the patch is
sufficient ;-)

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/5] merge-ort: do check `parse_tree()`'s return value
  2024-02-22 14:36     ` [PATCH v3 2/5] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-22 17:18       ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-22 17:18 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Eric Sunshine, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> This change is not accompanied by a regression test because the code in
> question is only reached at the `checkout` stage, i.e. after the merge
> has happened (and therefore the tree objects could only be missing if
> the disk had gone bad in that short time window, or something similarly
> tricky to recreate in the test suite).

Makes sense.

A complete tangent I wonder is if unit-test-minded folks have clever
ideas to allow better test coverage, perhaps injecting a failure on
demand to any codepath (in this case, the codepath to write the
resulting tree) to simulate a situation where we fail to parse the
tree.

In any case, the patch looks good, of course, and I see no need for
further comments.

Thanks.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  merge-ort.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index c37fc035f13..79d9e18f63d 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -4379,9 +4379,11 @@ static int checkout(struct merge_options *opt,
>  	unpack_opts.verbose_update = (opt->verbosity > 2);
>  	unpack_opts.fn = twoway_merge;
>  	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
> -	parse_tree(prev);
> +	if (parse_tree(prev) < 0)
> +		return -1;
>  	init_tree_desc(&trees[0], prev->buffer, prev->size);
> -	parse_tree(next);
> +	if (parse_tree(next) < 0)
> +		return -1;
>  	init_tree_desc(&trees[1], next->buffer, next->size);
>  
>  	ret = unpack_trees(2, trees, &unpack_opts);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 3/5] t4301: verify that merge-tree fails on missing blob objects
  2024-02-22 14:36     ` [PATCH v3 3/5] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
@ 2024-02-22 17:27       ` Junio C Hamano
  2024-02-22 18:23         ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2024-02-22 17:27 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Eric Sunshine, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> We just fixed a problem where `merge-tree` would not fail on missing
> tree objects. Let's ensure that that problem does not occur with blob
> objects (and won't, in the future, either).
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  t/t4301-merge-tree-write-tree.sh | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> index 908c9b540c8..d4463a45706 100755
> --- a/t/t4301-merge-tree-write-tree.sh
> +++ b/t/t4301-merge-tree-write-tree.sh
> @@ -962,4 +962,20 @@ test_expect_success 'error out on missing tree objects' '
>  	test_must_be_empty actual
>  '
>  
> +test_expect_success 'error out on missing blob objects' '
> +	echo 1 | git hash-object -w --stdin >blob1 &&
> +	echo 2 | git hash-object -w --stdin >blob2 &&
> +	echo 3 | git hash-object -w --stdin >blob3 &&
> +	printf "100644 blob $(cat blob1)\tblob\n" | git mktree >tree1 &&
> +	printf "100644 blob $(cat blob2)\tblob\n" | git mktree >tree2 &&
> +	printf "100644 blob $(cat blob3)\tblob\n" | git mktree >tree3 &&
> +	git init --bare missing-blob.git &&
> +	cat blob1 blob3 tree1 tree2 tree3 |
> +	git pack-objects missing-blob.git/objects/pack/side1-whatever-is-missing &&
> +	test_must_fail git --git-dir=missing-blob.git >actual 2>err \
> +		merge-tree --merge-base=$(cat tree1) $(cat tree2) $(cat tree3) &&
> +	test_grep "unable to read blob object $(cat blob2)" err &&
> +	test_must_be_empty actual
> +'

It would have been even easier to see that blob2 is what we expect
to be complained about, if you listed all objects and filtered blob2
out, but the number of objects involved here is so small, a "cat" of
all objects we want to keep is OK here.

Again, I very much love the way the test repository/object store
that lack certain objects are constructed without making our hands
dirty.

I see no need for further comments.  Looking very good.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/5] Always check `parse_tree*()`'s return value
  2024-02-22 14:36     ` [PATCH v3 4/5] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-22 17:58       ` Junio C Hamano
  2024-02-23  8:33         ` Johannes Schindelin
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2024-02-22 17:58 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Eric Sunshine, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> @@ -707,7 +707,8 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
>  	init_checkout_metadata(&opts.meta, info->refname,
>  			       info->commit ? &info->commit->object.oid : null_oid(),
>  			       NULL);
> -	parse_tree(tree);
> +	if (parse_tree(tree) < 0)
> +		return 128;

The other error returned from this function is when unpack_trees()
fails well before the writeout phase and the value we return is also
128, so the caller is prepared to act on it.  OK.

> @@ -786,9 +787,15 @@ static int merge_working_tree(const struct checkout_opts *opts,
>  		if (new_branch_info->commit)
>  			BUG("'switch --orphan' should never accept a commit as starting point");
>  		new_tree = parse_tree_indirect(the_hash_algo->empty_tree);
> -	} else
> +		if (!new_tree)
> +			BUG("unable to read empty tree");

parse_tree() of the_hash_algo->empty_tree should result in a tree
object without having to even consult the object store, so BUG(),
not die(), is very much appropriate here.  OK.

> +	} else {
>  		new_tree = repo_get_commit_tree(the_repository,
>  						new_branch_info->commit);
> +		if (!new_tree)
> +			return error(_("unable to read tree %s"),
> +				     oid_to_hex(&new_branch_info->commit->object.oid));

We can help translators by enclosing %s inside a pair of parentheses.

    $ git grep -h 'msgid .*unable to read tree' po | sort | uniq -c
     18 msgid "unable to read tree (%s)"

FYI, the message with "(%s)" is shared by four places; there is one
instance of the message without parentheses added very recently that
forced .po files to have both entries.  We probably should unify them
to use the one with more existing users.

> @@ -823,7 +830,8 @@ static int merge_working_tree(const struct checkout_opts *opts,
>  				oid_to_hex(old_commit_oid));
>  
>  		init_tree_desc(&trees[0], tree->buffer, tree->size);
> -		parse_tree(new_tree);
> +		if (parse_tree(new_tree) < 0)
> +			exit(128);

There is another exit() in the same else clause this code is in, and
upon failing to unpack_trees(), that call exits with 128.  This
parse_tree() is about preparing the input for that call, so it makes
sense to exit with the same code.  Excellent.

> diff --git a/builtin/clone.c b/builtin/clone.c
> index c6357af9498..4410b55be98 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -736,7 +736,8 @@ static int checkout(int submodule_progress, int filter_submodules)
>  	tree = parse_tree_indirect(&oid);
>  	if (!tree)
>  		die(_("unable to parse commit %s"), oid_to_hex(&oid));
> -	parse_tree(tree);
> +	if (parse_tree(tree) < 0)
> +		exit(128);
>  	init_tree_desc(&t, tree->buffer, tree->size);
>  	if (unpack_trees(1, &t, &opts) < 0)
>  		die(_("unable to checkout working tree"));

Exactly the same reasoning applies, as die() exits with 128.

We may want to "#define EXIT_DIE 128" and use it in appropriate
places to make such a reasoning/review easier (possibly an entry for
#leftoverbits)?

> diff --git a/builtin/commit.c b/builtin/commit.c
> index 781af2e206c..0723f06de7a 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -339,7 +339,8 @@ static void create_base_index(const struct commit *current_head)
>  	tree = parse_tree_indirect(&current_head->object.oid);
>  	if (!tree)
>  		die(_("failed to unpack HEAD tree object"));
> -	parse_tree(tree);
> +	if (parse_tree(tree) < 0)
> +		exit(128);
>  	init_tree_desc(&t, tree->buffer, tree->size);
>  	if (unpack_trees(1, &t, &opts))
>  		exit(128); /* We've already reported the error, finish dying */

Ditto.

> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> index 8196ca9dd85..5923ed36893 100644
> --- a/builtin/read-tree.c
> +++ b/builtin/read-tree.c
> @@ -263,7 +263,8 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>  	cache_tree_free(&the_index.cache_tree);
>  	for (i = 0; i < nr_trees; i++) {
>  		struct tree *tree = trees[i];
> -		parse_tree(tree);
> +		if (parse_tree(tree) < 0)
> +			return 128;
>  		init_tree_desc(t+i, tree->buffer, tree->size);
>  	}
>  	if (unpack_trees(nr_trees, t, &opts))

Ditto.  After the post-context we also return 128.

> diff --git a/builtin/reset.c b/builtin/reset.c
> index 4b018d20e3b..f030f57f4e9 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -119,6 +119,10 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
>  
>  	if (reset_type == MIXED || reset_type == HARD) {
>  		tree = parse_tree_indirect(oid);
> +		if (!tree) {
> +			error(_("unable to read tree %s"), oid_to_hex(oid));
> +			goto out;
> +		}
>  		prime_cache_tree(the_repository, the_repository->index, tree);
>  	}

Good.

> diff --git a/cache-tree.c b/cache-tree.c
> index 641427ed410..c6508b64a5c 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -779,8 +779,8 @@ static void prime_cache_tree_rec(struct repository *r,
>  			struct cache_tree_sub *sub;
>  			struct tree *subtree = lookup_tree(r, &entry.oid);
>  
> -			if (!subtree->object.parsed)
> -				parse_tree(subtree);
> +			if (!subtree->object.parsed && parse_tree(subtree) < 0)
> +				exit(128);
>  			sub = cache_tree_sub(it, entry.path);
>  			sub->cache_tree = cache_tree();

The cache_tree() used to be just an optimization mechanism, but
there is no other way than fully populating it to write a tree
object out of the index, so dying here is the only sensible thing to
do upon unparseable subtree.  Otherwise we would end up silently
writing a bogus result.  Good.

> diff --git a/merge-recursive.c b/merge-recursive.c
> index e3beb0801b1..10d41bfd487 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -410,7 +410,8 @@ static inline int merge_detect_rename(struct merge_options *opt)
>  
>  static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree)
>  {
> -	parse_tree(tree);
> +	if (parse_tree(tree) < 0)
> +		exit(128);
>  	init_tree_desc(desc, tree->buffer, tree->size);
>  }

OK.

> diff --git a/merge.c b/merge.c
> index b60925459c2..14a7325859d 100644
> --- a/merge.c
> +++ b/merge.c
> @@ -80,7 +80,10 @@ int checkout_fast_forward(struct repository *r,
>  		return -1;
>  	}
>  	for (i = 0; i < nr_trees; i++) {
> -		parse_tree(trees[i]);
> +		if (parse_tree(trees[i]) < 0) {
> +			rollback_lock_file(&lock_file);
> +			return -1;
> +		}
>  		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
>  	}

This handles the error in the same way as the other case earlier
where any of the tree-ish failed to load.  OK.

> diff --git a/reset.c b/reset.c
> index 48da0adf851..a93fdbc12e3 100644
> --- a/reset.c
> +++ b/reset.c
> @@ -158,6 +158,11 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
>  	}
>  
>  	tree = parse_tree_indirect(oid);
> +	if (!tree) {
> +		ret = error(_("unable to read tree %s"), oid_to_hex(oid));
> +		goto leave_reset_head;
> +	}

OK, but the _("unable to read tree (%s)") comment applies here, too.

> diff --git a/sequencer.c b/sequencer.c
> index d584cac8ed9..407473bab28 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -715,6 +715,8 @@ static int do_recursive_merge(struct repository *r,
>  	o.show_rename_progress = 1;
>  
>  	head_tree = parse_tree_indirect(head);
> +	if (!head_tree)
> +		return error(_("unable to read tree %s"), oid_to_hex(head));
>  	next_tree = next ? repo_get_commit_tree(r, next) : empty_tree(r);
>  	base_tree = base ? repo_get_commit_tree(r, base) : empty_tree(r);

Ditto.

> @@ -3887,6 +3889,8 @@ static int do_reset(struct repository *r,
>  	}
>  
>  	tree = parse_tree_indirect(&oid);
> +	if (!tree)
> +		return error(_("unable to read tree %s"), oid_to_hex(&oid));

Ditto.

>  	prime_cache_tree(r, r->index, tree);
>  
>  	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0)

Looking good, modulo the additional translator burden.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/5] cache-tree: avoid an unnecessary check
  2024-02-22 14:36     ` [PATCH v3 5/5] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
@ 2024-02-22 18:00       ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-22 18:00 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Eric Sunshine, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The first thing the `parse_tree()` function does is to return early if
> the tree has already been parsed. Therefore we do not need to guard the
> `parse_tree()` call behind a check of that flag.
>
> As of time of writing, there are no other instances of this in Git's
> code bases: whenever the `parsed` flag guards a `parse_tree()` call, it
> guards more than just that call.
>
> Suggested-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  cache-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index c6508b64a5c..78d6ba92853 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -779,7 +779,7 @@ static void prime_cache_tree_rec(struct repository *r,
>  			struct cache_tree_sub *sub;
>  			struct tree *subtree = lookup_tree(r, &entry.oid);
>  
> -			if (!subtree->object.parsed && parse_tree(subtree) < 0)
> +			if (parse_tree(subtree) < 0)
>  				exit(128);
>  			sub = cache_tree_sub(it, entry.path);
>  			sub->cache_tree = cache_tree();

Obviously makes sense.
I see no need for further comments.  Will queue.

Thanks.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 3/5] t4301: verify that merge-tree fails on missing blob objects
  2024-02-22 17:27       ` Junio C Hamano
@ 2024-02-22 18:23         ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-22 18:23 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Eric Sunshine, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

>> +test_expect_success 'error out on missing blob objects' '
>> +	echo 1 | git hash-object -w --stdin >blob1 &&
>> +	echo 2 | git hash-object -w --stdin >blob2 &&
>> +	echo 3 | git hash-object -w --stdin >blob3 &&
>> +	printf "100644 blob $(cat blob1)\tblob\n" | git mktree >tree1 &&
>> +	printf "100644 blob $(cat blob2)\tblob\n" | git mktree >tree2 &&
>> +	printf "100644 blob $(cat blob3)\tblob\n" | git mktree >tree3 &&
>> +	git init --bare missing-blob.git &&
>> +	cat blob1 blob3 tree1 tree2 tree3 |
>> +	git pack-objects missing-blob.git/objects/pack/side1-whatever-is-missing &&
>> +	test_must_fail git --git-dir=missing-blob.git >actual 2>err \
>> +		merge-tree --merge-base=$(cat tree1) $(cat tree2) $(cat tree3) &&
>> +	test_grep "unable to read blob object $(cat blob2)" err &&
>> +	test_must_be_empty actual
>> +'
>
> It would have been even easier to see that blob2 is what we expect
> to be complained about, if you listed all objects and filtered blob2
> out, but the number of objects involved here is so small, a "cat" of
> all objects we want to keep is OK here.

Just FYI for anybody reading from the sideline.

	git cat-file --unordered \
		--batch-check="%(objectname)" --batch-all-objects |
	grep -v $(cat blob2) |
	git pack-objects ...

would be a compact way to say "Give me a pack that lets me easily
simulate what happens in a repository identical to the current one,
if it lacked object "blob2", without having to enumerate what object
we want to include in the test repository.

Again, I think in this particular case, it is easy enough to see in
the "blob1 blob3 tree1 tree2 tree3" enumeration what is missing, so
the way the patch is written is fine.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/5] Always check `parse_tree*()`'s return value
  2024-02-22 17:58       ` Junio C Hamano
@ 2024-02-23  8:33         ` Johannes Schindelin
  2024-02-23 17:17           ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Johannes Schindelin @ 2024-02-23  8:33 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Patrick Steinhardt,
	Eric Sunshine

[-- Attachment #1: Type: text/plain, Size: 1679 bytes --]

Hi Junio,

thank you so much for reviewing!

On Thu, 22 Feb 2024, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > +	} else {
> >  		new_tree = repo_get_commit_tree(the_repository,
> >  						new_branch_info->commit);
> > +		if (!new_tree)
> > +			return error(_("unable to read tree %s"),
> > +				     oid_to_hex(&new_branch_info->commit->object.oid));
>
> We can help translators by enclosing %s inside a pair of parentheses.
>
>     $ git grep -h 'msgid .*unable to read tree' po | sort | uniq -c
>      18 msgid "unable to read tree (%s)"

Right. I had used

  $ git grep 'unable to read tree .*%s' | sed -n 's/.*_("\([^"]*\).*/\1/p' | sort | uniq -c
       11 unable to read tree %s
        3 unable to read tree (%s)

only to realize that the 11 were the ones I added.🤦 Re-running the same
command on v2.43.0 reports only the 3 parenthesized ones.

I've fixed those error messages in v4, and also added a patch to adjust
the one error message that I imitated (and to mark it for translation).
This patch might be slightly controversial because, in what is probably
only a hypothetical scenario, users might have scripted around `git
bisect` to parse error messages _and_ special-cased the "unable to read
tree" message to be able to deal with the missing tree in a programmatical
manner. I might be misjudging the likelihood for something like that, but
I vividly remember the, ahem, "exciting times" I had after 7560f547e61
(treewide: correct several "up-to-date" to "up to date", 2017-08-23). If
that patch is too concerning, I am open to dropping it.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 0/6] merge-tree: handle missing objects correctly
  2024-02-22 14:36   ` [PATCH v3 0/5] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                       ` (4 preceding siblings ...)
  2024-02-22 14:36     ` [PATCH v3 5/5] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
@ 2024-02-23  8:34     ` Johannes Schindelin via GitGitGadget
  2024-02-23  8:34       ` [PATCH v4 1/6] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
                         ` (6 more replies)
  5 siblings, 7 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-23  8:34 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin

I recently looked into issues where git merge-tree calls returned bogus data
(in one instance returning an empty tree for non-empty merge parents). By
the time I had a look at the corresponding repository, the issue was no
longer reproducible, but a closer look at the code combined with some manual
experimenting turned up the fact that missing tree objects aren't handled as
errors by git merge-tree.

While at it, I added a commit on top that tries to catch all remaining
unchecked parse_tree() calls.

This patch series is based on 5f43cf5b2e4 (merge-tree: accept 3 trees as
arguments, 2024-01-28) (the original tip commit of js/merge-tree-3-trees)
because I introduced three unchecked parse_tree() calls in that topic.

Changes since v3:

 * Aligned the translated error messages with pre-existing ones (sorry, I
   forgot to make that change in v2!).
 * Added a new commit at the end to mark the one error message for
   translation which I had imitated, after making it consistent with the
   remaining "unable to read tree" error messages.

Changes since v2:

 * Fixed the new "missing tree object" test case in t4301 that succeeded for
   the wrong reason.
 * Adjusted the new "missing blob object" test case to avoid succeeding for
   the wrong reason.
 * Simplified the "missing blob object" test case.

Changes since v1:

 * Simplified the test case, avoiding a subshell and a pipe in the process.
 * Added a patch to remove a superfluous subtree->object.parsed guard around
   a parse_tree(subtree) call.

Johannes Schindelin (6):
  merge-tree: fail with a non-zero exit code on missing tree objects
  merge-ort: do check `parse_tree()`'s return value
  t4301: verify that merge-tree fails on missing blob objects
  Always check `parse_tree*()`'s return value
  cache-tree: avoid an unnecessary check
  fill_tree_descriptor(): mark error message for translation

 builtin/checkout.c               | 19 ++++++++++++++++---
 builtin/clone.c                  |  3 ++-
 builtin/commit.c                 |  3 ++-
 builtin/merge-tree.c             |  6 ++++++
 builtin/read-tree.c              |  3 ++-
 builtin/reset.c                  |  4 ++++
 cache-tree.c                     |  4 ++--
 merge-ort.c                      | 16 +++++++++++-----
 merge-recursive.c                |  3 ++-
 merge.c                          |  5 ++++-
 reset.c                          |  5 +++++
 sequencer.c                      |  4 ++++
 t/t4301-merge-tree-write-tree.sh | 27 +++++++++++++++++++++++++++
 t/t6030-bisect-porcelain.sh      |  2 +-
 tree-walk.c                      |  2 +-
 15 files changed, 89 insertions(+), 17 deletions(-)


base-commit: 5f43cf5b2e4b68386d3774bce880b0f74d801635
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1651%2Fdscho%2Fmerge-tree-and-missing-objects-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1651/dscho/merge-tree-and-missing-objects-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1651

Range-diff vs v3:

 1:  11b9cd8c5da = 1:  11b9cd8c5da merge-tree: fail with a non-zero exit code on missing tree objects
 2:  f01f4eb011b = 2:  f01f4eb011b merge-ort: do check `parse_tree()`'s return value
 3:  e82fdf7fbcb = 3:  e82fdf7fbcb t4301: verify that merge-tree fails on missing blob objects
 4:  9e4dc94ef03 ! 4:  5942c27f439 Always check `parse_tree*()`'s return value
     @@ builtin/checkout.c: static int merge_working_tree(const struct checkout_opts *op
       		new_tree = repo_get_commit_tree(the_repository,
       						new_branch_info->commit);
      +		if (!new_tree)
     -+			return error(_("unable to read tree %s"),
     ++			return error(_("unable to read tree (%s)"),
      +				     oid_to_hex(&new_branch_info->commit->object.oid));
      +	}
       	if (opts->discard_changes) {
     @@ builtin/checkout.c: static void setup_new_branch_info_and_source_tree(
       		/* not a commit */
       		*source_tree = parse_tree_indirect(rev);
      +		if (!*source_tree)
     -+			die(_("unable to read tree %s"), oid_to_hex(rev));
     ++			die(_("unable to read tree (%s)"), oid_to_hex(rev));
       	} else {
       		parse_commit_or_die(new_branch_info->commit);
       		*source_tree = repo_get_commit_tree(the_repository,
       						    new_branch_info->commit);
      +		if (!*source_tree)
     -+			die(_("unable to read tree %s"),
     ++			die(_("unable to read tree (%s)"),
      +			    oid_to_hex(&new_branch_info->commit->object.oid));
       	}
       }
     @@ builtin/merge-tree.c: static int real_merge(struct merge_tree_options *o,
       			die(_("could not parse as tree '%s'"), merge_base);
       		base_tree = parse_tree_indirect(&base_oid);
      +		if (!base_tree)
     -+			die(_("unable to read tree %s"), oid_to_hex(&base_oid));
     ++			die(_("unable to read tree (%s)"), oid_to_hex(&base_oid));
       		if (repo_get_oid_treeish(the_repository, branch1, &head_oid))
       			die(_("could not parse as tree '%s'"), branch1);
       		parent1_tree = parse_tree_indirect(&head_oid);
      +		if (!parent1_tree)
     -+			die(_("unable to read tree %s"), oid_to_hex(&head_oid));
     ++			die(_("unable to read tree (%s)"), oid_to_hex(&head_oid));
       		if (repo_get_oid_treeish(the_repository, branch2, &merge_oid))
       			die(_("could not parse as tree '%s'"), branch2);
       		parent2_tree = parse_tree_indirect(&merge_oid);
      +		if (!parent2_tree)
     -+			die(_("unable to read tree %s"), oid_to_hex(&merge_oid));
     ++			die(_("unable to read tree (%s)"), oid_to_hex(&merge_oid));
       
       		opt.ancestor = merge_base;
       		merge_incore_nonrecursive(&opt, base_tree, parent1_tree, parent2_tree, &result);
     @@ builtin/reset.c: static int reset_index(const char *ref, const struct object_id
       	if (reset_type == MIXED || reset_type == HARD) {
       		tree = parse_tree_indirect(oid);
      +		if (!tree) {
     -+			error(_("unable to read tree %s"), oid_to_hex(oid));
     ++			error(_("unable to read tree (%s)"), oid_to_hex(oid));
      +			goto out;
      +		}
       		prime_cache_tree(the_repository, the_repository->index, tree);
     @@ merge-ort.c: static void merge_ort_nonrecursive_internal(struct merge_options *o
       	if (result->clean >= 0) {
       		result->tree = parse_tree_indirect(&working_tree_oid);
      +		if (!result->tree)
     -+			die(_("unable to read tree %s"),
     ++			die(_("unable to read tree (%s)"),
      +			    oid_to_hex(&working_tree_oid));
       		/* existence of conflicted entries implies unclean */
       		result->clean &= strmap_empty(&opt->priv->conflicted);
     @@ reset.c: int reset_head(struct repository *r, const struct reset_head_opts *opts
       
       	tree = parse_tree_indirect(oid);
      +	if (!tree) {
     -+		ret = error(_("unable to read tree %s"), oid_to_hex(oid));
     ++		ret = error(_("unable to read tree (%s)"), oid_to_hex(oid));
      +		goto leave_reset_head;
      +	}
      +
     @@ sequencer.c: static int do_recursive_merge(struct repository *r,
       
       	head_tree = parse_tree_indirect(head);
      +	if (!head_tree)
     -+		return error(_("unable to read tree %s"), oid_to_hex(head));
     ++		return error(_("unable to read tree (%s)"), oid_to_hex(head));
       	next_tree = next ? repo_get_commit_tree(r, next) : empty_tree(r);
       	base_tree = base ? repo_get_commit_tree(r, base) : empty_tree(r);
       
     @@ sequencer.c: static int do_reset(struct repository *r,
       
       	tree = parse_tree_indirect(&oid);
      +	if (!tree)
     -+		return error(_("unable to read tree %s"), oid_to_hex(&oid));
     ++		return error(_("unable to read tree (%s)"), oid_to_hex(&oid));
       	prime_cache_tree(r, r->index, tree);
       
       	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0)
 5:  91dc4ccd04e = 5:  7e5e84a4e7c cache-tree: avoid an unnecessary check
 -:  ----------- > 6:  ee2fcee5a10 fill_tree_descriptor(): mark error message for translation

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 1/6] merge-tree: fail with a non-zero exit code on missing tree objects
  2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
@ 2024-02-23  8:34       ` Johannes Schindelin via GitGitGadget
  2024-02-23  8:34       ` [PATCH v4 2/6] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-23  8:34 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When `git merge-tree` encounters a missing tree object, it should error
out and not continue quietly as if nothing had happened.

However, as of time of writing, `git merge-tree` _does_ continue, and
then offers the empty tree as result.

Let's fix this.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 merge-ort.c                      |  7 ++++---
 t/t4301-merge-tree-write-tree.sh | 11 +++++++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 6491070d965..c37fc035f13 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1659,9 +1659,10 @@ static int collect_merge_info(struct merge_options *opt,
 	info.data = opt;
 	info.show_all_errors = 1;
 
-	parse_tree(merge_base);
-	parse_tree(side1);
-	parse_tree(side2);
+	if (parse_tree(merge_base) < 0 ||
+	    parse_tree(side1) < 0 ||
+	    parse_tree(side2) < 0)
+		return -1;
 	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
 	init_tree_desc(t + 1, side1->buffer, side1->size);
 	init_tree_desc(t + 2, side2->buffer, side2->size);
diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 7d0fa74da74..908c9b540c8 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -951,4 +951,15 @@ test_expect_success '--merge-base with tree OIDs' '
 	test_cmp with-commits with-trees
 '
 
+test_expect_success 'error out on missing tree objects' '
+	git init --bare missing-tree.git &&
+	git rev-list side3 >list &&
+	git rev-parse side3^: >>list &&
+	git pack-objects missing-tree.git/objects/pack/side3-tree-is-missing <list &&
+	side3=$(git rev-parse side3) &&
+	test_must_fail git --git-dir=missing-tree.git merge-tree $side3^ $side3 >actual 2>err &&
+	test_grep "Could not read $(git rev-parse $side3:)" err &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 2/6] merge-ort: do check `parse_tree()`'s return value
  2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  2024-02-23  8:34       ` [PATCH v4 1/6] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
@ 2024-02-23  8:34       ` Johannes Schindelin via GitGitGadget
  2024-02-23  8:34       ` [PATCH v4 3/6] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-23  8:34 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The previous commit fixed a bug where a missing tree was reported, but
not treated as an error.

This patch addresses the same issue for the remaining two callers of
`parse_tree()`.

This change is not accompanied by a regression test because the code in
question is only reached at the `checkout` stage, i.e. after the merge
has happened (and therefore the tree objects could only be missing if
the disk had gone bad in that short time window, or something similarly
tricky to recreate in the test suite).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 merge-ort.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index c37fc035f13..79d9e18f63d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4379,9 +4379,11 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.verbose_update = (opt->verbosity > 2);
 	unpack_opts.fn = twoway_merge;
 	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
-	parse_tree(prev);
+	if (parse_tree(prev) < 0)
+		return -1;
 	init_tree_desc(&trees[0], prev->buffer, prev->size);
-	parse_tree(next);
+	if (parse_tree(next) < 0)
+		return -1;
 	init_tree_desc(&trees[1], next->buffer, next->size);
 
 	ret = unpack_trees(2, trees, &unpack_opts);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 3/6] t4301: verify that merge-tree fails on missing blob objects
  2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
  2024-02-23  8:34       ` [PATCH v4 1/6] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
  2024-02-23  8:34       ` [PATCH v4 2/6] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-23  8:34       ` Johannes Schindelin via GitGitGadget
  2024-02-23  8:34       ` [PATCH v4 4/6] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-23  8:34 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

We just fixed a problem where `merge-tree` would not fail on missing
tree objects. Let's ensure that that problem does not occur with blob
objects (and won't, in the future, either).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t4301-merge-tree-write-tree.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 908c9b540c8..d4463a45706 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -962,4 +962,20 @@ test_expect_success 'error out on missing tree objects' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'error out on missing blob objects' '
+	echo 1 | git hash-object -w --stdin >blob1 &&
+	echo 2 | git hash-object -w --stdin >blob2 &&
+	echo 3 | git hash-object -w --stdin >blob3 &&
+	printf "100644 blob $(cat blob1)\tblob\n" | git mktree >tree1 &&
+	printf "100644 blob $(cat blob2)\tblob\n" | git mktree >tree2 &&
+	printf "100644 blob $(cat blob3)\tblob\n" | git mktree >tree3 &&
+	git init --bare missing-blob.git &&
+	cat blob1 blob3 tree1 tree2 tree3 |
+	git pack-objects missing-blob.git/objects/pack/side1-whatever-is-missing &&
+	test_must_fail git --git-dir=missing-blob.git >actual 2>err \
+		merge-tree --merge-base=$(cat tree1) $(cat tree2) $(cat tree3) &&
+	test_grep "unable to read blob object $(cat blob2)" err &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 4/6] Always check `parse_tree*()`'s return value
  2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                         ` (2 preceding siblings ...)
  2024-02-23  8:34       ` [PATCH v4 3/6] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
@ 2024-02-23  8:34       ` Johannes Schindelin via GitGitGadget
  2024-02-23  8:34       ` [PATCH v4 5/6] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-23  8:34 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Otherwise we may easily run into serious crashes: For example, if we run
`init_tree_desc()` directly after a failed `parse_tree()`, we are
accessing uninitialized data or trying to dereference `NULL`.

Note that the `parse_tree()` function already takes care of showing an
error message. The `parse_tree_indirectly()` and
`repo_get_commit_tree()` functions do not, therefore those latter call
sites need to show a useful error message while the former do not.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/checkout.c   | 19 ++++++++++++++++---
 builtin/clone.c      |  3 ++-
 builtin/commit.c     |  3 ++-
 builtin/merge-tree.c |  6 ++++++
 builtin/read-tree.c  |  3 ++-
 builtin/reset.c      |  4 ++++
 cache-tree.c         |  4 ++--
 merge-ort.c          |  3 +++
 merge-recursive.c    |  3 ++-
 merge.c              |  5 ++++-
 reset.c              |  5 +++++
 sequencer.c          |  4 ++++
 12 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index f02434bc155..9ab0901d629 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -707,7 +707,8 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	init_checkout_metadata(&opts.meta, info->refname,
 			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		return 128;
 	init_tree_desc(&tree_desc, tree->buffer, tree->size);
 	switch (unpack_trees(1, &tree_desc, &opts)) {
 	case -2:
@@ -786,9 +787,15 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		if (new_branch_info->commit)
 			BUG("'switch --orphan' should never accept a commit as starting point");
 		new_tree = parse_tree_indirect(the_hash_algo->empty_tree);
-	} else
+		if (!new_tree)
+			BUG("unable to read empty tree");
+	} else {
 		new_tree = repo_get_commit_tree(the_repository,
 						new_branch_info->commit);
+		if (!new_tree)
+			return error(_("unable to read tree (%s)"),
+				     oid_to_hex(&new_branch_info->commit->object.oid));
+	}
 	if (opts->discard_changes) {
 		ret = reset_tree(new_tree, opts, 1, writeout_error, new_branch_info);
 		if (ret)
@@ -823,7 +830,8 @@ static int merge_working_tree(const struct checkout_opts *opts,
 				oid_to_hex(old_commit_oid));
 
 		init_tree_desc(&trees[0], tree->buffer, tree->size);
-		parse_tree(new_tree);
+		if (parse_tree(new_tree) < 0)
+			exit(128);
 		tree = new_tree;
 		init_tree_desc(&trees[1], tree->buffer, tree->size);
 
@@ -1239,10 +1247,15 @@ static void setup_new_branch_info_and_source_tree(
 	if (!new_branch_info->commit) {
 		/* not a commit */
 		*source_tree = parse_tree_indirect(rev);
+		if (!*source_tree)
+			die(_("unable to read tree (%s)"), oid_to_hex(rev));
 	} else {
 		parse_commit_or_die(new_branch_info->commit);
 		*source_tree = repo_get_commit_tree(the_repository,
 						    new_branch_info->commit);
+		if (!*source_tree)
+			die(_("unable to read tree (%s)"),
+			    oid_to_hex(&new_branch_info->commit->object.oid));
 	}
 }
 
diff --git a/builtin/clone.c b/builtin/clone.c
index c6357af9498..4410b55be98 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -736,7 +736,8 @@ static int checkout(int submodule_progress, int filter_submodules)
 	tree = parse_tree_indirect(&oid);
 	if (!tree)
 		die(_("unable to parse commit %s"), oid_to_hex(&oid));
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(&t, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts) < 0)
 		die(_("unable to checkout working tree"));
diff --git a/builtin/commit.c b/builtin/commit.c
index 781af2e206c..0723f06de7a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -339,7 +339,8 @@ static void create_base_index(const struct commit *current_head)
 	tree = parse_tree_indirect(&current_head->object.oid);
 	if (!tree)
 		die(_("failed to unpack HEAD tree object"));
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(&t, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts))
 		exit(128); /* We've already reported the error, finish dying */
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 2d4ce5b3886..3492a575a6c 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -447,12 +447,18 @@ static int real_merge(struct merge_tree_options *o,
 		if (repo_get_oid_treeish(the_repository, merge_base, &base_oid))
 			die(_("could not parse as tree '%s'"), merge_base);
 		base_tree = parse_tree_indirect(&base_oid);
+		if (!base_tree)
+			die(_("unable to read tree (%s)"), oid_to_hex(&base_oid));
 		if (repo_get_oid_treeish(the_repository, branch1, &head_oid))
 			die(_("could not parse as tree '%s'"), branch1);
 		parent1_tree = parse_tree_indirect(&head_oid);
+		if (!parent1_tree)
+			die(_("unable to read tree (%s)"), oid_to_hex(&head_oid));
 		if (repo_get_oid_treeish(the_repository, branch2, &merge_oid))
 			die(_("could not parse as tree '%s'"), branch2);
 		parent2_tree = parse_tree_indirect(&merge_oid);
+		if (!parent2_tree)
+			die(_("unable to read tree (%s)"), oid_to_hex(&merge_oid));
 
 		opt.ancestor = merge_base;
 		merge_incore_nonrecursive(&opt, base_tree, parent1_tree, parent2_tree, &result);
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 8196ca9dd85..5923ed36893 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -263,7 +263,8 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	cache_tree_free(&the_index.cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
-		parse_tree(tree);
+		if (parse_tree(tree) < 0)
+			return 128;
 		init_tree_desc(t+i, tree->buffer, tree->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
diff --git a/builtin/reset.c b/builtin/reset.c
index 4b018d20e3b..fd36fc5bd95 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -119,6 +119,10 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 
 	if (reset_type == MIXED || reset_type == HARD) {
 		tree = parse_tree_indirect(oid);
+		if (!tree) {
+			error(_("unable to read tree (%s)"), oid_to_hex(oid));
+			goto out;
+		}
 		prime_cache_tree(the_repository, the_repository->index, tree);
 	}
 
diff --git a/cache-tree.c b/cache-tree.c
index 641427ed410..c6508b64a5c 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -779,8 +779,8 @@ static void prime_cache_tree_rec(struct repository *r,
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
 
-			if (!subtree->object.parsed)
-				parse_tree(subtree);
+			if (!subtree->object.parsed && parse_tree(subtree) < 0)
+				exit(128);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
 
diff --git a/merge-ort.c b/merge-ort.c
index 79d9e18f63d..910ba38ff05 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -4983,6 +4983,9 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 
 	if (result->clean >= 0) {
 		result->tree = parse_tree_indirect(&working_tree_oid);
+		if (!result->tree)
+			die(_("unable to read tree (%s)"),
+			    oid_to_hex(&working_tree_oid));
 		/* existence of conflicted entries implies unclean */
 		result->clean &= strmap_empty(&opt->priv->conflicted);
 	}
diff --git a/merge-recursive.c b/merge-recursive.c
index e3beb0801b1..10d41bfd487 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -410,7 +410,8 @@ static inline int merge_detect_rename(struct merge_options *opt)
 
 static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree)
 {
-	parse_tree(tree);
+	if (parse_tree(tree) < 0)
+		exit(128);
 	init_tree_desc(desc, tree->buffer, tree->size);
 }
 
diff --git a/merge.c b/merge.c
index b60925459c2..14a7325859d 100644
--- a/merge.c
+++ b/merge.c
@@ -80,7 +80,10 @@ int checkout_fast_forward(struct repository *r,
 		return -1;
 	}
 	for (i = 0; i < nr_trees; i++) {
-		parse_tree(trees[i]);
+		if (parse_tree(trees[i]) < 0) {
+			rollback_lock_file(&lock_file);
+			return -1;
+		}
 		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
 	}
 
diff --git a/reset.c b/reset.c
index 48da0adf851..080bcb6d656 100644
--- a/reset.c
+++ b/reset.c
@@ -158,6 +158,11 @@ int reset_head(struct repository *r, const struct reset_head_opts *opts)
 	}
 
 	tree = parse_tree_indirect(oid);
+	if (!tree) {
+		ret = error(_("unable to read tree (%s)"), oid_to_hex(oid));
+		goto leave_reset_head;
+	}
+
 	prime_cache_tree(r, r->index, tree);
 
 	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0) {
diff --git a/sequencer.c b/sequencer.c
index d584cac8ed9..33d12b2ffd1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -715,6 +715,8 @@ static int do_recursive_merge(struct repository *r,
 	o.show_rename_progress = 1;
 
 	head_tree = parse_tree_indirect(head);
+	if (!head_tree)
+		return error(_("unable to read tree (%s)"), oid_to_hex(head));
 	next_tree = next ? repo_get_commit_tree(r, next) : empty_tree(r);
 	base_tree = base ? repo_get_commit_tree(r, base) : empty_tree(r);
 
@@ -3887,6 +3889,8 @@ static int do_reset(struct repository *r,
 	}
 
 	tree = parse_tree_indirect(&oid);
+	if (!tree)
+		return error(_("unable to read tree (%s)"), oid_to_hex(&oid));
 	prime_cache_tree(r, r->index, tree);
 
 	if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 5/6] cache-tree: avoid an unnecessary check
  2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                         ` (3 preceding siblings ...)
  2024-02-23  8:34       ` [PATCH v4 4/6] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
@ 2024-02-23  8:34       ` Johannes Schindelin via GitGitGadget
  2024-02-23  8:34       ` [PATCH v4 6/6] fill_tree_descriptor(): mark error message for translation Johannes Schindelin via GitGitGadget
  2024-02-23 18:23       ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Junio C Hamano
  6 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-23  8:34 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The first thing the `parse_tree()` function does is to return early if
the tree has already been parsed. Therefore we do not need to guard the
`parse_tree()` call behind a check of that flag.

As of time of writing, there are no other instances of this in Git's
code bases: whenever the `parsed` flag guards a `parse_tree()` call, it
guards more than just that call.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 cache-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index c6508b64a5c..78d6ba92853 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -779,7 +779,7 @@ static void prime_cache_tree_rec(struct repository *r,
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
 
-			if (!subtree->object.parsed && parse_tree(subtree) < 0)
+			if (parse_tree(subtree) < 0)
 				exit(128);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 6/6] fill_tree_descriptor(): mark error message for translation
  2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                         ` (4 preceding siblings ...)
  2024-02-23  8:34       ` [PATCH v4 5/6] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
@ 2024-02-23  8:34       ` Johannes Schindelin via GitGitGadget
  2024-02-23 18:23       ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Junio C Hamano
  6 siblings, 0 replies; 48+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2024-02-23  8:34 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Eric Sunshine, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

There is an error message in that function to report a missing tree; In
contrast to three other, similar error messages, it is not marked for
translation yet.

Mark it for translation, and while at it, make the error message
consistent with the others by enclosing the SHA in parentheses.

This requires a change to t6030 which expects the previous format of the
commit message. Theoretically, this could present problems with existing
scripts that use `git bisect` and parse its output (because Git does not
provide other means for callers to discern between error conditions).
However, this is unlikely to matter in practice because the most common
course of action to deal with fatal corruptions is to report the error
message to the user and exit, rather than trying to do something with
the reported SHA of the missing tree.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t6030-bisect-porcelain.sh | 2 +-
 tree-walk.c                 | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 2a5b7d8379c..58f3d9c675e 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -872,7 +872,7 @@ test_expect_success 'broken branch creation' '
 
 echo "" > expected.ok
 cat > expected.missing-tree.default <<EOF
-fatal: unable to read tree $deleted
+fatal: unable to read tree ($deleted)
 EOF
 
 test_expect_success 'bisect fails if tree is broken on start commit' '
diff --git a/tree-walk.c b/tree-walk.c
index b517792ba23..690fa6569bd 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -100,7 +100,7 @@ void *fill_tree_descriptor(struct repository *r,
 	if (oid) {
 		buf = read_object_with_reference(r, oid, OBJ_TREE, &size, NULL);
 		if (!buf)
-			die("unable to read tree %s", oid_to_hex(oid));
+			die(_("unable to read tree (%s)"), oid_to_hex(oid));
 	}
 	init_tree_desc(desc, buf, size);
 	return buf;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/5] Always check `parse_tree*()`'s return value
  2024-02-23  8:33         ` Johannes Schindelin
@ 2024-02-23 17:17           ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-23 17:17 UTC (permalink / raw
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Patrick Steinhardt,
	Eric Sunshine

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Right. I had used
>
>   $ git grep 'unable to read tree .*%s' | sed -n 's/.*_("\([^"]*\).*/\1/p' | sort | uniq -c
>        11 unable to read tree %s
>         3 unable to read tree (%s)
>
> only to realize that the 11 were the ones I added.🤦 Re-running the same
> command on v2.43.0 reports only the 3 parenthesized ones.

I think we probably should discuss what format is the easiest to
understand and most logical to readers, and unify to such a format,
not necessarily the current majority, in the longer term.  But for
now, let's pick one that costs less to unify to.

We may also want to cast a wider net to make things consistent.  For
example, we learn that the parenthesized one is not necessarily more
prevalent in a larger picture.

Let me annotate the output from this command:

    $ git grep -h -E \
      -e '(unable to|not) (read|find|acccess) (blob|tree|commit|tag) .*%s' master po/ |
      sort -u

1. msgid "cannot find commit %s (%s)"

   The first one is a textual refname, the next one is oid-to-hex.

2. msgid "cannot read blob %s for path %s"

   The first one is oid-to-hex, the other is a pathname.

3. msgid "could not find commit %s"

   oid-to-hex (in commit-graph.c)

4. msgid "could not read commit message of %s"

   oid-to-hex (in sequencer.c)

5. msgid "could not read commit message: %s"

   This is irrelevant to the topic, as %s is for strerror().

6. msgid "unable to access commit %s"

   oid-to-hex (in builtin/pull.c)

7. msgid "unable to read commit message from '%s'"

   This message is in po/ but it seems that it no longer is used
   anywhere in the source.

8. msgid "unable to read tree %s"
9. msgid "unable to read tree (%s)"

   We know about these two already.

We seem to just use unadorned %s for many messages when talking
about commit objects, some are inside a pair of 'single quotes',
When we are giving a long hexadecimal string, especially without
doing any abbreviation, I personally think it is a waste to enclose
it in any punctuation pair, so if I were to vote today, I would
probably support standardizing on "tree %s", "blob %s", etc., but I
think that is just a personal preference (not even a "taste" thing).

In any case, all of the above is clearly outside the scope of this
series.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/6] merge-tree: handle missing objects correctly
  2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
                         ` (5 preceding siblings ...)
  2024-02-23  8:34       ` [PATCH v4 6/6] fill_tree_descriptor(): mark error message for translation Johannes Schindelin via GitGitGadget
@ 2024-02-23 18:23       ` Junio C Hamano
  6 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2024-02-23 18:23 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Eric Sunshine, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> I recently looked into issues where git merge-tree calls returned bogus data
> (in one instance returning an empty tree for non-empty merge parents). By
> the time I had a look at the corresponding repository, the issue was no
> longer reproducible, but a closer look at the code combined with some manual
> experimenting turned up the fact that missing tree objects aren't handled as
> errors by git merge-tree.

Looking good.  Will replace.  Thanks.


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-02-23 18:23 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-06  9:49 [PATCH 0/4] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
2024-02-06  9:49 ` [PATCH 1/4] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
2024-02-07  7:42   ` Patrick Steinhardt
2024-02-06  9:49 ` [PATCH 2/4] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
2024-02-06  9:49 ` [PATCH 3/4] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
2024-02-06  9:49 ` [PATCH 4/4] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
2024-02-06 22:07   ` Junio C Hamano
2024-02-07  7:42   ` Patrick Steinhardt
2024-02-06 21:12 ` [PATCH 0/4] merge-tree: handle missing objects correctly Junio C Hamano
2024-02-07  7:42 ` Patrick Steinhardt
2024-02-07 16:47 ` [PATCH v2 0/5] " Johannes Schindelin via GitGitGadget
2024-02-07 16:47   ` [PATCH v2 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
2024-02-07 17:02     ` Eric Sunshine
2024-02-22 14:09       ` Johannes Schindelin
2024-02-07 16:47   ` [PATCH v2 2/5] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
2024-02-07 16:47   ` [PATCH v2 3/5] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
2024-02-07 17:24     ` Eric Sunshine
2024-02-07 21:24       ` Junio C Hamano
2024-02-08  0:52       ` Eric Sunshine
2024-02-22 14:12       ` Johannes Schindelin
2024-02-07 16:47   ` [PATCH v2 4/5] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
2024-02-07 17:26     ` Eric Sunshine
2024-02-22 14:08       ` Johannes Schindelin
2024-02-22 17:05         ` Junio C Hamano
2024-02-07 16:47   ` [PATCH v2 5/5] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
2024-02-07 16:58     ` Junio C Hamano
2024-02-22 14:36   ` [PATCH v3 0/5] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
2024-02-22 14:36     ` [PATCH v3 1/5] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
2024-02-22 17:13       ` Junio C Hamano
2024-02-22 14:36     ` [PATCH v3 2/5] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
2024-02-22 17:18       ` Junio C Hamano
2024-02-22 14:36     ` [PATCH v3 3/5] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
2024-02-22 17:27       ` Junio C Hamano
2024-02-22 18:23         ` Junio C Hamano
2024-02-22 14:36     ` [PATCH v3 4/5] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
2024-02-22 17:58       ` Junio C Hamano
2024-02-23  8:33         ` Johannes Schindelin
2024-02-23 17:17           ` Junio C Hamano
2024-02-22 14:36     ` [PATCH v3 5/5] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
2024-02-22 18:00       ` Junio C Hamano
2024-02-23  8:34     ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Johannes Schindelin via GitGitGadget
2024-02-23  8:34       ` [PATCH v4 1/6] merge-tree: fail with a non-zero exit code on missing tree objects Johannes Schindelin via GitGitGadget
2024-02-23  8:34       ` [PATCH v4 2/6] merge-ort: do check `parse_tree()`'s return value Johannes Schindelin via GitGitGadget
2024-02-23  8:34       ` [PATCH v4 3/6] t4301: verify that merge-tree fails on missing blob objects Johannes Schindelin via GitGitGadget
2024-02-23  8:34       ` [PATCH v4 4/6] Always check `parse_tree*()`'s return value Johannes Schindelin via GitGitGadget
2024-02-23  8:34       ` [PATCH v4 5/6] cache-tree: avoid an unnecessary check Johannes Schindelin via GitGitGadget
2024-02-23  8:34       ` [PATCH v4 6/6] fill_tree_descriptor(): mark error message for translation Johannes Schindelin via GitGitGadget
2024-02-23 18:23       ` [PATCH v4 0/6] merge-tree: handle missing objects correctly Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).