Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] commit -a -m: allow the top-level tree to become empty again
@ 2023-06-29 13:23 Johannes Schindelin via GitGitGadget
  2023-06-29 13:23 ` [PATCH 1/3] do_read_index(): always mark index as initialized unless erroring out Johannes Schindelin via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-06-29 13:23 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin

This patch series is in response to the bug report in
https://github.com/git-for-windows/git/issues/4462 that demonstrates that
git commit -a -m <msg> would no longer always stage all updates to tracked
files. The bug has been introduced in Git v2.40.0.

Johannes Schindelin (3):
  do_read_index(): always mark index as initialized unless erroring out
  split-index: accept that a base index can be empty
  commit -a -m: allow the top-level tree to become empty again

 builtin/commit.c      |  7 ++-----
 read-cache.c          | 15 +++++++++------
 t/t2200-add-update.sh | 11 +++++++++++
 3 files changed, 22 insertions(+), 11 deletions(-)


base-commit: a9e066fa63149291a55f383cfa113d8bdbdaa6b3
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1554%2Fdscho%2Ffix-git-commit-a-m-when-tree-becomes-empty-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1554/dscho/fix-git-commit-a-m-when-tree-becomes-empty-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1554
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] do_read_index(): always mark index as initialized unless erroring out
  2023-06-29 13:23 [PATCH 0/3] commit -a -m: allow the top-level tree to become empty again Johannes Schindelin via GitGitGadget
@ 2023-06-29 13:23 ` Johannes Schindelin via GitGitGadget
  2023-06-29 13:23 ` [PATCH 2/3] split-index: accept that a base index can be empty Johannes Schindelin via GitGitGadget
  2023-06-29 13:23 ` [PATCH 3/3] commit -a -m: allow the top-level tree to become empty again Johannes Schindelin via GitGitGadget
  2 siblings, 0 replies; 6+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-06-29 13:23 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In 913e0e99b6a (unpack_trees(): protect the handcrafted in-core index
from read_cache(), 2008-08-23) a flag was introduced into the
`index_state` structure to indicate whether it had been initialized (or
more correctly: read and parsed).

There was one code path that was not handled, though: when the index
file does not yet exist (but the `must_exist` parameter is set to 0 to
indicate that that's okay). In this instance, Git wants to go forward
with a new, pristine Git index, almost as if the file had existed and
contained no index entries or extensions.

Since Git wants to handle this situation the same as if an "empty" Git
index file existed, let's set the `initialized` flag also in that case.

This is necessary to prepare for fixing the bug where the condition
`cache_nr == 0` is incorrectly used as an indicator that the index was
already read, and the condition `initialized != 0` needs to be used
instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 read-cache.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/read-cache.c b/read-cache.c
index f4c31a68c85..b10caa9831c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2285,6 +2285,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (fd < 0) {
 		if (!must_exist && errno == ENOENT) {
 			set_new_index_sparsity(istate);
+			istate->initialized = 1;
 			return 0;
 		}
 		die_errno(_("%s: index file open failed"), path);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] split-index: accept that a base index can be empty
  2023-06-29 13:23 [PATCH 0/3] commit -a -m: allow the top-level tree to become empty again Johannes Schindelin via GitGitGadget
  2023-06-29 13:23 ` [PATCH 1/3] do_read_index(): always mark index as initialized unless erroring out Johannes Schindelin via GitGitGadget
@ 2023-06-29 13:23 ` Johannes Schindelin via GitGitGadget
  2023-06-29 19:02   ` Junio C Hamano
  2023-06-29 13:23 ` [PATCH 3/3] commit -a -m: allow the top-level tree to become empty again Johannes Schindelin via GitGitGadget
  2 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-06-29 13:23 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

We are about to fix an ancient bug where `do_read_index()` pretended
that the index was not initialized when there are no index entries.

Before the `index_state` structure gained the `initialized` flag in
913e0e99b6a (unpack_trees(): protect the handcrafted in-core index from
read_cache(), 2008-08-23), that was the best we could do (even if it was
incorrect: it is totally possible to read a Git index file that contains
no index entries).

This pattern was repeated also in 998330ac2e7 (read-cache: look for
shared index files next to the index, too, 2021-08-26), which we fix
here by _not_ mistaking an empty base index for a missing
`sharedindex.*` file.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 read-cache.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index b10caa9831c..e15a472f54f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2455,12 +2455,14 @@ int read_index_from(struct index_state *istate, const char *path,
 
 	base_oid_hex = oid_to_hex(&split_index->base_oid);
 	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex);
-	trace2_region_enter_printf("index", "shared/do_read_index",
-				   the_repository, "%s", base_path);
-	ret = do_read_index(split_index->base, base_path, 0);
-	trace2_region_leave_printf("index", "shared/do_read_index",
-				   the_repository, "%s", base_path);
-	if (!ret) {
+	if (file_exists(base_path)) {
+		trace2_region_enter_printf("index", "shared/do_read_index",
+					the_repository, "%s", base_path);
+
+		ret = do_read_index(split_index->base, base_path, 0);
+		trace2_region_leave_printf("index", "shared/do_read_index",
+					the_repository, "%s", base_path);
+	} else {
 		char *path_copy = xstrdup(path);
 		char *base_path2 = xstrfmt("%s/sharedindex.%s",
 					   dirname(path_copy), base_oid_hex);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] commit -a -m: allow the top-level tree to become empty again
  2023-06-29 13:23 [PATCH 0/3] commit -a -m: allow the top-level tree to become empty again Johannes Schindelin via GitGitGadget
  2023-06-29 13:23 ` [PATCH 1/3] do_read_index(): always mark index as initialized unless erroring out Johannes Schindelin via GitGitGadget
  2023-06-29 13:23 ` [PATCH 2/3] split-index: accept that a base index can be empty Johannes Schindelin via GitGitGadget
@ 2023-06-29 13:23 ` Johannes Schindelin via GitGitGadget
  2023-06-29 19:17   ` Junio C Hamano
  2 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-06-29 13:23 UTC (permalink / raw
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In 03267e8656c (commit: discard partial cache before (re-)reading it,
2022-11-08), a memory leak was plugged by discarding any partial index
before re-reading it.

The problem with this memory leak fix is that it was based on an
incomplete understanding of the logic introduced in 7168624c353 (Do not
generate full commit log message if it is not going to be used,
2007-11-28).

That logic was introduced to add a shortcut when committing without
editing the commit message interactively. A part of that logic was to
ensure that the index was read into memory:

	if (!active_nr && read_cache() < 0)
		die(...)

Translation to English: If the index has not yet been read, read it, and
if that fails, error out.

That logic was incorrect, though: It used `!active_nr` as an indicator
that the index was not yet read. Usually this is not a problem because
in the vast majority of instances, the index contains at least one
entry.

And it was natural to do it this way because at the time that condition
was introduced, the `index_state` structure had no explicit flag to
indicate that it was initialized: This flag was only introduced in
913e0e99b6a (unpack_trees(): protect the handcrafted in-core index from
read_cache(), 2008-08-23), but that commit did not adjust the code path
where no index file was found and a new, pristine index was initialized.

Now, when the index does not contain any entry (which is quite
common in Git's test suite because it starts quite a many repositories
from scratch), subsequent calls to `do_read_index()` will mistake the
index not to be initialized, and read it again unnecessarily.

This is a problem because after initializing the empty index e.g. the
`cache_tree` in that index could have been initialized before a
subsequent call to `do_read_index()` wants to ensure an initialized
index. And if that subsequent call mistakes the index not to have been
initialized, it would lead to leaked memory.

The correct fix for that memory leak is to adjust the condition so that
it does not mistake `active_nr == 0` to mean that the index has not yet
been read.

Using the `initialized` flag instead, we avoid that mistake, and as a
bonus we can fix a bug at the same time that was introduced by the
memory leak fix: When deleting all tracked files and then asking `git
commit -a -m ...` to commit the result, Git would internally update the
index, then discard and re-read the index undoing the update, and fail
to commit anything.

This fixes https://github.com/git-for-windows/git/issues/4462

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/commit.c      |  7 ++-----
 t/t2200-add-update.sh | 11 +++++++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 65a5c0e29d5..4cf2baaf943 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -998,11 +998,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		struct object_id oid;
 		const char *parent = "HEAD";
 
-		if (!the_index.cache_nr) {
-			discard_index(&the_index);
-			if (repo_read_index(the_repository) < 0)
-				die(_("Cannot read index"));
-		}
+		if (!the_index.initialized && repo_read_index(the_repository) < 0)
+			die(_("Cannot read index"));
 
 		if (amend)
 			parent = "HEAD^1";
diff --git a/t/t2200-add-update.sh b/t/t2200-add-update.sh
index be394f1131a..c01492f33f8 100755
--- a/t/t2200-add-update.sh
+++ b/t/t2200-add-update.sh
@@ -197,4 +197,15 @@ test_expect_success '"add -u non-existent" should fail' '
 	! grep "non-existent" actual
 '
 
+test_expect_success '"commit -a" implies "add -u" if index becomes empty' '
+	git rm -rf \* &&
+	git commit -m clean-slate &&
+	test_commit file1 &&
+	rm file1.t &&
+	test_tick &&
+	git commit -a -m remove &&
+	git ls-tree HEAD: >out &&
+	test_must_be_empty out
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] split-index: accept that a base index can be empty
  2023-06-29 13:23 ` [PATCH 2/3] split-index: accept that a base index can be empty Johannes Schindelin via GitGitGadget
@ 2023-06-29 19:02   ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2023-06-29 19:02 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> We are about to fix an ancient bug where `do_read_index()` pretended
> that the index was not initialized when there are no index entries.
>
> Before the `index_state` structure gained the `initialized` flag in
> 913e0e99b6a (unpack_trees(): protect the handcrafted in-core index from
> read_cache(), 2008-08-23), that was the best we could do (even if it was
> incorrect: it is totally possible to read a Git index file that contains
> no index entries).

Yeah, I very much remember how that single bit made our live much
easier.

> This pattern was repeated also in 998330ac2e7 (read-cache: look for
> shared index files next to the index, too, 2021-08-26), which we fix
> here by _not_ mistaking an empty base index for a missing
> `sharedindex.*` file.

Ahh, this is in the codepath to deal with a separate worktree.  We
allow sharing of the "sharedindex.*" file across worktrees and
entries read from the "index" files from individual worktrees to
overlay it.  But we also do allow worktrees to have their own
"sharedindex.*" file, which is what the commit in question wanted to
do, and the way it (wanted to) implement was

 - check the "gitdir" version first, as before
 - if that did not exist, then look at the one next to "index"

but "if that did not exist" was implemented incorrectly and did not
account for the case where that "gitdir" version was an empty index.

So, instead, updated code checks and reads the "gitdir" version *if*
the file exists, regardless of how many entries there are in it.

Makes sense.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  read-cache.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index b10caa9831c..e15a472f54f 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2455,12 +2455,14 @@ int read_index_from(struct index_state *istate, const char *path,
>  
>  	base_oid_hex = oid_to_hex(&split_index->base_oid);
>  	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex);
> -	trace2_region_enter_printf("index", "shared/do_read_index",
> -				   the_repository, "%s", base_path);
> -	ret = do_read_index(split_index->base, base_path, 0);
> -	trace2_region_leave_printf("index", "shared/do_read_index",
> -				   the_repository, "%s", base_path);
> -	if (!ret) {
> +	if (file_exists(base_path)) {
> +		trace2_region_enter_printf("index", "shared/do_read_index",
> +					the_repository, "%s", base_path);
> +
> +		ret = do_read_index(split_index->base, base_path, 0);
> +		trace2_region_leave_printf("index", "shared/do_read_index",
> +					the_repository, "%s", base_path);
> +	} else {
>  		char *path_copy = xstrdup(path);
>  		char *base_path2 = xstrfmt("%s/sharedindex.%s",
>  					   dirname(path_copy), base_oid_hex);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] commit -a -m: allow the top-level tree to become empty again
  2023-06-29 13:23 ` [PATCH 3/3] commit -a -m: allow the top-level tree to become empty again Johannes Schindelin via GitGitGadget
@ 2023-06-29 19:17   ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2023-06-29 19:17 UTC (permalink / raw
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> That logic was introduced to add a shortcut when committing without
> editing the commit message interactively. A part of that logic was to
> ensure that the index was read into memory:
>
> 	if (!active_nr && read_cache() < 0)
> 		die(...)
>
> Translation to English: If the index has not yet been read, read it, and
> if that fails, error out.

Well described.  It does make sense to turn !active_nr used here
into a check on the .initialized member.

> And it was natural to do it this way because at the time that condition
> was introduced, the `index_state` structure had no explicit flag to
> indicate that it was initialized: This flag was only introduced in
> 913e0e99b6a (unpack_trees(): protect the handcrafted in-core index from
> read_cache(), 2008-08-23), but that commit did not adjust the code path
> where no index file was found and a new, pristine index was initialized.

My mistake, but after 15 years it probably is beyond statute of
limitations ;-)

> Using the `initialized` flag instead, we avoid that mistake, and as a
> bonus we can fix a bug at the same time that was introduced by the
> memory leak fix: When deleting all tracked files and then asking `git
> commit -a -m ...` to commit the result, Git would internally update the
> index, then discard and re-read the index undoing the update, and fail
> to commit anything.

That does sound like the primary bug fixed with this change, not a
bonus, but anyway, the change is very sensible and clearly described
with a good test.  Will queue.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-29 19:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-29 13:23 [PATCH 0/3] commit -a -m: allow the top-level tree to become empty again Johannes Schindelin via GitGitGadget
2023-06-29 13:23 ` [PATCH 1/3] do_read_index(): always mark index as initialized unless erroring out Johannes Schindelin via GitGitGadget
2023-06-29 13:23 ` [PATCH 2/3] split-index: accept that a base index can be empty Johannes Schindelin via GitGitGadget
2023-06-29 19:02   ` Junio C Hamano
2023-06-29 13:23 ` [PATCH 3/3] commit -a -m: allow the top-level tree to become empty again Johannes Schindelin via GitGitGadget
2023-06-29 19:17   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).