Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix behavior of worktree config in submodules
@ 2023-05-23 23:17 Victoria Dye via GitGitGadget
  2023-05-23 23:17 ` [PATCH 1/2] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-05-23 23:17 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, chooglen, Victoria Dye

About a year ago, discussion on the sparse index integration of 'git grep'
surfaced larger incompatibilities between sparse-checkout and submodules
[1]. This series fixes one of the underlying issues to that incompatibility,
which is that the worktree config of the submodule (where
'core.sparseCheckout', 'core.sparseCheckoutCone', and 'index.sparse' are
set) is not used when operating on the submodule from its super project
(e.g., in a command with '--recurse-submodules').

The outcome of this series is that 'extensions.worktreeConfig' and the
contents of the repository's worktree config are read and applied to (and
only to) the relevant repo when working in a super project/submodule setup.
This alone doesn't fix sparse-checkout/submodule interoperability; the
additional changes needed for that will be submitted in a later series. I'm
also hoping this will help (or at least not hurt) the work to avoid use of
global state in config parsing [2].

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/093827ae-41ef-5f7c-7829-647536ce1305@github.com/
[2]
https://lore.kernel.org/git/pull.1497.git.git.1682104398.gitgitgadget@gmail.com/

Victoria Dye (2):
  config: use gitdir to get worktree config
  repository: move 'repository_format_worktree_config' to repo scope

 builtin/config.c                       |  3 +-
 builtin/worktree.c                     |  2 +-
 config.c                               | 42 ++++++++++++++++++--------
 environment.c                          |  1 -
 environment.h                          |  1 -
 repository.c                           |  1 +
 repository.h                           |  1 +
 setup.c                                | 10 ++++--
 t/t3007-ls-files-recurse-submodules.sh | 34 +++++++++++++++++++++
 worktree.c                             |  4 +--
 10 files changed, 78 insertions(+), 21 deletions(-)


base-commit: 4a714b37029a4b63dbd22f7d7ed81f7a0d693680
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1536%2Fvdye%2Fvdye%2Fsubmodule-worktree-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1536/vdye/vdye/submodule-worktree-config-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1536
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] config: use gitdir to get worktree config
  2023-05-23 23:17 [PATCH 0/2] Fix behavior of worktree config in submodules Victoria Dye via GitGitGadget
@ 2023-05-23 23:17 ` Victoria Dye via GitGitGadget
  2023-05-25  1:05   ` Glen Choo
  2023-05-23 23:17 ` [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-05-23 23:17 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, chooglen, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'do_git_config_sequence()' to read the worktree config from
'config.worktree' in 'opts->git_dir' rather than the gitdir of
'the_repository'.

The worktree config is loaded from the path returned by
'git_pathdup("config.worktree")', the 'config.worktree' relative to the
gitdir of 'the_repository'. If loading the config for a submodule, this path
is incorrect, since 'the_repository' is the super project. Conversely,
'opts->git_dir' is the gitdir of the submodule being configured, so the
config file in that location should be read instead.

To ensure the use of 'opts->git_dir' is safe, require that 'opts->git_dir'
is set if-and-only-if 'opts->commondir' is set (rather than "only-if" as it
is now). In all current usage of 'config_options', these values are set
together, so the stricter check does not change any behavior.

Finally, add tests to 't3007-ls-files-recurse-submodules.sh' to demonstrate
the corrected config loading behavior. Note that behavior still isn't ideal
because 'extensions.worktreeConfig' in the super project controls whether or
not the worktree config is used in the submodule. This will be fixed in a
later patch.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 config.c                               | 28 +++++++++++++++++---------
 t/t3007-ls-files-recurse-submodules.sh | 23 +++++++++++++++++++++
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/config.c b/config.c
index b79baf83e35..a93f7bfa3aa 100644
--- a/config.c
+++ b/config.c
@@ -2200,14 +2200,24 @@ static int do_git_config_sequence(struct config_reader *reader,
 	char *xdg_config = NULL;
 	char *user_config = NULL;
 	char *repo_config;
+	char *worktree_config;
 	enum config_scope prev_parsing_scope = reader->parsing_scope;
 
-	if (opts->commondir)
+	/*
+	 * Ensure that either:
+	 * - the git_dir and commondir are both set, or
+	 * - the git_dir and commondir are both NULL
+	 */
+	if (!opts->git_dir != !opts->commondir)
+		BUG("only one of commondir and git_dir is non-NULL");
+
+	if (opts->commondir) {
 		repo_config = mkpathdup("%s/config", opts->commondir);
-	else if (opts->git_dir)
-		BUG("git_dir without commondir");
-	else
+		worktree_config = mkpathdup("%s/config.worktree", opts->git_dir);
+	} else {
 		repo_config = NULL;
+		worktree_config = NULL;
+	}
 
 	config_reader_set_scope(reader, CONFIG_SCOPE_SYSTEM);
 	if (git_config_system() && system_config &&
@@ -2230,11 +2240,10 @@ static int do_git_config_sequence(struct config_reader *reader,
 		ret += git_config_from_file(fn, repo_config, data);
 
 	config_reader_set_scope(reader, CONFIG_SCOPE_WORKTREE);
-	if (!opts->ignore_worktree && repository_format_worktree_config) {
-		char *path = git_pathdup("config.worktree");
-		if (!access_or_die(path, R_OK, 0))
-			ret += git_config_from_file(fn, path, data);
-		free(path);
+	if (!opts->ignore_worktree && worktree_config &&
+	    repository_format_worktree_config &&
+	    !access_or_die(worktree_config, R_OK, 0)) {
+		ret += git_config_from_file(fn, worktree_config, data);
 	}
 
 	config_reader_set_scope(reader, CONFIG_SCOPE_COMMAND);
@@ -2246,6 +2255,7 @@ static int do_git_config_sequence(struct config_reader *reader,
 	free(xdg_config);
 	free(user_config);
 	free(repo_config);
+	free(worktree_config);
 	return ret;
 }
 
diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh
index dd7770e85de..e35c203241f 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -299,6 +299,29 @@ test_expect_success '--recurse-submodules does not support --error-unmatch' '
 	test_i18ngrep "does not support --error-unmatch" actual
 '
 
+test_expect_success '--recurse-submodules parses submodule repo config' '
+	test_when_finished "git -C submodule config --unset feature.experimental" &&
+	git -C submodule config feature.experimental "invalid non-boolean value" &&
+	test_must_fail git ls-files --recurse-submodules 2>err &&
+	grep "bad boolean config value" err
+'
+
+test_expect_success '--recurse-submodules parses submodule worktree config' '
+	test_when_finished "git -C submodule config --unset extensions.worktreeConfig" &&
+	test_when_finished "git -C submodule config --worktree --unset feature.experimental" &&
+	test_when_finished "git config --unset extensions.worktreeConfig" &&
+
+	git -C submodule config extensions.worktreeConfig true &&
+	git -C submodule config --worktree feature.experimental "invalid non-boolean value" &&
+
+	# NEEDSWORK: the extensions.worktreeConfig is set globally based on super
+	# project, so we need to enable it in the super project.
+	git config extensions.worktreeConfig true &&
+
+	test_must_fail git ls-files --recurse-submodules 2>err &&
+	grep "bad boolean config value" err
+'
+
 test_incompatible_with_recurse_submodules () {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
 		test_must_fail git ls-files --recurse-submodules $1 2>actual &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope
  2023-05-23 23:17 [PATCH 0/2] Fix behavior of worktree config in submodules Victoria Dye via GitGitGadget
  2023-05-23 23:17 ` [PATCH 1/2] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
@ 2023-05-23 23:17 ` Victoria Dye via GitGitGadget
  2023-05-25  1:29   ` Glen Choo
  2023-05-25 20:13   ` Derrick Stolee
  2023-05-24 10:25 ` [PATCH 0/2] Fix behavior of worktree config in submodules Junio C Hamano
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 29+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-05-23 23:17 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, chooglen, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Move 'repository_format_worktree_config' out of the global scope and into
the 'repository' struct. This change is similar to how
'repository_format_partial_clone' was moved in ebaf3bcf1ae (repository: move
global r_f_p_c to repo struct, 2021-06-17), adding to the 'repository'
struct and updating 'setup.c' & 'repository.c' functions to assign the value
appropriately. In addition, update usage of the setting to reference the
relevant context's repo or, as a fallback, 'the_repository'.

The primary goal of this change is to be able to load worktree config for a
submodule depending on whether that submodule - not the super project - has
'extensions.worktreeConfig' enabled. To ensure 'do_git_config_sequence()'
has access to the newly repo-scoped configuration:

- update 'repo_read_config()' to create a 'config_source' to hold the
  repo instance
- add a 'repo' argument to 'do_git_config_sequence()'
- update 'config_with_options' to call 'do_git_config_sequence()' with
  'config_source.repo', or 'the_repository' as a fallback

Finally, add/update tests in 't3007-ls-files-recurse-submodules.sh' to
verify 'extensions.worktreeConfig' is read an used independently by super
projects and submodules.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/config.c                       |  3 ++-
 builtin/worktree.c                     |  2 +-
 config.c                               | 16 +++++++++++-----
 environment.c                          |  1 -
 environment.h                          |  1 -
 repository.c                           |  1 +
 repository.h                           |  1 +
 setup.c                                | 10 ++++++++--
 t/t3007-ls-files-recurse-submodules.sh | 21 ++++++++++++++++-----
 worktree.c                             |  4 ++--
 10 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index ff2fe8ef125..5e352117c7a 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -5,6 +5,7 @@
 #include "color.h"
 #include "editor.h"
 #include "environment.h"
+#include "repository.h"
 #include "gettext.h"
 #include "ident.h"
 #include "parse-options.h"
@@ -713,7 +714,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
 		given_config_source.scope = CONFIG_SCOPE_LOCAL;
 	} else if (use_worktree_config) {
 		struct worktree **worktrees = get_worktrees();
-		if (repository_format_worktree_config)
+		if (the_repository->repository_format_worktree_config)
 			given_config_source.file = git_pathdup("config.worktree");
 		else if (worktrees[0] && worktrees[1])
 			die(_("--worktree cannot be used with multiple "
diff --git a/builtin/worktree.c b/builtin/worktree.c
index f3180463be2..60e389aaedb 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -483,7 +483,7 @@ static int add_worktree(const char *path, const char *refname,
 	 * values from the current worktree into the new one, that way the
 	 * new worktree behaves the same as this one.
 	 */
-	if (repository_format_worktree_config)
+	if (the_repository->repository_format_worktree_config)
 		copy_filtered_worktree_config(sb_repo.buf);
 
 	strvec_pushf(&child_env, "%s=%s", GIT_DIR_ENVIRONMENT, sb_git.buf);
diff --git a/config.c b/config.c
index a93f7bfa3aa..9ce2ffff5e1 100644
--- a/config.c
+++ b/config.c
@@ -2193,6 +2193,7 @@ int git_config_system(void)
 
 static int do_git_config_sequence(struct config_reader *reader,
 				  const struct config_options *opts,
+				  const struct repository *repo,
 				  config_fn_t fn, void *data)
 {
 	int ret = 0;
@@ -2241,7 +2242,7 @@ static int do_git_config_sequence(struct config_reader *reader,
 
 	config_reader_set_scope(reader, CONFIG_SCOPE_WORKTREE);
 	if (!opts->ignore_worktree && worktree_config &&
-	    repository_format_worktree_config &&
+	    repo && repo->repository_format_worktree_config &&
 	    !access_or_die(worktree_config, R_OK, 0)) {
 		ret += git_config_from_file(fn, worktree_config, data);
 	}
@@ -2277,7 +2278,7 @@ int config_with_options(config_fn_t fn, void *data,
 		data = &inc;
 	}
 
-	if (config_source)
+	if (config_source && config_source->scope != CONFIG_SCOPE_UNKNOWN)
 		config_reader_set_scope(&the_reader, config_source->scope);
 
 	/*
@@ -2294,7 +2295,9 @@ int config_with_options(config_fn_t fn, void *data,
 		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
 	} else {
-		ret = do_git_config_sequence(&the_reader, opts, fn, data);
+		struct repository *repo = config_source && config_source->repo ?
+			config_source->repo : the_repository;
+		ret = do_git_config_sequence(&the_reader, opts, repo, fn, data);
 	}
 
 	if (inc.remote_urls) {
@@ -2667,11 +2670,14 @@ static void repo_read_config(struct repository *repo)
 {
 	struct config_options opts = { 0 };
 	struct configset_add_data data = CONFIGSET_ADD_INIT;
+	struct git_config_source config_source = { 0 };
 
 	opts.respect_includes = 1;
 	opts.commondir = repo->commondir;
 	opts.git_dir = repo->gitdir;
 
+	config_source.repo = repo;
+
 	if (!repo->config)
 		CALLOC_ARRAY(repo->config, 1);
 	else
@@ -2681,7 +2687,7 @@ static void repo_read_config(struct repository *repo)
 	data.config_set = repo->config;
 	data.config_reader = &the_reader;
 
-	if (config_with_options(config_set_callback, &data, NULL, &opts) < 0)
+	if (config_with_options(config_set_callback, &data, &config_source, &opts) < 0)
 		/*
 		 * config_with_options() normally returns only
 		 * zero, as most errors are fatal, and
@@ -3337,7 +3343,7 @@ int repo_config_set_worktree_gently(struct repository *r,
 				    const char *key, const char *value)
 {
 	/* Only use worktree-specific config if it is already enabled. */
-	if (repository_format_worktree_config) {
+	if (r->repository_format_worktree_config) {
 		char *file = repo_git_path(r, "config.worktree");
 		int ret = git_config_set_multivar_in_file_gently(
 					file, key, value, NULL, 0);
diff --git a/environment.c b/environment.c
index 28d18eaca8e..6bd001efbde 100644
--- a/environment.c
+++ b/environment.c
@@ -42,7 +42,6 @@ int is_bare_repository_cfg = -1; /* unspecified */
 int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int repository_format_precious_objects;
-int repository_format_worktree_config;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 char *apply_default_whitespace;
diff --git a/environment.h b/environment.h
index 30cb7e0fa34..e6668079269 100644
--- a/environment.h
+++ b/environment.h
@@ -197,7 +197,6 @@ extern char *notes_ref_name;
 extern int grafts_replace_parents;
 
 extern int repository_format_precious_objects;
-extern int repository_format_worktree_config;
 
 /*
  * Create a temporary file rooted in the object database directory, or
diff --git a/repository.c b/repository.c
index c53e480e326..104960f8f59 100644
--- a/repository.c
+++ b/repository.c
@@ -182,6 +182,7 @@ int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
+	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
 	repo->repository_format_partial_clone = format.partial_clone;
diff --git a/repository.h b/repository.h
index 1a13ff28677..74ae26635a4 100644
--- a/repository.h
+++ b/repository.h
@@ -163,6 +163,7 @@ struct repository {
 	struct promisor_remote_config *promisor_remote_config;
 
 	/* Configurations */
+	int repository_format_worktree_config;
 
 	/* Indicate if a repository has a different 'commondir' from 'gitdir' */
 	unsigned different_commondir:1;
diff --git a/setup.c b/setup.c
index 458582207ea..d8663954350 100644
--- a/setup.c
+++ b/setup.c
@@ -650,11 +650,10 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 	string_list_clear(&candidate->v1_only_extensions, 0);
 
-	if (repository_format_worktree_config) {
+	if (candidate->worktree_config) {
 		/*
 		 * pick up core.bare and core.worktree from per-worktree
 		 * config if present
@@ -1423,6 +1422,9 @@ int discover_git_directory(struct strbuf *commondir,
 		return -1;
 	}
 
+	the_repository->repository_format_worktree_config =
+		candidate.worktree_config;
+
 	/* take ownership of candidate.partial_clone */
 	the_repository->repository_format_partial_clone =
 		candidate.partial_clone;
@@ -1560,6 +1562,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		}
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->repository_format_worktree_config =
+				repo_fmt.worktree_config;
 			/* take ownership of repo_fmt.partial_clone */
 			the_repository->repository_format_partial_clone =
 				repo_fmt.partial_clone;
@@ -1651,6 +1655,8 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
+	the_repository->repository_format_worktree_config =
+		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =
 		xstrdup_or_null(fmt->partial_clone);
 	clear_repository_format(&repo_fmt);
diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh
index e35c203241f..6d0bacef4de 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -309,19 +309,30 @@ test_expect_success '--recurse-submodules parses submodule repo config' '
 test_expect_success '--recurse-submodules parses submodule worktree config' '
 	test_when_finished "git -C submodule config --unset extensions.worktreeConfig" &&
 	test_when_finished "git -C submodule config --worktree --unset feature.experimental" &&
-	test_when_finished "git config --unset extensions.worktreeConfig" &&
 
 	git -C submodule config extensions.worktreeConfig true &&
 	git -C submodule config --worktree feature.experimental "invalid non-boolean value" &&
 
-	# NEEDSWORK: the extensions.worktreeConfig is set globally based on super
-	# project, so we need to enable it in the super project.
-	git config extensions.worktreeConfig true &&
-
 	test_must_fail git ls-files --recurse-submodules 2>err &&
 	grep "bad boolean config value" err
 '
 
+test_expect_success '--recurse-submodules submodules ignore super project worktreeConfig extension' '
+	test_when_finished "git config --unset extensions.worktreeConfig" &&
+
+	# Enable worktree config in both super project & submodule, set an
+	# invalid config in the submodule worktree config, then disable worktree
+	# config in the submodule. The invalid worktree config should not be
+	# picked up.
+	git config extensions.worktreeConfig true &&
+	git -C submodule config extensions.worktreeConfig true &&
+	git -C submodule config --worktree feature.experimental "invalid non-boolean value" &&
+	git -C submodule config --unset extensions.worktreeConfig &&
+
+	git ls-files --recurse-submodules 2>err &&
+	! grep "bad boolean config value" err
+'
+
 test_incompatible_with_recurse_submodules () {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
 		test_must_fail git ls-files --recurse-submodules $1 2>actual &&
diff --git a/worktree.c b/worktree.c
index b5ee71c5ebd..c448fecd4b3 100644
--- a/worktree.c
+++ b/worktree.c
@@ -806,7 +806,7 @@ int init_worktree_config(struct repository *r)
 	 * If the extension is already enabled, then we can skip the
 	 * upgrade process.
 	 */
-	if (repository_format_worktree_config)
+	if (r->repository_format_worktree_config)
 		return 0;
 	if ((res = git_config_set_gently("extensions.worktreeConfig", "true")))
 		return error(_("failed to set extensions.worktreeConfig setting"));
@@ -846,7 +846,7 @@ int init_worktree_config(struct repository *r)
 	 * Ensure that we use worktree config for the remaining lifetime
 	 * of the current process.
 	 */
-	repository_format_worktree_config = 1;
+	r->repository_format_worktree_config = 1;
 
 cleanup:
 	git_configset_clear(&cs);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] Fix behavior of worktree config in submodules
  2023-05-23 23:17 [PATCH 0/2] Fix behavior of worktree config in submodules Victoria Dye via GitGitGadget
  2023-05-23 23:17 ` [PATCH 1/2] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
  2023-05-23 23:17 ` [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
@ 2023-05-24 10:25 ` Junio C Hamano
  2023-05-25 19:56 ` Glen Choo
  2023-05-26  1:32 ` [PATCH v2 0/3] " Victoria Dye via GitGitGadget
  4 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2023-05-24 10:25 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, derrickstolee, chooglen, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> About a year ago, discussion on the sparse index integration of 'git grep'
> surfaced larger incompatibilities between sparse-checkout and submodules
> [1]. This series fixes one of the underlying issues to that incompatibility,
> which is that the worktree config of the submodule (where
> 'core.sparseCheckout', 'core.sparseCheckoutCone', and 'index.sparse' are
> set) is not used when operating on the submodule from its super project
> (e.g., in a command with '--recurse-submodules').

OK.  So in short, worktreeConfig used to be a singleton, global for
the entire git process, but because "git grep" (and possibly others)
wants to operate across repository boundary when recursing into
submodules, the repository_format_worktree_config needs to be per
repository instance, not a global singleton.

Makes sense.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] config: use gitdir to get worktree config
  2023-05-23 23:17 ` [PATCH 1/2] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
@ 2023-05-25  1:05   ` Glen Choo
  2023-05-25 20:05     ` Derrick Stolee
  0 siblings, 1 reply; 29+ messages in thread
From: Glen Choo @ 2023-05-25  1:05 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: derrickstolee, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Update 'do_git_config_sequence()' to read the worktree config from
> 'config.worktree' in 'opts->git_dir' rather than the gitdir of
> 'the_repository'.

Thanks for the patches! This makes sense. do_git_config_sequence() is
eventually called by repo_config(), which is supposed to read config
into a "struct repository", so any reliance on the_repository's settings
is wrong.

>                                        Note that behavior still isn't ideal
> because 'extensions.worktreeConfig' in the super project[...]

Nit: We typically use "superproject" without the space.

> diff --git a/config.c b/config.c
> index b79baf83e35..a93f7bfa3aa 100644
> --- a/config.c
> +++ b/config.c
> @@ -2200,14 +2200,24 @@ static int do_git_config_sequence(struct config_reader *reader,
>  	char *xdg_config = NULL;
>  	char *user_config = NULL;
>  	char *repo_config;
> +	char *worktree_config;
>  	enum config_scope prev_parsing_scope = reader->parsing_scope;
>  
> -	if (opts->commondir)
> +	/*
> +	 * Ensure that either:
> +	 * - the git_dir and commondir are both set, or
> +	 * - the git_dir and commondir are both NULL
> +	 */
> +	if (!opts->git_dir != !opts->commondir)
> +		BUG("only one of commondir and git_dir is non-NULL");
> +
> +	if (opts->commondir) {
>  		repo_config = mkpathdup("%s/config", opts->commondir);
> -	else if (opts->git_dir)
> -		BUG("git_dir without commondir");
> -	else
> +		worktree_config = mkpathdup("%s/config.worktree", opts->git_dir);
> +	} else {
>  		repo_config = NULL;
> +		worktree_config = NULL;
> +	}

Makes sense to me. I don't see why we would ever want to set one without
the other.

I looked into whether we could get replace opts->commondir and
opts->git_dir with a "struct repository" arg, but unfortunately
read_early_config() needs to pass these values without touching
"the_repository".

> diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh
> index dd7770e85de..e35c203241f 100755
> --- a/t/t3007-ls-files-recurse-submodules.sh
> +++ b/t/t3007-ls-files-recurse-submodules.sh
> @@ -299,6 +299,29 @@ test_expect_success '--recurse-submodules does not support --error-unmatch' '
>  	test_i18ngrep "does not support --error-unmatch" actual
>  '
>  
> +test_expect_success '--recurse-submodules parses submodule repo config' '
> +	test_when_finished "git -C submodule config --unset feature.experimental" &&
> +	git -C submodule config feature.experimental "invalid non-boolean value" &&
> +	test_must_fail git ls-files --recurse-submodules 2>err &&
> +	grep "bad boolean config value" err
> +'

This test has a few bits that are important but non-obvious. It would be
useful to capture them in either the commit message or a comment.

Firstly, we can't test this using "git config" because that only uses
the_repository, and we specifically need to read config in-core into a
"struct repository" that is a submodule, so we need a command that
recurses into a submodule without using subprocesses. IIRC the only
choices are "git grep" and "git ls-files".

Secondly, when we test that config is read from the submodule the choice
of "feature.experimental" is quite important. The config is read quite
indirectly: "git ls-files" reads from the submodule's index, which
will call prepare_repo_settings() on the submodule, and eventually calls
repo_config_get_bool() on "feature.experimental". Any of the configs in
prepare_repo_settings() should do, though. A tiny suggestion would be to
use "index.sparse" instead of "feature.experimental", since (I presume)
we'll have to add sparse index + submodule tests for "git ls-files"
eventually.

> +test_expect_success '--recurse-submodules parses submodule worktree config' '
> +	test_when_finished "git -C submodule config --unset extensions.worktreeConfig" &&

I believe "test_config -C" will achieve the desired effect.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope
  2023-05-23 23:17 ` [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
@ 2023-05-25  1:29   ` Glen Choo
  2023-05-25 16:09     ` Glen Choo
  2023-05-25 20:13   ` Derrick Stolee
  1 sibling, 1 reply; 29+ messages in thread
From: Glen Choo @ 2023-05-25  1:29 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: derrickstolee, Victoria Dye

Here's a quick response on the config.c bits, I haven't looked through
the global-removing parts closely yet. Rearranging the hunks for
clarity...

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:


> @@ -2667,11 +2670,14 @@ static void repo_read_config(struct repository *repo)
>  {
>  	struct config_options opts = { 0 };
>  	struct configset_add_data data = CONFIGSET_ADD_INIT;
> +	struct git_config_source config_source = { 0 };
>  
>  	opts.respect_includes = 1;
>  	opts.commondir = repo->commondir;
>  	opts.git_dir = repo->gitdir;
>  
> +	config_source.repo = repo;
> +
>  	if (!repo->config)
>  		CALLOC_ARRAY(repo->config, 1);
>  	else
> @@ -2681,7 +2687,7 @@ static void repo_read_config(struct repository *repo)
>  	data.config_set = repo->config;
>  	data.config_reader = &the_reader;
>  
> -	if (config_with_options(config_set_callback, &data, NULL, &opts) < 0)
> +	if (config_with_options(config_set_callback, &data, &config_source, &opts) < 0)
>  		/*
>  		 * config_with_options() normally returns only
>  		 * zero, as most errors are fatal, and

I think it would be better to pass a "struct repository" arg to
config_with_options() instead of mocking a config_source to hold a .repo
member. config_with_options() does double duty - it either discovers and
reads the configs for the repo (system, global, worktree, etc), or it
reads the config from just config_source. From this perspective, it
doesn't make sense that the caller can pass config_source but
config_with_options() will still discover and read all configs, and I
think the only reason why this behavior is supported at all is that
builtin/config.c sometimes "reads all config" and sometimes "reads from
a single file", but sloppily passes a non-NULL "config_source" arg
unconditionally.

> diff --git a/config.c b/config.c
> index a93f7bfa3aa..9ce2ffff5e1 100644
> --- a/config.c
> +++ b/config.c
> @@ -2277,7 +2278,7 @@ int config_with_options(config_fn_t fn, void *data,
>  		data = &inc;
>  	}
>  
> -	if (config_source)
> +	if (config_source && config_source->scope != CONFIG_SCOPE_UNKNOWN)
>  		config_reader_set_scope(&the_reader, config_source->scope);
>  
>  	/*

The aforemented change would also let us get rid of this, which might
not always be correct. I think there might be cases where the scope is
actually unknown, but I'm not sure if we have any of those situations
in-tree.

> diff --git a/environment.c b/environment.c
> index 28d18eaca8e..6bd001efbde 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -42,7 +42,6 @@ int is_bare_repository_cfg = -1; /* unspecified */
>  int warn_ambiguous_refs = 1;
>  int warn_on_object_refname_ambiguity = 1;
>  int repository_format_precious_objects;
> -int repository_format_worktree_config;
>  const char *git_commit_encoding;
>  const char *git_log_output_encoding;
>  char *apply_default_whitespace;

As an aside, I'm really happy to lose another global :)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope
  2023-05-25  1:29   ` Glen Choo
@ 2023-05-25 16:09     ` Glen Choo
  2023-05-25 20:02       ` Victoria Dye
  0 siblings, 1 reply; 29+ messages in thread
From: Glen Choo @ 2023-05-25 16:09 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: derrickstolee, Victoria Dye

Glen Choo <chooglen@google.com> writes:

> I think it would be better to pass a "struct repository" arg to
> config_with_options() instead of mocking a config_source to hold a .repo
> member.

The flipside is that this would be redundant with an existing use of
git_config_source.repo, so for consistency, we should probably remove
git_config_source.repo. There's only one user of git_config_source.repo
- reading .gitmodules from a blob. It probably made sense to add .repo
the time, but now that we have a second, different use of "struct
repository", accepting an arg is probably better.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] Fix behavior of worktree config in submodules
  2023-05-23 23:17 [PATCH 0/2] Fix behavior of worktree config in submodules Victoria Dye via GitGitGadget
                   ` (2 preceding siblings ...)
  2023-05-24 10:25 ` [PATCH 0/2] Fix behavior of worktree config in submodules Junio C Hamano
@ 2023-05-25 19:56 ` Glen Choo
  2023-05-26  1:32 ` [PATCH v2 0/3] " Victoria Dye via GitGitGadget
  4 siblings, 0 replies; 29+ messages in thread
From: Glen Choo @ 2023-05-25 19:56 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: derrickstolee, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The outcome of this series is that 'extensions.worktreeConfig' and the
> contents of the repository's worktree config are read and applied to (and
> only to) the relevant repo when working in a super project/submodule setup.

Thanks sending these patches! This is an obvious bug, and squashing
another global is always helpful.

> I'm
> also hoping this will help (or at least not hurt) the work to avoid use of
> global state in config parsing [2].

Both pieces of work touch different bits I think - yours plumbs and
reads a "struct repository" before the config iterating begins, mine
plumbs config source information after the config interating - so
they don't help each other, but at least the conflicts are only textual.

Junio: There are quite a few conflicts. AFAICT each only has to be
resolved once (IOW the number of conflicts to be fixed is the same with
"git merge" vs "git rebase"). I have a v2 that I'm about to send, and if
you'd prefer (especially since you're on half-vacation), I can base it
off Victoria's topic, which looks pretty straightforward and will
probably get merged pretty quickly.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope
  2023-05-25 16:09     ` Glen Choo
@ 2023-05-25 20:02       ` Victoria Dye
  0 siblings, 0 replies; 29+ messages in thread
From: Victoria Dye @ 2023-05-25 20:02 UTC (permalink / raw)
  To: Glen Choo, Victoria Dye via GitGitGadget, git; +Cc: derrickstolee

Glen Choo wrote:
> Glen Choo <chooglen@google.com> writes:
> 
>> I think it would be better to pass a "struct repository" arg to
>> config_with_options() instead of mocking a config_source to hold a .repo
>> member.
> 
> The flipside is that this would be redundant with an existing use of
> git_config_source.repo, so for consistency, we should probably remove
> git_config_source.repo. There's only one user of git_config_source.repo
> - reading .gitmodules from a blob. It probably made sense to add .repo
> the time, but now that we have a second, different use of "struct
> repository", accepting an arg is probably better.

Agreed, I'd much prefer having a single 'struct repository' instance used in
the context of config parsing (if they ever had different values, it
probably wouldn't be immediately obvious for debugging). This and all of
your other recommendations seem reasonable to me, I'll re-roll shortly.

Thanks for the review!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] config: use gitdir to get worktree config
  2023-05-25  1:05   ` Glen Choo
@ 2023-05-25 20:05     ` Derrick Stolee
  0 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2023-05-25 20:05 UTC (permalink / raw)
  To: Glen Choo, Victoria Dye via GitGitGadget, git; +Cc: Victoria Dye

On 5/24/2023 9:05 PM, Glen Choo wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Update 'do_git_config_sequence()' to read the worktree config from
>> 'config.worktree' in 'opts->git_dir' rather than the gitdir of
>> 'the_repository'.
> 
> Thanks for the patches! This makes sense. do_git_config_sequence() is
> eventually called by repo_config(), which is supposed to read config
> into a "struct repository", so any reliance on the_repository's settings
> is wrong.

>> +test_expect_success '--recurse-submodules parses submodule repo config' '
>> +	test_when_finished "git -C submodule config --unset feature.experimental" &&
>> +	git -C submodule config feature.experimental "invalid non-boolean value" &&
>> +	test_must_fail git ls-files --recurse-submodules 2>err &&
>> +	grep "bad boolean config value" err
>> +'
> 
> This test has a few bits that are important but non-obvious. It would be
> useful to capture them in either the commit message or a comment.
> 
> Firstly, we can't test this using "git config" because that only uses
> the_repository, and we specifically need to read config in-core into a
> "struct repository" that is a submodule, so we need a command that
> recurses into a submodule without using subprocesses. IIRC the only
> choices are "git grep" and "git ls-files".
> 
> Secondly, when we test that config is read from the submodule the choice
> of "feature.experimental" is quite important. The config is read quite
> indirectly: "git ls-files" reads from the submodule's index, which
> will call prepare_repo_settings() on the submodule, and eventually calls
> repo_config_get_bool() on "feature.experimental". Any of the configs in
> prepare_repo_settings() should do, though. A tiny suggestion would be to
> use "index.sparse" instead of "feature.experimental", since (I presume)
> we'll have to add sparse index + submodule tests for "git ls-files"
> eventually.

Some of the points you bring up are definitely subtle, like the choice
of config variable.

I appreciate that there are two tests here: one to verify the test
checks have a similar effect without using the worktree config, and
then a second test to show the same behavior with worktree config.

If I understand correctly, the first test would pass without this
code change, but it is a helpful one to help add confidence in the
second test.
 
> +test_expect_success '--recurse-submodules parses submodule worktree config' '
>> +	test_when_finished "git -C submodule config --unset extensions.worktreeConfig" &&
> 
> I believe "test_config -C" will achieve the desired effect.

This should work, though it requires acting a bit strangely, at least
if we want to replace the 'git config --worktree' command.

test_config treats the positions of the arguments as special, so we
would need to write it as:

 test_config -C submodule feature.experimental --worktree "non boolean value"

and that's assuming that 'git -C submodule config feature.experimental
--worktree "non boolean value"' is parsed correctly to use the --worktree
argument. (I haven't tried it.) By using this order, that allows the
test_config helper to run the appropriate 'test_when_finished git config
--unset feature.experimental' command.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope
  2023-05-23 23:17 ` [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
  2023-05-25  1:29   ` Glen Choo
@ 2023-05-25 20:13   ` Derrick Stolee
  1 sibling, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2023-05-25 20:13 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: chooglen, Victoria Dye

On 5/23/2023 7:17 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Move 'repository_format_worktree_config' out of the global scope and into
> the 'repository' struct. This change is similar to how
> 'repository_format_partial_clone' was moved in ebaf3bcf1ae (repository: move
> global r_f_p_c to repo struct, 2021-06-17), adding to the 'repository'
> struct and updating 'setup.c' & 'repository.c' functions to assign the value
> appropriately. In addition, update usage of the setting to reference the
> relevant context's repo or, as a fallback, 'the_repository'.
> 
> The primary goal of this change is to be able to load worktree config for a
> submodule depending on whether that submodule - not the super project - has
> 'extensions.worktreeConfig' enabled. To ensure 'do_git_config_sequence()'
> has access to the newly repo-scoped configuration:
> 
> - update 'repo_read_config()' to create a 'config_source' to hold the
>   repo instance
> - add a 'repo' argument to 'do_git_config_sequence()'
> - update 'config_with_options' to call 'do_git_config_sequence()' with
>   'config_source.repo', or 'the_repository' as a fallback
> 
> Finally, add/update tests in 't3007-ls-files-recurse-submodules.sh' to
> verify 'extensions.worktreeConfig' is read an used independently by super
> projects and submodules.
 
> @@ -2277,7 +2278,7 @@ int config_with_options(config_fn_t fn, void *data,
>  		data = &inc;
>  	}
>  
> -	if (config_source)
> +	if (config_source && config_source->scope != CONFIG_SCOPE_UNKNOWN)
>  		config_reader_set_scope(&the_reader, config_source->scope);

This extra condition on config_source->scope surprised me. Could you
elaborate on the reason this is necessary?

> @@ -2667,11 +2670,14 @@ static void repo_read_config(struct repository *repo)
>  {
>  	struct config_options opts = { 0 };
>  	struct configset_add_data data = CONFIGSET_ADD_INIT;
> +	struct git_config_source config_source = { 0 };

This could be...

	struct git_config_source config_source = { .repo = repo };

>  
>  	opts.respect_includes = 1;
>  	opts.commondir = repo->commondir;
>  	opts.git_dir = repo->gitdir;
>  
> +	config_source.repo = repo;
> +

...avoiding these lines.

> diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh
> index e35c203241f..6d0bacef4de 100755
> --- a/t/t3007-ls-files-recurse-submodules.sh
> +++ b/t/t3007-ls-files-recurse-submodules.sh
> @@ -309,19 +309,30 @@ test_expect_success '--recurse-submodules parses submodule repo config' '
>  test_expect_success '--recurse-submodules parses submodule worktree config' '
>  	test_when_finished "git -C submodule config --unset extensions.worktreeConfig" &&
>  	test_when_finished "git -C submodule config --worktree --unset feature.experimental" &&
> -	test_when_finished "git config --unset extensions.worktreeConfig" &&
>  
>  	git -C submodule config extensions.worktreeConfig true &&
>  	git -C submodule config --worktree feature.experimental "invalid non-boolean value" &&
>  
> -	# NEEDSWORK: the extensions.worktreeConfig is set globally based on super
> -	# project, so we need to enable it in the super project.
> -	git config extensions.worktreeConfig true &&
> -
>  	test_must_fail git ls-files --recurse-submodules 2>err &&
>  	grep "bad boolean config value" err
>  '

These are my favorite kind of test updates: deleting extra setup that's no
longer needed.

> +test_expect_success '--recurse-submodules submodules ignore super project worktreeConfig extension' '
> +	test_when_finished "git config --unset extensions.worktreeConfig" &&
> +
> +	# Enable worktree config in both super project & submodule, set an
> +	# invalid config in the submodule worktree config, then disable worktree
> +	# config in the submodule. The invalid worktree config should not be
> +	# picked up.
> +	git config extensions.worktreeConfig true &&
> +	git -C submodule config extensions.worktreeConfig true &&
> +	git -C submodule config --worktree feature.experimental "invalid non-boolean value" &&
> +	git -C submodule config --unset extensions.worktreeConfig &&
> +
> +	git ls-files --recurse-submodules 2>err &&
> +	! grep "bad boolean config value" err
> +'

We have the same ways to improve here using 'test_config' as recommended
in patch 1.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 0/3] Fix behavior of worktree config in submodules
  2023-05-23 23:17 [PATCH 0/2] Fix behavior of worktree config in submodules Victoria Dye via GitGitGadget
                   ` (3 preceding siblings ...)
  2023-05-25 19:56 ` Glen Choo
@ 2023-05-26  1:32 ` Victoria Dye via GitGitGadget
  2023-05-26  1:32   ` [PATCH v2 1/3] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
                     ` (4 more replies)
  4 siblings, 5 replies; 29+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-05-26  1:32 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, chooglen, gitster, Victoria Dye

About a year ago, discussion on the sparse index integration of 'git grep'
surfaced larger incompatibilities between sparse-checkout and submodules
[1]. This series fixes one of the underlying issues to that incompatibility,
which is that the worktree config of the submodule (where
'core.sparseCheckout', 'core.sparseCheckoutCone', and 'index.sparse' are
set) is not used when operating on the submodule from its super project
(e.g., in a command with '--recurse-submodules').

The outcome of this series is that 'extensions.worktreeConfig' and the
contents of the repository's worktree config are read and applied to (and
only to) the relevant repo when working in a super project/submodule setup.
This alone doesn't fix sparse-checkout/submodule interoperability; the
additional changes needed for that will be submitted in a later series. I'm
also hoping this will help (or at least not hurt) the work to avoid use of
global state in config parsing [2].


Changes since V1
================

 * In 't3007', replaced manual 'git config'/'test_when_finished "git config
   --unset"' pairs with 'test_config' helper. Updated 'test_config' to
   handle the '--worktree' option.
 * Updated commit messages & test comments to better explain the purpose and
   more subtle functionality details to the new tests
 * Added a commit to move 'struct repository' out of 'git_config_source',
   rather than creating a dummy 'config_source' just to hold a repository
   instance.
 * Changed the config setting in the new tests from 'feature.experimental'
   to 'index.sparse' to tie these changes to their intended use case.
 * "super project" -> "superproject"

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/093827ae-41ef-5f7c-7829-647536ce1305@github.com/
[2]
https://lore.kernel.org/git/pull.1497.git.git.1682104398.gitgitgadget@gmail.com/

Victoria Dye (3):
  config: use gitdir to get worktree config
  config: pass 'repo' directly to 'config_with_options()'
  repository: move 'repository_format_worktree_config' to repo scope

 builtin/config.c                       | 17 +++++----
 builtin/worktree.c                     |  2 +-
 config.c                               | 49 ++++++++++++++++----------
 config.h                               |  4 +--
 environment.c                          |  1 -
 environment.h                          |  1 -
 repository.c                           |  1 +
 repository.h                           |  1 +
 setup.c                                | 10 ++++--
 submodule-config.c                     |  3 +-
 t/t3007-ls-files-recurse-submodules.sh | 33 +++++++++++++++++
 t/test-lib-functions.sh                | 13 +++++--
 worktree.c                             |  4 +--
 13 files changed, 102 insertions(+), 37 deletions(-)


base-commit: 4a714b37029a4b63dbd22f7d7ed81f7a0d693680
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1536%2Fvdye%2Fvdye%2Fsubmodule-worktree-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1536/vdye/vdye/submodule-worktree-config-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1536

Range-diff vs v1:

 1:  aead2fe1ce1 ! 1:  fb597cdfeb0 config: use gitdir to get worktree config
     @@ Commit message
          The worktree config is loaded from the path returned by
          'git_pathdup("config.worktree")', the 'config.worktree' relative to the
          gitdir of 'the_repository'. If loading the config for a submodule, this path
     -    is incorrect, since 'the_repository' is the super project. Conversely,
     -    'opts->git_dir' is the gitdir of the submodule being configured, so the
     -    config file in that location should be read instead.
     +    is incorrect, since 'the_repository' is the superproject. 'opts->git_dir' is
     +    the gitdir of the submodule being configured, so the config file in that
     +    location should be read instead.
      
          To ensure the use of 'opts->git_dir' is safe, require that 'opts->git_dir'
          is set if-and-only-if 'opts->commondir' is set (rather than "only-if" as it
          is now). In all current usage of 'config_options', these values are set
          together, so the stricter check does not change any behavior.
      
     -    Finally, add tests to 't3007-ls-files-recurse-submodules.sh' to demonstrate
     -    the corrected config loading behavior. Note that behavior still isn't ideal
     -    because 'extensions.worktreeConfig' in the super project controls whether or
     -    not the worktree config is used in the submodule. This will be fixed in a
     -    later patch.
     +    Finally, add tests to 't3007-ls-files-recurse-submodules.sh' to verify the
     +    corrected config is loaded. Use 'ls-files' to test this because, unlike some
     +    other '--recurse-submodules' commands, 'ls-files' parses the config of the
     +    submodule in the same process as the superproject (via 'show_submodule()' ->
     +    'repo_read_index()' -> 'prepare_repo_settings()'). As a result,
     +    'the_repository' points to the config of the superproject but the
     +    commondir/gitdir in the config sequence will be that of the submodule,
     +    providing the exact scenario needed to verify this patch.
     +
     +    The first test ('--recurse-submodules parses submodule repo config') checks
     +    that the submodule's *repo* config is read when running 'ls-files' on the
     +    superproject; this confirms already-working behavior, serving as a reference
     +    for how worktree config parsing should behave. The second test
     +    ('--recurse-submodules parses submodule worktree config') tests the same
     +    scenario as the previous but instead using the *worktree* config,
     +    demonstrating the corrected behavior. The 'test_config' helper is extended
     +    for this case so that it properly applies the '--worktree' option to the
     +    configure/unconfigure operations it performs.
     +
     +    Note that, although the submodule worktree config is now parsed instead of
     +    the superproject's, 'extensions.worktreeConfig' in the superproject still
     +    controls whether or not the worktree config is enabled at all in the
     +    submodule. This will be fixed in a later patch.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     @@ t/t3007-ls-files-recurse-submodules.sh: test_expect_success '--recurse-submodule
       '
       
      +test_expect_success '--recurse-submodules parses submodule repo config' '
     -+	test_when_finished "git -C submodule config --unset feature.experimental" &&
     -+	git -C submodule config feature.experimental "invalid non-boolean value" &&
     ++	test_config -C submodule index.sparse "invalid non-boolean value" &&
      +	test_must_fail git ls-files --recurse-submodules 2>err &&
      +	grep "bad boolean config value" err
      +'
      +
      +test_expect_success '--recurse-submodules parses submodule worktree config' '
     -+	test_when_finished "git -C submodule config --unset extensions.worktreeConfig" &&
     -+	test_when_finished "git -C submodule config --worktree --unset feature.experimental" &&
     -+	test_when_finished "git config --unset extensions.worktreeConfig" &&
     ++	test_config -C submodule extensions.worktreeConfig true &&
     ++	test_config -C submodule --worktree index.sparse "invalid non-boolean value" &&
      +
     -+	git -C submodule config extensions.worktreeConfig true &&
     -+	git -C submodule config --worktree feature.experimental "invalid non-boolean value" &&
     -+
     -+	# NEEDSWORK: the extensions.worktreeConfig is set globally based on super
     -+	# project, so we need to enable it in the super project.
     -+	git config extensions.worktreeConfig true &&
     ++	# NEEDSWORK: the extensions.worktreeConfig is set globally based on
     ++	# superproject, so we need to enable it in the superproject.
     ++	test_config extensions.worktreeConfig true &&
      +
      +	test_must_fail git ls-files --recurse-submodules 2>err &&
      +	grep "bad boolean config value" err
     @@ t/t3007-ls-files-recurse-submodules.sh: test_expect_success '--recurse-submodule
       test_incompatible_with_recurse_submodules () {
       	test_expect_success "--recurse-submodules and $1 are incompatible" "
       		test_must_fail git ls-files --recurse-submodules $1 2>actual &&
     +
     + ## t/test-lib-functions.sh ##
     +@@ t/test-lib-functions.sh: test_config () {
     + 		config_dir=$1
     + 		shift
     + 	fi
     +-	test_when_finished "test_unconfig ${config_dir:+-C '$config_dir'} '$1'" &&
     +-	git ${config_dir:+-C "$config_dir"} config "$@"
     ++
     ++	# If --worktree is provided, use it to configure/unconfigure
     ++	is_worktree=
     ++	if test "$1" = --worktree
     ++	then
     ++		is_worktree=1
     ++		shift
     ++	fi
     ++
     ++	test_when_finished "test_unconfig ${config_dir:+-C '$config_dir'} ${is_worktree:+--worktree} '$1'" &&
     ++	git ${config_dir:+-C "$config_dir"} config ${is_worktree:+--worktree} "$@"
     + }
     + 
     + test_config_global () {
 -:  ----------- > 2:  26a36423a8a config: pass 'repo' directly to 'config_with_options()'
 2:  5ed9100a770 ! 3:  506a2cf8c73 repository: move 'repository_format_worktree_config' to repo scope
     @@ Commit message
          Move 'repository_format_worktree_config' out of the global scope and into
          the 'repository' struct. This change is similar to how
          'repository_format_partial_clone' was moved in ebaf3bcf1ae (repository: move
     -    global r_f_p_c to repo struct, 2021-06-17), adding to the 'repository'
     +    global r_f_p_c to repo struct, 2021-06-17), adding it to the 'repository'
          struct and updating 'setup.c' & 'repository.c' functions to assign the value
     -    appropriately. In addition, update usage of the setting to reference the
     -    relevant context's repo or, as a fallback, 'the_repository'.
     +    appropriately.
      
     -    The primary goal of this change is to be able to load worktree config for a
     -    submodule depending on whether that submodule - not the super project - has
     +    The primary goal of this change is to be able to load the worktree config of
     +    a submodule depending on whether that submodule - not its superproject - has
          'extensions.worktreeConfig' enabled. To ensure 'do_git_config_sequence()'
     -    has access to the newly repo-scoped configuration:
     -
     -    - update 'repo_read_config()' to create a 'config_source' to hold the
     -      repo instance
     -    - add a 'repo' argument to 'do_git_config_sequence()'
     -    - update 'config_with_options' to call 'do_git_config_sequence()' with
     -      'config_source.repo', or 'the_repository' as a fallback
     +    has access to the newly repo-scoped configuration, add a 'struct repository'
     +    argument to 'do_git_config_sequence()' and pass it the 'repo' value from
     +    'config_with_options()'.
      
          Finally, add/update tests in 't3007-ls-files-recurse-submodules.sh' to
     -    verify 'extensions.worktreeConfig' is read an used independently by super
     -    projects and submodules.
     +    verify 'extensions.worktreeConfig' is read an used independently by
     +    superprojects and submodules.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     @@ config.c: static int do_git_config_sequence(struct config_reader *reader,
       	    !access_or_die(worktree_config, R_OK, 0)) {
       		ret += git_config_from_file(fn, worktree_config, data);
       	}
     -@@ config.c: int config_with_options(config_fn_t fn, void *data,
     - 		data = &inc;
     - 	}
     - 
     --	if (config_source)
     -+	if (config_source && config_source->scope != CONFIG_SCOPE_UNKNOWN)
     - 		config_reader_set_scope(&the_reader, config_source->scope);
     - 
     - 	/*
      @@ config.c: int config_with_options(config_fn_t fn, void *data,
       		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
       						data);
       	} else {
      -		ret = do_git_config_sequence(&the_reader, opts, fn, data);
     -+		struct repository *repo = config_source && config_source->repo ?
     -+			config_source->repo : the_repository;
      +		ret = do_git_config_sequence(&the_reader, opts, repo, fn, data);
       	}
       
       	if (inc.remote_urls) {
     -@@ config.c: static void repo_read_config(struct repository *repo)
     - {
     - 	struct config_options opts = { 0 };
     - 	struct configset_add_data data = CONFIGSET_ADD_INIT;
     -+	struct git_config_source config_source = { 0 };
     - 
     - 	opts.respect_includes = 1;
     - 	opts.commondir = repo->commondir;
     - 	opts.git_dir = repo->gitdir;
     - 
     -+	config_source.repo = repo;
     -+
     - 	if (!repo->config)
     - 		CALLOC_ARRAY(repo->config, 1);
     - 	else
     -@@ config.c: static void repo_read_config(struct repository *repo)
     - 	data.config_set = repo->config;
     - 	data.config_reader = &the_reader;
     - 
     --	if (config_with_options(config_set_callback, &data, NULL, &opts) < 0)
     -+	if (config_with_options(config_set_callback, &data, &config_source, &opts) < 0)
     - 		/*
     - 		 * config_with_options() normally returns only
     - 		 * zero, as most errors are fatal, and
      @@ config.c: int repo_config_set_worktree_gently(struct repository *r,
       				    const char *key, const char *value)
       {
     @@ setup.c: void check_repository_format(struct repository_format *fmt)
       	clear_repository_format(&repo_fmt);
      
       ## t/t3007-ls-files-recurse-submodules.sh ##
     -@@ t/t3007-ls-files-recurse-submodules.sh: test_expect_success '--recurse-submodules parses submodule repo config' '
     - test_expect_success '--recurse-submodules parses submodule worktree config' '
     - 	test_when_finished "git -C submodule config --unset extensions.worktreeConfig" &&
     - 	test_when_finished "git -C submodule config --worktree --unset feature.experimental" &&
     --	test_when_finished "git config --unset extensions.worktreeConfig" &&
     +@@ t/t3007-ls-files-recurse-submodules.sh: test_expect_success '--recurse-submodules parses submodule worktree config' '
     + 	test_config -C submodule extensions.worktreeConfig true &&
     + 	test_config -C submodule --worktree index.sparse "invalid non-boolean value" &&
       
     - 	git -C submodule config extensions.worktreeConfig true &&
     - 	git -C submodule config --worktree feature.experimental "invalid non-boolean value" &&
     - 
     --	# NEEDSWORK: the extensions.worktreeConfig is set globally based on super
     --	# project, so we need to enable it in the super project.
     --	git config extensions.worktreeConfig true &&
     +-	# NEEDSWORK: the extensions.worktreeConfig is set globally based on
     +-	# superproject, so we need to enable it in the superproject.
     +-	test_config extensions.worktreeConfig true &&
      -
       	test_must_fail git ls-files --recurse-submodules 2>err &&
       	grep "bad boolean config value" err
       '
       
      +test_expect_success '--recurse-submodules submodules ignore super project worktreeConfig extension' '
     -+	test_when_finished "git config --unset extensions.worktreeConfig" &&
     -+
      +	# Enable worktree config in both super project & submodule, set an
     -+	# invalid config in the submodule worktree config, then disable worktree
     -+	# config in the submodule. The invalid worktree config should not be
     -+	# picked up.
     -+	git config extensions.worktreeConfig true &&
     -+	git -C submodule config extensions.worktreeConfig true &&
     -+	git -C submodule config --worktree feature.experimental "invalid non-boolean value" &&
     -+	git -C submodule config --unset extensions.worktreeConfig &&
     ++	# invalid config in the submodule worktree config
     ++	test_config extensions.worktreeConfig true &&
     ++	test_config -C submodule extensions.worktreeConfig true &&
     ++	test_config -C submodule --worktree index.sparse "invalid non-boolean value" &&
     ++
     ++	# Now, disable the worktree config in the submodule. Note that we need
     ++	# to manually re-enable extensions.worktreeConfig when the test is
     ++	# finished, otherwise the test_unconfig of index.sparse will not work.
     ++	test_unconfig -C submodule extensions.worktreeConfig &&
     ++	test_when_finished "git -C submodule config extensions.worktreeConfig true" &&
      +
     ++	# With extensions.worktreeConfig disabled in the submodule, the invalid
     ++	# worktree config is not picked up.
      +	git ls-files --recurse-submodules 2>err &&
      +	! grep "bad boolean config value" err
      +'

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/3] config: use gitdir to get worktree config
  2023-05-26  1:32 ` [PATCH v2 0/3] " Victoria Dye via GitGitGadget
@ 2023-05-26  1:32   ` Victoria Dye via GitGitGadget
  2023-05-26  1:32   ` [PATCH v2 2/3] config: pass 'repo' directly to 'config_with_options()' Victoria Dye via GitGitGadget
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-05-26  1:32 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, chooglen, gitster, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'do_git_config_sequence()' to read the worktree config from
'config.worktree' in 'opts->git_dir' rather than the gitdir of
'the_repository'.

The worktree config is loaded from the path returned by
'git_pathdup("config.worktree")', the 'config.worktree' relative to the
gitdir of 'the_repository'. If loading the config for a submodule, this path
is incorrect, since 'the_repository' is the superproject. 'opts->git_dir' is
the gitdir of the submodule being configured, so the config file in that
location should be read instead.

To ensure the use of 'opts->git_dir' is safe, require that 'opts->git_dir'
is set if-and-only-if 'opts->commondir' is set (rather than "only-if" as it
is now). In all current usage of 'config_options', these values are set
together, so the stricter check does not change any behavior.

Finally, add tests to 't3007-ls-files-recurse-submodules.sh' to verify the
corrected config is loaded. Use 'ls-files' to test this because, unlike some
other '--recurse-submodules' commands, 'ls-files' parses the config of the
submodule in the same process as the superproject (via 'show_submodule()' ->
'repo_read_index()' -> 'prepare_repo_settings()'). As a result,
'the_repository' points to the config of the superproject but the
commondir/gitdir in the config sequence will be that of the submodule,
providing the exact scenario needed to verify this patch.

The first test ('--recurse-submodules parses submodule repo config') checks
that the submodule's *repo* config is read when running 'ls-files' on the
superproject; this confirms already-working behavior, serving as a reference
for how worktree config parsing should behave. The second test
('--recurse-submodules parses submodule worktree config') tests the same
scenario as the previous but instead using the *worktree* config,
demonstrating the corrected behavior. The 'test_config' helper is extended
for this case so that it properly applies the '--worktree' option to the
configure/unconfigure operations it performs.

Note that, although the submodule worktree config is now parsed instead of
the superproject's, 'extensions.worktreeConfig' in the superproject still
controls whether or not the worktree config is enabled at all in the
submodule. This will be fixed in a later patch.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 config.c                               | 28 +++++++++++++++++---------
 t/t3007-ls-files-recurse-submodules.sh | 18 +++++++++++++++++
 t/test-lib-functions.sh                | 13 ++++++++++--
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/config.c b/config.c
index b79baf83e35..a93f7bfa3aa 100644
--- a/config.c
+++ b/config.c
@@ -2200,14 +2200,24 @@ static int do_git_config_sequence(struct config_reader *reader,
 	char *xdg_config = NULL;
 	char *user_config = NULL;
 	char *repo_config;
+	char *worktree_config;
 	enum config_scope prev_parsing_scope = reader->parsing_scope;
 
-	if (opts->commondir)
+	/*
+	 * Ensure that either:
+	 * - the git_dir and commondir are both set, or
+	 * - the git_dir and commondir are both NULL
+	 */
+	if (!opts->git_dir != !opts->commondir)
+		BUG("only one of commondir and git_dir is non-NULL");
+
+	if (opts->commondir) {
 		repo_config = mkpathdup("%s/config", opts->commondir);
-	else if (opts->git_dir)
-		BUG("git_dir without commondir");
-	else
+		worktree_config = mkpathdup("%s/config.worktree", opts->git_dir);
+	} else {
 		repo_config = NULL;
+		worktree_config = NULL;
+	}
 
 	config_reader_set_scope(reader, CONFIG_SCOPE_SYSTEM);
 	if (git_config_system() && system_config &&
@@ -2230,11 +2240,10 @@ static int do_git_config_sequence(struct config_reader *reader,
 		ret += git_config_from_file(fn, repo_config, data);
 
 	config_reader_set_scope(reader, CONFIG_SCOPE_WORKTREE);
-	if (!opts->ignore_worktree && repository_format_worktree_config) {
-		char *path = git_pathdup("config.worktree");
-		if (!access_or_die(path, R_OK, 0))
-			ret += git_config_from_file(fn, path, data);
-		free(path);
+	if (!opts->ignore_worktree && worktree_config &&
+	    repository_format_worktree_config &&
+	    !access_or_die(worktree_config, R_OK, 0)) {
+		ret += git_config_from_file(fn, worktree_config, data);
 	}
 
 	config_reader_set_scope(reader, CONFIG_SCOPE_COMMAND);
@@ -2246,6 +2255,7 @@ static int do_git_config_sequence(struct config_reader *reader,
 	free(xdg_config);
 	free(user_config);
 	free(repo_config);
+	free(worktree_config);
 	return ret;
 }
 
diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh
index dd7770e85de..a3e26751427 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -299,6 +299,24 @@ test_expect_success '--recurse-submodules does not support --error-unmatch' '
 	test_i18ngrep "does not support --error-unmatch" actual
 '
 
+test_expect_success '--recurse-submodules parses submodule repo config' '
+	test_config -C submodule index.sparse "invalid non-boolean value" &&
+	test_must_fail git ls-files --recurse-submodules 2>err &&
+	grep "bad boolean config value" err
+'
+
+test_expect_success '--recurse-submodules parses submodule worktree config' '
+	test_config -C submodule extensions.worktreeConfig true &&
+	test_config -C submodule --worktree index.sparse "invalid non-boolean value" &&
+
+	# NEEDSWORK: the extensions.worktreeConfig is set globally based on
+	# superproject, so we need to enable it in the superproject.
+	test_config extensions.worktreeConfig true &&
+
+	test_must_fail git ls-files --recurse-submodules 2>err &&
+	grep "bad boolean config value" err
+'
+
 test_incompatible_with_recurse_submodules () {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
 		test_must_fail git ls-files --recurse-submodules $1 2>actual &&
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6e19ebc922a..b3864e22e9a 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -542,8 +542,17 @@ test_config () {
 		config_dir=$1
 		shift
 	fi
-	test_when_finished "test_unconfig ${config_dir:+-C '$config_dir'} '$1'" &&
-	git ${config_dir:+-C "$config_dir"} config "$@"
+
+	# If --worktree is provided, use it to configure/unconfigure
+	is_worktree=
+	if test "$1" = --worktree
+	then
+		is_worktree=1
+		shift
+	fi
+
+	test_when_finished "test_unconfig ${config_dir:+-C '$config_dir'} ${is_worktree:+--worktree} '$1'" &&
+	git ${config_dir:+-C "$config_dir"} config ${is_worktree:+--worktree} "$@"
 }
 
 test_config_global () {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 2/3] config: pass 'repo' directly to 'config_with_options()'
  2023-05-26  1:32 ` [PATCH v2 0/3] " Victoria Dye via GitGitGadget
  2023-05-26  1:32   ` [PATCH v2 1/3] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
@ 2023-05-26  1:32   ` Victoria Dye via GitGitGadget
  2023-05-26  1:33   ` [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-05-26  1:32 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, chooglen, gitster, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a 'struct repository' argument to 'config_with_options()' and remove the
'repo' field from 'struct git_config_source'.

A 'struct repository' instance was originally added to the config source in
e3e8bf046e9 (submodule-config: pass repo upon blob config read, 2021-08-16)
to improve how submodule blob config content was accessed. At the time, this
was the only use for a 'repository' instance, so it was naturally added only
where it was needed: to 'struct git_config_source'. However, in upcoming
patches, 'config_with_options()' will need the repository instance to access
extension information (regardless of whether a 'config_source' exists). To
make the 'struct repository' instance more easily accessible, move it into
the function's arguments.

Update all callers of 'config_with_options()' to pass the appropriate 'repo'
value:

* in 'builtin/config.c', use 'the_repository'
* in 'submodule--config.c', use the 'repo' arg in 'config_from_gitmodules()'
* in 'read_[very_]early_config()' & 'read_protected_config()', set 'repo' to
  NULL (repository instances aren't available there)
* in 'populate_remote_urls()', use the repo instance that has been added to
  the 'struct config_include_data'
* in 'repo_read_config()', use the given 'repo' arg

Finally, note that this patch eliminates the fallback to 'the_repository'
that previously existed for the 'config_source' repo instance if it was
NULL. The fallback is no longer necessary, as the 'repo' is set explicitly
in all cases where it is needed.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/config.c   | 14 +++++++++-----
 config.c           | 16 +++++++++-------
 config.h           |  4 ++--
 submodule-config.c |  3 +--
 4 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index ff2fe8ef125..8fc90288f9e 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -375,7 +375,8 @@ static int get_value(const char *key_, const char *regex_, unsigned flags)
 	}
 
 	config_with_options(collect_config, &values,
-			    &given_config_source, &config_options);
+			    &given_config_source, the_repository,
+			    &config_options);
 
 	if (!values.nr && default_value) {
 		struct strbuf *item;
@@ -486,7 +487,8 @@ static void get_color(const char *var, const char *def_color)
 	get_color_found = 0;
 	parsed_color[0] = '\0';
 	config_with_options(git_get_color_config, NULL,
-			    &given_config_source, &config_options);
+			    &given_config_source, the_repository,
+			    &config_options);
 
 	if (!get_color_found && def_color) {
 		if (color_parse(def_color, parsed_color) < 0)
@@ -518,7 +520,8 @@ static int get_colorbool(const char *var, int print)
 	get_diff_color_found = -1;
 	get_color_ui_found = -1;
 	config_with_options(git_get_colorbool_config, NULL,
-			    &given_config_source, &config_options);
+			    &given_config_source, the_repository,
+			    &config_options);
 
 	if (get_colorbool_found < 0) {
 		if (!strcmp(get_colorbool_slot, "color.diff"))
@@ -607,7 +610,8 @@ static int get_urlmatch(const char *var, const char *url)
 	}
 
 	config_with_options(urlmatch_config_entry, &config,
-			    &given_config_source, &config_options);
+			    &given_config_source, the_repository,
+			    &config_options);
 
 	ret = !values.nr;
 
@@ -827,7 +831,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
 	if (actions == ACTION_LIST) {
 		check_argc(argc, 0, 0);
 		if (config_with_options(show_all_config, NULL,
-					&given_config_source,
+					&given_config_source, the_repository,
 					&config_options) < 0) {
 			if (given_config_source.file)
 				die_errno(_("unable to read config file '%s'"),
diff --git a/config.c b/config.c
index a93f7bfa3aa..67e60e131c2 100644
--- a/config.c
+++ b/config.c
@@ -199,6 +199,7 @@ struct config_include_data {
 	void *data;
 	const struct config_options *opts;
 	struct git_config_source *config_source;
+	struct repository *repo;
 	struct config_reader *config_reader;
 
 	/*
@@ -415,7 +416,8 @@ static void populate_remote_urls(struct config_include_data *inc)
 
 	inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
 	string_list_init_dup(inc->remote_urls);
-	config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
+	config_with_options(add_remote_url, inc->remote_urls,
+			    inc->config_source, inc->repo, &opts);
 
 	config_reader_set_scope(inc->config_reader, store_scope);
 }
@@ -2261,6 +2263,7 @@ static int do_git_config_sequence(struct config_reader *reader,
 
 int config_with_options(config_fn_t fn, void *data,
 			struct git_config_source *config_source,
+			struct repository *repo,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
@@ -2271,6 +2274,7 @@ int config_with_options(config_fn_t fn, void *data,
 		inc.fn = fn;
 		inc.data = data;
 		inc.opts = opts;
+		inc.repo = repo;
 		inc.config_source = config_source;
 		inc.config_reader = &the_reader;
 		fn = git_config_include;
@@ -2289,8 +2293,6 @@ int config_with_options(config_fn_t fn, void *data,
 	} else if (config_source && config_source->file) {
 		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
-		struct repository *repo = config_source->repo ?
-			config_source->repo : the_repository;
 		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
 	} else {
@@ -2353,7 +2355,7 @@ void read_early_config(config_fn_t cb, void *data)
 		opts.git_dir = gitdir.buf;
 	}
 
-	config_with_options(cb, data, NULL, &opts);
+	config_with_options(cb, data, NULL, NULL, &opts);
 
 	strbuf_release(&commondir);
 	strbuf_release(&gitdir);
@@ -2373,7 +2375,7 @@ void read_very_early_config(config_fn_t cb, void *data)
 	opts.ignore_cmdline = 1;
 	opts.system_gently = 1;
 
-	config_with_options(cb, data, NULL, &opts);
+	config_with_options(cb, data, NULL, NULL, &opts);
 }
 
 RESULT_MUST_BE_USED
@@ -2681,7 +2683,7 @@ static void repo_read_config(struct repository *repo)
 	data.config_set = repo->config;
 	data.config_reader = &the_reader;
 
-	if (config_with_options(config_set_callback, &data, NULL, &opts) < 0)
+	if (config_with_options(config_set_callback, &data, NULL, repo, &opts) < 0)
 		/*
 		 * config_with_options() normally returns only
 		 * zero, as most errors are fatal, and
@@ -2825,7 +2827,7 @@ static void read_protected_config(void)
 	git_configset_init(&protected_config);
 	data.config_set = &protected_config;
 	data.config_reader = &the_reader;
-	config_with_options(config_set_callback, &data, NULL, &opts);
+	config_with_options(config_set_callback, &data, NULL, NULL, &opts);
 }
 
 void git_protected_config(config_fn_t fn, void *data)
diff --git a/config.h b/config.h
index 247b572b37b..d1c5577589e 100644
--- a/config.h
+++ b/config.h
@@ -3,6 +3,7 @@
 
 #include "hashmap.h"
 #include "string-list.h"
+#include "repository.h"
 
 
 /**
@@ -49,8 +50,6 @@ const char *config_scope_name(enum config_scope scope);
 struct git_config_source {
 	unsigned int use_stdin:1;
 	const char *file;
-	/* The repository if blob is not NULL; leave blank for the_repository */
-	struct repository *repo;
 	const char *blob;
 	enum config_scope scope;
 };
@@ -196,6 +195,7 @@ void git_config(config_fn_t fn, void *);
  */
 int config_with_options(config_fn_t fn, void *,
 			struct git_config_source *config_source,
+			struct repository *repo,
 			const struct config_options *opts);
 
 /**
diff --git a/submodule-config.c b/submodule-config.c
index 58dfbde9ae5..7eb7a0d88d2 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -659,7 +659,6 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
 			config_source.file = file;
 		} else if (repo_get_oid(repo, GITMODULES_INDEX, &oid) >= 0 ||
 			   repo_get_oid(repo, GITMODULES_HEAD, &oid) >= 0) {
-			config_source.repo = repo;
 			config_source.blob = oidstr = xstrdup(oid_to_hex(&oid));
 			if (repo != the_repository)
 				add_submodule_odb_by_path(repo->objects->odb->path);
@@ -667,7 +666,7 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
 			goto out;
 		}
 
-		config_with_options(fn, data, &config_source, &opts);
+		config_with_options(fn, data, &config_source, repo, &opts);
 
 out:
 		free(oidstr);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope
  2023-05-26  1:32 ` [PATCH v2 0/3] " Victoria Dye via GitGitGadget
  2023-05-26  1:32   ` [PATCH v2 1/3] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
  2023-05-26  1:32   ` [PATCH v2 2/3] config: pass 'repo' directly to 'config_with_options()' Victoria Dye via GitGitGadget
@ 2023-05-26  1:33   ` Victoria Dye via GitGitGadget
  2023-05-31 22:17     ` Glen Choo
  2023-05-26 15:48   ` [PATCH v2 0/3] Fix behavior of worktree config in submodules Derrick Stolee
  2023-06-13 22:09   ` Glen Choo
  4 siblings, 1 reply; 29+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-05-26  1:33 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, chooglen, gitster, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Move 'repository_format_worktree_config' out of the global scope and into
the 'repository' struct. This change is similar to how
'repository_format_partial_clone' was moved in ebaf3bcf1ae (repository: move
global r_f_p_c to repo struct, 2021-06-17), adding it to the 'repository'
struct and updating 'setup.c' & 'repository.c' functions to assign the value
appropriately.

The primary goal of this change is to be able to load the worktree config of
a submodule depending on whether that submodule - not its superproject - has
'extensions.worktreeConfig' enabled. To ensure 'do_git_config_sequence()'
has access to the newly repo-scoped configuration, add a 'struct repository'
argument to 'do_git_config_sequence()' and pass it the 'repo' value from
'config_with_options()'.

Finally, add/update tests in 't3007-ls-files-recurse-submodules.sh' to
verify 'extensions.worktreeConfig' is read an used independently by
superprojects and submodules.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/config.c                       |  3 ++-
 builtin/worktree.c                     |  2 +-
 config.c                               |  7 ++++---
 environment.c                          |  1 -
 environment.h                          |  1 -
 repository.c                           |  1 +
 repository.h                           |  1 +
 setup.c                                | 10 ++++++++--
 t/t3007-ls-files-recurse-submodules.sh | 23 +++++++++++++++++++----
 worktree.c                             |  4 ++--
 10 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 8fc90288f9e..d40fddb042a 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -5,6 +5,7 @@
 #include "color.h"
 #include "editor.h"
 #include "environment.h"
+#include "repository.h"
 #include "gettext.h"
 #include "ident.h"
 #include "parse-options.h"
@@ -717,7 +718,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
 		given_config_source.scope = CONFIG_SCOPE_LOCAL;
 	} else if (use_worktree_config) {
 		struct worktree **worktrees = get_worktrees();
-		if (repository_format_worktree_config)
+		if (the_repository->repository_format_worktree_config)
 			given_config_source.file = git_pathdup("config.worktree");
 		else if (worktrees[0] && worktrees[1])
 			die(_("--worktree cannot be used with multiple "
diff --git a/builtin/worktree.c b/builtin/worktree.c
index f3180463be2..60e389aaedb 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -483,7 +483,7 @@ static int add_worktree(const char *path, const char *refname,
 	 * values from the current worktree into the new one, that way the
 	 * new worktree behaves the same as this one.
 	 */
-	if (repository_format_worktree_config)
+	if (the_repository->repository_format_worktree_config)
 		copy_filtered_worktree_config(sb_repo.buf);
 
 	strvec_pushf(&child_env, "%s=%s", GIT_DIR_ENVIRONMENT, sb_git.buf);
diff --git a/config.c b/config.c
index 67e60e131c2..f5bdac0aeed 100644
--- a/config.c
+++ b/config.c
@@ -2195,6 +2195,7 @@ int git_config_system(void)
 
 static int do_git_config_sequence(struct config_reader *reader,
 				  const struct config_options *opts,
+				  const struct repository *repo,
 				  config_fn_t fn, void *data)
 {
 	int ret = 0;
@@ -2243,7 +2244,7 @@ static int do_git_config_sequence(struct config_reader *reader,
 
 	config_reader_set_scope(reader, CONFIG_SCOPE_WORKTREE);
 	if (!opts->ignore_worktree && worktree_config &&
-	    repository_format_worktree_config &&
+	    repo && repo->repository_format_worktree_config &&
 	    !access_or_die(worktree_config, R_OK, 0)) {
 		ret += git_config_from_file(fn, worktree_config, data);
 	}
@@ -2296,7 +2297,7 @@ int config_with_options(config_fn_t fn, void *data,
 		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
 	} else {
-		ret = do_git_config_sequence(&the_reader, opts, fn, data);
+		ret = do_git_config_sequence(&the_reader, opts, repo, fn, data);
 	}
 
 	if (inc.remote_urls) {
@@ -3339,7 +3340,7 @@ int repo_config_set_worktree_gently(struct repository *r,
 				    const char *key, const char *value)
 {
 	/* Only use worktree-specific config if it is already enabled. */
-	if (repository_format_worktree_config) {
+	if (r->repository_format_worktree_config) {
 		char *file = repo_git_path(r, "config.worktree");
 		int ret = git_config_set_multivar_in_file_gently(
 					file, key, value, NULL, 0);
diff --git a/environment.c b/environment.c
index 28d18eaca8e..6bd001efbde 100644
--- a/environment.c
+++ b/environment.c
@@ -42,7 +42,6 @@ int is_bare_repository_cfg = -1; /* unspecified */
 int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int repository_format_precious_objects;
-int repository_format_worktree_config;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 char *apply_default_whitespace;
diff --git a/environment.h b/environment.h
index 30cb7e0fa34..e6668079269 100644
--- a/environment.h
+++ b/environment.h
@@ -197,7 +197,6 @@ extern char *notes_ref_name;
 extern int grafts_replace_parents;
 
 extern int repository_format_precious_objects;
-extern int repository_format_worktree_config;
 
 /*
  * Create a temporary file rooted in the object database directory, or
diff --git a/repository.c b/repository.c
index c53e480e326..104960f8f59 100644
--- a/repository.c
+++ b/repository.c
@@ -182,6 +182,7 @@ int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
+	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
 	repo->repository_format_partial_clone = format.partial_clone;
diff --git a/repository.h b/repository.h
index 1a13ff28677..74ae26635a4 100644
--- a/repository.h
+++ b/repository.h
@@ -163,6 +163,7 @@ struct repository {
 	struct promisor_remote_config *promisor_remote_config;
 
 	/* Configurations */
+	int repository_format_worktree_config;
 
 	/* Indicate if a repository has a different 'commondir' from 'gitdir' */
 	unsigned different_commondir:1;
diff --git a/setup.c b/setup.c
index 458582207ea..d8663954350 100644
--- a/setup.c
+++ b/setup.c
@@ -650,11 +650,10 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 	string_list_clear(&candidate->v1_only_extensions, 0);
 
-	if (repository_format_worktree_config) {
+	if (candidate->worktree_config) {
 		/*
 		 * pick up core.bare and core.worktree from per-worktree
 		 * config if present
@@ -1423,6 +1422,9 @@ int discover_git_directory(struct strbuf *commondir,
 		return -1;
 	}
 
+	the_repository->repository_format_worktree_config =
+		candidate.worktree_config;
+
 	/* take ownership of candidate.partial_clone */
 	the_repository->repository_format_partial_clone =
 		candidate.partial_clone;
@@ -1560,6 +1562,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		}
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->repository_format_worktree_config =
+				repo_fmt.worktree_config;
 			/* take ownership of repo_fmt.partial_clone */
 			the_repository->repository_format_partial_clone =
 				repo_fmt.partial_clone;
@@ -1651,6 +1655,8 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
+	the_repository->repository_format_worktree_config =
+		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =
 		xstrdup_or_null(fmt->partial_clone);
 	clear_repository_format(&repo_fmt);
diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh
index a3e26751427..7308a3d4e25 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -309,14 +309,29 @@ test_expect_success '--recurse-submodules parses submodule worktree config' '
 	test_config -C submodule extensions.worktreeConfig true &&
 	test_config -C submodule --worktree index.sparse "invalid non-boolean value" &&
 
-	# NEEDSWORK: the extensions.worktreeConfig is set globally based on
-	# superproject, so we need to enable it in the superproject.
-	test_config extensions.worktreeConfig true &&
-
 	test_must_fail git ls-files --recurse-submodules 2>err &&
 	grep "bad boolean config value" err
 '
 
+test_expect_success '--recurse-submodules submodules ignore super project worktreeConfig extension' '
+	# Enable worktree config in both super project & submodule, set an
+	# invalid config in the submodule worktree config
+	test_config extensions.worktreeConfig true &&
+	test_config -C submodule extensions.worktreeConfig true &&
+	test_config -C submodule --worktree index.sparse "invalid non-boolean value" &&
+
+	# Now, disable the worktree config in the submodule. Note that we need
+	# to manually re-enable extensions.worktreeConfig when the test is
+	# finished, otherwise the test_unconfig of index.sparse will not work.
+	test_unconfig -C submodule extensions.worktreeConfig &&
+	test_when_finished "git -C submodule config extensions.worktreeConfig true" &&
+
+	# With extensions.worktreeConfig disabled in the submodule, the invalid
+	# worktree config is not picked up.
+	git ls-files --recurse-submodules 2>err &&
+	! grep "bad boolean config value" err
+'
+
 test_incompatible_with_recurse_submodules () {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
 		test_must_fail git ls-files --recurse-submodules $1 2>actual &&
diff --git a/worktree.c b/worktree.c
index b5ee71c5ebd..c448fecd4b3 100644
--- a/worktree.c
+++ b/worktree.c
@@ -806,7 +806,7 @@ int init_worktree_config(struct repository *r)
 	 * If the extension is already enabled, then we can skip the
 	 * upgrade process.
 	 */
-	if (repository_format_worktree_config)
+	if (r->repository_format_worktree_config)
 		return 0;
 	if ((res = git_config_set_gently("extensions.worktreeConfig", "true")))
 		return error(_("failed to set extensions.worktreeConfig setting"));
@@ -846,7 +846,7 @@ int init_worktree_config(struct repository *r)
 	 * Ensure that we use worktree config for the remaining lifetime
 	 * of the current process.
 	 */
-	repository_format_worktree_config = 1;
+	r->repository_format_worktree_config = 1;
 
 cleanup:
 	git_configset_clear(&cs);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/3] Fix behavior of worktree config in submodules
  2023-05-26  1:32 ` [PATCH v2 0/3] " Victoria Dye via GitGitGadget
                     ` (2 preceding siblings ...)
  2023-05-26  1:33   ` [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
@ 2023-05-26 15:48   ` Derrick Stolee
  2023-06-13 22:09   ` Glen Choo
  4 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2023-05-26 15:48 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: chooglen, gitster, Victoria Dye

On 5/25/2023 9:32 PM, Victoria Dye via GitGitGadget wrote:
> About a year ago, discussion on the sparse index integration of 'git grep'
> surfaced larger incompatibilities between sparse-checkout and submodules
> [1]. This series fixes one of the underlying issues to that incompatibility,
> which is that the worktree config of the submodule (where
> 'core.sparseCheckout', 'core.sparseCheckoutCone', and 'index.sparse' are
> set) is not used when operating on the submodule from its super project
> (e.g., in a command with '--recurse-submodules').
> 
> The outcome of this series is that 'extensions.worktreeConfig' and the
> contents of the repository's worktree config are read and applied to (and
> only to) the relevant repo when working in a super project/submodule setup.
> This alone doesn't fix sparse-checkout/submodule interoperability; the
> additional changes needed for that will be submitted in a later series. I'm
> also hoping this will help (or at least not hurt) the work to avoid use of
> global state in config parsing [2].
> 
> 
> Changes since V1
> ================
> 
>  * In 't3007', replaced manual 'git config'/'test_when_finished "git config
>    --unset"' pairs with 'test_config' helper. Updated 'test_config' to
>    handle the '--worktree' option.
>  * Updated commit messages & test comments to better explain the purpose and
>    more subtle functionality details to the new tests
>  * Added a commit to move 'struct repository' out of 'git_config_source',
>    rather than creating a dummy 'config_source' just to hold a repository
>    instance.
>  * Changed the config setting in the new tests from 'feature.experimental'
>    to 'index.sparse' to tie these changes to their intended use case.
>  * "super project" -> "superproject"

Thanks for these updates. I'm happy with this version.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope
  2023-05-26  1:33   ` [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
@ 2023-05-31 22:17     ` Glen Choo
  2023-06-01  4:43       ` Junio C Hamano
  2023-06-07 22:29       ` Victoria Dye
  0 siblings, 2 replies; 29+ messages in thread
From: Glen Choo @ 2023-05-31 22:17 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git, Jonathan Tan
  Cc: derrickstolee, gitster, Victoria Dye

The changes that replace repository_format_worktree_config with "struct
repository".worktree_config look trivially good. The "struct
repository_format" bits track ebaf3bcf1ae (repository: move global
r_f_p_c to repo struct, 2021-06-17) so it preserves the status quo, but
I have some questions about ebaf3bcf1ae. Cc-ing the author (Jonathan
Tan) for context.

Rearranging the hunks for clarity,

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -1560,6 +1562,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  		}
>  		if (startup_info->have_repository) {
>  			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> +			the_repository->repository_format_worktree_config =
> +				repo_fmt.worktree_config;
>  			/* take ownership of repo_fmt.partial_clone */
>  			the_repository->repository_format_partial_clone =
>  				repo_fmt.partial_clone;

[snip]

> @@ -1651,6 +1655,8 @@ void check_repository_format(struct repository_format *fmt)
>  	check_repository_format_gently(get_git_dir(), fmt, NULL);
>  	startup_info->have_repository = 1;
>  	repo_set_hash_algo(the_repository, fmt->hash_algo);
> +	the_repository->repository_format_worktree_config =
> +		fmt->worktree_config;
>  	the_repository->repository_format_partial_clone =
>  		xstrdup_or_null(fmt->partial_clone);
>  	clear_repository_format(&repo_fmt);

[snip]

> diff --git a/repository.c b/repository.c
> index c53e480e326..104960f8f59 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -182,6 +182,7 @@ int repo_init(struct repository *repo,
>  		goto error;
>  
>  	repo_set_hash_algo(repo, format.hash_algo);
> +	repo->repository_format_worktree_config = format.worktree_config;
>  
>  	/* take ownership of format.partial_clone */
>  	repo->repository_format_partial_clone = format.partial_clone;

This patch adds another instance of copying fields from "struct
repository_format" to "struct repository", so I think that we should
start doing this with a helper function instead of copy-pasting the
logic.

As for what should be in the helper function, the above hunks suggest
that we should copy .hash_algo, .partial_clone, and .worktree_config.
However...

> @@ -1423,6 +1422,9 @@ int discover_git_directory(struct strbuf *commondir,
>  		return -1;
>  	}
>  
> +	the_repository->repository_format_worktree_config =
> +		candidate.worktree_config;
> +
>  	/* take ownership of candidate.partial_clone */
>  	the_repository->repository_format_partial_clone =
>  		candidate.partial_clone;

This hunk does not copy .hash_algo. I initially wondered if it is safe
to just copy .hash_algo here too, but I now suspect that we shouldn't
have done the_repository setup in discover_git_directory() in the first
place. It isn't used by the setup.c machinery - its one caller in "git"
(it's used by "scalar") is read_early_config(), which is supposed to
work without a fully set up repository, and bears a comment saying that
"no global state is changed" by calling discover_git_directory() (which
stopped being true in ebaf3bcf1ae). It looks like
discover_git_directory() is just a lightweight entrypoint into the
setup.c machinery. 16ac8b8db6 (setup: introduce the
discover_git_directory() function, 2017-03-13)) says "Let's just provide
a convenient wrapper function with an easier signature that *just*
discovers the .git/ directory. We will use it in a subsequent patch to
fix the early config."

If I'm wrong and we _should_ be doing the_repository setup, then I'm
guessing it's safe to copy .hash_algo here too. So either way, I think
we should introduce a helper function to do the copying, especially
because we will probably need to repeat this process yet again for
"repository_format_precious_objects".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope
  2023-05-31 22:17     ` Glen Choo
@ 2023-06-01  4:43       ` Junio C Hamano
  2023-06-12 21:37         ` Glen Choo
  2023-06-07 22:29       ` Victoria Dye
  1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2023-06-01  4:43 UTC (permalink / raw)
  To: Glen Choo
  Cc: Victoria Dye via GitGitGadget, git, Jonathan Tan, derrickstolee,
	Victoria Dye

Glen Choo <chooglen@google.com> writes:

>> @@ -1423,6 +1422,9 @@ int discover_git_directory(struct strbuf *commondir,
>>  		return -1;
>>  	}
>>  
>> +	the_repository->repository_format_worktree_config =
>> +		candidate.worktree_config;
>> +
>>  	/* take ownership of candidate.partial_clone */
>>  	the_repository->repository_format_partial_clone =
>>  		candidate.partial_clone;
>
> This hunk does not copy .hash_algo. I initially wondered if it is safe
> to just copy .hash_algo here too, but I now suspect that we shouldn't
> have done the_repository setup in discover_git_directory() in the first
> place.

That's quite a departure from the established practice, isn't it?
Due to recent and not so recent header shuffling (moving everything
out of cache.h, dropping "extern", etc.), "git blame" is a bit hard
to follow, but ever since 16ac8b8d (setup: introduce the
discover_git_directory() function, 2017-03-13) added the function,
we do execute the "setup" when we know we are in a repository.

It would probably be worth mentioning that the "global state" Dscho
refers to in that commit is primarily about the current directory of
the Git process.  During the discovery, we used to go up one level
at a time and tried to see if the current directory is either the
top of the working tree (i.e.  has ".git/" that is a git repository)
or the top of a GIT_DIR-looking directory.  That was changed in
ce9b8aab (setup_git_directory_1(): avoid changing global state,
2017-03-13) in the same series and discusses what "global state" the
series addresses.

If a relatively recent and oddball caller calls the function when it
does not want any of the setup donw after finding out that we could
use the directory as a repository, a new early "pure discovery" part
should be split out of the function, and both the function itself
and the oddball caller should be taught to call that pure-discovery
helper, I think.

> If I'm wrong and we _should_ be doing the_repository setup, then I'm
> guessing it's safe to copy .hash_algo here too. So either way, I think
> we should introduce a helper function to do the copying, especially
> because we will probably need to repeat this process yet again for
> "repository_format_precious_objects".

I do not know (or care in the context of this thread) about the
"precious objects" bit, but .worktree-config is the third one on top
of .hash_algo and .partial_clone, and it generally is a good time to
refactor when you find yourself adding the third instance of
repetitive code.  So I agree with you that it is time to introduce a
helper function to copy from a "struct repository_format" to the
repository instance.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope
  2023-05-31 22:17     ` Glen Choo
  2023-06-01  4:43       ` Junio C Hamano
@ 2023-06-07 22:29       ` Victoria Dye
  2023-06-12 18:10         ` Glen Choo
  1 sibling, 1 reply; 29+ messages in thread
From: Victoria Dye @ 2023-06-07 22:29 UTC (permalink / raw)
  To: Glen Choo, Victoria Dye via GitGitGadget, git, Jonathan Tan
  Cc: derrickstolee, gitster

Glen Choo wrote:
> This patch adds another instance of copying fields from "struct
> repository_format" to "struct repository", so I think that we should
> start doing this with a helper function instead of copy-pasting the
> logic.
> 
> As for what should be in the helper function, the above hunks suggest
> that we should copy .hash_algo, .partial_clone, and .worktree_config.
> However...
> 

...

> 
> This hunk does not copy .hash_algo. I initially wondered if it is safe
> to just copy .hash_algo here too, but I now suspect that we shouldn't
> have done the_repository setup in discover_git_directory() in the first
> place. It isn't used by the setup.c machinery - its one caller in "git"
> (it's used by "scalar") is read_early_config(), which is supposed to
> work without a fully set up repository, and bears a comment saying that
> "no global state is changed" by calling discover_git_directory() (which
> stopped being true in ebaf3bcf1ae). It looks like
> discover_git_directory() is just a lightweight entrypoint into the
> setup.c machinery. 16ac8b8db6 (setup: introduce the
> discover_git_directory() function, 2017-03-13)) says "Let's just provide
> a convenient wrapper function with an easier signature that *just*
> discovers the .git/ directory. We will use it in a subsequent patch to
> fix the early config."
> 
> If I'm wrong and we _should_ be doing the_repository setup, then I'm
> guessing it's safe to copy .hash_algo here too. So either way, I think
> we should introduce a helper function to do the copying, especially
> because we will probably need to repeat this process yet again for
> "repository_format_precious_objects".

Thanks for pointing this out & sharing your findings! 

I agree with the desire to reduce code duplication, but the reason I avoided
that refactor when putting these patches together is because of the subtle
differences across the different repository format assignment blocks.

For example, in addition to what you mentioned here w.r.t. '.hash_algo',
there are also differences in how 'repository_format_partial_clone' is
assigned: it's deep-copied in 'check_repository_format', but shallow-copied
(then subsequently NULL'd in the 'struct repository_format' to avoid freeing
the pointer when the struct is disposed of) in 'discover_git_directory()' &
'setup_git_directory_gently()'. 

If we were to settle on a single "copy repository format settings" function,
it's not obvious what the "right" approach is. We could change
'check_repository_format()' to the shallow-copy-then-null like the others:
its two callers (in 'init-db.c' and 'path.c') don't use the value of
'repository_format_partial_clone' in 'struct repository_format' after
calling 'check_repository_format()'. But, if we did that, it'd introduce a
side effect to the input 'struct repository_format', which IMO would be
surprising behavior for a function called 'check_<something>()'. Conversely,
unifying on a deep copy or adding a flag to toggle deep vs. shallow copy
feels like unnecessary complexity if we don't actually need a deep copy.

Beyond the smaller subtleties, there's the larger question (that you sort of
get at with the questions around 'discover_git_directory()') as to whether
we should more heavily refactor or consolidate these setup functions. The
similar code implies "yes", but such a refactor feels firmly out-of-scope
for this series. A smaller change (e.g. just moving the assignments into
their own function) could be less of a diversion, but any benefit seems like
it'd be outweighed by the added churn/complexity of a new function.

In any case, sorry for the long-winded response. I'd initially tried to
implement your feedback, but every time I did I'd get stopped up on the
things I mentioned above. So, rather than continue to put off responding to
this thread, I tried to capture what kept stopping me from moving forward -
hopefully it makes (at least a little bit of) sense!


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope
  2023-06-07 22:29       ` Victoria Dye
@ 2023-06-12 18:10         ` Glen Choo
  2023-06-12 19:45           ` Victoria Dye
  0 siblings, 1 reply; 29+ messages in thread
From: Glen Choo @ 2023-06-12 18:10 UTC (permalink / raw)
  To: Victoria Dye, Victoria Dye via GitGitGadget, git, Jonathan Tan
  Cc: derrickstolee, gitster

Victoria Dye <vdye@github.com> writes:

> For example, in addition to what you mentioned here w.r.t. '.hash_algo',
> there are also differences in how 'repository_format_partial_clone' is
> assigned: it's deep-copied in 'check_repository_format', but shallow-copied
> (then subsequently NULL'd in the 'struct repository_format' to avoid freeing
> the pointer when the struct is disposed of) in 'discover_git_directory()' &
> 'setup_git_directory_gently()'. 

Thanks for the analysis and explanation. It's quite a pain that the
various sites are similar but subtly different.

> If we were to settle on a single "copy repository format settings" function,
> it's not obvious what the "right" approach is. We could change
> 'check_repository_format()' to the shallow-copy-then-null like the others:
> its two callers (in 'init-db.c' and 'path.c') don't use the value of
> 'repository_format_partial_clone' in 'struct repository_format' after
> calling 'check_repository_format()'. But, if we did that, it'd introduce a
> side effect to the input 'struct repository_format', which IMO would be
> surprising behavior for a function called 'check_<something>()'. Conversely,
> unifying on a deep copy or adding a flag to toggle deep vs. shallow copy
> feels like unnecessary complexity if we don't actually need a deep copy.
>
> Beyond the smaller subtleties, there's the larger question (that you sort of
> get at with the questions around 'discover_git_directory()') as to whether
> we should more heavily refactor or consolidate these setup functions. The
> similar code implies "yes", but such a refactor feels firmly out-of-scope
> for this series. A smaller change (e.g. just moving the assignments into
> their own function) could be less of a diversion, but any benefit seems like
> it'd be outweighed by the added churn/complexity of a new function.

I don't agree that this refactor is out of scope. I think we agree that
the refactor is desirable, but if we apply the same heuristics in the
future, the next author to copy a member from 'repository_format' to
'repository' could do the same and we'd never end up with the refactor
we wanted. I strongly feel that if we don't put in a concerted effort
into such refactors along the way, we end up creating more of the churn
that made our lives harder in the first place.

I sympathize with the 'out-of-scope' sentiment, though, and I find it
frustrating when a simple change starts growing in scope because a
reviewer suggests fixing oddities in the codebase that I didn't think
were in scope. In that vein, I think the helper function can simplify
the in-scope things even if we punt on the difficult-to-reason-about
parts.

E.g. we could support both deep and shallow copying, like:

  /*
   * Copy members from a repository_format to repository.
   *
   * If 'src' will no longer be read after copying (e.g. it will be
   * cleared soon), pass a nonzero value so that pointer members will be
   * moved to 'dest' (NULL-ed and shallow copied) instead of being deep
   * copied.
   */
  void copy_repository_format(struct repository *dest,
                              struct repository_format *src,
                              int take_ownership);

And in discover_git_directory(), where we don't copy .hash_algo, we
could leave the code as-is and put a FIXME to figure out if we should
use the helper function or drop the copying entirely.

(I'm somewhat convinced that we can just do shallow copying, though.
Inspecting check_repository_format() shows that it calls
clear_repository_format() right afterwards, so we really don't need the
deep copy there. Using shallow copying seems to work just fine here [1].
I'll ping Jonathan Tan to see if there was a good reason to deep copy.)

[1] https://github.com/chooglen/git/actions/runs/5246795137/jobs/9476098535

> In any case, sorry for the long-winded response. I'd initially tried to
> implement your feedback, but every time I did I'd get stopped up on the
> things I mentioned above. So, rather than continue to put off responding to
> this thread, I tried to capture what kept stopping me from moving forward -
> hopefully it makes (at least a little bit of) sense!

Thanks for being receptive to the feedback in the first round. I really
appreciate the response, and I agree that discussing this was a better
way forward than being stuck.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope
  2023-06-12 18:10         ` Glen Choo
@ 2023-06-12 19:45           ` Victoria Dye
  2023-06-12 20:23             ` Glen Choo
  0 siblings, 1 reply; 29+ messages in thread
From: Victoria Dye @ 2023-06-12 19:45 UTC (permalink / raw)
  To: Glen Choo, Victoria Dye via GitGitGadget, git, Jonathan Tan
  Cc: derrickstolee, gitster

Glen Choo wrote:
> Victoria Dye <vdye@github.com> writes:
>> If we were to settle on a single "copy repository format settings" function,
>> it's not obvious what the "right" approach is. We could change
>> 'check_repository_format()' to the shallow-copy-then-null like the others:
>> its two callers (in 'init-db.c' and 'path.c') don't use the value of
>> 'repository_format_partial_clone' in 'struct repository_format' after
>> calling 'check_repository_format()'. But, if we did that, it'd introduce a
>> side effect to the input 'struct repository_format', which IMO would be
>> surprising behavior for a function called 'check_<something>()'. Conversely,
>> unifying on a deep copy or adding a flag to toggle deep vs. shallow copy
>> feels like unnecessary complexity if we don't actually need a deep copy.
>>
>> Beyond the smaller subtleties, there's the larger question (that you sort of
>> get at with the questions around 'discover_git_directory()') as to whether
>> we should more heavily refactor or consolidate these setup functions. The
>> similar code implies "yes", but such a refactor feels firmly out-of-scope
>> for this series. A smaller change (e.g. just moving the assignments into
>> their own function) could be less of a diversion, but any benefit seems like
>> it'd be outweighed by the added churn/complexity of a new function.
> 
> I don't agree that this refactor is out of scope. I think we agree that
> the refactor is desirable, but if we apply the same heuristics in the
> future, the next author to copy a member from 'repository_format' to
> 'repository' could do the same and we'd never end up with the refactor
> we wanted. I strongly feel that if we don't put in a concerted effort
> into such refactors along the way, we end up creating more of the churn
> that made our lives harder in the first place.

I don't actually agree that *this* refactor (that is, moving all the
"repository_format -> repository" assignments into a helper function) is
desirable, though, even if we extrapolate to future updates. 

Even if we updated the only other 'repository_format' value
('repository_format_precious_objects') to be copied the same way, the
benefit we'd get from eliminating a couple of lines of code duplication
wouldn't necessarily outweigh the the extra complexity of a new abstraction
- which may or may not need special-casing based on who's calling it -
and/or the risk associated with changing behavior if we want to eliminate
those special cases. IOW, I don't feel it's a definitive net improvement in
this situation.

What I do feel is desirable (although, to be honest, not particularly
strongly) is a larger refactor of 'setup.c' - including a deeper analysis
into the all the different setup functions we have, how they're used, why
they differ, etc. - to create a more unified setup process across all of
Git. That might include a helper function like what you've described, but it
might not need one at all, hence my comment about "added churn" (it's just
another thing a larger refactor would need to work around). 

*That* refactor is what I was referring to as "out-of-scope" for this series
(sorry I wasn't clearer earlier!), which - considering the goal is "make
submodules work with worktree config" - doesn't seem unreasonable to me.

> E.g. we could support both deep and shallow copying, like:
> 
>   /*
>    * Copy members from a repository_format to repository.
>    *
>    * If 'src' will no longer be read after copying (e.g. it will be
>    * cleared soon), pass a nonzero value so that pointer members will be
>    * moved to 'dest' (NULL-ed and shallow copied) instead of being deep
>    * copied.
>    */
>   void copy_repository_format(struct repository *dest,
>                               struct repository_format *src,
>                               int take_ownership);

Unless we find that we *need* to support both, this approach would be more
harmful than helpful. If it doesn't matter whether the copy is shallow or
deep, this design proliferates that meaningless distinction in a way that
can easily confuse developers (or at least create more work for them as try
to try to understand it) if they ever want to change or use the function.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope
  2023-06-12 19:45           ` Victoria Dye
@ 2023-06-12 20:23             ` Glen Choo
  2023-06-12 23:04               ` [PATCH] setup: copy repository_format using helper Glen Choo
  0 siblings, 1 reply; 29+ messages in thread
From: Glen Choo @ 2023-06-12 20:23 UTC (permalink / raw)
  To: Victoria Dye, Victoria Dye via GitGitGadget, git, Jonathan Tan
  Cc: derrickstolee, gitster

Victoria Dye <vdye@github.com> writes:

> Even if we updated the only other 'repository_format' value
> ('repository_format_precious_objects') to be copied the same way, the
> benefit we'd get from eliminating a couple of lines of code duplication
> wouldn't necessarily outweigh the the extra complexity of a new abstraction
> - which may or may not need special-casing based on who's calling it -
> and/or the risk associated with changing behavior if we want to eliminate
> those special cases. IOW, I don't feel it's a definitive net improvement in
> this situation.

I see. In the process of doing this digging, I've become quite convinced
that the risk is minimal. I definitely want the refactor to happen, but
I suppose it's not reasonable for you to bear the risk.

I'll send a follow up patch on top of your series that implements the
cleanup I hope to see, and I'd be happy to give _that_ series a
Reviewed-by (though it's a bit weird since one of the patches will be
mine). It'll touch the same lines twice, but at least the patches will
be owned by the people who care about them the most.

>> E.g. we could support both deep and shallow copying, like:
>> 
>>   /*
>>    * Copy members from a repository_format to repository.
>>    *
>>    * If 'src' will no longer be read after copying (e.g. it will be
>>    * cleared soon), pass a nonzero value so that pointer members will be
>>    * moved to 'dest' (NULL-ed and shallow copied) instead of being deep
>>    * copied.
>>    */
>>   void copy_repository_format(struct repository *dest,
>>                               struct repository_format *src,
>>                               int take_ownership);
>
> Unless we find that we *need* to support both, this approach would be more
> harmful than helpful. If it doesn't matter whether the copy is shallow or
> deep, this design proliferates that meaningless distinction in a way that
> can easily confuse developers (or at least create more work for them as try
> to try to understand it) if they ever want to change or use the function.

Fair enough. I agree we're better off figuring out if the need exists
before trying to support it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope
  2023-06-01  4:43       ` Junio C Hamano
@ 2023-06-12 21:37         ` Glen Choo
  0 siblings, 0 replies; 29+ messages in thread
From: Glen Choo @ 2023-06-12 21:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Victoria Dye via GitGitGadget, git, Jonathan Tan, derrickstolee,
	Victoria Dye

Junio C Hamano <gitster@pobox.com> writes:

> That's quite a departure from the established practice, isn't it?
> Due to recent and not so recent header shuffling (moving everything
> out of cache.h, dropping "extern", etc.), "git blame" is a bit hard
> to follow, but ever since 16ac8b8d (setup: introduce the
> discover_git_directory() function, 2017-03-13) added the function,
> we do execute the "setup" when we know we are in a repository.
>
> It would probably be worth mentioning that the "global state" Dscho
> refers to in that commit is primarily about the current directory of
> the Git process.  During the discovery, we used to go up one level
> at a time and tried to see if the current directory is either the
> top of the working tree (i.e.  has ".git/" that is a git repository)
> or the top of a GIT_DIR-looking directory.  That was changed in
> ce9b8aab (setup_git_directory_1(): avoid changing global state,
> 2017-03-13) in the same series and discusses what "global state" the
> series addresses.
>
> If a relatively recent and oddball caller calls the function when it
> does not want any of the setup donw after finding out that we could
> use the directory as a repository, a new early "pure discovery" part
> should be split out of the function, and both the function itself
> and the oddball caller should be taught to call that pure-discovery
> helper, I think.

Hm, isn't discover_git_directory() that pure-discovery helper? In the
ce9b8aab, Dscho created stateless machinery (setup_git_directory_1()),
and in 16ac8b8d, he used that stateless machinery to create a
pure-discovery helper (discover_git_directory()).

It's true that the global state is primarily the cwd, but the spirit of
the change is the same - setup_git_directory_1() should have no global
side effects and neither should discover_git_diretory() (since it's
supposed to only do discovery).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] setup: copy repository_format using helper
  2023-06-12 20:23             ` Glen Choo
@ 2023-06-12 23:04               ` Glen Choo
  2023-06-13  0:03                 ` Victoria Dye
  0 siblings, 1 reply; 29+ messages in thread
From: Glen Choo @ 2023-06-12 23:04 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Victoria Dye, Victoria Dye via GitGitGadget,
	Jonathan Tan, derrickstolee, gitster

In several parts of the setup machinery, we set up a repository_format
and then use it to set up the_repository in nearly the exact same way,
suggesting that we might be able to use a helper function to standardize
the behavior and make future modifications easier. Create this helper
function, setup_repository_from_format(), thus standardizing this
behavior.

To determine what the 'standardized behavior' should be, we can compare
the candidate call sites in repo_init(), setup_git_directory_gently(),
check_repository_format() and discover_git_directory(),

- All of them copy .worktree_config.

- All of them 'copy' .partial_clone. Most perform a shallow copy of the
  pointer, then set the .partial_clone = NULL so that it doesn't get
  cleared by clear_repository_format(). However,
  check_repository_format() copies the string deeply because the
  repository_format is sometimes read back (it is an "out" parameter).
  To accomodate both shallow copying and deep copying, toggle this
  behavior using the "modify_fmt_ok" parameter.

- Most of them set up repository.hash_algo, except
  discover_git_directory(). Our helper function unconditionally sets up
  .hash_algo because it turns out that discover_git_directory() probably
  doesn't need to set up "struct repository" at all!
  discover_git_directory() isn't actually used in the setup process - its
  only caller in the Git binary is read_early_config(). As explained by
  16ac8b8db6 (setup: introduce the discover_git_directory() function,
  2017-03-13), it is supposed to be an entrypoint into setup.c machinery
  that allows the Git directory to be discovered without side effects,
  in other words, we shouldn't have introduced side effects in
  ebaf3bcf1ae (repository: move global r_f_p_c to repo struct,
  2021-06-17). Fortunately, we didn't start to rely on this unintended
  behavior between then and now, so we can just drop it.

Signed-off-by: Glen Choo <chooglen@google.com>
---
Here's the helper function I had in mind. I was initially mistaken and
it turns out that we need to support deep copying, but fortunately,
t0001 is extremely thorough and will catch virtually any mistake in the
setup process. CI seems to pass, though it appears to be a little flaky
today and sometimes cancels jobs
(https://github.com/chooglen/git/actions/runs/5249029150).

If you're comfortable with it, I would prefer for you to squash this
into your patches so that we don't just end up changing the same few
lines. If not, I'll Reviewed-by your patches (if I don't find any other
concerns on a re-read) and send this as a 1-patch on top.

 repository.c |  7 +------
 setup.c      | 31 +++++++++++++++++++------------
 setup.h      | 10 ++++++++++
 3 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/repository.c b/repository.c
index 104960f8f5..50f0b26a6c 100644
--- a/repository.c
+++ b/repository.c
@@ -181,12 +181,7 @@ int repo_init(struct repository *repo,
 	if (read_and_verify_repository_format(&format, repo->commondir))
 		goto error;
 
-	repo_set_hash_algo(repo, format.hash_algo);
-	repo->repository_format_worktree_config = format.worktree_config;
-
-	/* take ownership of format.partial_clone */
-	repo->repository_format_partial_clone = format.partial_clone;
-	format.partial_clone = NULL;
+	setup_repository_from_format(repo, &format, 1);
 
 	if (worktree)
 		repo_set_worktree(repo, worktree);
diff --git a/setup.c b/setup.c
index d866395435..33ce58676f 100644
--- a/setup.c
+++ b/setup.c
@@ -1561,13 +1561,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
 			setup_git_env(gitdir);
 		}
 		if (startup_info->have_repository) {
-			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
-			the_repository->repository_format_worktree_config =
-				repo_fmt.worktree_config;
-			/* take ownership of repo_fmt.partial_clone */
-			the_repository->repository_format_partial_clone =
-				repo_fmt.partial_clone;
-			repo_fmt.partial_clone = NULL;
+			setup_repository_from_format(the_repository,
+						     &repo_fmt, 1);
 		}
 	}
 	/*
@@ -1654,14 +1649,26 @@ void check_repository_format(struct repository_format *fmt)
 		fmt = &repo_fmt;
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
-	repo_set_hash_algo(the_repository, fmt->hash_algo);
-	the_repository->repository_format_worktree_config =
-		fmt->worktree_config;
-	the_repository->repository_format_partial_clone =
-		xstrdup_or_null(fmt->partial_clone);
+	setup_repository_from_format(the_repository, fmt, 0);
 	clear_repository_format(&repo_fmt);
 }
 
+void setup_repository_from_format(struct repository *repo,
+				  struct repository_format *fmt,
+				  int modify_fmt_ok)
+{
+	repo_set_hash_algo(repo, fmt->hash_algo);
+	repo->repository_format_worktree_config = fmt->worktree_config;
+	if (modify_fmt_ok) {
+		repo->repository_format_partial_clone =
+			fmt->partial_clone;
+		fmt->partial_clone = NULL;
+	} else {
+		repo->repository_format_partial_clone =
+			xstrdup_or_null(fmt->partial_clone);
+	}
+}
+
 /*
  * Returns the "prefix", a path to the current working directory
  * relative to the work tree root, or NULL, if the current working
diff --git a/setup.h b/setup.h
index 4c1ca9d0c9..ed39aa38e0 100644
--- a/setup.h
+++ b/setup.h
@@ -140,6 +140,16 @@ int verify_repository_format(const struct repository_format *format,
  */
 void check_repository_format(struct repository_format *fmt);
 
+/*
+ * Setup a "struct repository" from the fields from the repository format.
+ * If "modify_fmt_ok" is nonzero, pointer members in "fmt" will be shallowly
+ * copied to repo and set to NULL (so that it's safe to clear "fmt").
+ */
+struct repository;
+void setup_repository_from_format(struct repository *repo,
+				  struct repository_format *fmt,
+				  int modify_fmt_ok);
+
 /*
  * NOTE NOTE NOTE!!
  *
-- 
2.41.0.162.gfafddb0af9-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] setup: copy repository_format using helper
  2023-06-12 23:04               ` [PATCH] setup: copy repository_format using helper Glen Choo
@ 2023-06-13  0:03                 ` Victoria Dye
  2023-06-13 18:25                   ` Glen Choo
  0 siblings, 1 reply; 29+ messages in thread
From: Victoria Dye @ 2023-06-13  0:03 UTC (permalink / raw)
  To: Glen Choo, git
  Cc: Victoria Dye via GitGitGadget, Jonathan Tan, derrickstolee,
	gitster

Glen Choo wrote:
> In several parts of the setup machinery, we set up a repository_format
> and then use it to set up the_repository in nearly the exact same way,
> suggesting that we might be able to use a helper function to standardize
> the behavior and make future modifications easier. Create this helper
> function, setup_repository_from_format(), thus standardizing this
> behavior.
> 
> To determine what the 'standardized behavior' should be, we can compare
> the candidate call sites in repo_init(), setup_git_directory_gently(),
> check_repository_format() and discover_git_directory(),
> 
> - All of them copy .worktree_config.
> 
> - All of them 'copy' .partial_clone. Most perform a shallow copy of the
>   pointer, then set the .partial_clone = NULL so that it doesn't get
>   cleared by clear_repository_format(). However,
>   check_repository_format() copies the string deeply because the
>   repository_format is sometimes read back (it is an "out" parameter).
>   To accomodate both shallow copying and deep copying, toggle this
>   behavior using the "modify_fmt_ok" parameter.

Do you have a specific example of this happening? I see two uses of
'check_repository_format()' in the codebase:

1. in 'enter_repo()' ('path.c')
2. in 'init_db()' ('init-db.c')

The first one calls 'check_repository_format()' with 'NULL', which causes
the function to create a temporary 'struct repository_format' that is then
discarded at the end of the function - no need to worry about the value
being cleared there.

The second one does call 'check_repository_format()' with a 'struct
repository_format' instance, but the 'partial_clone' field field is not
accessed again after that. The only subsequent usages of the 'repo_fmt'
variable in 'init_db()' are:

- in 'validate_hash_algorithm()', where only the 'version' and 'hash_algo'
  fields are accessed.
- in 'create_default_files()', where only 'hash_algo' is accessed.

So, shouldn't it be safe to shallow-copy-and-NULL? But as I noted earlier
[1], if you do that it'll make the name 'check_repository_format()' a bit
misleading (since it's actually modifying its arg in place). So, if you
update to always shallow copy, 'check_repository_format()' should be renamed
to reflect its side effects.

[1] https://lore.kernel.org/git/49509708-c0a1-2439-a551-cab05d944b66@github.com/

> 
> - Most of them set up repository.hash_algo, except
>   discover_git_directory(). Our helper function unconditionally sets up
>   .hash_algo because it turns out that discover_git_directory() probably
>   doesn't need to set up "struct repository" at all!

If that's the case, shouldn't the 'repository_format' assignments in
'discover_git_directory()' be removed altogether? 

>   discover_git_directory() isn't actually used in the setup process - its
>   only caller in the Git binary is read_early_config(). As explained by
>   16ac8b8db6 (setup: introduce the discover_git_directory() function,
>   2017-03-13), it is supposed to be an entrypoint into setup.c machinery
>   that allows the Git directory to be discovered without side effects,
>   in other words, we shouldn't have introduced side effects in
>   ebaf3bcf1ae (repository: move global r_f_p_c to repo struct,
>   2021-06-17). Fortunately, we didn't start to rely on this unintended
>   behavior between then and now, so we can just drop it.
> 
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
> Here's the helper function I had in mind. I was initially mistaken and
> it turns out that we need to support deep copying, but fortunately,
> t0001 is extremely thorough and will catch virtually any mistake in the
> setup process. CI seems to pass, though it appears to be a little flaky
> today and sometimes cancels jobs
> (https://github.com/chooglen/git/actions/runs/5249029150).
> 
> If you're comfortable with it, I would prefer for you to squash this
> into your patches so that we don't just end up changing the same few
> lines. If not, I'll Reviewed-by your patches (if I don't find any other
> concerns on a re-read) and send this as a 1-patch on top.

Reading through the commit message & patch, I'm still not convinced this
refactor is a good idea - to me, it doesn't leave the code in a clearly
better state. If you feel strongly that it does, though, I'm happy to leave
it to others to review/decide but I would prefer that you keep it a separate
patch submission on top.

Thanks!

> diff --git a/repository.c b/repository.c
> index 104960f8f5..50f0b26a6c 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -181,12 +181,7 @@ int repo_init(struct repository *repo,
>  	if (read_and_verify_repository_format(&format, repo->commondir))
>  		goto error;
>  
> -	repo_set_hash_algo(repo, format.hash_algo);
> -	repo->repository_format_worktree_config = format.worktree_config;
> -
> -	/* take ownership of format.partial_clone */
> -	repo->repository_format_partial_clone = format.partial_clone;
> -	format.partial_clone = NULL;
> +	setup_repository_from_format(repo, &format, 1);
>  
>  	if (worktree)
>  		repo_set_worktree(repo, worktree);
> diff --git a/setup.c b/setup.c
> index d866395435..33ce58676f 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1561,13 +1561,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  			setup_git_env(gitdir);
>  		}
>  		if (startup_info->have_repository) {
> -			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> -			the_repository->repository_format_worktree_config =
> -				repo_fmt.worktree_config;
> -			/* take ownership of repo_fmt.partial_clone */
> -			the_repository->repository_format_partial_clone =
> -				repo_fmt.partial_clone;
> -			repo_fmt.partial_clone = NULL;
> +			setup_repository_from_format(the_repository,
> +						     &repo_fmt, 1);
>  		}
>  	}
>  	/*
> @@ -1654,14 +1649,26 @@ void check_repository_format(struct repository_format *fmt)
>  		fmt = &repo_fmt;
>  	check_repository_format_gently(get_git_dir(), fmt, NULL);
>  	startup_info->have_repository = 1;
> -	repo_set_hash_algo(the_repository, fmt->hash_algo);
> -	the_repository->repository_format_worktree_config =
> -		fmt->worktree_config;
> -	the_repository->repository_format_partial_clone =
> -		xstrdup_or_null(fmt->partial_clone);
> +	setup_repository_from_format(the_repository, fmt, 0);
>  	clear_repository_format(&repo_fmt);
>  }
>  

I think you may be missing changes to 'discover_git_directory()'? Like I
mentioned above, though, if you don't think 'discover_git_directory()' needs
to set up 'the_repository', then those assignments should just be removed
(not replaced with 'setup_repository_from_format()').


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] setup: copy repository_format using helper
  2023-06-13  0:03                 ` Victoria Dye
@ 2023-06-13 18:25                   ` Glen Choo
  2023-06-13 19:45                     ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Glen Choo @ 2023-06-13 18:25 UTC (permalink / raw)
  To: Victoria Dye, git
  Cc: Victoria Dye via GitGitGadget, Jonathan Tan, derrickstolee,
	gitster

Victoria Dye <vdye@github.com> writes:

>> - All of them 'copy' .partial_clone. Most perform a shallow copy of the
>>   pointer, then set the .partial_clone = NULL so that it doesn't get
>>   cleared by clear_repository_format(). However,
>>   check_repository_format() copies the string deeply because the
>>   repository_format is sometimes read back (it is an "out" parameter).
>>   To accomodate both shallow copying and deep copying, toggle this
>>   behavior using the "modify_fmt_ok" parameter.
>
> Do you have a specific example of this happening? I see two uses of
> 'check_repository_format()' in the codebase:
>
> 1. in 'enter_repo()' ('path.c')
> 2. in 'init_db()' ('init-db.c')
>
> The first one calls 'check_repository_format()' with 'NULL', which causes
> the function to create a temporary 'struct repository_format' that is then
> discarded at the end of the function - no need to worry about the value
> being cleared there.
>
> The second one does call 'check_repository_format()' with a 'struct
> repository_format' instance, but the 'partial_clone' field field is not
> accessed again after that. The only subsequent usages of the 'repo_fmt'
> variable in 'init_db()' are:
>
> - in 'validate_hash_algorithm()', where only the 'version' and 'hash_algo'
>   fields are accessed.
> - in 'create_default_files()', where only 'hash_algo' is accessed.
>
> So, shouldn't it be safe to shallow-copy-and-NULL? But as I noted earlier
> [1], if you do that it'll make the name 'check_repository_format()' a bit
> misleading (since it's actually modifying its arg in place). So, if you
> update to always shallow copy, 'check_repository_format()' should be renamed
> to reflect its side effects.

My understanding of check_repository_format() is that it serves double
duty of doing a) setup of the_repository and b) populating an "out"
parameter with the appropriate values. IMO a) is the side effect that
could warrant the rename, and b) is the expected, "read-only" use case.
From that perspective, doing a shallow copy here isn't really
introducing a weird side-effect (because the arg to an "out" parameter
should be zero-ed out to begin with), but it's returning a 'wrong'
value. You're right that it's safe because the NULL-ed value isn't read
back right now, but it's not any good if this function gains more
callers.

Your point about not having side effects in check_*() is a good one
though, and I'm starting to feel doubtful that we should be doing setup
there either....

>> If you're comfortable with it, I would prefer for you to squash this
>> into your patches so that we don't just end up changing the same few
>> lines. If not, I'll Reviewed-by your patches (if I don't find any other
>> concerns on a re-read) and send this as a 1-patch on top.
>
> Reading through the commit message & patch, I'm still not convinced this
> refactor is a good idea - to me, it doesn't leave the code in a clearly
> better state. If you feel strongly that it does, though, I'm happy to leave
> it to others to review/decide but I would prefer that you keep it a separate
> patch submission on top.

Okay. Given how weird check_repository_format() and
discover_git_directory() are, I think we haven't done enough
investigation to properly consolidate this logic, and doing that
introduces quite a lot of scope creep. It feels very unsatisfactory that
we are propagating a pattern that is suspicious in some places and
outright wrong in others instead of cleaning up as we go and leaving it
in a better state for future authors, but this series does leave some
_other_ parts in a better state (removing the global), and I think it's
still a net positive.

The helper function might not be a good idea yet, but I'm convinced that
removing the setup from discover_git_directory() is a good idea. I think
this series would be in a better state if we get rid of the wrong
pattern instead of extending it. Unfortunately, I forgot to include that
change in the patch I sent (ugh) but here's a patch that _just_ includes
the discover_git_directory() change that I hope you can squash into your
series (and you can use whatever bits of my commit message you see fit).

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

  diff --git a/setup.c b/setup.c
  index 33ce58676f..b172ffd48a 100644
  --- a/setup.c
  +++ b/setup.c
  @@ -1422,14 +1422,6 @@ int discover_git_directory(struct strbuf *commondir,
      return -1;
    }

  -	the_repository->repository_format_worktree_config =
  -		candidate.worktree_config;
  -
  -	/* take ownership of candidate.partial_clone */
  -	the_repository->repository_format_partial_clone =
  -		candidate.partial_clone;
  -	candidate.partial_clone = NULL;
  -
    clear_repository_format(&candidate);
    return 0;
  }

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

You can see that this patch based on top of yours passes CI

  https://github.com/git/git/commit/9469fe3a6b0efbe89d26ef096a2eebabea59c55f
  https://github.com/chooglen/git/actions/runs/5258672473

> I think you may be missing changes to 'discover_git_directory()'? Like I
> mentioned above, though, if you don't think 'discover_git_directory()' needs
> to set up 'the_repository', then those assignments should just be removed
> (not replaced with 'setup_repository_from_format()').

Ah sorry, yes they were meant to be removed. I somehow missed those as I
was preparing the patch.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] setup: copy repository_format using helper
  2023-06-13 18:25                   ` Glen Choo
@ 2023-06-13 19:45                     ` Junio C Hamano
  0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2023-06-13 19:45 UTC (permalink / raw)
  To: Glen Choo
  Cc: Victoria Dye, git, Victoria Dye via GitGitGadget, Jonathan Tan,
	derrickstolee

Glen Choo <chooglen@google.com> writes:

> Victoria Dye <vdye@github.com> writes:
>
>> So, shouldn't it be safe to shallow-copy-and-NULL? But as I noted earlier
>> [1], if you do that it'll make the name 'check_repository_format()' a bit
>> misleading (since it's actually modifying its arg in place). So, if you
>> update to always shallow copy, 'check_repository_format()' should be renamed
>> to reflect its side effects.
>
> My understanding of check_repository_format() is that it serves double
> duty of doing a) setup of the_repository and b) populating an "out"
> parameter with the appropriate values. IMO a) is the side effect that
> could warrant the rename, and b) is the expected, "read-only" use case.
>
> From that perspective, doing a shallow copy here isn't really
> introducing a weird side-effect (because the arg to an "out" parameter
> should be zero-ed out to begin with), but it's returning a 'wrong'
> value. You're right that it's safe because the NULL-ed value isn't read
> back right now, but it's not any good if this function gains more
> callers.

Thanks for having this discussion.  The above makes perfect sense to
me.

> The helper function might not be a good idea yet, but I'm convinced that
> removing the setup from discover_git_directory() is a good idea. I think
> this series would be in a better state if we get rid of the wrong
> pattern instead of extending it.
> ...
>> I think you may be missing changes to 'discover_git_directory()'? Like I
>> mentioned above, though, if you don't think 'discover_git_directory()' needs
>> to set up 'the_repository', then those assignments should just be removed
>> (not replaced with 'setup_repository_from_format()').
>
> Ah sorry, yes they were meant to be removed. I somehow missed those as I
> was preparing the patch.

It looks like you two are in agreement at the end.  It does feel
that the change to make discover purely about discovering extends
the scope a bit too much, but it would be a good direction to go in
the longer term.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/3] Fix behavior of worktree config in submodules
  2023-05-26  1:32 ` [PATCH v2 0/3] " Victoria Dye via GitGitGadget
                     ` (3 preceding siblings ...)
  2023-05-26 15:48   ` [PATCH v2 0/3] Fix behavior of worktree config in submodules Derrick Stolee
@ 2023-06-13 22:09   ` Glen Choo
  2023-06-13 22:17     ` Victoria Dye
  4 siblings, 1 reply; 29+ messages in thread
From: Glen Choo @ 2023-06-13 22:09 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: derrickstolee, gitster, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  * Added a commit to move 'struct repository' out of 'git_config_source',
>    rather than creating a dummy 'config_source' just to hold a repository
>    instance.
>  * Changed the config setting in the new tests from 'feature.experimental'
>    to 'index.sparse' to tie these changes to their intended use case.
>  * "super project" -> "superproject"

Thanks! Discounting the discussions on the side thread (which we've
decided are mostly out of scope) I think this version is good enough to
merge as-is.

In

  https://lore.kernel.org/git/kl6llegnfccw.fsf@chooglen-macbookpro.roam.corp.google.com

I said that this series is better if we squash in a patch to drop the
setup code from discover_git_directory(), but on hindsight, I think it
also makes perfect sense for me to send that as a standalone patch. Let
me know if you plan to squash it in or not so I'll know whether to send
it :)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/3] Fix behavior of worktree config in submodules
  2023-06-13 22:09   ` Glen Choo
@ 2023-06-13 22:17     ` Victoria Dye
  0 siblings, 0 replies; 29+ messages in thread
From: Victoria Dye @ 2023-06-13 22:17 UTC (permalink / raw)
  To: Glen Choo, Victoria Dye via GitGitGadget, git; +Cc: derrickstolee, gitster

Glen Choo wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>  * Added a commit to move 'struct repository' out of 'git_config_source',
>>    rather than creating a dummy 'config_source' just to hold a repository
>>    instance.
>>  * Changed the config setting in the new tests from 'feature.experimental'
>>    to 'index.sparse' to tie these changes to their intended use case.
>>  * "super project" -> "superproject"
> 
> Thanks! Discounting the discussions on the side thread (which we've
> decided are mostly out of scope) I think this version is good enough to
> merge as-is.
> 
> In
> 
>   https://lore.kernel.org/git/kl6llegnfccw.fsf@chooglen-macbookpro.roam.corp.google.com
> 
> I said that this series is better if we squash in a patch to drop the
> setup code from discover_git_directory(), but on hindsight, I think it
> also makes perfect sense for me to send that as a standalone patch. Let
> me know if you plan to squash it in or not so I'll know whether to send
> it :)

Thanks for the re-review! This series was just merged to 'next', so I think
sending the new patch separately would be the least disruptive option.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-06-13 22:17 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23 23:17 [PATCH 0/2] Fix behavior of worktree config in submodules Victoria Dye via GitGitGadget
2023-05-23 23:17 ` [PATCH 1/2] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
2023-05-25  1:05   ` Glen Choo
2023-05-25 20:05     ` Derrick Stolee
2023-05-23 23:17 ` [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
2023-05-25  1:29   ` Glen Choo
2023-05-25 16:09     ` Glen Choo
2023-05-25 20:02       ` Victoria Dye
2023-05-25 20:13   ` Derrick Stolee
2023-05-24 10:25 ` [PATCH 0/2] Fix behavior of worktree config in submodules Junio C Hamano
2023-05-25 19:56 ` Glen Choo
2023-05-26  1:32 ` [PATCH v2 0/3] " Victoria Dye via GitGitGadget
2023-05-26  1:32   ` [PATCH v2 1/3] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
2023-05-26  1:32   ` [PATCH v2 2/3] config: pass 'repo' directly to 'config_with_options()' Victoria Dye via GitGitGadget
2023-05-26  1:33   ` [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
2023-05-31 22:17     ` Glen Choo
2023-06-01  4:43       ` Junio C Hamano
2023-06-12 21:37         ` Glen Choo
2023-06-07 22:29       ` Victoria Dye
2023-06-12 18:10         ` Glen Choo
2023-06-12 19:45           ` Victoria Dye
2023-06-12 20:23             ` Glen Choo
2023-06-12 23:04               ` [PATCH] setup: copy repository_format using helper Glen Choo
2023-06-13  0:03                 ` Victoria Dye
2023-06-13 18:25                   ` Glen Choo
2023-06-13 19:45                     ` Junio C Hamano
2023-05-26 15:48   ` [PATCH v2 0/3] Fix behavior of worktree config in submodules Derrick Stolee
2023-06-13 22:09   ` Glen Choo
2023-06-13 22:17     ` Victoria Dye

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).