From: Mike Hommey <mh@glandium.org>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/2] builtin/clone: allow remote helpers to detect repo
Date: Fri, 3 May 2024 11:04:32 +0900 [thread overview]
Message-ID: <20240503020432.2fxwuhjsvumy7i7z@glandium.org> (raw)
In-Reply-To: <9d888adf92e9a8af7c18847219f97d3e595e3e36.1709041721.git.ps@pks.im>
Hey, it's me again, coming back way too late to figure out that this fix
had an adverse side effect in a corner case.
Here's a simplified reproducer:
```
$ cat > git-remote-foo <<EOF
#!/bin/sh
# capabilities
echo option
echo import
echo push
echo
# option progress true
echo ok
# option verbosity 1
echo ok
# list
echo
EOF
$ chmod +x git-remote-foo
$ PATH=$PWD:$PATH git clone foo::bar
Cloning into 'bar'...
created HEAD
warning: this remote helper should implement refspec capability
created reference db
warning: You appear to have cloned an empty repository.
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
```
That hint is the adverse side effect. It shouldn't be shown.
It's happening because `builtin/clone.c` does this:
`create_reference_database(the_repository->ref_storage_format, NULL, 1);`
`create_reference` calls `is_reinit`, which returns true because there is
HEAD, and proceeds to skip the block behind `if (!reinit)`.
Before the change, that code block that is now skipped would have called
`git_default_branch_name(quiet)` (where `quiet` is 1, per the above call
to `create_reference_database`). In turn
`git_default_branch_name(quiet)` would have a) skipped the initial
branch hint (because quiet), and b) cached the branch name for subsequent
`git_default_branch_name` calls.
Then later in `builtin/clone.c`, we end up in this code:
`branch = git_default_branch_name(0);`
Before the change, this would get the cache value from the previous
call, but now it has to compute it, and because 0 is passed, it's not
quiet about the initial branch hint.
The non-remote-helper case goes through with this without a problem
because it ends up in the transport_ls_refs_options.unborn_head_target
case and never calls `git_default_branch_name(0)`.
Mike
On Tue, Feb 27, 2024 at 03:27:44PM +0100, Patrick Steinhardt wrote:
> In 18c9cb7524 (builtin/clone: create the refdb with the correct object
> format, 2023-12-12), we have changed git-clone(1) so that it delays
> creation of the refdb until after it has learned about the remote's
> object format. This change was required for the reftable backend, which
> encodes the object format into the tables. So if we pre-initialized the
> refdb with the default object format, but the remote uses a different
> object format than that, then the resulting tables would have encoded
> the wrong object format.
>
> This change unfortunately breaks remote helpers which try to access the
> repository that is about to be created. Because the refdb has not yet
> been initialized at the point where we spawn the remote helper, we also
> don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by
> the remote helper which try to access the repository would fail because
> it cannot be discovered.
>
> This is essentially a chicken-and-egg problem: we cannot initialize the
> refdb because we don't know about the object format. But we cannot learn
> about the object format because the remote helper may be unable to
> access the partially-initialized repository.
>
> Ideally, we would address this issue via capabilities. But the remote
> helper protocol is not structured in a way that guarantees that the
> capability announcement happens before the remote helper tries to access
> the repository.
>
> Instead, fix this issue by partially initializing the refdb up to the
> point where it becomes discoverable by Git commands.
>
> Reported-by: Mike Hommey <mh@glandium.org>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> builtin/clone.c | 46 ++++++++++++++++++++++++++++++++++++++
> setup.c | 9 +++++++-
> t/t5801/git-remote-testgit | 5 +++++
> 3 files changed, 59 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index bad1b70ce8..5d7f112125 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -926,6 +926,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> struct ref *mapped_refs = NULL;
> const struct ref *ref;
> struct strbuf key = STRBUF_INIT;
> + struct strbuf buf = STRBUF_INIT;
> struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
> struct transport *transport = NULL;
> const char *src_ref_prefix = "refs/heads/";
> @@ -1125,6 +1126,50 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> git_dir = real_git_dir;
> }
>
> + /*
> + * We have a chicken-and-egg situation between initializing the refdb
> + * and spawning transport helpers:
> + *
> + * - Initializing the refdb requires us to know about the object
> + * format. We thus have to spawn the transport helper to learn
> + * about it.
> + *
> + * - The transport helper may want to access the Git repository. But
> + * because the refdb has not been initialized, we don't have "HEAD"
> + * or "refs/". Thus, the helper cannot find the Git repository.
> + *
> + * Ideally, we would have structured the helper protocol such that it's
> + * mandatory for the helper to first announce its capabilities without
> + * yet assuming a fully initialized repository. Like that, we could
> + * have added a "lazy-refdb-init" capability that announces whether the
> + * helper is ready to handle not-yet-initialized refdbs. If any helper
> + * didn't support them, we would have fully initialized the refdb with
> + * the SHA1 object format, but later on bailed out if we found out that
> + * the remote repository used a different object format.
> + *
> + * But we didn't, and thus we use the following workaround to partially
> + * initialize the repository's refdb such that it can be discovered by
> + * Git commands. To do so, we:
> + *
> + * - Create an invalid HEAD ref pointing at "refs/heads/.invalid".
> + *
> + * - Create the "refs/" directory.
> + *
> + * - Set up the ref storage format and repository version as
> + * required.
> + *
> + * This is sufficient for Git commands to discover the Git directory.
> + */
> + initialize_repository_version(GIT_HASH_UNKNOWN,
> + the_repository->ref_storage_format, 1);
> +
> + strbuf_addf(&buf, "%s/HEAD", git_dir);
> + write_file(buf.buf, "ref: refs/heads/.invalid");
> +
> + strbuf_reset(&buf);
> + strbuf_addf(&buf, "%s/refs", git_dir);
> + safe_create_dir(buf.buf, 1);
> +
> /*
> * additional config can be injected with -c, make sure it's included
> * after init_db, which clears the entire config environment.
> @@ -1453,6 +1498,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> free(remote_name);
> strbuf_release(&reflog_msg);
> strbuf_release(&branch_top);
> + strbuf_release(&buf);
> strbuf_release(&key);
> free_refs(mapped_refs);
> free_refs(remote_head_points_at);
> diff --git a/setup.c b/setup.c
> index b69b1cbc2a..e3b76e84b5 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1889,6 +1889,13 @@ void initialize_repository_version(int hash_algo,
> char repo_version_string[10];
> int repo_version = GIT_REPO_VERSION;
>
> + /*
> + * Note that we initialize the repository version to 1 when the ref
> + * storage format is unknown. This is on purpose so that we can add the
> + * correct object format to the config during git-clone(1). The format
> + * version will get adjusted by git-clone(1) once it has learned about
> + * the remote repository's format.
> + */
> if (hash_algo != GIT_HASH_SHA1 ||
> ref_storage_format != REF_STORAGE_FORMAT_FILES)
> repo_version = GIT_REPO_VERSION_READ;
> @@ -1898,7 +1905,7 @@ void initialize_repository_version(int hash_algo,
> "%d", repo_version);
> git_config_set("core.repositoryformatversion", repo_version_string);
>
> - if (hash_algo != GIT_HASH_SHA1)
> + if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
> git_config_set("extensions.objectformat",
> hash_algos[hash_algo].name);
> else if (reinit)
> diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
> index 1544d6dc6b..bcfb358c51 100755
> --- a/t/t5801/git-remote-testgit
> +++ b/t/t5801/git-remote-testgit
> @@ -12,6 +12,11 @@ url=$2
>
> dir="$GIT_DIR/testgit/$alias"
>
> +if ! git rev-parse --is-inside-git-dir
> +then
> + exit 1
> +fi
> +
> h_refspec="refs/heads/*:refs/testgit/$alias/heads/*"
> t_refspec="refs/tags/*:refs/testgit/$alias/tags/*"
>
> --
> 2.44.0
>
next prev parent reply other threads:[~2024-05-03 2:23 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-27 14:27 [PATCH 0/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
2024-02-27 14:27 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Patrick Steinhardt
2024-02-28 11:32 ` Karthik Nayak
2024-03-04 6:44 ` Patrick Steinhardt
2024-03-04 16:28 ` Junio C Hamano
2024-03-05 11:43 ` Patrick Steinhardt
2024-03-05 15:59 ` Junio C Hamano
2024-03-06 11:17 ` [PATCH] t0610: remove unused variable assignment Patrick Steinhardt
2024-03-01 1:20 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Justin Tobler
2024-03-04 6:44 ` Patrick Steinhardt
2024-02-27 14:27 ` [PATCH 2/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
2024-02-27 21:31 ` Junio C Hamano
2024-03-04 6:44 ` Patrick Steinhardt
2024-03-04 16:17 ` Junio C Hamano
2024-05-03 2:04 ` Mike Hommey [this message]
2024-02-27 20:58 ` [PATCH 0/2] " Junio C Hamano
2024-02-27 21:33 ` Junio C Hamano
2024-03-04 6:44 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240503020432.2fxwuhjsvumy7i7z@glandium.org \
--to=mh@glandium.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).