Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* Re: When someone creates a new worktree by copying an existing one
       [not found] <CAPMMpohHo2co7n_NjD9XntBs3DVU91Rqx8dmRrSWg=1eof+Rhw@mail.gmail.com>
@ 2023-06-01  8:32 ` Tao Klerks
  2023-06-01 19:48 ` Eric Sunshine
  1 sibling, 0 replies; 3+ messages in thread
From: Tao Klerks @ 2023-06-01  8:32 UTC (permalink / raw)
  To: git

On Wed, May 31, 2023 at 8:59 PM Tao Klerks <tao@klerks.biz> wrote:
>
> I think I can implement something reasonably effective with a pair of
> hooks (pre-commit and post-checkout look like good candidates), but
> I'm weirded out that something like this should need to be "custom".
>

FWIW, here's what that code has ended up looking like, in
post-checkout and pre-commit:

########################################################################
# WORKTREE GIT FILE PATH TO REPO WORKTREE GITDIR FILE PATH CONSISTENCY #
########################################################################
# (this code is duplicated across pre-commit and post-checkout)

WORKTREE_GIT_FILE_PATH="$PWD/.git"
GITDIR_PREFIX="gitdir: "
# Only run worktree consistency checks if it's a worktree - if .git is a file
if [[ -f "$WORKTREE_GIT_FILE_PATH" ]]; then
  WORKTREE_GIT_FILE_CONTENT=$(head -n 1 "$WORKTREE_GIT_FILE_PATH")
  WORKTREE_METADATA_FOLDER_PATH=${WORKTREE_GIT_FILE_CONTENT#"$GITDIR_PREFIX"}
  WORKTREE_GITDIR_FILE_PATH="$WORKTREE_METADATA_FOLDER_PATH/gitdir"
  # if the gitdir file doesn't exist, git will complain loudly, no
need to interfere.
  if [[ -f "$WORKTREE_GITDIR_FILE_PATH" ]]; then
    WORKTREE_GITDIR_PATH=$(head -n 1 "$WORKTREE_GITDIR_FILE_PATH")
    # if the gitdir value doesn't match this worktree's git file path,
we have a problem.
    if ! [[ "$WORKTREE_GIT_FILE_PATH" -ef "$WORKTREE_GITDIR_PATH" ]]; then
      >&2 echo ""
      >&2 echo "******************************************************************************"
      >&2 echo "* WARNING: It looks like this worktree has moved or
been copied incorrectly! *"
      >&2 echo "******************************************************************************"
      >&2 echo ""
      >&2 echo "The git repo expects this worktree to be at:
$(realpath "$(dirname "$WORKTREE_GITDIR_PATH")")"
      >&2 echo ""
      >&2 echo "However, it is at: $(realpath "$(dirname
"$WORKTREE_GIT_FILE_PATH")")"
      >&2 echo ""
      >&2 echo "If you copied an existing worktree to create this one,
then the two worktrees"
      >&2 echo "will be badly 'linked', with changes in one affecting
the other. To resolve this,"
      >&2 echo "please delete one of the two, and run 'git worktree
repair' in the remaining one."
      >&2 echo "You can then properly create a new worktree using 'git
worktree add'. You"
      >&2 echo "should delete whichever worktree doesn't contain
uncommitted changes you"
      >&2 echo "want to keep, of course."
      >&2 echo ""
      >&2 echo "If you simply moved/renamed this worktree, then run
'git worktree repair' to"
      >&2 echo "make the git repo and worktree 'match up' again, and
resolve this warning."
      >&2 echo ""
      >&2 echo "******************************************************************************"
      >&2 echo ""
      # in post-checkout this exit code won't change git's behavior,
but git still "echoes" it to
      # the calling program, so it's still a useful signal that
something went wrong - which is
      # warranted under the circumstances.
      exit 1
    fi
  fi
fi

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: When someone creates a new worktree by copying an existing one
       [not found] <CAPMMpohHo2co7n_NjD9XntBs3DVU91Rqx8dmRrSWg=1eof+Rhw@mail.gmail.com>
  2023-06-01  8:32 ` When someone creates a new worktree by copying an existing one Tao Klerks
@ 2023-06-01 19:48 ` Eric Sunshine
  2023-06-02 19:19   ` Tao Klerks
  1 sibling, 1 reply; 3+ messages in thread
From: Eric Sunshine @ 2023-06-01 19:48 UTC (permalink / raw)
  To: Tao Klerks; +Cc: git

On Wed, May 31, 2023 at 3:00 PM Tao Klerks <tao@klerks.biz> wrote:
> I've recently encountered some users who've got themselves into
> trouble by copying a worktree into another folder, to create a second
> worktree, instead of using "git worktree add".
>
> They assumed this would be fine, just the same as creating a new
> *repository* by copying an new *repository* is generally fine (except
> when you introduce worktrees, of course).
>
> There users eventually understood that something was wrong, as you
> might expect, when a checkout in one worktree also "checked out" (and
> produced lots of unexpected changes) in the other worktree - or
> similar weirdnesses, eg on commit.
>
> Ideally, as I user, I would expect git to start shouting at me to "git
> worktree repair" the moment the mutual reference between the ".git"
> file of the worktree, and the "gitdir" file of the repo's worktree
> metadata, stopped agreeing.
>
> Is there any reason we can't/don't have such a safety check? (would it
> be expensive??)
>
> I think I can implement something reasonably effective with a pair of
> hooks (pre-commit and post-checkout look like good candidates), but
> I'm weirded out that something like this should need to be "custom".
>
> I did a quick search of the archives but didn't find anything
> relevant. Has this been discussed? Should I try to prepare a patch
> including that sort of validation... somewhere?

Just some thoughts, not a proper answer...

If you want to detect this sort of problem early, then patching one of
the initialization functions which is called by each Git command which
requires a working tree (i.e. those with NEED_WORK_TREE set or which
call setup_work_tree() manually) would be one approach. It's not
immediately obvious which function should be patched since there is
deep and mysterious interaction between the various startup functions,
but probably something in setup.c or environment.c.

An alternative might be to perform the detection only when the "index"
is about to be updated or some other worktree-local administrative
entry, but that seems like a much less tractable approach.

Aside from the corruption you describe above, there may be other types
of corruption worth detecting early, but any such detection may be
prohibitively expensive, even if it's just the type you mentioned. (By
"prohibitively", I'm referring to previous work to reduce the startup
time of Git commands; people may not want to see those gains reversed,
especially for a case such as this which is likely to be rare.)

Unless I'm mistaken, the described corruption can only be detected by
a Git command running within the duplicate directory itself since
there will be no entry in .git/worktrees/ corresponding to the
manually-created copy. Hence, Git commands run in any other worktree
will be unable to detect it, not even the "git worktree" command.

The validation performed by worktree.c:validate_worktree() is already
capable of detecting the sort of corruption you describe, if handed a
`struct worktree` populated to point at the worktree which is a copy
of the original worktree. However, there is no API presently to create
a `struct worktree` from an arbitrary path, so that is something you'd
have to add if you want to take advantage of
worktree.c:validate_worktree(). Adding such a function does feel
somewhat ugly and special-purpose, though.

I was going to mention that suggesting to the user merely to use "git
worktree repair" would not help in this situation, but I see in your
followup email that you instruct the user to first delete one of the
worktrees before running "git worktree repair", which would indeed
work. However, you probably want the instructions to be clearer,
saying that the user should delete the worktree directory manually
(say, with "rm -rf"), _not_ with "git worktree remove".

It wouldn't help with early-detection, but it might be worthwhile to
add a "git worktree validate" command which detects and reports
problems with worktree administrative data (i.e. call
worktree.c:validate_worktree() for each worktree). However, the above
caveat still applies about inability to detect the problem from other
worktrees.

Upgrading "git worktree repair" to report the problem might be
worthwhile, but it also is subject to the same caveat about inability
to detect the problem from other worktrees.

One might suggest that "git worktree repair" should somehow repair the
situation itself, perhaps by creating a new entry in .git/worktrees/,
but it's not at all clear if that is a good idea. Any such automated
solution would need to be very careful not to throw away the user's
work (for instance, work which is staged in the "index").

It may be possible to add some option or subcommand to "git worktree"
which would allow a user to "attach" an existing directory as a
worktree. For instance:

    % cp -R worktreeA worktreeB
    % git worktree attach worktreeB

would create the necessary administrative entries in
.git/worktrees/worktreeB/ and make worktreeB/.git point at
.git/worktrees/worktreeB. However, I'm extremely reluctant to suggest
doing this since we don't want to encourage users to perform worktree
manipulation manually; they should be using "git worktree" for all
such tasks. (We also don't want users hijacking a worktree which
belongs to some other repository.)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: When someone creates a new worktree by copying an existing one
  2023-06-01 19:48 ` Eric Sunshine
@ 2023-06-02 19:19   ` Tao Klerks
  0 siblings, 0 replies; 3+ messages in thread
From: Tao Klerks @ 2023-06-02 19:19 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git

On Thu, Jun 1, 2023 at 9:48 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Wed, May 31, 2023 at 3:00 PM Tao Klerks <tao@klerks.biz> wrote:
> >
>
>
> Aside from the corruption you describe above, there may be other types
> of corruption worth detecting early, but any such detection may be
> prohibitively expensive, even if it's just the type you mentioned. (By
> "prohibitively", I'm referring to previous work to reduce the startup
> time of Git commands; people may not want to see those gains reversed,
> especially for a case such as this which is likely to be rare.)

Will keep that in mind, thx.

>
> Unless I'm mistaken, the described corruption can only be detected by
> a Git command running within the duplicate directory itself since
> there will be no entry in .git/worktrees/ corresponding to the
> manually-created copy. Hence, Git commands run in any other worktree
> will be unable to detect it, not even the "git worktree" command.
>

That matches my understanding.

> I was going to mention that suggesting to the user merely to use "git
> worktree repair" would not help in this situation, but I see in your
> followup email that you instruct the user to first delete one of the
> worktrees before running "git worktree repair", which would indeed
> work. However, you probably want the instructions to be clearer,
> saying that the user should delete the worktree directory manually
> (say, with "rm -rf"), _not_ with "git worktree remove".
>

Makes sense, thanks for the tip!

> It wouldn't help with early-detection, but it might be worthwhile to
> add a "git worktree validate" command which detects and reports
> problems with worktree administrative data (i.e. call
> worktree.c:validate_worktree() for each worktree). However, the above
> caveat still applies about inability to detect the problem from other
> worktrees.
>

Yep, that might be interesting, but wouldn't obviously address the
case I'm most interested in.

My objective is to tell the user "you've done something wrong, and are
now in a problem state" as soon as they start operating in their
newly-copied worktree.

> Upgrading "git worktree repair" to report the problem might be
> worthwhile, but it also is subject to the same caveat about inability
> to detect the problem from other worktrees.
>

Yeah, I would have appreciated a "git worktree repair -n" option when
I was writing the hook scripts, to avoid having to "bash" my way
around the .git and gitdir references.

To your point, my hook script could do a much better job than it
currently does: instead of saying "there's *something* wrong", it
could actually check whether the "wrong" original worktree gitdir path
stored in the repo actually corresponds to a real path,. and thereby
differentiate between "run git worktree repair to fix things" and
"delete this unholy worktree you have created with rm -rf".

> One might suggest that "git worktree repair" should somehow repair the
> situation itself, perhaps by creating a new entry in .git/worktrees/,
> but it's not at all clear if that is a good idea. Any such automated
> solution would need to be very careful not to throw away the user's
> work (for instance, work which is staged in the "index").
>

Yeah, I agree that could end up weird. Imagine:
* The user creates a copy (B) of the original worktree (A)
* Initially, the index, HEAD etc is all correct for both worktrees -
they are identical
* The user keeps working in the original worktree, A, staging stuff,
committing stuff, switching branches - git has no way of warning of
anything wrong. There *is* nothing wrong with the git repo.
* The user changes directory to the "bad" worktree, "B".
* Now git can warn that there is something wrong, and offer "repair by
creating a new worktree from the shared state?"... but the resulting
worktree would still be a complete mess, with a large set of
unexplained unexplainable unstaged changes, some of which might have
originally been real unstaged changes at the time of the copy.

As far as I can tell, the only "right" thing is to tell the user to
delete the duplicate-and-mistargeted worktree. As long as the warning
from git comes early enough, the user is unlikely to have managed to
do any work in that folder before they find out it's unusable.

> It may be possible to add some option or subcommand to "git worktree"
> which would allow a user to "attach" an existing directory as a
> worktree. For instance:
>
>     % cp -R worktreeA worktreeB
>     % git worktree attach worktreeB
>
> would create the necessary administrative entries in
> .git/worktrees/worktreeB/ and make worktreeB/.git point at
> .git/worktrees/worktreeB. However, I'm extremely reluctant to suggest
> doing this since we don't want to encourage users to perform worktree
> manipulation manually; they should be using "git worktree" for all
> such tasks. (We also don't want users hijacking a worktree which
> belongs to some other repository.)

Agreed - my main concern with even attempting to offer this is that
the user who does this by accident is ill-equipped to let git know
what the correct HEAD for that directory is!

Thanks again for the feedback.

I'll add this to my list of "things to see if one day I could help
with", but likely won't get to it.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-06-02 19:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAPMMpohHo2co7n_NjD9XntBs3DVU91Rqx8dmRrSWg=1eof+Rhw@mail.gmail.com>
2023-06-01  8:32 ` When someone creates a new worktree by copying an existing one Tao Klerks
2023-06-01 19:48 ` Eric Sunshine
2023-06-02 19:19   ` Tao Klerks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).