Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* bug? round-trip through fast-import/fast-export loses files
@ 2023-03-20 17:10 Priedhorsky, Reid
  2023-03-21  1:57 ` Elijah Newren
  0 siblings, 1 reply; 6+ messages in thread
From: Priedhorsky, Reid @ 2023-03-20 17:10 UTC (permalink / raw
  To: git@vger.kernel.org

  Hello,

  I believe I’ve found a bug in Git. It seems that (1) round-tripping through
  fast-export/fast-import a repository (2) that contains a commit that changes
  a file to a directory (3) deletes the contents of that directory from the
  repository.

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)

  Run this shell script:

  ~~~~
  #!/bin/bash

  set -ex

  mkdir -p /tmp/weirdal
  cd /tmp/weirdal
  git --version

  # init repo
  rm -Rf wd
  mkdir wd
  cd wd
  git init -b main

  # first commit - foo is a file
  touch foo
  git add -A
  git commit -m 'file'

  # second commit - foo is a directory
  rm foo
  mkdir foo
  touch foo/bar
  git add -A
  git commit -m 'directory'

  # the contents of foo are in the working dir and the repo
  git status
  ls -lR
  git ls-tree --name-only -r HEAD

  # import/export repository (add --full-tree to work around bug)
  git fast-export --no-data -- --all > ../export
  cat ../export
  git fast-import --force --quiet < ../export

  # bug: foo is still in the WD but not the repo; should still be both
  git status
  ls -lR
  git ls-tree --name-only -r HEAD
  #git fast-export --no-data -- --all | diff -u --text ../export - || true
  ~~~~

What did you expect to happen? (Expected behavior)

  Repo should be unchanged, i.e.:

  + git status
  On branch main
  nothing to commit, working tree clean

What happened instead? (Actual behavior)

  Git thinks foo/bar has been staged:

  + git status
  On branch main
  Changes to be committed:
    (use "git restore --staged <file>..." to unstage)
          new file:   foo/bar

What's different between what you expected and what actually happened?

  File foo/bar is staged when it should be unchanged.

Anything else you want to add:

  This also happens in 2.38.1 built from source.

  The bad behavior can be worked around with “--full-tree” on fast-export, but
  the real repo where I want to do this is pretty large, so I’d prefer not to.

  Note the “git fast-export” output:

    commit refs/heads/main
    mark :2
    author Reid Priedhorsky <reidpr@lanl.gov> 1679330805 -0600
    committer Reid Priedhorsky <reidpr@lanl.gov> 1679330805 -0600
    data 10
    directory
    from :1
    M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 foo/bar
    D foo

  It looks to me like the “M ... foo/bar” is being processed before “D foo”
  when it should happen in the opposite order.

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.40.0.71.g950264636c
cpu: x86_64
built from commit: 950264636c68591989456e3ba0a5442f93152c1a
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64
compiler info: gnuc: 10.2
libc info: glibc: 2.31
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]

Thanks,
Reid

—
he/his


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug? round-trip through fast-import/fast-export loses files
  2023-03-20 17:10 bug? round-trip through fast-import/fast-export loses files Priedhorsky, Reid
@ 2023-03-21  1:57 ` Elijah Newren
  2023-03-21 15:54   ` Priedhorsky, Reid
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Elijah Newren @ 2023-03-21  1:57 UTC (permalink / raw
  To: Priedhorsky, Reid; +Cc: git@vger.kernel.org

Hi,

On Mon, Mar 20, 2023 at 11:23 AM Priedhorsky, Reid <reidpr@lanl.gov> wrote:
>
>   Hello,
>
>   I believe I’ve found a bug in Git. It seems that (1) round-tripping through
>   fast-export/fast-import a repository (2) that contains a commit that changes
>   a file to a directory (3) deletes the contents of that directory from the
>   repository.
>
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
>
>   Run this shell script:
>
>   ~~~~
>   #!/bin/bash
>
>   set -ex
>
>   mkdir -p /tmp/weirdal
>   cd /tmp/weirdal
>   git --version
>
>   # init repo
>   rm -Rf wd
>   mkdir wd
>   cd wd
>   git init -b main
>
>   # first commit - foo is a file
>   touch foo
>   git add -A
>   git commit -m 'file'
>
>   # second commit - foo is a directory
>   rm foo
>   mkdir foo
>   touch foo/bar
>   git add -A
>   git commit -m 'directory'
>
>   # the contents of foo are in the working dir and the repo
>   git status
>   ls -lR
>   git ls-tree --name-only -r HEAD
>
>   # import/export repository (add --full-tree to work around bug)
>   git fast-export --no-data -- --all > ../export
>   cat ../export
>   git fast-import --force --quiet < ../export
>
>   # bug: foo is still in the WD but not the repo; should still be both
>   git status
>   ls -lR
>   git ls-tree --name-only -r HEAD
>   #git fast-export --no-data -- --all | diff -u --text ../export - || true
>   ~~~~
>
> What did you expect to happen? (Expected behavior)
>
>   Repo should be unchanged, i.e.:
>
>   + git status
>   On branch main
>   nothing to commit, working tree clean
>
> What happened instead? (Actual behavior)
>
>   Git thinks foo/bar has been staged:
>
>   + git status
>   On branch main
>   Changes to be committed:
>     (use "git restore --staged <file>..." to unstage)
>           new file:   foo/bar
>
> What's different between what you expected and what actually happened?
>
>   File foo/bar is staged when it should be unchanged.
>
> Anything else you want to add:
>
>   This also happens in 2.38.1 built from source.
>
>   The bad behavior can be worked around with “--full-tree” on fast-export, but
>   the real repo where I want to do this is pretty large, so I’d prefer not to.
>
>   Note the “git fast-export” output:
>
>     commit refs/heads/main
>     mark :2
>     author Reid Priedhorsky <reidpr@lanl.gov> 1679330805 -0600
>     committer Reid Priedhorsky <reidpr@lanl.gov> 1679330805 -0600
>     data 10
>     directory
>     from :1
>     M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 foo/bar
>     D foo
>
>   It looks to me like the “M ... foo/bar” is being processed before “D foo”
>   when it should happen in the opposite order.

Thanks for the well-written bug report, including not only a testcase
but even the relevant bits of the fast-export output.  I thought I had
fixed D/F issues in fast-export & fast-import before, and indeed a
search turns up both of

253fb5f889 (fast-import: Improve robustness when D->F changes provided
in wrong order, 2010-07-09)
060df62422 (fast-export: Fix output order of D/F changes, 2010-07-09)

However, it looks like both of those only considered D->F (directory
becomes a file) changes, whereas you specifically have a case of F->D
(file becoming a directory).

Honestly, looking back at those two patches of mine, I think both were
rather suboptimal.  A better solution that would handle both F->D and
D->F would be having fast-export sort the diff_filepairs such that it
processes the deletes before the modifies.  Another improved solution
would be having fast-import sort the files given to it and handling
deletes first.  Either should fix this.

Might be a good task for a new contributor.  Any takers?  (Tagging as
#leftoverbits.)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug? round-trip through fast-import/fast-export loses files
  2023-03-21  1:57 ` Elijah Newren
@ 2023-03-21 15:54   ` Priedhorsky, Reid
  2023-03-21 17:07   ` Junio C Hamano
  2023-03-21 18:31   ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Priedhorsky, Reid @ 2023-03-21 15:54 UTC (permalink / raw
  To: git@vger.kernel.org; +Cc: Elijah Newren


> On Mar 20, 2023, at 7:57 PM, Elijah Newren <newren@gmail.com> wrote:
> 
> Thanks for the well-written bug report, including not only a testcase
> but even the relevant bits of the fast-export output.

😁

> A better solution that would handle both F->D and
> D->F would be having fast-export sort the diff_filepairs such that it
> processes the deletes before the modifies.

In fact, the successful workaround in my own code is to re-order the fast-export output in exactly this way (i.e., sort the lines so all the D lines come before all the M lines).

> Another improved solution
> would be having fast-import sort the files given to it and handling
> deletes first.  Either should fix this.
> 
> Might be a good task for a new contributor.  Any takers?  (Tagging as
> #leftoverbits.)

Let me take a look, but I can’t guarantee anything.

HTH,
Reid

—
he/his


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug? round-trip through fast-import/fast-export loses files
  2023-03-21  1:57 ` Elijah Newren
  2023-03-21 15:54   ` Priedhorsky, Reid
@ 2023-03-21 17:07   ` Junio C Hamano
  2023-03-21 18:31   ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2023-03-21 17:07 UTC (permalink / raw
  To: Elijah Newren; +Cc: Priedhorsky, Reid, git@vger.kernel.org

Elijah Newren <newren@gmail.com> writes:

> Honestly, looking back at those two patches of mine, I think both were
> rather suboptimal.  A better solution that would handle both F->D and
> D->F would be having fast-export sort the diff_filepairs such that it
> processes the deletes before the modifies.  Another improved solution
> would be having fast-import sort the files given to it and handling
> deletes first.  Either should fix this.

Reminds me of the trick we had to invent for "git apply", where we
process deletions first and then creation, to avoid the exact
problem of creating "foo" while removing "foo/bar" ;-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug? round-trip through fast-import/fast-export loses files
  2023-03-21  1:57 ` Elijah Newren
  2023-03-21 15:54   ` Priedhorsky, Reid
  2023-03-21 17:07   ` Junio C Hamano
@ 2023-03-21 18:31   ` Jeff King
  2023-03-22  3:07     ` Elijah Newren
  2 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2023-03-21 18:31 UTC (permalink / raw
  To: Elijah Newren; +Cc: Miguel Torroja, Priedhorsky, Reid, git@vger.kernel.org

On Mon, Mar 20, 2023 at 06:57:21PM -0700, Elijah Newren wrote:

> Honestly, looking back at those two patches of mine, I think both were
> rather suboptimal.  A better solution that would handle both F->D and
> D->F would be having fast-export sort the diff_filepairs such that it
> processes the deletes before the modifies.  Another improved solution
> would be having fast-import sort the files given to it and handling
> deletes first.  Either should fix this.
> 
> Might be a good task for a new contributor.  Any takers?  (Tagging as
> #leftoverbits.)

There was a patch a while ago, but it didn't get applied:

  https://lore.kernel.org/git/1493079137-1838-1-git-send-email-miguel.torroja@gmail.com/

It got hung up on the fact that fast-export can also generate renames,
and ordering there is tricky. I stand by the sentiment from back then
that it is still worth it to order things to make the no-rename case
work, even if there are still corner cases with renames (since you can
have a cycle of renames, you should not use them if you want to be
robust against this kind of ordering dependency).

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug? round-trip through fast-import/fast-export loses files
  2023-03-21 18:31   ` Jeff King
@ 2023-03-22  3:07     ` Elijah Newren
  0 siblings, 0 replies; 6+ messages in thread
From: Elijah Newren @ 2023-03-22  3:07 UTC (permalink / raw
  To: Jeff King; +Cc: Miguel Torroja, Priedhorsky, Reid, git@vger.kernel.org

On Tue, Mar 21, 2023 at 11:31 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Mar 20, 2023 at 06:57:21PM -0700, Elijah Newren wrote:
>
> > Honestly, looking back at those two patches of mine, I think both were
> > rather suboptimal.  A better solution that would handle both F->D and
> > D->F would be having fast-export sort the diff_filepairs such that it
> > processes the deletes before the modifies.  Another improved solution
> > would be having fast-import sort the files given to it and handling
> > deletes first.  Either should fix this.
> >
> > Might be a good task for a new contributor.  Any takers?  (Tagging as
> > #leftoverbits.)
>
> There was a patch a while ago, but it didn't get applied:
>
>   https://lore.kernel.org/git/1493079137-1838-1-git-send-email-miguel.torroja@gmail.com/
>
> It got hung up on the fact that fast-export can also generate renames,
> and ordering there is tricky. I stand by the sentiment from back then
> that it is still worth it to order things to make the no-rename case
> work, even if there are still corner cases with renames (since you can
> have a cycle of renames, you should not use them if you want to be
> robust against this kind of ordering dependency).

Ah, thanks for the heads up.  Stinkin' renames.  ;-)  I tend to forget
those in the fast-export/fast-import context since fast-export doesn't
detect renames by default (and e.g. filter-repo would turn them off if
they were on by default).

So, I think we probably need to have the fix in fast-import.
Currently, parse_new_commit() calls the various file_change_*()
functions immediately, which in turn immediately call the various
tree_content_*() functions (e.g. file_change_cr() for a rename will
call both tree_content_remove() and tree_content_set()).  Instead of
immediately handling all these, I think that queueing all the
file_change_*() calls up, splitting the renames into delete + modify,
and sorting all the deletes first should fix things for both F->D and
D->F, even when cycles of renames are present as per your example in
that thread.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-22  3:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-20 17:10 bug? round-trip through fast-import/fast-export loses files Priedhorsky, Reid
2023-03-21  1:57 ` Elijah Newren
2023-03-21 15:54   ` Priedhorsky, Reid
2023-03-21 17:07   ` Junio C Hamano
2023-03-21 18:31   ` Jeff King
2023-03-22  3:07     ` Elijah Newren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).