Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Matt Cree <matt.cree@gearset.com>
To: git@vger.kernel.org
Subject: Issue with git log and reference repositories using --dissociate and --filter=blob:none
Date: Fri, 17 May 2024 14:52:45 +0100	[thread overview]
Message-ID: <3B4EFAA3-0EE1-4C08-ADCD-7B03E184B016@gearset.com> (raw)

Hello all, I am working with an application that uses a reference cache for repositories. We use that cache on subsequent clones to save on some of the overheads of cloning the repositories each time we need them, which is fairly frequently.

We've ran into an issue with git log combined with our use of the reference repository (and possibly --dissociate on the clone) where the commit history cannot be fetched (for a while) with the error:
> error: Could not read 2ffba4df2f9ec9df145fcdd84fe20a3d934b4555
> fatal: Failed to traverse parents of commit 7603ede45da4d396f2641b01e2ef3e13d49b572f

This is part of a user facing feature but thankfully it's extremely infrequent as of right now.

We do a partial clone of the repository with the following
> git -c core.symlinks=false -c gc.auto=0 clone --reference ./$reference_repo --dissociate --filter=blob:none --no-checkout -b master https://github.com/mattcree/dissociate-clone-issue ./$clone_repo

Then we try to get the log 
> git -c core.symlinks=false -c gc.auto=0 log -100 --first-parent --pretty="%H %an %ct %s" master b3447a67238c760aa2845d32e5eb95b96e67c733

I set the following trace options to help debugging
> GIT_TRACE=2 GIT_CURL_VERBOSE=2 GIT_TRACE_PERFORMANCE=2 GIT_TRACE_PACK_ACCESS=2 GIT_TRACE_PACKET=2 GIT_TRACE_PACKFILE=2 GIT_TRACE_SETUP=2 GIT_TRACE_SHALLOW=2

From what I can tell, what is going on here is the following
1. We cloned the repository using --dissociate which forces a repack of the cloned repository
2. The clone completes fast (on git 2.40+) but in the background 'git-remote-https' is running
3. The bug appears while I request the log during this time
4. When the 'git-remote-https' process ends, the log can be requested successfully

From what I can tell git repack is called during the dissociate, which I guess forces an update to the pack files. For this reason the pack file the commit objects are in may not exist at the time when the git log is called.

This means when going through the commits, eventually it does not find one here https://github.com/git/git/blob/492ee03f60297e7e83d101f4519ab8abc98782bc/revision.c#L1106 -- this code path does seem aware of the possibility of missing objects but in this case the arguments it has been given clearly don't stop it from failing here.

When we remove '--dissociate' this issue does not appear. The decision to use dissociate was mainly just driven by perceived safety (I did not work on this part and can't say either way) -- we may fix it for now by removing this.

When we remove '--filter=blob:none' the issue also do not appear.

When running the log command, Git 2.39.3 appears to print quite a bit of http logging e.g.
> Info: [HTTP/2] [1] OPENED stream for https://github.com/mattcree/dissociate-clone-issue/info/refs?service=git-upload-pack

This does not appear with 2.40+. I believe this is either a bug or a poorly handled scenario caused by the lazy retrieval of pack files but I can't say for sure. This is the second time I'm coming to the mailing list with a 'is this a bug?' type question -- it appears to me that it is, but our use case is fairly niche so I wasn't sure if we found something here. I've dived in a bit with the code and there's a lot of moving parts which I am not familiar with, so I may have missed something, but I think I've covered the main issues and I have supplied a recreation below.

The following gist contains a recreation including the reference repository state, the repository to clone, and a script for repeating the situation, and a selection from the log output of running with `--filter=blob:none` -- however it's probably easier for you to run the script yourself for the full output.
> https://gist.github.com/mattcree/b5fcd364c97219465f37b62598db36b0


             reply	other threads:[~2024-05-17 13:52 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-17 13:52 Matt Cree [this message]
2024-05-21  9:55 ` Issue with git log and reference repositories using --dissociate and --filter=blob:none Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3B4EFAA3-0EE1-4C08-ADCD-7B03E184B016@gearset.com \
    --to=matt.cree@gearset.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).