Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: blanet via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, blanet <bupt_xingxin@163.com>,
	Xing Xin <xingxin.xx@bytedance.com>
Subject: Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
Date: Fri, 17 May 2024 07:00:49 +0200	[thread overview]
Message-ID: <ZkbkgT46mdNEhcf0@tanuki> (raw)
In-Reply-To: <pull.1730.git.1715742069966.gitgitgadget@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4820 bytes --]

On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>

Long time no see :)

> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not being discovered, resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
> 
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
> 
>   git bundle create base.bundle main
> 
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
> 
>   git bundle create incr.bundle A..main
> 
> Now we can generate a bundle list with the following content:
> 
>   [bundle]
>       version = 1
>       mode = all
>       heuristic = creationToken
> 
>   [bundle "base"]
>       uri = base.bundle
>       creationToken = 1
> 
>   [bundle "incr"]
>       uri = incr.bundle
>       creationToken = 2
> 
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
> 
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
> 
> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>    check its prerequisites if any. The verify procedure would find oids
>    so `packed_git` is initialized.
> 
> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>    during which `mark_complete_and_common_ref` and `mark_tips` would
>    find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>    enlisted if `packed_git` has already initialized in 1.

And I assume we do not want it to not use `OBJECT_INFO_QUICK`?

> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
> can not find `B` for negotiation at a latter time bacause `B` exists in
> `incr.pack` which is not enlisted in `packed_git`.

Okay, the explanation feels sensible.

> This commit fixes this by adding a `reprepare_packed_git` call for every
> successfully unbundled bundle, which ensures to enlist all generated
> packs from bundle uri. And a set of negotiation related tests are added.

This makes me wonder though. Do we really need to call
`reprepare_packed_git()` once for every bundle, or can't we instead call
it once at the end once we have fetched all bundles? Reading on.

> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>     bundle-uri: refresh packed_git if unbundle succeed
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
> 
>  bundle-uri.c                |   3 +
>  t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 132 insertions(+)
> 
> diff --git a/bundle-uri.c b/bundle-uri.c
> index ca32050a78f..2b9d36cfd8e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -7,6 +7,7 @@
>  #include "refs.h"
>  #include "run-command.h"
>  #include "hashmap.h"
> +#include "packfile.h"
>  #include "pkt-line.h"
>  #include "config.h"
>  #include "remote.h"
> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  			       VERIFY_BUNDLE_QUIET)))
>  		return 1;
>  
> +	reprepare_packed_git(r);
> +

So what's hidden here is that `unbundle_from_file()` will also try to
access the bundle's refs right away. Surprisingly, we do so by calling
`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the
effect that we basically accept arbitrary object IDs here even if we do
not know those. That's why we didn't have to `reprepare_packed_git()`
before this change.

Now there are two conflicting thoughts here:

  - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
    should now be accessible.

  - Or we can avoid calling `reprepare_packed_git()` inside the loop and
    instead call it once after we have fetched all bundles.

The second one feels a bit like premature optimization to me. But the
first item does feel like it could help us to catch broken bundles
because we wouldn't end up creating refs for objects that neither we nor
the bundle have.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-05-17  5:00 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-15  3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
2024-05-17  5:00 ` Patrick Steinhardt [this message]
2024-05-17 16:14   ` Junio C Hamano
2024-05-20 11:48     ` Xing Xin
2024-05-20 17:19       ` Junio C Hamano
2024-05-27 16:04         ` Xing Xin
2024-05-20  9:41   ` Xing Xin
2024-05-17  7:36 ` Karthik Nayak
2024-05-20 10:19   ` Xing Xin
2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
2024-05-21 15:41   ` Karthik Nayak
2024-05-27 15:41   ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-27 15:41     ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-05-28 11:55       ` Patrick Steinhardt
2024-05-30  8:32         ` Xing Xin
2024-05-27 15:41     ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
2024-05-28 12:03       ` Patrick Steinhardt
2024-05-29 18:12         ` Xing Xin
2024-05-30  4:38           ` Patrick Steinhardt
2024-05-30  8:46             ` Xing Xin
2024-05-27 15:41     ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-05-28 12:03       ` Patrick Steinhardt
2024-05-28 17:10         ` Junio C Hamano
2024-05-28 17:24           ` Junio C Hamano
2024-05-29  5:52             ` Patrick Steinhardt
2024-05-30  8:48             ` Xing Xin
2024-05-29  5:52           ` Patrick Steinhardt
2024-05-27 15:41     ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
2024-05-28 12:05       ` Patrick Steinhardt
2024-05-30  8:54         ` Xing Xin
2024-05-30  8:21     ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-30  8:21       ` [PATCH v4 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-05-30  8:21       ` [PATCH v4 2/4] unbundle: extend verify_bundle_flags to support fsck-objects Xing Xin via GitGitGadget
2024-06-06 12:06         ` Patrick Steinhardt
2024-06-11  6:46           ` Xing Xin
2024-05-30  8:21       ` [PATCH v4 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-05-30  8:21       ` [PATCH v4 4/4] unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
2024-06-06 12:06         ` Patrick Steinhardt
2024-06-11  6:46           ` Xing Xin
2024-06-11  6:42       ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11  6:42         ` [PATCH v5 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-11  6:42         ` [PATCH v5 2/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-11  6:42         ` [PATCH v5 3/4] unbundle: extend options to support object verification Xing Xin via GitGitGadget
2024-06-11  9:11           ` Patrick Steinhardt
2024-06-11 12:47             ` Xing Xin
2024-06-11  6:42         ` [PATCH v5 4/4] unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches Xing Xin via GitGitGadget
2024-06-11 12:45         ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11 12:45           ` [PATCH v6 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-11 19:08             ` Junio C Hamano
2024-06-17 13:53               ` Xing Xin
2024-06-11 12:45           ` [PATCH v6 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-11 19:20             ` Junio C Hamano
2024-06-11 12:45           ` [PATCH v6 3/3] unbundle: support object verification for fetches Xing Xin via GitGitGadget
2024-06-11 20:05             ` Junio C Hamano
2024-06-12 18:33               ` Xing Xin
2024-06-11 13:14           ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches Patrick Steinhardt
2024-06-17 13:55           ` [PATCH v7 " blanet via GitGitGadget
2024-06-17 13:55             ` [PATCH v7 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-18 17:37               ` Junio C Hamano
2024-06-19  6:30                 ` Xing Xin
2024-06-17 13:55             ` [PATCH v7 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-17 13:55             ` [PATCH v7 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget
2024-06-19  4:07             ` [PATCH v8 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-19  4:07               ` [PATCH v8 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-19  4:07               ` [PATCH v8 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-19  4:07               ` [PATCH v8 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZkbkgT46mdNEhcf0@tanuki \
    --to=ps@pks.im \
    --cc=bupt_xingxin@163.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=xingxin.xx@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).