* [PATCH] bundle-uri: refresh packed_git if unbundle succeed
@ 2024-05-15 3:01 blanet via GitGitGadget
2024-05-17 5:00 ` Patrick Steinhardt
` (2 more replies)
0 siblings, 3 replies; 66+ messages in thread
From: blanet via GitGitGadget @ 2024-05-15 3:01 UTC (permalink / raw)
To: git; +Cc: blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not being discovered, resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.
And this is easy to reproduce. Suppose we have a repository with a
single branch `main` pointing to commit `A`, firstly we create a base
bundle with
git bundle create base.bundle main
Then let's add a new commit `B` on top of `A`, so that an incremental
bundle for `main` can be created with
git bundle create incr.bundle A..main
Now we can generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above would give the expected
`refs/bundles/main` pointing at `B` in new repository, in other words we
already had everything locally from the bundles, but git would still
download everything from server as if we got nothing.
So why the `refs/bundles/main` is not discovered? After some digging I
found that:
1. when unbundling a downloaded bundle, a `verify_bundle` is called to
check its prerequisites if any. The verify procedure would find oids
so `packed_git` is initialized.
2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
during which `mark_complete_and_common_ref` and `mark_tips` would
find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
enlisted if `packed_git` has already initialized in 1.
Back to the example above, when unbunding `incr.bundle`, `base.pack` is
enlisted to `packed_git` bacause of the prerequisites to verify. Then we
can not find `B` for negotiation at a latter time bacause `B` exists in
`incr.pack` which is not enlisted in `packed_git`.
This commit fixes this by adding a `reprepare_packed_git` call for every
successfully unbundled bundle, which ensures to enlist all generated
packs from bundle uri. And a set of negotiation related tests are added.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri: refresh packed_git if unbundle succeed
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1730
bundle-uri.c | 3 +
t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
2 files changed, 132 insertions(+)
diff --git a/bundle-uri.c b/bundle-uri.c
index ca32050a78f..2b9d36cfd8e 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -7,6 +7,7 @@
#include "refs.h"
#include "run-command.h"
#include "hashmap.h"
+#include "packfile.h"
#include "pkt-line.h"
#include "config.h"
#include "remote.h"
@@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
VERIFY_BUNDLE_QUIET)))
return 1;
+ reprepare_packed_git(r);
+
/*
* Convert all refs/heads/ from the bundle into refs/bundles/
* in the local repository.
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..a5b04d6f187 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
git init clone-from &&
git -C clone-from checkout -b topic &&
+
test_commit -C clone-from A &&
+ git -C clone-from bundle create A.bundle topic &&
+
test_commit -C clone-from B &&
git -C clone-from bundle create B.bundle topic
'
@@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
! grep "refs/bundles/" refs
'
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="clone-from/A.bundle" \
+ clone-from nego-bundle-part &&
+ git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
+ grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=topic --no-tags \
+ --bundle-uri="clone-from/B.bundle" \
+ clone-from nego-bundle-all &&
+ git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-no-heuristic &&
+
+ git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-heuristic &&
+
+ git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=left --no-tags \
+ --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-all &&
+
+ git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
#########################################################################
# HTTP tests begin here
base-commit: 83f1add914c6b4682de1e944ec0d1ac043d53d78
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
2024-05-15 3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
@ 2024-05-17 5:00 ` Patrick Steinhardt
2024-05-17 16:14 ` Junio C Hamano
2024-05-20 9:41 ` Xing Xin
2024-05-17 7:36 ` Karthik Nayak
2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
2 siblings, 2 replies; 66+ messages in thread
From: Patrick Steinhardt @ 2024-05-17 5:00 UTC (permalink / raw)
To: blanet via GitGitGadget; +Cc: git, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 4820 bytes --]
On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
Long time no see :)
> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not being discovered, resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
>
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
>
> git bundle create base.bundle main
>
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
>
> git bundle create incr.bundle A..main
>
> Now we can generate a bundle list with the following content:
>
> [bundle]
> version = 1
> mode = all
> heuristic = creationToken
>
> [bundle "base"]
> uri = base.bundle
> creationToken = 1
>
> [bundle "incr"]
> uri = incr.bundle
> creationToken = 2
>
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
>
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
>
> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
> check its prerequisites if any. The verify procedure would find oids
> so `packed_git` is initialized.
>
> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
> during which `mark_complete_and_common_ref` and `mark_tips` would
> find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
> enlisted if `packed_git` has already initialized in 1.
And I assume we do not want it to not use `OBJECT_INFO_QUICK`?
> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
> can not find `B` for negotiation at a latter time bacause `B` exists in
> `incr.pack` which is not enlisted in `packed_git`.
Okay, the explanation feels sensible.
> This commit fixes this by adding a `reprepare_packed_git` call for every
> successfully unbundled bundle, which ensures to enlist all generated
> packs from bundle uri. And a set of negotiation related tests are added.
This makes me wonder though. Do we really need to call
`reprepare_packed_git()` once for every bundle, or can't we instead call
it once at the end once we have fetched all bundles? Reading on.
> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
> bundle-uri: refresh packed_git if unbundle succeed
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
>
> bundle-uri.c | 3 +
> t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 132 insertions(+)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index ca32050a78f..2b9d36cfd8e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -7,6 +7,7 @@
> #include "refs.h"
> #include "run-command.h"
> #include "hashmap.h"
> +#include "packfile.h"
> #include "pkt-line.h"
> #include "config.h"
> #include "remote.h"
> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
> VERIFY_BUNDLE_QUIET)))
> return 1;
>
> + reprepare_packed_git(r);
> +
So what's hidden here is that `unbundle_from_file()` will also try to
access the bundle's refs right away. Surprisingly, we do so by calling
`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the
effect that we basically accept arbitrary object IDs here even if we do
not know those. That's why we didn't have to `reprepare_packed_git()`
before this change.
Now there are two conflicting thoughts here:
- Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
should now be accessible.
- Or we can avoid calling `reprepare_packed_git()` inside the loop and
instead call it once after we have fetched all bundles.
The second one feels a bit like premature optimization to me. But the
first item does feel like it could help us to catch broken bundles
because we wouldn't end up creating refs for objects that neither we nor
the bundle have.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
2024-05-15 3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
2024-05-17 5:00 ` Patrick Steinhardt
@ 2024-05-17 7:36 ` Karthik Nayak
2024-05-20 10:19 ` Xing Xin
2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
2 siblings, 1 reply; 66+ messages in thread
From: Karthik Nayak @ 2024-05-17 7:36 UTC (permalink / raw)
To: blanet via GitGitGadget, git; +Cc: blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 4706 bytes --]
"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Xing Xin <xingxin.xx@bytedance.com>>
> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not being discovered, resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
>
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
>
> git bundle create base.bundle main
>
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
>
> git bundle create incr.bundle A..main
>
> Now we can generate a bundle list with the following content:
>
> [bundle]
> version = 1
> mode = all
> heuristic = creationToken
>
> [bundle "base"]
> uri = base.bundle
> creationToken = 1
>
> [bundle "incr"]
> uri = incr.bundle
> creationToken = 2
>
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
>
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
>
> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
s/a//
> check its prerequisites if any. The verify procedure would find oids
> so `packed_git` is initialized.
>
So the flow is:
1. `fetch_bundle_list` fetches all the bundles advertised via
`download_bundle_list` to local files.
2. It then calls `unbundle_all_bundles` to unbundle all the bundles.
3. Each bundle is then unbundled using `unbundle_from_file`.
4. Here, we first read the bundle header to get all the prerequisites
for the bundle, this is done in `read_bundle_header`.
5. Then we call `unbundle`, which calls `verify_bundle` to ensure that
the repository does indeed contain the prerequisites mentioned in the
bundle. Then it creates the index from the bundle file.
So because the objects are being checked, the `prepare_packed_git`
function is eventually called, which means that the
`raw_object_store->packed_git` data gets filled in and
`packed_git_initialized` is set.
This means consecutive calls to `prepare_packed_git` doesn't
re-initiate `raw_object_store->packed_git` since
`packed_git_initialized` already is set.
So your explanation makes sense, as a _nit_ I would perhaps add the part
about why consecutive calls to `prepare_packed_git` are ineffective.
> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
s/unbundled/unbundling
> during which `mark_complete_and_common_ref` and `mark_tips` would
> find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
> enlisted if `packed_git` has already initialized in 1.
> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
> can not find `B` for negotiation at a latter time bacause `B` exists in
> `incr.pack` which is not enlisted in `packed_git`.
>
> This commit fixes this by adding a `reprepare_packed_git` call for every
> successfully unbundled bundle, which ensures to enlist all generated
> packs from bundle uri. And a set of negotiation related tests are added.
>
The solution makes sense.
> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
> bundle-uri: refresh packed_git if unbundle succeed
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
>
> bundle-uri.c | 3 +
> t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 132 insertions(+)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index ca32050a78f..2b9d36cfd8e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -7,6 +7,7 @@
> #include "refs.h"
> #include "run-command.h"
> #include "hashmap.h"
> +#include "packfile.h"
> #include "pkt-line.h"
> #include "config.h"
> #include "remote.h"
> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
> VERIFY_BUNDLE_QUIET)))
> return 1;
>
> + reprepare_packed_git(r);
> +
>
Would it make sense to move this to `bundle.c:unbundle()`, since that is
also where the idx is created?
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
2024-05-17 5:00 ` Patrick Steinhardt
@ 2024-05-17 16:14 ` Junio C Hamano
2024-05-20 11:48 ` Xing Xin
2024-05-20 9:41 ` Xing Xin
1 sibling, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2024-05-17 16:14 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: blanet via GitGitGadget, git, blanet, Xing Xin
Patrick Steinhardt <ps@pks.im> writes:
> Now there are two conflicting thoughts here:
>
> - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
> should now be accessible.
>
> - Or we can avoid calling `reprepare_packed_git()` inside the loop and
> instead call it once after we have fetched all bundles.
>
> The second one feels a bit like premature optimization to me. But the
> first item does feel like it could help us to catch broken bundles
> because we wouldn't end up creating refs for objects that neither we nor
> the bundle have.
I like the way your thoughts are structured around here.
I do agree that the latter is a wrong approach---we shouldn't be
trusting what came from elsewhere over the network without first
checking. We should probably be running the "index-pack --fix-thin"
the unbundling process runs with also the "--fsck-objects" option if
we are not doing so already, and even then, we should make sure that
the object we are making our ref point at have everything behind it.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
2024-05-17 5:00 ` Patrick Steinhardt
2024-05-17 16:14 ` Junio C Hamano
@ 2024-05-20 9:41 ` Xing Xin
1 sibling, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-05-20 9:41 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: blanet via GitGitGadget, git, Xing Xin
At 2024-05-17 13:00:49, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@bytedance.com>
>
>Long time no see :)
Glad to see you again here :)
>>
>> So why the `refs/bundles/main` is not discovered? After some digging I
>> found that:
>>
>> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>> check its prerequisites if any. The verify procedure would find oids
>> so `packed_git` is initialized.
>>
>> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>> during which `mark_complete_and_common_ref` and `mark_tips` would
>> find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>> enlisted if `packed_git` has already initialized in 1.
>
>And I assume we do not want it to not use `OBJECT_INFO_QUICK`?
I think so. For clones or fetches without using bundle-uri, we can hardly
encounter the case that new packs are added during the negotiation process.
So using `OBJECT_INFO_QUICK` can get some performance gain.
>
>> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
>> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
>> can not find `B` for negotiation at a latter time bacause `B` exists in
>> `incr.pack` which is not enlisted in `packed_git`.
>
>Okay, the explanation feels sensible.
>
>> This commit fixes this by adding a `reprepare_packed_git` call for every
>> successfully unbundled bundle, which ensures to enlist all generated
>> packs from bundle uri. And a set of negotiation related tests are added.
>
>This makes me wonder though. Do we really need to call
>`reprepare_packed_git()` once for every bundle, or can't we instead call
>it once at the end once we have fetched all bundles? Reading on.
>
>> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
>> ---
>> bundle-uri: refresh packed_git if unbundle succeed
>>
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
>> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
>>
>> bundle-uri.c | 3 +
>> t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>> 2 files changed, 132 insertions(+)
>>
>> diff --git a/bundle-uri.c b/bundle-uri.c
>> index ca32050a78f..2b9d36cfd8e 100644
>> --- a/bundle-uri.c
>> +++ b/bundle-uri.c
>> @@ -7,6 +7,7 @@
>> #include "refs.h"
>> #include "run-command.h"
>> #include "hashmap.h"
>> +#include "packfile.h"
>> #include "pkt-line.h"
>> #include "config.h"
>> #include "remote.h"
>> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>> VERIFY_BUNDLE_QUIET)))
>> return 1;
>>
>> + reprepare_packed_git(r);
>> +
>
>So what's hidden here is that `unbundle_from_file()` will also try to
>access the bundle's refs right away. Surprisingly, we do so by calling
>`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the
>effect that we basically accept arbitrary object IDs here even if we do
>not know those. That's why we didn't have to `reprepare_packed_git()`
>before this change.
You are right! I tried dropping this `REF_SKIP_OID_VERIFICATION` flag and
the negotiation works as expected.
After some further digging I find that without `REF_SKIP_OID_VERIFICATION`,
both `write_ref_to_lockfile` for files backend and `reftable_be_transaction_prepare`
for reftable backend would call `parse_object` to check the oid. `parse_object`
can help refresh `packed_git` via `reprepare_packed_git`.
>
>Now there are two conflicting thoughts here:
>
> - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
> should now be accessible.
>
> - Or we can avoid calling `reprepare_packed_git()` inside the loop and
> instead call it once after we have fetched all bundles.
>
>The second one feels a bit like premature optimization to me. But the
>first item does feel like it could help us to catch broken bundles
>because we wouldn't end up creating refs for objects that neither we nor
>the bundle have.
I favor the first approach because a validation on the object IDs we are
writing is a safe guard . And the flag itself was designed to be used in
testing scenarios.
/*
* Blindly write an object_id. This is useful for testing data corruption
* scenarios.
*/
#define REF_SKIP_OID_VERIFICATION (1 << 10)
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
2024-05-17 7:36 ` Karthik Nayak
@ 2024-05-20 10:19 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-05-20 10:19 UTC (permalink / raw)
To: Karthik Nayak; +Cc: blanet via GitGitGadget, git, Xing Xin
At 2024-05-17 15:36:53, "Karthik Nayak" <karthik.188@gmail.com> wrote:
>"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Xing Xin <xingxin.xx@bytedance.com>>
>> When using the bundle-uri mechanism with a bundle list containing
>> multiple interrelated bundles, we encountered a bug where tips from
>> downloaded bundles were not being discovered, resulting in rather slow
>> clones. This was particularly problematic when employing the heuristic
>> `creationTokens`.
>>
>> And this is easy to reproduce. Suppose we have a repository with a
>> single branch `main` pointing to commit `A`, firstly we create a base
>> bundle with
>>
>> git bundle create base.bundle main
>>
>> Then let's add a new commit `B` on top of `A`, so that an incremental
>> bundle for `main` can be created with
>>
>> git bundle create incr.bundle A..main
>>
>> Now we can generate a bundle list with the following content:
>>
>> [bundle]
>> version = 1
>> mode = all
>> heuristic = creationToken
>>
>> [bundle "base"]
>> uri = base.bundle
>> creationToken = 1
>>
>> [bundle "incr"]
>> uri = incr.bundle
>> creationToken = 2
>>
>> A fresh clone with the bundle list above would give the expected
>> `refs/bundles/main` pointing at `B` in new repository, in other words we
>> already had everything locally from the bundles, but git would still
>> download everything from server as if we got nothing.
>>
>> So why the `refs/bundles/main` is not discovered? After some digging I
>> found that:
>>
>> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>
>s/a//
Thanks!
>
>> check its prerequisites if any. The verify procedure would find oids
>> so `packed_git` is initialized.
>>
>
>So the flow is:
>1. `fetch_bundle_list` fetches all the bundles advertised via
>`download_bundle_list` to local files.
>2. It then calls `unbundle_all_bundles` to unbundle all the bundles.
>3. Each bundle is then unbundled using `unbundle_from_file`.
>4. Here, we first read the bundle header to get all the prerequisites
>for the bundle, this is done in `read_bundle_header`.
>5. Then we call `unbundle`, which calls `verify_bundle` to ensure that
>the repository does indeed contain the prerequisites mentioned in the
>bundle. Then it creates the index from the bundle file.
>
>So because the objects are being checked, the `prepare_packed_git`
>function is eventually called, which means that the
>`raw_object_store->packed_git` data gets filled in and
>`packed_git_initialized` is set.
>
>This means consecutive calls to `prepare_packed_git` doesn't
>re-initiate `raw_object_store->packed_git` since
>`packed_git_initialized` already is set.
>
>So your explanation makes sense, as a _nit_ I would perhaps add the part
>about why consecutive calls to `prepare_packed_git` are ineffective.
Thanks my friend, you have expressed this issue more clearly. I will
post a new description based on your explanation with the creationToken
case covered.
>
>> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>
>s/unbundled/unbundling
Copy that.
>
>> +#include "packfile.h"
>> #include "pkt-line.h"
>> #include "config.h"
>> #include "remote.h"
>> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>> VERIFY_BUNDLE_QUIET)))
>> return 1;
>>
>> + reprepare_packed_git(r);
>> +
>>
>
>Would it make sense to move this to `bundle.c:unbundle()`, since that is
>also where the idx is created?
>
I wonder if we need a mental model that we should `reprepare_packed_git`
that when a new pack and its corresponding idx is generated? Currently
whether to call `reprepare_packed_git` is determined by the caller.
But within the scope of this bug, I tend to remove the
`REF_SKIP_OID_VERIFICATION` flag when writing refs as Patrick suggested.
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
2024-05-17 16:14 ` Junio C Hamano
@ 2024-05-20 11:48 ` Xing Xin
2024-05-20 17:19 ` Junio C Hamano
0 siblings, 1 reply; 66+ messages in thread
From: Xing Xin @ 2024-05-20 11:48 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Patrick Steinhardt, blanet via GitGitGadget, git, Xing Xin
At 2024-05-18 00:14:47, "Junio C Hamano" <gitster@pobox.com> wrote:
>Patrick Steinhardt <ps@pks.im> writes:
>
>> Now there are two conflicting thoughts here:
>>
>> - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
>> should now be accessible.
>>
>> - Or we can avoid calling `reprepare_packed_git()` inside the loop and
>> instead call it once after we have fetched all bundles.
>>
>> The second one feels a bit like premature optimization to me. But the
>> first item does feel like it could help us to catch broken bundles
>> because we wouldn't end up creating refs for objects that neither we nor
>> the bundle have.
>
>I like the way your thoughts are structured around here.
>
>I do agree that the latter is a wrong approach---we shouldn't be
>trusting what came from elsewhere over the network without first
>checking. We should probably be running the "index-pack --fix-thin"
>the unbundling process runs with also the "--fsck-objects" option if
>we are not doing so already, and even then, we should make sure that
>the object we are making our ref point at have everything behind it.
Currently `unbundle` and all its callers are not adding "--fsck-objects".
There is a param `extra_index_pack_args` for `unbundle` but it is
mainly used for progress related options.
Personally I think data from bundles and data received via network
should be treated equally. For "fetch-pack" we now have some configs
such as "fetch.fsckobjects" and "transfer.fsckobjects" to decide the
behavior, these configs are invisible when we are fetching bundles.
So for bundles I think we have some different ways to go:
- Acknowledge the configs mentioned above and behave like
"fetch-pack".
- Add "--fsck-objects" as a default in `unbundle`.
- In `unbundle_from_file`, pass in "--fsck-objects" as
`extra_index_pack_args` for `unbundle` so this only affect the
bundle-uri related process.
What do you think?
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v2] bundle-uri: verify oid before writing refs
2024-05-15 3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
2024-05-17 5:00 ` Patrick Steinhardt
2024-05-17 7:36 ` Karthik Nayak
@ 2024-05-20 12:36 ` blanet via GitGitGadget
2024-05-21 15:41 ` Karthik Nayak
2024-05-27 15:41 ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2 siblings, 2 replies; 66+ messages in thread
From: blanet via GitGitGadget @ 2024-05-20 12:36 UTC (permalink / raw)
To: git; +Cc: blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.
And this is easy to reproduce. Suppose we have a repository with a
single branch `main` pointing to commit `A`, firstly we create a base
bundle with
git bundle create base.bundle main
Then let's add a new commit `B` on top of `A`, so that an incremental
bundle for `main` can be created with
git bundle create incr.bundle A..main
Now we can generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above would give the expected
`refs/bundles/main` pointing at `B` in new repository, in other words we
already had everything locally from the bundles, but git would still
download everything from server as if we got nothing.
So why the `refs/bundles/main` is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
`download_bundle_list` or via `fetch_bundles_by_token` for the
creationToken heuristic case.
2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
is called by `unbundle_all_bundles` or called within
`fetch_bundles_by_token` for the creationToken heuristic case.
3. Here, we first read the bundle header to get all the prerequisites
for the bundle, this is done in `read_bundle_header`.
4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
the repository does indeed contain the prerequisites mentioned in the
bundle.
5. The `verify_bundle` will call `parse_object`, within which the
`prepare_packed_git` or `reprepare_packed_git` is eventually called,
which means that the `raw_object_store->packed_git` data gets filled
in and ``packed_git_initialized` is set. This also means consecutive
calls to `prepare_packed_git` doesn't re-initiate
`raw_object_store->packed_git` since `packed_git_initialized` already
is set.
6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
target arbitrary objects are written to the repository.
7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
`mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
tips. Here it won't call `reprepare_packed_git` anymore so it would
fail to parse oids that only reside in the last bundle.
Back to the example above, when unbunding `incr.bundle`, `base.pack` is
enlisted to `packed_git` bacause of the prerequisites to verify. While
we can not find `B` for negotiation at a latter time because `B` exists
in `incr.pack` which is not enlisted in `packed_git`.
This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
flag when writing bundle refs, so we can:
1. Ensure that the bundle refs we are writing are pointing to valid
objects.
2. Ensure all the tips from bundle refs can be correctly parsed.
And a set of negotiation related tests for bundle-uri are added.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri: refresh packed_git if unbundle succeed
cc: Patrick Steinhardt ps@pks.im cc: Karthik Nayak karthik.188@gmail.com
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1730
Range-diff vs v1:
1: d4841eea556 ! 1: 8bdeacf1360 bundle-uri: refresh packed_git if unbundle succeed
@@ Metadata
Author: Xing Xin <xingxin.xx@bytedance.com>
## Commit message ##
- bundle-uri: refresh packed_git if unbundle succeed
+ bundle-uri: verify oid before writing refs
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
- downloaded bundles were not being discovered, resulting in rather slow
+ downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.
@@ Commit message
So why the `refs/bundles/main` is not discovered? After some digging I
found that:
- 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
- check its prerequisites if any. The verify procedure would find oids
- so `packed_git` is initialized.
-
- 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
- during which `mark_complete_and_common_ref` and `mark_tips` would
- find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
- enlisted if `packed_git` has already initialized in 1.
+ 1. Bundles in bundle list are downloaded to local files via
+ `download_bundle_list` or via `fetch_bundles_by_token` for the
+ creationToken heuristic case.
+ 2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
+ is called by `unbundle_all_bundles` or called within
+ `fetch_bundles_by_token` for the creationToken heuristic case.
+ 3. Here, we first read the bundle header to get all the prerequisites
+ for the bundle, this is done in `read_bundle_header`.
+ 4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
+ the repository does indeed contain the prerequisites mentioned in the
+ bundle.
+ 5. The `verify_bundle` will call `parse_object`, within which the
+ `prepare_packed_git` or `reprepare_packed_git` is eventually called,
+ which means that the `raw_object_store->packed_git` data gets filled
+ in and ``packed_git_initialized` is set. This also means consecutive
+ calls to `prepare_packed_git` doesn't re-initiate
+ `raw_object_store->packed_git` since `packed_git_initialized` already
+ is set.
+ 6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
+ with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
+ target arbitrary objects are written to the repository.
+ 7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
+ `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
+ tips. Here it won't call `reprepare_packed_git` anymore so it would
+ fail to parse oids that only reside in the last bundle.
Back to the example above, when unbunding `incr.bundle`, `base.pack` is
- enlisted to `packed_git` bacause of the prerequisites to verify. Then we
- can not find `B` for negotiation at a latter time bacause `B` exists in
- `incr.pack` which is not enlisted in `packed_git`.
+ enlisted to `packed_git` bacause of the prerequisites to verify. While
+ we can not find `B` for negotiation at a latter time because `B` exists
+ in `incr.pack` which is not enlisted in `packed_git`.
+
+ This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
+ flag when writing bundle refs, so we can:
- This commit fixes this by adding a `reprepare_packed_git` call for every
- successfully unbundled bundle, which ensures to enlist all generated
- packs from bundle uri. And a set of negotiation related tests are added.
+ 1. Ensure that the bundle refs we are writing are pointing to valid
+ objects.
+ 2. Ensure all the tips from bundle refs can be correctly parsed.
+
+ And a set of negotiation related tests for bundle-uri are added.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
## bundle-uri.c ##
-@@
- #include "refs.h"
- #include "run-command.h"
- #include "hashmap.h"
-+#include "packfile.h"
- #include "pkt-line.h"
- #include "config.h"
- #include "remote.h"
@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
- VERIFY_BUNDLE_QUIET)))
- return 1;
+ refs_update_ref(get_main_ref_store(the_repository),
+ "fetched bundle", bundle_ref.buf, oid,
+ has_old ? &old_oid : NULL,
+- REF_SKIP_OID_VERIFICATION,
+- UPDATE_REFS_MSG_ON_ERR);
++ 0, UPDATE_REFS_MSG_ON_ERR);
+ }
-+ reprepare_packed_git(r);
-+
- /*
- * Convert all refs/heads/ from the bundle into refs/bundles/
- * in the local repository.
+ bundle_header_release(&header);
## t/t5558-clone-bundle-uri.sh ##
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle file' '
bundle-uri.c | 3 +-
t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
2 files changed, 130 insertions(+), 2 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
refs_update_ref(get_main_ref_store(the_repository),
"fetched bundle", bundle_ref.buf, oid,
has_old ? &old_oid : NULL,
- REF_SKIP_OID_VERIFICATION,
- UPDATE_REFS_MSG_ON_ERR);
+ 0, UPDATE_REFS_MSG_ON_ERR);
}
bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..a5b04d6f187 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
git init clone-from &&
git -C clone-from checkout -b topic &&
+
test_commit -C clone-from A &&
+ git -C clone-from bundle create A.bundle topic &&
+
test_commit -C clone-from B &&
git -C clone-from bundle create B.bundle topic
'
@@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
! grep "refs/bundles/" refs
'
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="clone-from/A.bundle" \
+ clone-from nego-bundle-part &&
+ git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
+ grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=topic --no-tags \
+ --bundle-uri="clone-from/B.bundle" \
+ clone-from nego-bundle-all &&
+ git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-no-heuristic &&
+
+ git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-heuristic &&
+
+ git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=left --no-tags \
+ --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-all &&
+
+ git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
#########################################################################
# HTTP tests begin here
base-commit: d8ab1d464d07baa30e5a180eb33b3f9aa5c93adf
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
2024-05-20 11:48 ` Xing Xin
@ 2024-05-20 17:19 ` Junio C Hamano
2024-05-27 16:04 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2024-05-20 17:19 UTC (permalink / raw)
To: Xing Xin; +Cc: Patrick Steinhardt, blanet via GitGitGadget, git, Xing Xin
"Xing Xin" <bupt_xingxin@163.com> writes:
> Personally I think data from bundles and data received via network
> should be treated equally.
Yup, that is not personal ;-) but is universally accepted as a good
discipline. In the case of bundle-uri, the bundle came over the
network so it is even more true that they should be treated the
same.
> For "fetch-pack" we now have some configs
> such as "fetch.fsckobjects" and "transfer.fsckobjects" to decide the
> behavior, these configs are invisible when we are fetching bundles.
When fetching over network, transport.c:fetch_refs_via_pack() calls
fetch_pack.c:fetch_pack(), which eventually calls get_pack() and the
configuration variables are honored there. It appears that the
transport layer is unaware of the .fsckobjects configuration knobs.
When fetching from a bundle, transport.c:fetch_refs_from_bundle()
calls bundle.c:unbundle(). This function has three callers, i.e.
"git bundle unbundle", normal fetching from a bundle, and more
recently added bundle-uri codepaths.
I think one reasonable approach to take is to add an extra parameter
that takes one of three values: (never, use-config, always), and
conditionally add "--fsck-objects" to the command line of the
index-pack. Teach "git bundle unbundle" the "--fsck-objects" option
so that it can pass 'never' or 'always' from the command line, and
pass 'use-config' from the code paths for normal fetching from a
budnle and bundle-uri.
To implement use-config, you'd probably need to refactor a small
part of fetch-pack.c:get_pack()
if (fetch_fsck_objects >= 0
? fetch_fsck_objects
: transfer_fsck_objects >= 0
? transfer_fsck_objects
: 0)
fsck_objects = 1;
into a public function (to support a caller like unbundle() that
comes from sideways, the new function may also need to call
fetch_pack_setup() to prime them).
A patch series may take a structure like so:
* define enum { UNBUNDLE_FSCK_NEVER, UNBUNDLE_FSCK_ALWAYS } in
bundle.h, have bundle.c:unbundle() accept a new parameter of that
type, and conditionally add "--fsck-objects" to its call to
"index-pack". "git bundle unbundle" can pass 'never' to its
invocation to unbundle() as an easy way to test it. For the
other two callers, we can start by passing 'always'.
* (optional) teach "git bundle unbundle" a new "--fsck-objects"
option to allow passing 'always' to its call to unbundle(). With
that, add tests to feed it a bundle with questionable objects in
it and make sure that unbundling notices.
* refactor fetch-pack.c:get_pack() to make the fetch-then-transfer
configuration logic available to external callers.
* Add UNBUNDLE_FSCK_USE_CONFIG to the enum, enhance unbundle() to
react to the value by calling the helper function you introduced
in the previous step.
Thanks.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] bundle-uri: verify oid before writing refs
2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
@ 2024-05-21 15:41 ` Karthik Nayak
2024-05-27 15:41 ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
1 sibling, 0 replies; 66+ messages in thread
From: Karthik Nayak @ 2024-05-21 15:41 UTC (permalink / raw)
To: blanet via GitGitGadget, git; +Cc: blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 8781 bytes --]
"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Xing Xin <xingxin.xx@bytedance.com>
>
> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not discovered, thus resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
>
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
>
> git bundle create base.bundle main
>
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
>
> git bundle create incr.bundle A..main
>
> Now we can generate a bundle list with the following content:
>
> [bundle]
> version = 1
> mode = all
> heuristic = creationToken
>
> [bundle "base"]
> uri = base.bundle
> creationToken = 1
>
> [bundle "incr"]
> uri = incr.bundle
> creationToken = 2
>
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
>
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
>
> 1. Bundles in bundle list are downloaded to local files via
> `download_bundle_list` or via `fetch_bundles_by_token` for the
> creationToken heuristic case.
> 2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
> is called by `unbundle_all_bundles` or called within
> `fetch_bundles_by_token` for the creationToken heuristic case.
> 3. Here, we first read the bundle header to get all the prerequisites
> for the bundle, this is done in `read_bundle_header`.
> 4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
> the repository does indeed contain the prerequisites mentioned in the
> bundle.
> 5. The `verify_bundle` will call `parse_object`, within which the
> `prepare_packed_git` or `reprepare_packed_git` is eventually called,
> which means that the `raw_object_store->packed_git` data gets filled
> in and ``packed_git_initialized` is set. This also means consecutive
> calls to `prepare_packed_git` doesn't re-initiate
> `raw_object_store->packed_git` since `packed_git_initialized` already
> is set.
> 6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
> with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
> target arbitrary objects are written to the repository.
> 7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
> `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
> tips. Here it won't call `reprepare_packed_git` anymore so it would
> fail to parse oids that only reside in the last bundle.
>
> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. While
s/bacause/because
[snip]
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 91b3319a5c1..65666a11d9c 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
> refs_update_ref(get_main_ref_store(the_repository),
> "fetched bundle", bundle_ref.buf, oid,
> has_old ? &old_oid : NULL,
> - REF_SKIP_OID_VERIFICATION,
> - UPDATE_REFS_MSG_ON_ERR);
> + 0, UPDATE_REFS_MSG_ON_ERR);
> }
>
> bundle_header_release(&header);
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index 1ca5f745e73..a5b04d6f187 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
> test_expect_success 'create bundle' '
> git init clone-from &&
> git -C clone-from checkout -b topic &&
> +
> test_commit -C clone-from A &&
> + git -C clone-from bundle create A.bundle topic &&
> +
> test_commit -C clone-from B &&
> git -C clone-from bundle create B.bundle topic
> '
> @@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
> ! grep "refs/bundles/" refs
> '
>
> +#########################################################################
> +# Clone negotiation related tests begin here
> +
> +test_expect_success 'negotiation: bundle with part of wanted commits' '
> + test_when_finished rm -rf trace*.txt &&
> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> + git clone --no-local --bundle-uri="clone-from/A.bundle" \
> + clone-from nego-bundle-part &&
> + git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
> + grep "refs/bundles/" refs >actual &&
> + cat >expect <<-\EOF &&
> + refs/bundles/topic
> + EOF
> + test_cmp expect actual &&
> + # Ensure that refs/bundles/topic are sent as "have".
> + grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle with all wanted commits' '
> + test_when_finished rm -rf trace*.txt &&
> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> + git clone --no-local --single-branch --branch=topic --no-tags \
> + --bundle-uri="clone-from/B.bundle" \
> + clone-from nego-bundle-all &&
> + git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
> + grep "refs/bundles/" refs >actual &&
> + cat >expect <<-\EOF &&
> + refs/bundles/topic
> + EOF
> + test_cmp expect actual &&
> + # We already have all needed commits so no "want" needed.
> + ! grep "clone> want " trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list (no heuristic)' '
> + test_when_finished rm -f trace*.txt &&
> + cat >bundle-list <<-EOF &&
> + [bundle]
> + version = 1
> + mode = all
> +
> + [bundle "bundle-1"]
> + uri = file://$(pwd)/clone-from/bundle-1.bundle
> +
> + [bundle "bundle-2"]
> + uri = file://$(pwd)/clone-from/bundle-2.bundle
> + EOF
> +
> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> + git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
> + clone-from nego-bundle-list-no-heuristic &&
> +
> + git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
> + grep "refs/bundles/" refs >actual &&
> + cat >expect <<-\EOF &&
> + refs/bundles/base
> + refs/bundles/left
> + EOF
> + test_cmp expect actual &&
> + grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list (creationToken)' '
> + test_when_finished rm -f trace*.txt &&
> + cat >bundle-list <<-EOF &&
> + [bundle]
> + version = 1
> + mode = all
> + heuristic = creationToken
> +
> + [bundle "bundle-1"]
> + uri = file://$(pwd)/clone-from/bundle-1.bundle
> + creationToken = 1
> +
> + [bundle "bundle-2"]
> + uri = file://$(pwd)/clone-from/bundle-2.bundle
> + creationToken = 2
> + EOF
> +
> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> + git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
> + clone-from nego-bundle-list-heuristic &&
> +
> + git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
> + grep "refs/bundles/" refs >actual &&
> + cat >expect <<-\EOF &&
> + refs/bundles/base
> + refs/bundles/left
> + EOF
> + test_cmp expect actual &&
> + grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list with all wanted commits' '
> + test_when_finished rm -f trace*.txt &&
> + cat >bundle-list <<-EOF &&
> + [bundle]
> + version = 1
> + mode = all
> + heuristic = creationToken
> +
> + [bundle "bundle-1"]
> + uri = file://$(pwd)/clone-from/bundle-1.bundle
> + creationToken = 1
> +
> + [bundle "bundle-2"]
> + uri = file://$(pwd)/clone-from/bundle-2.bundle
> + creationToken = 2
> + EOF
> +
> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> + git clone --no-local --single-branch --branch=left --no-tags \
> + --bundle-uri="file://$(pwd)/bundle-list" \
> + clone-from nego-bundle-list-all &&
> +
> + git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
> + grep "refs/bundles/" refs >actual &&
> + cat >expect <<-\EOF &&
> + refs/bundles/base
> + refs/bundles/left
> + EOF
> + test_cmp expect actual &&
> + # We already have all needed commits so no "want" needed.
> + ! grep "clone> want " trace-packet.txt
> +'
> +
> #########################################################################
> # HTTP tests begin here
>
>
> base-commit: d8ab1d464d07baa30e5a180eb33b3f9aa5c93adf
> --
> gitgitgadget
This update looks good to me, Thanks.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches
2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
2024-05-21 15:41 ` Karthik Nayak
@ 2024-05-27 15:41 ` blanet via GitGitGadget
2024-05-27 15:41 ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
` (4 more replies)
1 sibling, 5 replies; 66+ messages in thread
From: blanet via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet
While attempting to fix a reference negotiation bug in bundle-uri, we
discovered that the fetch process are missing some helpful object validation
logic when processing bundles. The main issues are:
* In the bundle-uri scenario, we did not validate object IDs before writing
bundle references. This is the root cause of the original negotiation bug
in bundle-uri, and can cause potential repository corruption.
* The existing fetch.fsckObjects and transfer.fsckObjects are not detected
when directly fetching bundles. In fact there is no object validation
support for unbundle.
The first patch fixes the bundle-uri negotiation issue by dropping the
REF_SKIP_OID_VERIFICATION flag when writing bundle references.
Patch 2~4 extend bundle.c:unbundle with a unbundle_fsck_flags to control
object fscking in different scenarios, the implementation mainly follows
what Junio suggested on the mailing list.
Xing Xin (4):
bundle-uri: verify oid before writing refs
unbundle: introduce unbundle_fsck_flags for fsckobjects handling
fetch-pack: expose fsckObjects configuration logic
unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH
builtin/bundle.c | 2 +-
bundle-uri.c | 5 +-
bundle.c | 20 ++++-
bundle.h | 9 +-
fetch-pack.c | 18 ++--
fetch-pack.h | 2 +
t/t5558-clone-bundle-uri.sh | 163 +++++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 23 +++++
transport.c | 2 +-
9 files changed, 227 insertions(+), 17 deletions(-)
base-commit: b9cfe4845cb2562584837bc0101c0ab76490a239
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1730
Range-diff vs v2:
1: 8bdeacf1360 = 1: 8f488a5eeaa bundle-uri: verify oid before writing refs
-: ----------- > 2: 057c697970f unbundle: introduce unbundle_fsck_flags for fsckobjects handling
-: ----------- > 3: 67401d4fbcb fetch-pack: expose fsckObjects configuration logic
-: ----------- > 4: c19b8f633cb unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH
--
gitgitgadget
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v3 1/4] bundle-uri: verify oid before writing refs
2024-05-27 15:41 ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
@ 2024-05-27 15:41 ` Xing Xin via GitGitGadget
2024-05-28 11:55 ` Patrick Steinhardt
2024-05-27 15:41 ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
` (3 subsequent siblings)
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.
And this is easy to reproduce. Suppose we have a repository with a
single branch `main` pointing to commit `A`, firstly we create a base
bundle with
git bundle create base.bundle main
Then let's add a new commit `B` on top of `A`, so that an incremental
bundle for `main` can be created with
git bundle create incr.bundle A..main
Now we can generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above would give the expected
`refs/bundles/main` pointing at `B` in new repository, in other words we
already had everything locally from the bundles, but git would still
download everything from server as if we got nothing.
So why the `refs/bundles/main` is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
`download_bundle_list` or via `fetch_bundles_by_token` for the
creationToken heuristic case.
2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
is called by `unbundle_all_bundles` or called within
`fetch_bundles_by_token` for the creationToken heuristic case.
3. Here, we first read the bundle header to get all the prerequisites
for the bundle, this is done in `read_bundle_header`.
4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
the repository does indeed contain the prerequisites mentioned in the
bundle.
5. The `verify_bundle` will call `parse_object`, within which the
`prepare_packed_git` or `reprepare_packed_git` is eventually called,
which means that the `raw_object_store->packed_git` data gets filled
in and ``packed_git_initialized` is set. This also means consecutive
calls to `prepare_packed_git` doesn't re-initiate
`raw_object_store->packed_git` since `packed_git_initialized` already
is set.
6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
target arbitrary objects are written to the repository.
7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
`mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
tips. Here it won't call `reprepare_packed_git` anymore so it would
fail to parse oids that only reside in the last bundle.
Back to the example above, when unbunding `incr.bundle`, `base.pack` is
enlisted to `packed_git` bacause of the prerequisites to verify. While
we can not find `B` for negotiation at a latter time because `B` exists
in `incr.pack` which is not enlisted in `packed_git`.
This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
flag when writing bundle refs, so we can:
1. Ensure that the bundle refs we are writing are pointing to valid
objects.
2. Ensure all the tips from bundle refs can be correctly parsed.
And a set of negotiation related tests for bundle-uri are added.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 3 +-
t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
2 files changed, 130 insertions(+), 2 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
refs_update_ref(get_main_ref_store(the_repository),
"fetched bundle", bundle_ref.buf, oid,
has_old ? &old_oid : NULL,
- REF_SKIP_OID_VERIFICATION,
- UPDATE_REFS_MSG_ON_ERR);
+ 0, UPDATE_REFS_MSG_ON_ERR);
}
bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..a5b04d6f187 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
git init clone-from &&
git -C clone-from checkout -b topic &&
+
test_commit -C clone-from A &&
+ git -C clone-from bundle create A.bundle topic &&
+
test_commit -C clone-from B &&
git -C clone-from bundle create B.bundle topic
'
@@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
! grep "refs/bundles/" refs
'
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="clone-from/A.bundle" \
+ clone-from nego-bundle-part &&
+ git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
+ grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=topic --no-tags \
+ --bundle-uri="clone-from/B.bundle" \
+ clone-from nego-bundle-all &&
+ git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-no-heuristic &&
+
+ git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-heuristic &&
+
+ git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=left --no-tags \
+ --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-all &&
+
+ git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
#########################################################################
# HTTP tests begin here
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling
2024-05-27 15:41 ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-27 15:41 ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-05-27 15:41 ` Xing Xin via GitGitGadget
2024-05-28 12:03 ` Patrick Steinhardt
2024-05-27 15:41 ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
` (2 subsequent siblings)
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
This commit adds a new enum `unbundle_fsck_flags` which is designed to
control the fsck behavior when unbundling. `unbundle` can use this newly
passed in enum to further decide whether to enable `--fsck-objects` for
"git-index-pack".
Currently only `UNBUNDLE_FSCK_NEVER` and `UNBUNDLE_FSCK_ALWAYS` are
supported as the very basic options. Another interesting option would be
added in later commits.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
builtin/bundle.c | 2 +-
bundle-uri.c | 2 +-
bundle.c | 12 +++++++++++-
bundle.h | 8 +++++++-
transport.c | 2 +-
5 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/builtin/bundle.c b/builtin/bundle.c
index 3ad11dc5d05..6c10961c640 100644
--- a/builtin/bundle.c
+++ b/builtin/bundle.c
@@ -212,7 +212,7 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
_("Unbundling objects"), NULL);
ret = !!unbundle(the_repository, &header, bundle_fd,
- &extra_index_pack_args, 0) ||
+ &extra_index_pack_args, 0, UNBUNDLE_FSCK_NEVER) ||
list_bundle_refs(&header, argc, argv);
bundle_header_release(&header);
cleanup:
diff --git a/bundle-uri.c b/bundle-uri.c
index 65666a11d9c..80f02aac6f1 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET)))
+ VERIFY_BUNDLE_QUIET, UNBUNDLE_FSCK_ALWAYS)))
return 1;
/*
diff --git a/bundle.c b/bundle.c
index 95367c2d0a0..a922d592782 100644
--- a/bundle.c
+++ b/bundle.c
@@ -612,7 +612,8 @@ int create_bundle(struct repository *r, const char *path,
int unbundle(struct repository *r, struct bundle_header *header,
int bundle_fd, struct strvec *extra_index_pack_args,
- enum verify_bundle_flags flags)
+ enum verify_bundle_flags flags,
+ enum unbundle_fsck_flags fsck_flags)
{
struct child_process ip = CHILD_PROCESS_INIT;
@@ -625,6 +626,15 @@ int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
+ switch (fsck_flags) {
+ case UNBUNDLE_FSCK_ALWAYS:
+ strvec_push(&ip.args, "--fsck-objects");
+ break;
+ case UNBUNDLE_FSCK_NEVER:
+ default:
+ break;
+ }
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index 021adbdcbb3..cfa9daddda6 100644
--- a/bundle.h
+++ b/bundle.h
@@ -30,6 +30,11 @@ int create_bundle(struct repository *r, const char *path,
int argc, const char **argv, struct strvec *pack_options,
int version);
+enum unbundle_fsck_flags {
+ UNBUNDLE_FSCK_NEVER = 0,
+ UNBUNDLE_FSCK_ALWAYS,
+};
+
enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
@@ -53,7 +58,8 @@ int verify_bundle(struct repository *r, struct bundle_header *header,
*/
int unbundle(struct repository *r, struct bundle_header *header,
int bundle_fd, struct strvec *extra_index_pack_args,
- enum verify_bundle_flags flags);
+ enum verify_bundle_flags flags,
+ enum unbundle_fsck_flags fsck_flags);
int list_bundle_refs(struct bundle_header *header,
int argc, const char **argv);
diff --git a/transport.c b/transport.c
index 0ad04b77fd2..6799988f10c 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, 0);
+ &extra_index_pack_args, 0, UNBUNDLE_FSCK_ALWAYS);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
2024-05-27 15:41 ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-27 15:41 ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-05-27 15:41 ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
@ 2024-05-27 15:41 ` Xing Xin via GitGitGadget
2024-05-28 12:03 ` Patrick Steinhardt
2024-05-27 15:41 ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
Currently we can use "transfer.fsckObjects" or "fetch.fsckObjects" to
control whether to enable checks for broken objects during fetching. But
these configs are only acknowledged by `fetch-pack.c:get_pack` and do
not make sense when fetching from bundles or using bundle-uris.
This commit exposed the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. In next
commit, this new function will be used by `unbundle` in fetching
scenarios.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
fetch-pack.c | 18 ++++++++++++------
fetch-pack.h | 2 ++
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 7d2aef21add..81a64be6951 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
strvec_push(&cmd.args, alternate_shallow_file);
}
- if (fetch_fsck_objects >= 0
- ? fetch_fsck_objects
- : transfer_fsck_objects >= 0
- ? transfer_fsck_objects
- : 0)
- fsck_objects = 1;
+ fsck_objects = fetch_pack_fsck_objects();
if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
if (pack_lockfiles || fsck_objects)
@@ -2046,6 +2041,17 @@ static const struct object_id *iterate_ref_map(void *cb_data)
return &ref->old_oid;
}
+int fetch_pack_fsck_objects(void)
+{
+ fetch_pack_setup();
+
+ return fetch_fsck_objects >= 0
+ ? fetch_fsck_objects
+ : transfer_fsck_objects >= 0
+ ? transfer_fsck_objects
+ : 0;
+}
+
struct ref *fetch_pack(struct fetch_pack_args *args,
int fd[],
const struct ref *ref,
diff --git a/fetch-pack.h b/fetch-pack.h
index 6775d265175..38956d9b748 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -101,4 +101,6 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
*/
int report_unmatched_refs(struct ref **sought, int nr_sought);
+int fetch_pack_fsck_objects(void);
+
#endif
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH
2024-05-27 15:41 ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
` (2 preceding siblings ...)
2024-05-27 15:41 ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-05-27 15:41 ` Xing Xin via GitGitGadget
2024-05-28 12:05 ` Patrick Steinhardt
2024-05-30 8:21 ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
This commit adds a new option `UNBUNDLE_FSCK_FOLLOW_FETCH` to
`unbundle_fsck_flags`, this new flag is currently used in the _fetch_
process by
- `transport.c:fetch_refs_from_bundle` for fetching directly from a
bundle.
- `bundle-uri.c:unbundle_from_file` for unbundling bundles downloaded
from bundle-uri.
So we now have a relatively consistent logic for checking objects during
fetching. Add tests for the above two situations are added.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 2 +-
bundle.c | 10 +++++++++-
bundle.h | 1 +
t/t5558-clone-bundle-uri.sh | 36 +++++++++++++++++++++++++++++++-----
t/t5607-clone-bundle.sh | 23 +++++++++++++++++++++++
transport.c | 2 +-
6 files changed, 66 insertions(+), 8 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 80f02aac6f1..0da3e5a61b9 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET, UNBUNDLE_FSCK_ALWAYS)))
+ VERIFY_BUNDLE_QUIET, UNBUNDLE_FSCK_FOLLOW_FETCH)))
return 1;
/*
diff --git a/bundle.c b/bundle.c
index a922d592782..c7344543aa4 100644
--- a/bundle.c
+++ b/bundle.c
@@ -17,6 +17,7 @@
#include "list-objects-filter-options.h"
#include "connected.h"
#include "write-or-die.h"
+#include "fetch-pack.h"
static const char v2_bundle_signature[] = "# v2 git bundle\n";
static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -616,6 +617,7 @@ int unbundle(struct repository *r, struct bundle_header *header,
enum unbundle_fsck_flags fsck_flags)
{
struct child_process ip = CHILD_PROCESS_INIT;
+ int fsck_objects = 0;
if (verify_bundle(r, header, flags))
return -1;
@@ -628,13 +630,19 @@ int unbundle(struct repository *r, struct bundle_header *header,
switch (fsck_flags) {
case UNBUNDLE_FSCK_ALWAYS:
- strvec_push(&ip.args, "--fsck-objects");
+ fsck_objects = 1;
+ break;
+ case UNBUNDLE_FSCK_FOLLOW_FETCH:
+ fsck_objects = fetch_pack_fsck_objects();
break;
case UNBUNDLE_FSCK_NEVER:
default:
break;
}
+ if (fsck_objects)
+ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index cfa9daddda6..c46488422ce 100644
--- a/bundle.h
+++ b/bundle.h
@@ -33,6 +33,7 @@ int create_bundle(struct repository *r, const char *path,
enum unbundle_fsck_flags {
UNBUNDLE_FSCK_NEVER = 0,
UNBUNDLE_FSCK_ALWAYS,
+ UNBUNDLE_FSCK_FOLLOW_FETCH,
};
enum verify_bundle_flags {
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index a5b04d6f187..3df4d44e78f 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -19,13 +19,30 @@ test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
git init clone-from &&
- git -C clone-from checkout -b topic &&
+ (
+ cd clone-from &&
+ git checkout -b topic &&
+
+ test_commit A &&
+ git bundle create A.bundle topic &&
+
+ test_commit B &&
+ git bundle create B.bundle topic &&
+
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
- test_commit -C clone-from A &&
- git -C clone-from bundle create A.bundle topic &&
+ commit: this is a commit with bad emails
- test_commit -C clone-from B &&
- git -C clone-from bundle create B.bundle topic
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad.bundle bad &&
+ git update-ref -d refs/heads/bad
+ )
'
test_expect_success 'clone with path bundle' '
@@ -36,6 +53,15 @@ test_expect_success 'clone with path bundle' '
test_cmp expect actual
'
+test_expect_success 'clone with bad bundle' '
+ git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad.bundle" \
+ clone-from clone-bad 2>err &&
+ # Unbundle fails, but clone can still proceed.
+ test_grep "missingEmail" err &&
+ git -C clone-bad for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
index 0d1e92d9963..423b35ac237 100755
--- a/t/t5607-clone-bundle.sh
+++ b/t/t5607-clone-bundle.sh
@@ -138,6 +138,29 @@ test_expect_success 'fetch SHA-1 from bundle' '
git fetch --no-tags foo/tip.bundle "$(cat hash)"
'
+test_expect_success 'clone bundle with fetch.fsckObjects' '
+ test_create_repo bundle-fsck &&
+ (
+ cd bundle-fsck &&
+ test_commit first &&
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad.bundle bad
+ ) &&
+ test_must_fail git -c fetch.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fsck-clone 2>err &&
+ test_grep "missingEmail" err
+'
+
test_expect_success 'git bundle uses expected default format' '
git bundle create bundle HEAD^.. &&
cat >expect <<-EOF &&
diff --git a/transport.c b/transport.c
index 6799988f10c..a140d4b03c0 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, 0, UNBUNDLE_FSCK_ALWAYS);
+ &extra_index_pack_args, 0, UNBUNDLE_FSCK_FOLLOW_FETCH);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
2024-05-20 17:19 ` Junio C Hamano
@ 2024-05-27 16:04 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-05-27 16:04 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Patrick Steinhardt, blanet via GitGitGadget, git, Xing Xin
At 2024-05-21 01:19:02, "Junio C Hamano" <gitster@pobox.com> wrote:
>"Xing Xin" <bupt_xingxin@163.com> writes:
>
>> Personally I think data from bundles and data received via network
>> should be treated equally.
>
>Yup, that is not personal ;-) but is universally accepted as a good
>discipline. In the case of bundle-uri, the bundle came over the
>network so it is even more true that they should be treated the
>same.
>
>> For "fetch-pack" we now have some configs
>> such as "fetch.fsckobjects" and "transfer.fsckobjects" to decide the
>> behavior, these configs are invisible when we are fetching bundles.
>
>When fetching over network, transport.c:fetch_refs_via_pack() calls
>fetch_pack.c:fetch_pack(), which eventually calls get_pack() and the
>configuration variables are honored there. It appears that the
>transport layer is unaware of the .fsckobjects configuration knobs.
>
>When fetching from a bundle, transport.c:fetch_refs_from_bundle()
>calls bundle.c:unbundle(). This function has three callers, i.e.
>"git bundle unbundle", normal fetching from a bundle, and more
>recently added bundle-uri codepaths.
>
>I think one reasonable approach to take is to add an extra parameter
>that takes one of three values: (never, use-config, always), and
>conditionally add "--fsck-objects" to the command line of the
>index-pack. Teach "git bundle unbundle" the "--fsck-objects" option
>so that it can pass 'never' or 'always' from the command line, and
>pass 'use-config' from the code paths for normal fetching from a
>budnle and bundle-uri.
>
>To implement use-config, you'd probably need to refactor a small
>part of fetch-pack.c:get_pack()
>
> if (fetch_fsck_objects >= 0
> ? fetch_fsck_objects
> : transfer_fsck_objects >= 0
> ? transfer_fsck_objects
> : 0)
> fsck_objects = 1;
>
>into a public function (to support a caller like unbundle() that
>comes from sideways, the new function may also need to call
>fetch_pack_setup() to prime them).
>
>A patch series may take a structure like so:
>
> * define enum { UNBUNDLE_FSCK_NEVER, UNBUNDLE_FSCK_ALWAYS } in
> bundle.h, have bundle.c:unbundle() accept a new parameter of that
> type, and conditionally add "--fsck-objects" to its call to
> "index-pack". "git bundle unbundle" can pass 'never' to its
> invocation to unbundle() as an easy way to test it. For the
> other two callers, we can start by passing 'always'.
>
> * (optional) teach "git bundle unbundle" a new "--fsck-objects"
> option to allow passing 'always' to its call to unbundle(). With
> that, add tests to feed it a bundle with questionable objects in
> it and make sure that unbundling notices.
I just submitted a new series mainly focusing on the unbundle handling during
fetches. I would like to submit a new one for teaching "git bundle unbundle" a
"--fsck-objects" option after this to make changes more targeted.
> * refactor fetch-pack.c:get_pack() to make the fetch-then-transfer
> configuration logic available to external callers.
>
> * Add UNBUNDLE_FSCK_USE_CONFIG to the enum, enhance unbundle() to
I tend to use `UNBUNDLE_FSCK_FOLLOW_FETCH` because this option is only
used in fetches, though the current implementation is indeed reading configs.
> react to the value by calling the helper function you introduced
> in the previous step.
The new patch series is constructed right as you suggested, thanks a lot for
your help.
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v3 1/4] bundle-uri: verify oid before writing refs
2024-05-27 15:41 ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-05-28 11:55 ` Patrick Steinhardt
2024-05-30 8:32 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Patrick Steinhardt @ 2024-05-28 11:55 UTC (permalink / raw)
To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 1895 bytes --]
On Mon, May 27, 2024 at 03:41:54PM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
[snip]
> 5. The `verify_bundle` will call `parse_object`, within which the
> `prepare_packed_git` or `reprepare_packed_git` is eventually called,
> which means that the `raw_object_store->packed_git` data gets filled
> in and ``packed_git_initialized` is set. This also means consecutive
s/``/`/
[snip]
> This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
> flag when writing bundle refs, so we can:
>
> 1. Ensure that the bundle refs we are writing are pointing to valid
> objects.
> 2. Ensure all the tips from bundle refs can be correctly parsed.
I think one angle that your explanation doesn't cover is why exactly
dropping the flag fixes the observed issue.
> And a set of negotiation related tests for bundle-uri are added.
s/And/Add/
[snip]
> +#########################################################################
> +# Clone negotiation related tests begin here
> +
> +test_expect_success 'negotiation: bundle with part of wanted commits' '
> + test_when_finished rm -rf trace*.txt &&
> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> + git clone --no-local --bundle-uri="clone-from/A.bundle" \
> + clone-from nego-bundle-part &&
> + git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
> + grep "refs/bundles/" refs >actual &&
> + cat >expect <<-\EOF &&
> + refs/bundles/topic
> + EOF
> + test_cmp expect actual &&
> + # Ensure that refs/bundles/topic are sent as "have".
> + grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
> +'
As far as I can see there is no test that verifies the case where the
bundle contains refs, but misses the objects to satisfy the refs. Can we
craft such a bundle and exercise this new failure mode?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
2024-05-27 15:41 ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-05-28 12:03 ` Patrick Steinhardt
2024-05-28 17:10 ` Junio C Hamano
0 siblings, 1 reply; 66+ messages in thread
From: Patrick Steinhardt @ 2024-05-28 12:03 UTC (permalink / raw)
To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 2849 bytes --]
On Mon, May 27, 2024 at 03:41:56PM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
>
> Currently we can use "transfer.fsckObjects" or "fetch.fsckObjects" to
> control whether to enable checks for broken objects during fetching. But
> these configs are only acknowledged by `fetch-pack.c:get_pack` and do
> not make sense when fetching from bundles or using bundle-uris.
Do they not make sense, or are they not effective? I assume you mean the
latter, right?
> This commit exposed the fetch-then-transfer configuration logic by
s/exposed/exposes/
> adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. In next
> commit, this new function will be used by `unbundle` in fetching
> scenarios.
>
> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
> fetch-pack.c | 18 ++++++++++++------
> fetch-pack.h | 2 ++
> 2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 7d2aef21add..81a64be6951 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
> strvec_push(&cmd.args, alternate_shallow_file);
> }
>
> - if (fetch_fsck_objects >= 0
> - ? fetch_fsck_objects
> - : transfer_fsck_objects >= 0
> - ? transfer_fsck_objects
> - : 0)
> - fsck_objects = 1;
This statement is really weird to read, but that is certainly not the
fault of this patch, but...
> + fsck_objects = fetch_pack_fsck_objects();
>
> if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
> if (pack_lockfiles || fsck_objects)
> @@ -2046,6 +2041,17 @@ static const struct object_id *iterate_ref_map(void *cb_data)
> return &ref->old_oid;
> }
>
> +int fetch_pack_fsck_objects(void)
> +{
> + fetch_pack_setup();
> +
> + return fetch_fsck_objects >= 0
> + ? fetch_fsck_objects
> + : transfer_fsck_objects >= 0
> + ? transfer_fsck_objects
> + : 0;
> +}
... can we maybe rewrite it to something more customary here? The
following is way easier to read, at least for me.
int fetch_pack_fsck_objects(void)
{
fetch_pack_setup();
if (fetch_fsck_objects >= 0 ||
transfer_fsck_objects >= 0)
return 1;
return 0;
}
> struct ref *fetch_pack(struct fetch_pack_args *args,
> int fd[],
> const struct ref *ref,
> diff --git a/fetch-pack.h b/fetch-pack.h
> index 6775d265175..38956d9b748 100644
> --- a/fetch-pack.h
> +++ b/fetch-pack.h
> @@ -101,4 +101,6 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
> */
> int report_unmatched_refs(struct ref **sought, int nr_sought);
>
> +int fetch_pack_fsck_objects(void);
Let's add a comment here saying what this function does.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling
2024-05-27 15:41 ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
@ 2024-05-28 12:03 ` Patrick Steinhardt
2024-05-29 18:12 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Patrick Steinhardt @ 2024-05-28 12:03 UTC (permalink / raw)
To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 694 bytes --]
On Mon, May 27, 2024 at 03:41:55PM +0000, Xing Xin via GitGitGadget wrote:
[snip]
> diff --git a/bundle.h b/bundle.h
> index 021adbdcbb3..cfa9daddda6 100644
> --- a/bundle.h
> +++ b/bundle.h
> @@ -30,6 +30,11 @@ int create_bundle(struct repository *r, const char *path,
> int argc, const char **argv, struct strvec *pack_options,
> int version);
>
> +enum unbundle_fsck_flags {
> + UNBUNDLE_FSCK_NEVER = 0,
> + UNBUNDLE_FSCK_ALWAYS,
> +};
> +
> enum verify_bundle_flags {
> VERIFY_BUNDLE_VERBOSE = (1 << 0),
> VERIFY_BUNDLE_QUIET = (1 << 1),
Wouldn't this have been a natural fit for the new flag, e.g. via
something like `VERIFY_BUNDLE_FSCK`?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH
2024-05-27 15:41 ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
@ 2024-05-28 12:05 ` Patrick Steinhardt
2024-05-30 8:54 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Patrick Steinhardt @ 2024-05-28 12:05 UTC (permalink / raw)
To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 1191 bytes --]
On Mon, May 27, 2024 at 03:41:57PM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
> diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
> index 0d1e92d9963..423b35ac237 100755
> --- a/t/t5607-clone-bundle.sh
> +++ b/t/t5607-clone-bundle.sh
> @@ -138,6 +138,29 @@ test_expect_success 'fetch SHA-1 from bundle' '
> git fetch --no-tags foo/tip.bundle "$(cat hash)"
> '
>
> +test_expect_success 'clone bundle with fetch.fsckObjects' '
> + test_create_repo bundle-fsck &&
> + (
> + cd bundle-fsck &&
> + test_commit first &&
> + cat >data <<-EOF &&
> + tree $(git rev-parse HEAD^{tree})
> + parent $(git rev-parse HEAD)
> + author A U Thor
> + committer A U Thor
> +
> + commit: this is a commit with bad emails
> +
> + EOF
> + git hash-object --literally -t commit -w --stdin <data >commit &&
> + git branch bad $(cat commit) &&
> + git bundle create bad.bundle bad
> + ) &&
> + test_must_fail git -c fetch.fsckObjects=true \
> + clone bundle-fsck/bad.bundle bundle-fsck-clone 2>err &&
> + test_grep "missingEmail" err
> +'
Do we also want to have a test for `transfer.fsckObjects`?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
2024-05-28 12:03 ` Patrick Steinhardt
@ 2024-05-28 17:10 ` Junio C Hamano
2024-05-28 17:24 ` Junio C Hamano
2024-05-29 5:52 ` Patrick Steinhardt
0 siblings, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2024-05-28 17:10 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, blanet, Xing Xin
Patrick Steinhardt <ps@pks.im> writes:
>> +int fetch_pack_fsck_objects(void)
>> +{
>> + fetch_pack_setup();
>> +
>> + return fetch_fsck_objects >= 0
>> + ? fetch_fsck_objects
>> + : transfer_fsck_objects >= 0
>> + ? transfer_fsck_objects
>> + : 0;
>> +}
>
> ... can we maybe rewrite it to something more customary here? The
> following is way easier to read, at least for me.
>
> int fetch_pack_fsck_objects(void)
> {
> fetch_pack_setup();
> if (fetch_fsck_objects >= 0 ||
> transfer_fsck_objects >= 0)
> return 1;
> return 0;
> }
But do they mean the same thing? In a repository where
[fetch] fsckobjects = no
is set, no matter what transfer.fsckobjects says (or left unspecified),
we want to return "no, we are not doing fsck".
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
2024-05-28 17:10 ` Junio C Hamano
@ 2024-05-28 17:24 ` Junio C Hamano
2024-05-29 5:52 ` Patrick Steinhardt
2024-05-30 8:48 ` Xing Xin
2024-05-29 5:52 ` Patrick Steinhardt
1 sibling, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2024-05-28 17:24 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, blanet, Xing Xin
Junio C Hamano <gitster@pobox.com> writes:
> Patrick Steinhardt <ps@pks.im> writes:
>
>>> +int fetch_pack_fsck_objects(void)
>>> +{
>>> + fetch_pack_setup();
>>> +
>>> + return fetch_fsck_objects >= 0
>>> + ? fetch_fsck_objects
>>> + : transfer_fsck_objects >= 0
>>> + ? transfer_fsck_objects
>>> + : 0;
>>> +}
>>
>> ... can we maybe rewrite it to something more customary here? The
>> following is way easier to read, at least for me.
>>
>> int fetch_pack_fsck_objects(void)
>> {
>> fetch_pack_setup();
>> if (fetch_fsck_objects >= 0 ||
>> transfer_fsck_objects >= 0)
>> return 1;
>> return 0;
>> }
>
> But do they mean the same thing? In a repository where
>
> [fetch] fsckobjects = no
>
> is set, no matter what transfer.fsckobjects says (or left unspecified),
> we want to return "no, we are not doing fsck".
The original before it was made into a helper function was written
as a cascade of ?: operators, because it had to be a single
expression. As the body of a helper function, we now can sprinkle
multiple return statements in it. I think the way that is easiest
to understand is
/* the most specific, if specified */
if (fetch_fsck_objects >= 0)
return fetch_fsck_objects;
/* the less specific, catch-all for both directions */
if (transfer_fsck_objects >= 0)
return transfer_fsck_objects;
/* the fallback hardcoded default */
return 0;
without the /* comments */.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
2024-05-28 17:10 ` Junio C Hamano
2024-05-28 17:24 ` Junio C Hamano
@ 2024-05-29 5:52 ` Patrick Steinhardt
1 sibling, 0 replies; 66+ messages in thread
From: Patrick Steinhardt @ 2024-05-29 5:52 UTC (permalink / raw)
To: Junio C Hamano
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]
On Tue, May 28, 2024 at 10:10:46AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> >> +int fetch_pack_fsck_objects(void)
> >> +{
> >> + fetch_pack_setup();
> >> +
> >> + return fetch_fsck_objects >= 0
> >> + ? fetch_fsck_objects
> >> + : transfer_fsck_objects >= 0
> >> + ? transfer_fsck_objects
> >> + : 0;
> >> +}
> >
> > ... can we maybe rewrite it to something more customary here? The
> > following is way easier to read, at least for me.
> >
> > int fetch_pack_fsck_objects(void)
> > {
> > fetch_pack_setup();
> > if (fetch_fsck_objects >= 0 ||
> > transfer_fsck_objects >= 0)
> > return 1;
> > return 0;
> > }
>
> But do they mean the same thing? In a repository where
>
> [fetch] fsckobjects = no
>
> is set, no matter what transfer.fsckobjects says (or left unspecified),
> we want to return "no, we are not doing fsck".
Oh, of course they don't. This here would be a faithful conversion:
int fetch_pack_fsck_objects(void)
{
fetch_pack_setup();
if (fetch_fsck_objects >= 0)
return fetch_fsck_objects;
if (transfer_fsck_objects >= 0)
return transfer_fsck_objects;
return 0;
}
Still easier to read in my opinion.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
2024-05-28 17:24 ` Junio C Hamano
@ 2024-05-29 5:52 ` Patrick Steinhardt
2024-05-30 8:48 ` Xing Xin
1 sibling, 0 replies; 66+ messages in thread
From: Patrick Steinhardt @ 2024-05-29 5:52 UTC (permalink / raw)
To: Junio C Hamano
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]
On Tue, May 28, 2024 at 10:24:35AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> >>> +int fetch_pack_fsck_objects(void)
> >>> +{
> >>> + fetch_pack_setup();
> >>> +
> >>> + return fetch_fsck_objects >= 0
> >>> + ? fetch_fsck_objects
> >>> + : transfer_fsck_objects >= 0
> >>> + ? transfer_fsck_objects
> >>> + : 0;
> >>> +}
> >>
> >> ... can we maybe rewrite it to something more customary here? The
> >> following is way easier to read, at least for me.
> >>
> >> int fetch_pack_fsck_objects(void)
> >> {
> >> fetch_pack_setup();
> >> if (fetch_fsck_objects >= 0 ||
> >> transfer_fsck_objects >= 0)
> >> return 1;
> >> return 0;
> >> }
> >
> > But do they mean the same thing? In a repository where
> >
> > [fetch] fsckobjects = no
> >
> > is set, no matter what transfer.fsckobjects says (or left unspecified),
> > we want to return "no, we are not doing fsck".
>
> The original before it was made into a helper function was written
> as a cascade of ?: operators, because it had to be a single
> expression. As the body of a helper function, we now can sprinkle
> multiple return statements in it. I think the way that is easiest
> to understand is
>
> /* the most specific, if specified */
> if (fetch_fsck_objects >= 0)
> return fetch_fsck_objects;
> /* the less specific, catch-all for both directions */
> if (transfer_fsck_objects >= 0)
> return transfer_fsck_objects;
> /* the fallback hardcoded default */
> return 0;
>
> without the /* comments */.
Ah, right, didn't see this mail. My revised version looks the same as
yours, except for the added comments.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling
2024-05-28 12:03 ` Patrick Steinhardt
@ 2024-05-29 18:12 ` Xing Xin
2024-05-30 4:38 ` Patrick Steinhardt
0 siblings, 1 reply; 66+ messages in thread
From: Xing Xin @ 2024-05-29 18:12 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, Xing Xin
At 2024-05-28 20:03:25, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Mon, May 27, 2024 at 03:41:55PM +0000, Xing Xin via GitGitGadget wrote:
>[snip]
>> diff --git a/bundle.h b/bundle.h
>> index 021adbdcbb3..cfa9daddda6 100644
>> --- a/bundle.h
>> +++ b/bundle.h
>> @@ -30,6 +30,11 @@ int create_bundle(struct repository *r, const char *path,
>> int argc, const char **argv, struct strvec *pack_options,
>> int version);
>>
>> +enum unbundle_fsck_flags {
>> + UNBUNDLE_FSCK_NEVER = 0,
>> + UNBUNDLE_FSCK_ALWAYS,
>> +};
>> +
>> enum verify_bundle_flags {
>> VERIFY_BUNDLE_VERBOSE = (1 << 0),
>> VERIFY_BUNDLE_QUIET = (1 << 1),
>
>Wouldn't this have been a natural fit for the new flag, e.g. via
>something like `VERIFY_BUNDLE_FSCK`?
It makes sense to me. Currently, verify_bundle_flags controls the amount
of information displayed when checking a bundle's prerequisites. The
newly added unbundle_fsck_flags is designed to check for broken objects
during the unbundle process, which is essentially a form of bundle
verification. I believe we should extend some object verification
capabilities to the git bundle verify command as well, perhaps by adding
a --fsck-objects option.
With this in mind, I support adding new options to verify_bundle_flags.
Since bundle.c:unbundle needs to combine multiple options, we must
define new options using bitwise shifting:
enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
VERIFY_BUNDLE_FSCK_OBJECTS_ALWAYS = (1 << 2),
VERIFY_BUNDLE_FSCK_OBJECTS_FOLLOW_FETCH = (1 << 3),
};
How about the naming? I'm not very good at naming :)
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Re: [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling
2024-05-29 18:12 ` Xing Xin
@ 2024-05-30 4:38 ` Patrick Steinhardt
2024-05-30 8:46 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Patrick Steinhardt @ 2024-05-30 4:38 UTC (permalink / raw)
To: Xing Xin; +Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 2259 bytes --]
On Thu, May 30, 2024 at 02:12:47AM +0800, Xing Xin wrote:
> At 2024-05-28 20:03:25, "Patrick Steinhardt" <ps@pks.im> wrote:
> >On Mon, May 27, 2024 at 03:41:55PM +0000, Xing Xin via GitGitGadget wrote:
> >[snip]
> >> diff --git a/bundle.h b/bundle.h
> >> index 021adbdcbb3..cfa9daddda6 100644
> >> --- a/bundle.h
> >> +++ b/bundle.h
> >> @@ -30,6 +30,11 @@ int create_bundle(struct repository *r, const char *path,
> >> int argc, const char **argv, struct strvec *pack_options,
> >> int version);
> >>
> >> +enum unbundle_fsck_flags {
> >> + UNBUNDLE_FSCK_NEVER = 0,
> >> + UNBUNDLE_FSCK_ALWAYS,
> >> +};
> >> +
> >> enum verify_bundle_flags {
> >> VERIFY_BUNDLE_VERBOSE = (1 << 0),
> >> VERIFY_BUNDLE_QUIET = (1 << 1),
> >
> >Wouldn't this have been a natural fit for the new flag, e.g. via
> >something like `VERIFY_BUNDLE_FSCK`?
>
> It makes sense to me. Currently, verify_bundle_flags controls the amount
> of information displayed when checking a bundle's prerequisites. The
> newly added unbundle_fsck_flags is designed to check for broken objects
> during the unbundle process, which is essentially a form of bundle
> verification. I believe we should extend some object verification
> capabilities to the git bundle verify command as well, perhaps by adding
> a --fsck-objects option.
>
> With this in mind, I support adding new options to verify_bundle_flags.
> Since bundle.c:unbundle needs to combine multiple options, we must
> define new options using bitwise shifting:
>
> enum verify_bundle_flags {
> VERIFY_BUNDLE_VERBOSE = (1 << 0),
> VERIFY_BUNDLE_QUIET = (1 << 1),
> VERIFY_BUNDLE_FSCK_OBJECTS_ALWAYS = (1 << 2),
> VERIFY_BUNDLE_FSCK_OBJECTS_FOLLOW_FETCH = (1 << 3),
> };
>
> How about the naming? I'm not very good at naming :)
I later noticed that you extend the `unbundle_fsck_flags` in a later
patch. With that in mind I don't think it's all that important anymore
to merge those into the `verify_bundle_flags` as you would otherwise
allow for weirdness. What happens for example when both `ALWAYS` and
`FOLLOW_FETCH` are set?
So feel free to ignore this advice. If you still think it's a good idea
then the above naming looks okay to me.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches
2024-05-27 15:41 ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
` (3 preceding siblings ...)
2024-05-27 15:41 ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
@ 2024-05-30 8:21 ` blanet via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
` (4 more replies)
4 siblings, 5 replies; 66+ messages in thread
From: blanet via GitGitGadget @ 2024-05-30 8:21 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet
While attempting to fix a reference negotiation bug in bundle-uri, we
identified that the fetch process lacks some crucial object validation
checks when processing bundles. The primary issues are:
1. In the bundle-uri scenario, object IDs were not validated before writing
bundle references. This was the root cause of the original negotiation
bug in bundle-uri and could lead to potential repository corruption.
2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
configurations were not applied when directly fetching bundles or
fetching with bundle-uri enabled. In fact, there were no object
validation supports for unbundle.
The first patch addresses the bundle-uri negotiation issue by removing the
REF_SKIP_OID_VERIFICATION flag when writing bundle references.
Patches 2 through 4 extend verify_bundle_flags for bundle.c:unbundle to add
support for object validation (fsck) in different scenarios, mainly
following the suggestions from Junio on the mailing list.
Xing Xin (4):
bundle-uri: verify oid before writing refs
unbundle: extend verify_bundle_flags to support fsck-objects
fetch-pack: expose fsckObjects configuration logic
unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH
bundle-uri.c | 5 +-
bundle.c | 10 ++
bundle.h | 2 +
fetch-pack.c | 17 ++--
fetch-pack.h | 5 +
t/t5558-clone-bundle-uri.sh | 186 +++++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 33 +++++++
transport.c | 2 +-
8 files changed, 246 insertions(+), 14 deletions(-)
base-commit: b9cfe4845cb2562584837bc0101c0ab76490a239
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1730
Range-diff vs v3:
1: 8f488a5eeaa ! 1: e958a3ab20c bundle-uri: verify oid before writing refs
@@ Commit message
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
- clones. This was particularly problematic when employing the heuristic
- `creationTokens`.
+ clones. This was particularly problematic when employing the
+ "creationTokens" heuristic.
- And this is easy to reproduce. Suppose we have a repository with a
- single branch `main` pointing to commit `A`, firstly we create a base
- bundle with
+ To reproduce this issue, consider a repository with a single branch
+ "main" pointing to commit "A". Firstly, create a base bundle with:
git bundle create base.bundle main
- Then let's add a new commit `B` on top of `A`, so that an incremental
- bundle for `main` can be created with
+ Then, add a new commit "B" on top of "A", and create an incremental
+ bundle for "main":
git bundle create incr.bundle A..main
- Now we can generate a bundle list with the following content:
+ Now, generate a bundle list with the following content:
[bundle]
version = 1
@@ Commit message
uri = incr.bundle
creationToken = 2
- A fresh clone with the bundle list above would give the expected
- `refs/bundles/main` pointing at `B` in new repository, in other words we
- already had everything locally from the bundles, but git would still
- download everything from server as if we got nothing.
+ A fresh clone with the bundle list above should result in a reference
+ "refs/bundles/main" pointing to "B" in the new repository. However, git
+ would still download everything from the server, as if it had fetched
+ nothing locally.
- So why the `refs/bundles/main` is not discovered? After some digging I
+ So why the "refs/bundles/main" is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
- `download_bundle_list` or via `fetch_bundles_by_token` for the
- creationToken heuristic case.
- 2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
- is called by `unbundle_all_bundles` or called within
- `fetch_bundles_by_token` for the creationToken heuristic case.
- 3. Here, we first read the bundle header to get all the prerequisites
- for the bundle, this is done in `read_bundle_header`.
- 4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
- the repository does indeed contain the prerequisites mentioned in the
+ `bundle-uri.c:download_bundle_list` or via
+ `bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
+ heuristic.
+ 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
+ is called by `bundle-uri.c:unbundle_all_bundles` or called within
+ `bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
+ heuristic.
+ 3. To get all prerequisites of the bundle, the bundle header is read
+ inside `bundle-uri.c:unbundle_from_file` to by calling
+ `bundle.c:read_bundle_header`.
+ 4. Then it calls `bundle.c:unbundle`, which calls
+ `bundle.c:verify_bundle` to ensure the repository contains all the
+ prerequisites.
+ 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
+ invokes `packfile.c:prepare_packed_git` or
+ `packfile.c:reprepare_packed_git`, filling
+ `raw_object_store->packed_git` and setting `packed_git_initialized`.
+ 6. If `bundle.c:unbundle` succeeds, it writes refs via
+ `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
+ bundle refs which can target arbitrary objects are written to the
+ repository.
+ 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
+ `fetch-pack.c:mark_complete_and_common_ref` and
+ `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
+ find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
+ prevents `packfile.c:reprepare_packed_git` from being called,
+ resulting in failures to parse OIDs that reside only in the latest
bundle.
- 5. The `verify_bundle` will call `parse_object`, within which the
- `prepare_packed_git` or `reprepare_packed_git` is eventually called,
- which means that the `raw_object_store->packed_git` data gets filled
- in and ``packed_git_initialized` is set. This also means consecutive
- calls to `prepare_packed_git` doesn't re-initiate
- `raw_object_store->packed_git` since `packed_git_initialized` already
- is set.
- 6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
- with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
- target arbitrary objects are written to the repository.
- 7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
- `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
- tips. Here it won't call `reprepare_packed_git` anymore so it would
- fail to parse oids that only reside in the last bundle.
-
- Back to the example above, when unbunding `incr.bundle`, `base.pack` is
- enlisted to `packed_git` bacause of the prerequisites to verify. While
- we can not find `B` for negotiation at a latter time because `B` exists
- in `incr.pack` which is not enlisted in `packed_git`.
-
- This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
- flag when writing bundle refs, so we can:
-
- 1. Ensure that the bundle refs we are writing are pointing to valid
- objects.
- 2. Ensure all the tips from bundle refs can be correctly parsed.
-
- And a set of negotiation related tests for bundle-uri are added.
+ In the example above, when unbunding "incr.bundle", "base.pack" is added
+ to `packed_git` due to prerequisites verification. However, "B" cannot
+ be found for negotiation because it exists in "incr.pack", which is not
+ included in `packed_git`.
+
+ This commit fixes the bug by removing `REF_SKIP_OID_VERIFICATION` flag
+ when writing bundle refs. When `refs.c:refs_update_ref` is called to to
+ write the corresponding bundle refs, it triggers
+ `refs.c:ref_transaction_commit`. This, in turn, invokes
+ `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of
+ the refs storage backend. For files backend, this function is
+ `files-backend.c:files_transaction_prepare`, and for reftable backend,
+ it is `reftable-backend.c:reftable_be_transaction_prepare`. Both
+ functions eventually call `object.c:parse_object`, which can invoke
+ `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
+ that bundle refs point to valid objects and that all tips from bundle
+ refs are correctly parsed during subsequent negotiations.
+
+ A test has been added to demonstrate that bundles with incorrect
+ headers, where refs point to non-existent objects, do not result in any
+ bundle refs being created in the repository. Additionally, a set of
+ negotiation-related tests for fetching with bundle-uri has been
+ included.
+
+ Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
+ Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
## bundle-uri.c ##
@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
## t/t5558-clone-bundle-uri.sh ##
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle file' '
+
test_expect_success 'create bundle' '
git init clone-from &&
- git -C clone-from checkout -b topic &&
+- git -C clone-from checkout -b topic &&
+- test_commit -C clone-from A &&
+- test_commit -C clone-from B &&
+- git -C clone-from bundle create B.bundle topic
++ (
++ cd clone-from &&
++ git checkout -b topic &&
++
++ test_commit A &&
++ git bundle create A.bundle topic &&
+
- test_commit -C clone-from A &&
-+ git -C clone-from bundle create A.bundle topic &&
++ test_commit B &&
++ git bundle create B.bundle topic &&
+
- test_commit -C clone-from B &&
- git -C clone-from bundle create B.bundle topic
++ # Create a bundle with reference pointing to non-existent object.
++ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
++ )
'
+
+ test_expect_success 'clone with path bundle' '
+@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with path bundle' '
+ test_cmp expect actual
+ '
+
++test_expect_success 'clone with bundle that has bad header' '
++ git clone --bundle-uri="clone-from/bad-header.bundle" \
++ clone-from clone-bad-header 2>err &&
++ # Write bundle ref fails, but clone can still proceed.
++ commit_b=$(git -C clone-from rev-parse B) &&
++ test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
++ git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
++ ! grep "refs/bundles/" refs
++'
++
+ test_expect_success 'clone with path bundle and non-default hash' '
+ test_when_finished "rm -rf clone-path-non-default-hash" &&
+ GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any mode, all failures)' '
! grep "refs/bundles/" refs
'
2: 057c697970f ! 2: beb70735811 unbundle: introduce unbundle_fsck_flags for fsckobjects handling
@@ Metadata
Author: Xing Xin <xingxin.xx@bytedance.com>
## Commit message ##
- unbundle: introduce unbundle_fsck_flags for fsckobjects handling
+ unbundle: extend verify_bundle_flags to support fsck-objects
- This commit adds a new enum `unbundle_fsck_flags` which is designed to
- control the fsck behavior when unbundling. `unbundle` can use this newly
- passed in enum to further decide whether to enable `--fsck-objects` for
- "git-index-pack".
+ This commit extends `verify_bundle_flags` by adding a new option
+ `VERIFY_BUNDLE_FSCK_ALWAYS`, which enables checks for broken objects in
+ `bundle.c:unbundle`. This option is now used as the default for fetches
+ involving bundles, specifically by `transport.c:fetch_refs_from_bundle`
+ for direct bundle fetches and by `bundle-uri.c:unbundle_from_file` for
+ _bundle-uri_ enabled fetches.
- Currently only `UNBUNDLE_FSCK_NEVER` and `UNBUNDLE_FSCK_ALWAYS` are
- supported as the very basic options. Another interesting option would be
- added in later commits.
+ Upcoming commits will introduce another option as a replacement that
+ fits better with fetch operations. `VERIFY_BUNDLE_FSCK_ALWAYS` will be
+ further used to add "--fsck-objects" support for "git bundle unbundle"
+ and "git bundle verify".
+ Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
- ## builtin/bundle.c ##
-@@ builtin/bundle.c: static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
- strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
- _("Unbundling objects"), NULL);
- ret = !!unbundle(the_repository, &header, bundle_fd,
-- &extra_index_pack_args, 0) ||
-+ &extra_index_pack_args, 0, UNBUNDLE_FSCK_NEVER) ||
- list_bundle_refs(&header, argc, argv);
- bundle_header_release(&header);
- cleanup:
-
## bundle-uri.c ##
@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET)))
-+ VERIFY_BUNDLE_QUIET, UNBUNDLE_FSCK_ALWAYS)))
++ VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
return 1;
/*
## bundle.c ##
-@@ bundle.c: int create_bundle(struct repository *r, const char *path,
-
- int unbundle(struct repository *r, struct bundle_header *header,
- int bundle_fd, struct strvec *extra_index_pack_args,
-- enum verify_bundle_flags flags)
-+ enum verify_bundle_flags flags,
-+ enum unbundle_fsck_flags fsck_flags)
- {
- struct child_process ip = CHILD_PROCESS_INIT;
-
@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
-+ switch (fsck_flags) {
-+ case UNBUNDLE_FSCK_ALWAYS:
++ if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
+ strvec_push(&ip.args, "--fsck-objects");
-+ break;
-+ case UNBUNDLE_FSCK_NEVER:
-+ default:
-+ break;
-+ }
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
## bundle.h ##
@@ bundle.h: int create_bundle(struct repository *r, const char *path,
- int argc, const char **argv, struct strvec *pack_options,
- int version);
-
-+enum unbundle_fsck_flags {
-+ UNBUNDLE_FSCK_NEVER = 0,
-+ UNBUNDLE_FSCK_ALWAYS,
-+};
-+
enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
-@@ bundle.h: int verify_bundle(struct repository *r, struct bundle_header *header,
- */
- int unbundle(struct repository *r, struct bundle_header *header,
- int bundle_fd, struct strvec *extra_index_pack_args,
-- enum verify_bundle_flags flags);
-+ enum verify_bundle_flags flags,
-+ enum unbundle_fsck_flags fsck_flags);
- int list_bundle_refs(struct bundle_header *header,
- int argc, const char **argv);
++ VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
+ };
+ int verify_bundle(struct repository *r, struct bundle_header *header,
## transport.c ##
@@ transport.c: static int fetch_refs_from_bundle(struct transport *transport,
@@ transport.c: static int fetch_refs_from_bundle(struct transport *transport,
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, 0);
-+ &extra_index_pack_args, 0, UNBUNDLE_FSCK_ALWAYS);
++ &extra_index_pack_args, VERIFY_BUNDLE_FSCK_ALWAYS);
transport->hash_algo = data->header.hash_algo;
return ret;
}
3: 67401d4fbcb ! 3: 5ddc894c2c1 fetch-pack: expose fsckObjects configuration logic
@@ Metadata
## Commit message ##
fetch-pack: expose fsckObjects configuration logic
- Currently we can use "transfer.fsckObjects" or "fetch.fsckObjects" to
- control whether to enable checks for broken objects during fetching. But
- these configs are only acknowledged by `fetch-pack.c:get_pack` and do
- not make sense when fetching from bundles or using bundle-uris.
+ Currently, we can use "transfer.fsckObjects" and the more specific
+ "fetch.fsckObjects" to control checks for broken objects in received
+ packs during fetches. However, these configurations were only
+ acknowledged by `fetch-pack.c:get_pack` and did not take effect in
+ direct bundle fetches and fetches with _bundle-uri_ enabled.
- This commit exposed the fetch-then-transfer configuration logic by
- adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. In next
- commit, this new function will be used by `unbundle` in fetching
- scenarios.
+ This commit exposes the fetch-then-transfer configuration logic by
+ adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
+ new function is used to replace the assignment for `fsck_objects` in
+ `fetch-pack.c:get_pack`. In the next commit, it will also be used by
+ `bundle.c:unbundle` to better fit fetching scenarios.
+ Helped-by: Junio C Hamano <gitster@pobox.com>
+ Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
## fetch-pack.c ##
@@ fetch-pack.c: static const struct object_id *iterate_ref_map(void *cb_data)
+int fetch_pack_fsck_objects(void)
+{
+ fetch_pack_setup();
-+
-+ return fetch_fsck_objects >= 0
-+ ? fetch_fsck_objects
-+ : transfer_fsck_objects >= 0
-+ ? transfer_fsck_objects
-+ : 0;
++ if (fetch_fsck_objects >= 0)
++ return fetch_fsck_objects;
++ if (transfer_fsck_objects >= 0)
++ return transfer_fsck_objects;
++ return 0;
+}
+
struct ref *fetch_pack(struct fetch_pack_args *args,
@@ fetch-pack.h: void negotiate_using_fetch(const struct oid_array *negotiation_tip
*/
int report_unmatched_refs(struct ref **sought, int nr_sought);
++/*
++ * Return true if checks for broken objects in received pack are required.
++ */
+int fetch_pack_fsck_objects(void);
+
#endif
4: c19b8f633cb ! 4: 68b9bca9f8b unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH
@@ Metadata
Author: Xing Xin <xingxin.xx@bytedance.com>
## Commit message ##
- unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH
+ unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH
- This commit adds a new option `UNBUNDLE_FSCK_FOLLOW_FETCH` to
- `unbundle_fsck_flags`, this new flag is currently used in the _fetch_
- process by
+ This commit introduces a new option `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` to
+ `verify_bundle_flags`. In `bundle.c:unbundle`, this new option controls
+ whether broken object checks should be enabled by invoking
+ `fetch-pack.c:fetch_pack_fsck_objects`. Note that the option
+ `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
+ `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
- - `transport.c:fetch_refs_from_bundle` for fetching directly from a
- bundle.
- - `bundle-uri.c:unbundle_from_file` for unbundling bundles downloaded
- from bundle-uri.
+ This flag is now used in the fetching process by:
- So we now have a relatively consistent logic for checking objects during
- fetching. Add tests for the above two situations are added.
+ - `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
+ - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
+
+ This addition ensures a consistent logic for object verification during
+ fetch operations. Tests have been added to confirm functionality in the
+ scenarios mentioned above.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
-- VERIFY_BUNDLE_QUIET, UNBUNDLE_FSCK_ALWAYS)))
-+ VERIFY_BUNDLE_QUIET, UNBUNDLE_FSCK_FOLLOW_FETCH)))
+- VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
++ VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
return 1;
/*
@@ bundle.c
static const char v2_bundle_signature[] = "# v2 git bundle\n";
static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
- enum unbundle_fsck_flags fsck_flags)
+ enum verify_bundle_flags flags)
{
struct child_process ip = CHILD_PROCESS_INIT;
+ int fsck_objects = 0;
@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
if (verify_bundle(r, header, flags))
return -1;
@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
+ strvec_push(&ip.args, "--promisor=from-bundle");
- switch (fsck_flags) {
- case UNBUNDLE_FSCK_ALWAYS:
-- strvec_push(&ip.args, "--fsck-objects");
+ if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
+ fsck_objects = 1;
-+ break;
-+ case UNBUNDLE_FSCK_FOLLOW_FETCH:
++ else if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
+ fsck_objects = fetch_pack_fsck_objects();
- break;
- case UNBUNDLE_FSCK_NEVER:
- default:
- break;
- }
-
-+ if (fsck_objects)
-+ strvec_push(&ip.args, "--fsck-objects");
+
++ if (fsck_objects)
+ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
- strvec_pushv(&ip.args, extra_index_pack_args->v);
- strvec_clear(extra_index_pack_args);
## bundle.h ##
-@@ bundle.h: int create_bundle(struct repository *r, const char *path,
- enum unbundle_fsck_flags {
- UNBUNDLE_FSCK_NEVER = 0,
- UNBUNDLE_FSCK_ALWAYS,
-+ UNBUNDLE_FSCK_FOLLOW_FETCH,
+@@ bundle.h: enum verify_bundle_flags {
+ VERIFY_BUNDLE_VERBOSE = (1 << 0),
+ VERIFY_BUNDLE_QUIET = (1 << 1),
+ VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
++ VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 3),
};
- enum verify_bundle_flags {
+ int verify_bundle(struct repository *r, struct bundle_header *header,
## t/t5558-clone-bundle-uri.sh ##
-@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle file' '
+@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'create bundle' '
+ git bundle create B.bundle topic &&
- test_expect_success 'create bundle' '
- git init clone-from &&
-- git -C clone-from checkout -b topic &&
-+ (
-+ cd clone-from &&
-+ git checkout -b topic &&
-+
-+ test_commit A &&
-+ git bundle create A.bundle topic &&
-+
-+ test_commit B &&
-+ git bundle create B.bundle topic &&
+ # Create a bundle with reference pointing to non-existent object.
+- sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
++ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle &&
+
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
-
-- test_commit -C clone-from A &&
-- git -C clone-from bundle create A.bundle topic &&
++
+ commit: this is a commit with bad emails
-
-- test_commit -C clone-from B &&
-- git -C clone-from bundle create B.bundle topic
++
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
-+ git bundle create bad.bundle bad &&
++ git bundle create bad-object.bundle bad &&
+ git update-ref -d refs/heads/bad
-+ )
+ )
'
- test_expect_success 'clone with path bundle' '
-@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with path bundle' '
- test_cmp expect actual
+@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with bundle that has bad header' '
+ ! grep "refs/bundles/" refs
'
-+test_expect_success 'clone with bad bundle' '
-+ git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad.bundle" \
-+ clone-from clone-bad 2>err &&
-+ # Unbundle fails, but clone can still proceed.
++test_expect_success 'clone with bundle that has bad object' '
++ # Unbundle succeeds if no fsckObjects confugured.
++ git clone --bundle-uri="clone-from/bad-object.bundle" \
++ clone-from clone-bad-object-no-fsck &&
++ git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
++ grep "refs/bundles/" refs >actual &&
++ cat >expect <<-\EOF &&
++ refs/bundles/bad
++ EOF
++ test_cmp expect actual &&
++
++ # Unbundle fails with fsckObjects set true, but clone can still proceed.
++ git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad-object.bundle" \
++ clone-from clone-bad-object-fsck 2>err &&
+ test_grep "missingEmail" err &&
-+ git -C clone-bad for-each-ref --format="%(refname)" >refs &&
++ git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
@@ t/t5607-clone-bundle.sh: test_expect_success 'fetch SHA-1 from bundle' '
git fetch --no-tags foo/tip.bundle "$(cat hash)"
'
-+test_expect_success 'clone bundle with fetch.fsckObjects' '
++test_expect_success 'clone bundle with different fsckObjects configurations' '
+ test_create_repo bundle-fsck &&
+ (
+ cd bundle-fsck &&
@@ t/t5607-clone-bundle.sh: test_expect_success 'fetch SHA-1 from bundle' '
+ git branch bad $(cat commit) &&
+ git bundle create bad.bundle bad
+ ) &&
++
++ git clone bundle-fsck/bad.bundle bundle-no-fsck &&
++
++ git -c fetch.fsckObjects=false -c transfer.fsckObjects=true \
++ clone bundle-fsck/bad.bundle bundle-fetch-no-fsck &&
++
+ test_must_fail git -c fetch.fsckObjects=true \
-+ clone bundle-fsck/bad.bundle bundle-fsck-clone 2>err &&
++ clone bundle-fsck/bad.bundle bundle-fetch-fsck 2>err &&
++ test_grep "missingEmail" err &&
++
++ test_must_fail git -c transfer.fsckObjects=true \
++ clone bundle-fsck/bad.bundle bundle-transfer-fsck 2>err &&
+ test_grep "missingEmail" err
+'
+
@@ transport.c: static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
-- &extra_index_pack_args, 0, UNBUNDLE_FSCK_ALWAYS);
-+ &extra_index_pack_args, 0, UNBUNDLE_FSCK_FOLLOW_FETCH);
+- &extra_index_pack_args, VERIFY_BUNDLE_FSCK_ALWAYS);
++ &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v4 1/4] bundle-uri: verify oid before writing refs
2024-05-30 8:21 ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
@ 2024-05-30 8:21 ` Xing Xin via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 2/4] unbundle: extend verify_bundle_flags to support fsck-objects Xing Xin via GitGitGadget
` (3 subsequent siblings)
4 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-30 8:21 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the
"creationTokens" heuristic.
To reproduce this issue, consider a repository with a single branch
"main" pointing to commit "A". Firstly, create a base bundle with:
git bundle create base.bundle main
Then, add a new commit "B" on top of "A", and create an incremental
bundle for "main":
git bundle create incr.bundle A..main
Now, generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above should result in a reference
"refs/bundles/main" pointing to "B" in the new repository. However, git
would still download everything from the server, as if it had fetched
nothing locally.
So why the "refs/bundles/main" is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
`bundle-uri.c:download_bundle_list` or via
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
is called by `bundle-uri.c:unbundle_all_bundles` or called within
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
3. To get all prerequisites of the bundle, the bundle header is read
inside `bundle-uri.c:unbundle_from_file` to by calling
`bundle.c:read_bundle_header`.
4. Then it calls `bundle.c:unbundle`, which calls
`bundle.c:verify_bundle` to ensure the repository contains all the
prerequisites.
5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
invokes `packfile.c:prepare_packed_git` or
`packfile.c:reprepare_packed_git`, filling
`raw_object_store->packed_git` and setting `packed_git_initialized`.
6. If `bundle.c:unbundle` succeeds, it writes refs via
`refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
bundle refs which can target arbitrary objects are written to the
repository.
7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
`fetch-pack.c:mark_complete_and_common_ref` and
`fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
prevents `packfile.c:reprepare_packed_git` from being called,
resulting in failures to parse OIDs that reside only in the latest
bundle.
In the example above, when unbunding "incr.bundle", "base.pack" is added
to `packed_git` due to prerequisites verification. However, "B" cannot
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.
This commit fixes the bug by removing `REF_SKIP_OID_VERIFICATION` flag
when writing bundle refs. When `refs.c:refs_update_ref` is called to to
write the corresponding bundle refs, it triggers
`refs.c:ref_transaction_commit`. This, in turn, invokes
`refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of
the refs storage backend. For files backend, this function is
`files-backend.c:files_transaction_prepare`, and for reftable backend,
it is `reftable-backend.c:reftable_be_transaction_prepare`. Both
functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.
A test has been added to demonstrate that bundles with incorrect
headers, where refs point to non-existent objects, do not result in any
bundle refs being created in the repository. Additionally, a set of
negotiation-related tests for fetching with bundle-uri has been
included.
Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 3 +-
t/t5558-clone-bundle-uri.sh | 153 +++++++++++++++++++++++++++++++++++-
2 files changed, 150 insertions(+), 6 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
refs_update_ref(get_main_ref_store(the_repository),
"fetched bundle", bundle_ref.buf, oid,
has_old ? &old_oid : NULL,
- REF_SKIP_OID_VERIFICATION,
- UPDATE_REFS_MSG_ON_ERR);
+ 0, UPDATE_REFS_MSG_ON_ERR);
}
bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..8f4f802e4f1 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -19,10 +19,19 @@ test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
git init clone-from &&
- git -C clone-from checkout -b topic &&
- test_commit -C clone-from A &&
- test_commit -C clone-from B &&
- git -C clone-from bundle create B.bundle topic
+ (
+ cd clone-from &&
+ git checkout -b topic &&
+
+ test_commit A &&
+ git bundle create A.bundle topic &&
+
+ test_commit B &&
+ git bundle create B.bundle topic &&
+
+ # Create a bundle with reference pointing to non-existent object.
+ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
+ )
'
test_expect_success 'clone with path bundle' '
@@ -33,6 +42,16 @@ test_expect_success 'clone with path bundle' '
test_cmp expect actual
'
+test_expect_success 'clone with bundle that has bad header' '
+ git clone --bundle-uri="clone-from/bad-header.bundle" \
+ clone-from clone-bad-header 2>err &&
+ # Write bundle ref fails, but clone can still proceed.
+ commit_b=$(git -C clone-from rev-parse B) &&
+ test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
+ git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
@@ -259,6 +278,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
! grep "refs/bundles/" refs
'
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="clone-from/A.bundle" \
+ clone-from nego-bundle-part &&
+ git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
+ grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=topic --no-tags \
+ --bundle-uri="clone-from/B.bundle" \
+ clone-from nego-bundle-all &&
+ git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-no-heuristic &&
+
+ git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-heuristic &&
+
+ git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=left --no-tags \
+ --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-all &&
+
+ git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
#########################################################################
# HTTP tests begin here
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v4 2/4] unbundle: extend verify_bundle_flags to support fsck-objects
2024-05-30 8:21 ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-05-30 8:21 ` Xing Xin via GitGitGadget
2024-06-06 12:06 ` Patrick Steinhardt
2024-05-30 8:21 ` [PATCH v4 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
` (2 subsequent siblings)
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-30 8:21 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
This commit extends `verify_bundle_flags` by adding a new option
`VERIFY_BUNDLE_FSCK_ALWAYS`, which enables checks for broken objects in
`bundle.c:unbundle`. This option is now used as the default for fetches
involving bundles, specifically by `transport.c:fetch_refs_from_bundle`
for direct bundle fetches and by `bundle-uri.c:unbundle_from_file` for
_bundle-uri_ enabled fetches.
Upcoming commits will introduce another option as a replacement that
fits better with fetch operations. `VERIFY_BUNDLE_FSCK_ALWAYS` will be
further used to add "--fsck-objects" support for "git bundle unbundle"
and "git bundle verify".
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 2 +-
bundle.c | 3 +++
bundle.h | 1 +
transport.c | 2 +-
4 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 65666a11d9c..066ff788104 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET)))
+ VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
return 1;
/*
diff --git a/bundle.c b/bundle.c
index 95367c2d0a0..26574e74bdd 100644
--- a/bundle.c
+++ b/bundle.c
@@ -625,6 +625,9 @@ int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
+ if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
+ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index 021adbdcbb3..cf23c8615d3 100644
--- a/bundle.h
+++ b/bundle.h
@@ -33,6 +33,7 @@ int create_bundle(struct repository *r, const char *path,
enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
+ VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
};
int verify_bundle(struct repository *r, struct bundle_header *header,
diff --git a/transport.c b/transport.c
index 0ad04b77fd2..1b3d61ffcec 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, 0);
+ &extra_index_pack_args, VERIFY_BUNDLE_FSCK_ALWAYS);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v4 3/4] fetch-pack: expose fsckObjects configuration logic
2024-05-30 8:21 ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 2/4] unbundle: extend verify_bundle_flags to support fsck-objects Xing Xin via GitGitGadget
@ 2024-05-30 8:21 ` Xing Xin via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 4/4] unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
4 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-30 8:21 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
Currently, we can use "transfer.fsckObjects" and the more specific
"fetch.fsckObjects" to control checks for broken objects in received
packs during fetches. However, these configurations were only
acknowledged by `fetch-pack.c:get_pack` and did not take effect in
direct bundle fetches and fetches with _bundle-uri_ enabled.
This commit exposes the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
new function is used to replace the assignment for `fsck_objects` in
`fetch-pack.c:get_pack`. In the next commit, it will also be used by
`bundle.c:unbundle` to better fit fetching scenarios.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
fetch-pack.c | 17 +++++++++++------
fetch-pack.h | 5 +++++
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 7d2aef21add..3acff2baf09 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
strvec_push(&cmd.args, alternate_shallow_file);
}
- if (fetch_fsck_objects >= 0
- ? fetch_fsck_objects
- : transfer_fsck_objects >= 0
- ? transfer_fsck_objects
- : 0)
- fsck_objects = 1;
+ fsck_objects = fetch_pack_fsck_objects();
if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
if (pack_lockfiles || fsck_objects)
@@ -2046,6 +2041,16 @@ static const struct object_id *iterate_ref_map(void *cb_data)
return &ref->old_oid;
}
+int fetch_pack_fsck_objects(void)
+{
+ fetch_pack_setup();
+ if (fetch_fsck_objects >= 0)
+ return fetch_fsck_objects;
+ if (transfer_fsck_objects >= 0)
+ return transfer_fsck_objects;
+ return 0;
+}
+
struct ref *fetch_pack(struct fetch_pack_args *args,
int fd[],
const struct ref *ref,
diff --git a/fetch-pack.h b/fetch-pack.h
index 6775d265175..b5c579cdae2 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -101,4 +101,9 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
*/
int report_unmatched_refs(struct ref **sought, int nr_sought);
+/*
+ * Return true if checks for broken objects in received pack are required.
+ */
+int fetch_pack_fsck_objects(void);
+
#endif
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v4 4/4] unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH
2024-05-30 8:21 ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
` (2 preceding siblings ...)
2024-05-30 8:21 ` [PATCH v4 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-05-30 8:21 ` Xing Xin via GitGitGadget
2024-06-06 12:06 ` Patrick Steinhardt
2024-06-11 6:42 ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-30 8:21 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
This commit introduces a new option `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` to
`verify_bundle_flags`. In `bundle.c:unbundle`, this new option controls
whether broken object checks should be enabled by invoking
`fetch-pack.c:fetch_pack_fsck_objects`. Note that the option
`VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
`VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
This flag is now used in the fetching process by:
- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
This addition ensures a consistent logic for object verification during
fetch operations. Tests have been added to confirm functionality in the
scenarios mentioned above.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 2 +-
bundle.c | 7 +++++++
bundle.h | 1 +
t/t5558-clone-bundle-uri.sh | 35 ++++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 33 +++++++++++++++++++++++++++++++++
transport.c | 2 +-
6 files changed, 77 insertions(+), 3 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 066ff788104..e7ebac6ce57 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
+ VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
return 1;
/*
diff --git a/bundle.c b/bundle.c
index 26574e74bdd..53ac73834ea 100644
--- a/bundle.c
+++ b/bundle.c
@@ -17,6 +17,7 @@
#include "list-objects-filter-options.h"
#include "connected.h"
#include "write-or-die.h"
+#include "fetch-pack.h"
static const char v2_bundle_signature[] = "# v2 git bundle\n";
static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -615,6 +616,7 @@ int unbundle(struct repository *r, struct bundle_header *header,
enum verify_bundle_flags flags)
{
struct child_process ip = CHILD_PROCESS_INIT;
+ int fsck_objects = 0;
if (verify_bundle(r, header, flags))
return -1;
@@ -626,6 +628,11 @@ int unbundle(struct repository *r, struct bundle_header *header,
strvec_push(&ip.args, "--promisor=from-bundle");
if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
+ fsck_objects = 1;
+ else if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
+ fsck_objects = fetch_pack_fsck_objects();
+
+ if (fsck_objects)
strvec_push(&ip.args, "--fsck-objects");
if (extra_index_pack_args) {
diff --git a/bundle.h b/bundle.h
index cf23c8615d3..a39d8ea1a7e 100644
--- a/bundle.h
+++ b/bundle.h
@@ -34,6 +34,7 @@ enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
+ VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 3),
};
int verify_bundle(struct repository *r, struct bundle_header *header,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 8f4f802e4f1..48be1b18802 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -30,7 +30,21 @@ test_expect_success 'create bundle' '
git bundle create B.bundle topic &&
# Create a bundle with reference pointing to non-existent object.
- sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
+ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle &&
+
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad-object.bundle bad &&
+ git update-ref -d refs/heads/bad
)
'
@@ -52,6 +66,25 @@ test_expect_success 'clone with bundle that has bad header' '
! grep "refs/bundles/" refs
'
+test_expect_success 'clone with bundle that has bad object' '
+ # Unbundle succeeds if no fsckObjects confugured.
+ git clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-no-fsck &&
+ git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/bad
+ EOF
+ test_cmp expect actual &&
+
+ # Unbundle fails with fsckObjects set true, but clone can still proceed.
+ git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-fsck 2>err &&
+ test_grep "missingEmail" err &&
+ git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
index 0d1e92d9963..5182efc0b45 100755
--- a/t/t5607-clone-bundle.sh
+++ b/t/t5607-clone-bundle.sh
@@ -138,6 +138,39 @@ test_expect_success 'fetch SHA-1 from bundle' '
git fetch --no-tags foo/tip.bundle "$(cat hash)"
'
+test_expect_success 'clone bundle with different fsckObjects configurations' '
+ test_create_repo bundle-fsck &&
+ (
+ cd bundle-fsck &&
+ test_commit first &&
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad.bundle bad
+ ) &&
+
+ git clone bundle-fsck/bad.bundle bundle-no-fsck &&
+
+ git -c fetch.fsckObjects=false -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-no-fsck &&
+
+ test_must_fail git -c fetch.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-fsck 2>err &&
+ test_grep "missingEmail" err &&
+
+ test_must_fail git -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-transfer-fsck 2>err &&
+ test_grep "missingEmail" err
+'
+
test_expect_success 'git bundle uses expected default format' '
git bundle create bundle HEAD^.. &&
cat >expect <<-EOF &&
diff --git a/transport.c b/transport.c
index 1b3d61ffcec..6cd5683bb45 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, VERIFY_BUNDLE_FSCK_ALWAYS);
+ &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v3 1/4] bundle-uri: verify oid before writing refs
2024-05-28 11:55 ` Patrick Steinhardt
@ 2024-05-30 8:32 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-05-30 8:32 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, Xing Xin
At 2024-05-28 19:55:15, "Patrick Steinhardt" <ps@pks.im> wrote:
[snip]
>> which means that the `raw_object_store->packed_git` data gets filled
>> in and ``packed_git_initialized` is set. This also means consecutive
>
>s/``/`/
Copy that.
[snip]
>
>> And a set of negotiation related tests for bundle-uri are added.
>
>s/And/Add/
The "And" is correct here, though I changed the commit message in the new
version. :)
[snip]
>As far as I can see there is no test that verifies the case where the
>bundle contains refs, but misses the objects to satisfy the refs. Can we
>craft such a bundle and exercise this new failure mode?
A new test is added in [PATCH v4 1/4], which shows it correctly refuses to
target a bundle reference to a non-existent object.
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re:Re: Re: [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling
2024-05-30 4:38 ` Patrick Steinhardt
@ 2024-05-30 8:46 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-05-30 8:46 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, Xing Xin
At 2024-05-30 12:38:49, "Patrick Steinhardt" <ps@pks.im> wrote:
[snip]
>> >
>> >Wouldn't this have been a natural fit for the new flag, e.g. via
>> >something like `VERIFY_BUNDLE_FSCK`?
>>
>> It makes sense to me. Currently, verify_bundle_flags controls the amount
>> of information displayed when checking a bundle's prerequisites. The
>> newly added unbundle_fsck_flags is designed to check for broken objects
>> during the unbundle process, which is essentially a form of bundle
>> verification. I believe we should extend some object verification
>> capabilities to the git bundle verify command as well, perhaps by adding
>> a --fsck-objects option.
>>
>> With this in mind, I support adding new options to verify_bundle_flags.
>> Since bundle.c:unbundle needs to combine multiple options, we must
>> define new options using bitwise shifting:
>>
>> enum verify_bundle_flags {
>> VERIFY_BUNDLE_VERBOSE = (1 << 0),
>> VERIFY_BUNDLE_QUIET = (1 << 1),
>> VERIFY_BUNDLE_FSCK_OBJECTS_ALWAYS = (1 << 2),
>> VERIFY_BUNDLE_FSCK_OBJECTS_FOLLOW_FETCH = (1 << 3),
>> };
>>
>> How about the naming? I'm not very good at naming :)
>
>I later noticed that you extend the `unbundle_fsck_flags` in a later
>patch. With that in mind I don't think it's all that important anymore
>to merge those into the `verify_bundle_flags` as you would otherwise
>allow for weirdness. What happens for example when both `ALWAYS` and
>`FOLLOW_FETCH` are set?
>
>So feel free to ignore this advice. If you still think it's a good idea
>then the above naming looks okay to me.
With the idea of extending "--fsck-objects" support for "git bundle verify" and
"git bundle unbundle", I prefer to grouping these options together. Especially
in the "git bundle verify" scenario, adding a new parameter like
`unbundle_fsck_flags` for `bundle.c:verify_bundle` is confusing.
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
2024-05-28 17:24 ` Junio C Hamano
2024-05-29 5:52 ` Patrick Steinhardt
@ 2024-05-30 8:48 ` Xing Xin
1 sibling, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-05-30 8:48 UTC (permalink / raw)
To: Junio C Hamano
Cc: Patrick Steinhardt, Xing Xin via GitGitGadget, git, Karthik Nayak,
Xing Xin
At 2024-05-29 01:24:35, "Junio C Hamano" <gitster@pobox.com> wrote:
[snip]
>The original before it was made into a helper function was written
>as a cascade of ?: operators, because it had to be a single
>expression. As the body of a helper function, we now can sprinkle
>multiple return statements in it. I think the way that is easiest
>to understand is
>
> /* the most specific, if specified */
> if (fetch_fsck_objects >= 0)
> return fetch_fsck_objects;
> /* the less specific, catch-all for both directions */
> if (transfer_fsck_objects >= 0)
> return transfer_fsck_objects;
> /* the fallback hardcoded default */
> return 0;
>
>without the /* comments */.
Applied, thanks!
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH
2024-05-28 12:05 ` Patrick Steinhardt
@ 2024-05-30 8:54 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-05-30 8:54 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, Xing Xin
At 2024-05-28 20:05:48, "Patrick Steinhardt" <ps@pks.im> wrote:
[snip]
>>
>> +test_expect_success 'clone bundle with fetch.fsckObjects' '
>> + test_create_repo bundle-fsck &&
>> + (
>> + cd bundle-fsck &&
>> + test_commit first &&
>> + cat >data <<-EOF &&
>> + tree $(git rev-parse HEAD^{tree})
>> + parent $(git rev-parse HEAD)
>> + author A U Thor
>> + committer A U Thor
>> +
>> + commit: this is a commit with bad emails
>> +
>> + EOF
>> + git hash-object --literally -t commit -w --stdin <data >commit &&
>> + git branch bad $(cat commit) &&
>> + git bundle create bad.bundle bad
>> + ) &&
>> + test_must_fail git -c fetch.fsckObjects=true \
>> + clone bundle-fsck/bad.bundle bundle-fsck-clone 2>err &&
>> + test_grep "missingEmail" err
>> +'
>
>Do we also want to have a test for `transfer.fsckObjects`?
Sure, some more combinations of "fetch.fsckObjects" and "transfer.fsckObjects"
are added in tests.
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v4 2/4] unbundle: extend verify_bundle_flags to support fsck-objects
2024-05-30 8:21 ` [PATCH v4 2/4] unbundle: extend verify_bundle_flags to support fsck-objects Xing Xin via GitGitGadget
@ 2024-06-06 12:06 ` Patrick Steinhardt
2024-06-11 6:46 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Patrick Steinhardt @ 2024-06-06 12:06 UTC (permalink / raw)
To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 318 bytes --]
On Thu, May 30, 2024 at 08:21:28AM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
Tiny nit: the important change in this commit is not that you wire up
the flag, but rather that we start to execute git-fsck(1) now. I'd thus
propose to adapt the commit title accordingly.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v4 4/4] unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH
2024-05-30 8:21 ` [PATCH v4 4/4] unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
@ 2024-06-06 12:06 ` Patrick Steinhardt
2024-06-11 6:46 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Patrick Steinhardt @ 2024-06-06 12:06 UTC (permalink / raw)
To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]
On Thu, May 30, 2024 at 08:21:30AM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
Same here, the important part is not that we introduce the flag, but
that we start using it in `unbundle_from_file()`.
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 066ff788104..e7ebac6ce57 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
> * the prerequisite commits.
> */
> if ((result = unbundle(r, &header, bundle_fd, NULL,
> - VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
> + VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
> return 1;
>
> /*
One thing that is a bit weird is that we first change `unbundle()` to
use `FSCK_ALWAYS` in a preceding patch, and then convert it to use
`FSCK_FOLLOW_FETCH` in the same series. It could be restructured a bit
to first introduce the flags, only, while not modifying any of the
callsites yet. Passing the respective flags would then be done in a
separate commit.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches
2024-05-30 8:21 ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
` (3 preceding siblings ...)
2024-05-30 8:21 ` [PATCH v4 4/4] unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
@ 2024-06-11 6:42 ` blanet via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
` (4 more replies)
4 siblings, 5 replies; 66+ messages in thread
From: blanet via GitGitGadget @ 2024-06-11 6:42 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet
While attempting to fix a reference negotiation bug in bundle-uri, we
identified that the fetch process lacks some crucial object validation
checks when processing bundles. The primary issues are:
1. In the bundle-uri scenario, object IDs were not validated before writing
bundle references. This was the root cause of the original negotiation
bug in bundle-uri and could lead to potential repository corruption.
2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
configurations were not applied when directly fetching bundles or
fetching with bundle-uri enabled. In fact, there were no object
validation supports for unbundle.
The first patch addresses the bundle-uri negotiation issue by removing the
REF_SKIP_OID_VERIFICATION flag when writing bundle references.
Patches 2 through 4 extend verify_bundle_flags for bundle.c:unbundle to add
support for object validation (fsck) in different scenarios, mainly
following the suggestions from Junio on the mailing list.
Xing Xin (4):
bundle-uri: verify oid before writing refs
fetch-pack: expose fsckObjects configuration logic
unbundle: extend options to support object verification
unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches
bundle-uri.c | 5 +-
bundle.c | 10 ++
bundle.h | 2 +
fetch-pack.c | 17 ++--
fetch-pack.h | 5 +
t/t5558-clone-bundle-uri.sh | 186 +++++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 33 +++++++
transport.c | 2 +-
8 files changed, 246 insertions(+), 14 deletions(-)
base-commit: b9cfe4845cb2562584837bc0101c0ab76490a239
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1730
Range-diff vs v4:
1: e958a3ab20c = 1: e958a3ab20c bundle-uri: verify oid before writing refs
3: 5ddc894c2c1 = 2: d21c236b8de fetch-pack: expose fsckObjects configuration logic
2: beb70735811 ! 3: 0a18d7839be unbundle: extend verify_bundle_flags to support fsck-objects
@@ Metadata
Author: Xing Xin <xingxin.xx@bytedance.com>
## Commit message ##
- unbundle: extend verify_bundle_flags to support fsck-objects
+ unbundle: extend options to support object verification
- This commit extends `verify_bundle_flags` by adding a new option
- `VERIFY_BUNDLE_FSCK_ALWAYS`, which enables checks for broken objects in
- `bundle.c:unbundle`. This option is now used as the default for fetches
- involving bundles, specifically by `transport.c:fetch_refs_from_bundle`
- for direct bundle fetches and by `bundle-uri.c:unbundle_from_file` for
- _bundle-uri_ enabled fetches.
+ This commit extends object verification support in `bundle.c:unbundle`
+ by adding two new options to `verify_bundle_flags`:
- Upcoming commits will introduce another option as a replacement that
- fits better with fetch operations. `VERIFY_BUNDLE_FSCK_ALWAYS` will be
- further used to add "--fsck-objects" support for "git bundle unbundle"
- and "git bundle verify".
+ - `VERIFY_BUNDLE_FSCK_ALWAYS` explicitly enables checks for broken
+ objects. It will be used to add "--fsck-objects" support for "git
+ bundle unbundle" in a separate series.
+ - `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is designed to be used during fetch
+ operations, specifically for direct bundle fetches and _bundle-uri_
+ enabled fetches. When enabled, `bundle.c:unbundle` invokes
+ `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable
+ checks for broken objects. Passing this flag during fetching will be
+ implemented in a subsequent commit.
+
+ Note that the option `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
+ `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
- ## bundle-uri.c ##
-@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
- * the prerequisite commits.
- */
- if ((result = unbundle(r, &header, bundle_fd, NULL,
-- VERIFY_BUNDLE_QUIET)))
-+ VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
- return 1;
-
- /*
-
## bundle.c ##
+@@
+ #include "list-objects-filter-options.h"
+ #include "connected.h"
+ #include "write-or-die.h"
++#include "fetch-pack.h"
+
+ static const char v2_bundle_signature[] = "# v2 git bundle\n";
+ static const char v3_bundle_signature[] = "# v3 git bundle\n";
+@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
+ enum verify_bundle_flags flags)
+ {
+ struct child_process ip = CHILD_PROCESS_INIT;
++ int fsck_objects = 0;
+
+ if (verify_bundle(r, header, flags))
+ return -1;
@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
+ if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
++ fsck_objects = 1;
++ else if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
++ fsck_objects = fetch_pack_fsck_objects();
++
++ if (fsck_objects)
+ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
@@ bundle.h: int create_bundle(struct repository *r, const char *path,
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
+ VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
++ VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 3),
};
int verify_bundle(struct repository *r, struct bundle_header *header,
-
- ## transport.c ##
-@@ transport.c: static int fetch_refs_from_bundle(struct transport *transport,
- if (!data->get_refs_from_bundle_called)
- get_refs_from_bundle_inner(transport);
- ret = unbundle(the_repository, &data->header, data->fd,
-- &extra_index_pack_args, 0);
-+ &extra_index_pack_args, VERIFY_BUNDLE_FSCK_ALWAYS);
- transport->hash_algo = data->header.hash_algo;
- return ret;
- }
4: 68b9bca9f8b ! 4: eb9f21f16b5 unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH
@@ Metadata
Author: Xing Xin <xingxin.xx@bytedance.com>
## Commit message ##
- unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH
+ unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches
- This commit introduces a new option `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` to
- `verify_bundle_flags`. In `bundle.c:unbundle`, this new option controls
- whether broken object checks should be enabled by invoking
- `fetch-pack.c:fetch_pack_fsck_objects`. Note that the option
- `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
- `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
-
- This flag is now used in the fetching process by:
+ This commit passes `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` to `unbundle` in
+ the fetching process, including:
- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
@@ Commit message
fetch operations. Tests have been added to confirm functionality in the
scenarios mentioned above.
+ Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
## bundle-uri.c ##
@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
-- VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
+- VERIFY_BUNDLE_QUIET)))
+ VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
return 1;
/*
- ## bundle.c ##
-@@
- #include "list-objects-filter-options.h"
- #include "connected.h"
- #include "write-or-die.h"
-+#include "fetch-pack.h"
-
- static const char v2_bundle_signature[] = "# v2 git bundle\n";
- static const char v3_bundle_signature[] = "# v3 git bundle\n";
-@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
- enum verify_bundle_flags flags)
- {
- struct child_process ip = CHILD_PROCESS_INIT;
-+ int fsck_objects = 0;
-
- if (verify_bundle(r, header, flags))
- return -1;
-@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
- strvec_push(&ip.args, "--promisor=from-bundle");
-
- if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
-+ fsck_objects = 1;
-+ else if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
-+ fsck_objects = fetch_pack_fsck_objects();
-+
-+ if (fsck_objects)
- strvec_push(&ip.args, "--fsck-objects");
-
- if (extra_index_pack_args) {
-
- ## bundle.h ##
-@@ bundle.h: enum verify_bundle_flags {
- VERIFY_BUNDLE_VERBOSE = (1 << 0),
- VERIFY_BUNDLE_QUIET = (1 << 1),
- VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
-+ VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 3),
- };
-
- int verify_bundle(struct repository *r, struct bundle_header *header,
-
## t/t5558-clone-bundle-uri.sh ##
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'create bundle' '
git bundle create B.bundle topic &&
@@ transport.c: static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
-- &extra_index_pack_args, VERIFY_BUNDLE_FSCK_ALWAYS);
+- &extra_index_pack_args, 0);
+ &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
transport->hash_algo = data->header.hash_algo;
return ret;
--
gitgitgadget
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v5 1/4] bundle-uri: verify oid before writing refs
2024-06-11 6:42 ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
@ 2024-06-11 6:42 ` Xing Xin via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 2/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
` (3 subsequent siblings)
4 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-11 6:42 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the
"creationTokens" heuristic.
To reproduce this issue, consider a repository with a single branch
"main" pointing to commit "A". Firstly, create a base bundle with:
git bundle create base.bundle main
Then, add a new commit "B" on top of "A", and create an incremental
bundle for "main":
git bundle create incr.bundle A..main
Now, generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above should result in a reference
"refs/bundles/main" pointing to "B" in the new repository. However, git
would still download everything from the server, as if it had fetched
nothing locally.
So why the "refs/bundles/main" is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
`bundle-uri.c:download_bundle_list` or via
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
is called by `bundle-uri.c:unbundle_all_bundles` or called within
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
3. To get all prerequisites of the bundle, the bundle header is read
inside `bundle-uri.c:unbundle_from_file` to by calling
`bundle.c:read_bundle_header`.
4. Then it calls `bundle.c:unbundle`, which calls
`bundle.c:verify_bundle` to ensure the repository contains all the
prerequisites.
5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
invokes `packfile.c:prepare_packed_git` or
`packfile.c:reprepare_packed_git`, filling
`raw_object_store->packed_git` and setting `packed_git_initialized`.
6. If `bundle.c:unbundle` succeeds, it writes refs via
`refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
bundle refs which can target arbitrary objects are written to the
repository.
7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
`fetch-pack.c:mark_complete_and_common_ref` and
`fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
prevents `packfile.c:reprepare_packed_git` from being called,
resulting in failures to parse OIDs that reside only in the latest
bundle.
In the example above, when unbunding "incr.bundle", "base.pack" is added
to `packed_git` due to prerequisites verification. However, "B" cannot
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.
This commit fixes the bug by removing `REF_SKIP_OID_VERIFICATION` flag
when writing bundle refs. When `refs.c:refs_update_ref` is called to to
write the corresponding bundle refs, it triggers
`refs.c:ref_transaction_commit`. This, in turn, invokes
`refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of
the refs storage backend. For files backend, this function is
`files-backend.c:files_transaction_prepare`, and for reftable backend,
it is `reftable-backend.c:reftable_be_transaction_prepare`. Both
functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.
A test has been added to demonstrate that bundles with incorrect
headers, where refs point to non-existent objects, do not result in any
bundle refs being created in the repository. Additionally, a set of
negotiation-related tests for fetching with bundle-uri has been
included.
Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 3 +-
t/t5558-clone-bundle-uri.sh | 153 +++++++++++++++++++++++++++++++++++-
2 files changed, 150 insertions(+), 6 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
refs_update_ref(get_main_ref_store(the_repository),
"fetched bundle", bundle_ref.buf, oid,
has_old ? &old_oid : NULL,
- REF_SKIP_OID_VERIFICATION,
- UPDATE_REFS_MSG_ON_ERR);
+ 0, UPDATE_REFS_MSG_ON_ERR);
}
bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..8f4f802e4f1 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -19,10 +19,19 @@ test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
git init clone-from &&
- git -C clone-from checkout -b topic &&
- test_commit -C clone-from A &&
- test_commit -C clone-from B &&
- git -C clone-from bundle create B.bundle topic
+ (
+ cd clone-from &&
+ git checkout -b topic &&
+
+ test_commit A &&
+ git bundle create A.bundle topic &&
+
+ test_commit B &&
+ git bundle create B.bundle topic &&
+
+ # Create a bundle with reference pointing to non-existent object.
+ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
+ )
'
test_expect_success 'clone with path bundle' '
@@ -33,6 +42,16 @@ test_expect_success 'clone with path bundle' '
test_cmp expect actual
'
+test_expect_success 'clone with bundle that has bad header' '
+ git clone --bundle-uri="clone-from/bad-header.bundle" \
+ clone-from clone-bad-header 2>err &&
+ # Write bundle ref fails, but clone can still proceed.
+ commit_b=$(git -C clone-from rev-parse B) &&
+ test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
+ git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
@@ -259,6 +278,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
! grep "refs/bundles/" refs
'
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="clone-from/A.bundle" \
+ clone-from nego-bundle-part &&
+ git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
+ grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=topic --no-tags \
+ --bundle-uri="clone-from/B.bundle" \
+ clone-from nego-bundle-all &&
+ git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-no-heuristic &&
+
+ git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-heuristic &&
+
+ git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=left --no-tags \
+ --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-all &&
+
+ git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
#########################################################################
# HTTP tests begin here
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v5 2/4] fetch-pack: expose fsckObjects configuration logic
2024-06-11 6:42 ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-06-11 6:42 ` Xing Xin via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 3/4] unbundle: extend options to support object verification Xing Xin via GitGitGadget
` (2 subsequent siblings)
4 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-11 6:42 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
Currently, we can use "transfer.fsckObjects" and the more specific
"fetch.fsckObjects" to control checks for broken objects in received
packs during fetches. However, these configurations were only
acknowledged by `fetch-pack.c:get_pack` and did not take effect in
direct bundle fetches and fetches with _bundle-uri_ enabled.
This commit exposes the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
new function is used to replace the assignment for `fsck_objects` in
`fetch-pack.c:get_pack`. In the next commit, it will also be used by
`bundle.c:unbundle` to better fit fetching scenarios.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
fetch-pack.c | 17 +++++++++++------
fetch-pack.h | 5 +++++
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 7d2aef21add..3acff2baf09 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
strvec_push(&cmd.args, alternate_shallow_file);
}
- if (fetch_fsck_objects >= 0
- ? fetch_fsck_objects
- : transfer_fsck_objects >= 0
- ? transfer_fsck_objects
- : 0)
- fsck_objects = 1;
+ fsck_objects = fetch_pack_fsck_objects();
if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
if (pack_lockfiles || fsck_objects)
@@ -2046,6 +2041,16 @@ static const struct object_id *iterate_ref_map(void *cb_data)
return &ref->old_oid;
}
+int fetch_pack_fsck_objects(void)
+{
+ fetch_pack_setup();
+ if (fetch_fsck_objects >= 0)
+ return fetch_fsck_objects;
+ if (transfer_fsck_objects >= 0)
+ return transfer_fsck_objects;
+ return 0;
+}
+
struct ref *fetch_pack(struct fetch_pack_args *args,
int fd[],
const struct ref *ref,
diff --git a/fetch-pack.h b/fetch-pack.h
index 6775d265175..b5c579cdae2 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -101,4 +101,9 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
*/
int report_unmatched_refs(struct ref **sought, int nr_sought);
+/*
+ * Return true if checks for broken objects in received pack are required.
+ */
+int fetch_pack_fsck_objects(void);
+
#endif
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v5 3/4] unbundle: extend options to support object verification
2024-06-11 6:42 ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 2/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-06-11 6:42 ` Xing Xin via GitGitGadget
2024-06-11 9:11 ` Patrick Steinhardt
2024-06-11 6:42 ` [PATCH v5 4/4] unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches Xing Xin via GitGitGadget
2024-06-11 12:45 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-11 6:42 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
This commit extends object verification support in `bundle.c:unbundle`
by adding two new options to `verify_bundle_flags`:
- `VERIFY_BUNDLE_FSCK_ALWAYS` explicitly enables checks for broken
objects. It will be used to add "--fsck-objects" support for "git
bundle unbundle" in a separate series.
- `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is designed to be used during fetch
operations, specifically for direct bundle fetches and _bundle-uri_
enabled fetches. When enabled, `bundle.c:unbundle` invokes
`fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable
checks for broken objects. Passing this flag during fetching will be
implemented in a subsequent commit.
Note that the option `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
`VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle.c | 10 ++++++++++
bundle.h | 2 ++
2 files changed, 12 insertions(+)
diff --git a/bundle.c b/bundle.c
index 95367c2d0a0..53ac73834ea 100644
--- a/bundle.c
+++ b/bundle.c
@@ -17,6 +17,7 @@
#include "list-objects-filter-options.h"
#include "connected.h"
#include "write-or-die.h"
+#include "fetch-pack.h"
static const char v2_bundle_signature[] = "# v2 git bundle\n";
static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -615,6 +616,7 @@ int unbundle(struct repository *r, struct bundle_header *header,
enum verify_bundle_flags flags)
{
struct child_process ip = CHILD_PROCESS_INIT;
+ int fsck_objects = 0;
if (verify_bundle(r, header, flags))
return -1;
@@ -625,6 +627,14 @@ int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
+ if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
+ fsck_objects = 1;
+ else if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
+ fsck_objects = fetch_pack_fsck_objects();
+
+ if (fsck_objects)
+ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index 021adbdcbb3..a39d8ea1a7e 100644
--- a/bundle.h
+++ b/bundle.h
@@ -33,6 +33,8 @@ int create_bundle(struct repository *r, const char *path,
enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
+ VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
+ VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 3),
};
int verify_bundle(struct repository *r, struct bundle_header *header,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v5 4/4] unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches
2024-06-11 6:42 ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
` (2 preceding siblings ...)
2024-06-11 6:42 ` [PATCH v5 3/4] unbundle: extend options to support object verification Xing Xin via GitGitGadget
@ 2024-06-11 6:42 ` Xing Xin via GitGitGadget
2024-06-11 12:45 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
4 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-11 6:42 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
This commit passes `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` to `unbundle` in
the fetching process, including:
- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
This addition ensures a consistent logic for object verification during
fetch operations. Tests have been added to confirm functionality in the
scenarios mentioned above.
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 2 +-
t/t5558-clone-bundle-uri.sh | 35 ++++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 33 +++++++++++++++++++++++++++++++++
transport.c | 2 +-
4 files changed, 69 insertions(+), 3 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 65666a11d9c..e7ebac6ce57 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET)))
+ VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
return 1;
/*
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 8f4f802e4f1..48be1b18802 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -30,7 +30,21 @@ test_expect_success 'create bundle' '
git bundle create B.bundle topic &&
# Create a bundle with reference pointing to non-existent object.
- sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
+ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle &&
+
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad-object.bundle bad &&
+ git update-ref -d refs/heads/bad
)
'
@@ -52,6 +66,25 @@ test_expect_success 'clone with bundle that has bad header' '
! grep "refs/bundles/" refs
'
+test_expect_success 'clone with bundle that has bad object' '
+ # Unbundle succeeds if no fsckObjects confugured.
+ git clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-no-fsck &&
+ git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/bad
+ EOF
+ test_cmp expect actual &&
+
+ # Unbundle fails with fsckObjects set true, but clone can still proceed.
+ git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-fsck 2>err &&
+ test_grep "missingEmail" err &&
+ git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
index 0d1e92d9963..5182efc0b45 100755
--- a/t/t5607-clone-bundle.sh
+++ b/t/t5607-clone-bundle.sh
@@ -138,6 +138,39 @@ test_expect_success 'fetch SHA-1 from bundle' '
git fetch --no-tags foo/tip.bundle "$(cat hash)"
'
+test_expect_success 'clone bundle with different fsckObjects configurations' '
+ test_create_repo bundle-fsck &&
+ (
+ cd bundle-fsck &&
+ test_commit first &&
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad.bundle bad
+ ) &&
+
+ git clone bundle-fsck/bad.bundle bundle-no-fsck &&
+
+ git -c fetch.fsckObjects=false -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-no-fsck &&
+
+ test_must_fail git -c fetch.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-fsck 2>err &&
+ test_grep "missingEmail" err &&
+
+ test_must_fail git -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-transfer-fsck 2>err &&
+ test_grep "missingEmail" err
+'
+
test_expect_success 'git bundle uses expected default format' '
git bundle create bundle HEAD^.. &&
cat >expect <<-EOF &&
diff --git a/transport.c b/transport.c
index 0ad04b77fd2..6cd5683bb45 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, 0);
+ &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v4 4/4] unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH
2024-06-06 12:06 ` Patrick Steinhardt
@ 2024-06-11 6:46 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-06-11 6:46 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, Xing Xin
At 2024-06-06 20:06:47, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Thu, May 30, 2024 at 08:21:30AM +0000, Xing Xin via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@bytedance.com>
>
>Same here, the important part is not that we introduce the flag, but
>that we start using it in `unbundle_from_file()`.
>
>> diff --git a/bundle-uri.c b/bundle-uri.c
>> index 066ff788104..e7ebac6ce57 100644
>> --- a/bundle-uri.c
>> +++ b/bundle-uri.c
>> @@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
>> * the prerequisite commits.
>> */
>> if ((result = unbundle(r, &header, bundle_fd, NULL,
>> - VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
>> + VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
>> return 1;
>>
>> /*
>
>One thing that is a bit weird is that we first change `unbundle()` to
>use `FSCK_ALWAYS` in a preceding patch, and then convert it to use
>`FSCK_FOLLOW_FETCH` in the same series. It could be restructured a bit
>to first introduce the flags, only, while not modifying any of the
>callsites yet. Passing the respective flags would then be done in a
>separate commit.
This makes sense to me, thanks!
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v4 2/4] unbundle: extend verify_bundle_flags to support fsck-objects
2024-06-06 12:06 ` Patrick Steinhardt
@ 2024-06-11 6:46 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-06-11 6:46 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, Xing Xin
At 2024-06-06 20:06:41, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Thu, May 30, 2024 at 08:21:28AM +0000, Xing Xin via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@bytedance.com>
>
>Tiny nit: the important change in this commit is not that you wire up
>the flag, but rather that we start to execute git-fsck(1) now. I'd thus
To be precise, it is adding a "--fsck-objects" flag to "git-index-pack".
>propose to adapt the commit title accordingly.
This commit is mapped to [PATCH v5 3/4] due to some adjustments to the
commit order and implementation details. Hope the new title can better
describe the new patch.
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v5 3/4] unbundle: extend options to support object verification
2024-06-11 6:42 ` [PATCH v5 3/4] unbundle: extend options to support object verification Xing Xin via GitGitGadget
@ 2024-06-11 9:11 ` Patrick Steinhardt
2024-06-11 12:47 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Patrick Steinhardt @ 2024-06-11 9:11 UTC (permalink / raw)
To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin
[-- Attachment #1: Type: text/plain, Size: 3306 bytes --]
On Tue, Jun 11, 2024 at 06:42:05AM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
>
> This commit extends object verification support in `bundle.c:unbundle`
> by adding two new options to `verify_bundle_flags`:
>
> - `VERIFY_BUNDLE_FSCK_ALWAYS` explicitly enables checks for broken
> objects. It will be used to add "--fsck-objects" support for "git
> bundle unbundle" in a separate series.
> - `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is designed to be used during fetch
> operations, specifically for direct bundle fetches and _bundle-uri_
> enabled fetches. When enabled, `bundle.c:unbundle` invokes
> `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable
> checks for broken objects. Passing this flag during fetching will be
> implemented in a subsequent commit.
>
> Note that the option `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
> `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
Thanks, the new sequence of commits is much easier to follow. It also
shows that there is no user of `VERIFY_BUNDLE_FSCK_ALWAYS` at the end of
this series. So maybe we should drop that flag?
If you do that, then I'd also propose to merge patches 2 and 3 into one
given that both are quite trivial and related to each other.
Other than that this series looks good to me.
Patrick
> Reviewed-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
> bundle.c | 10 ++++++++++
> bundle.h | 2 ++
> 2 files changed, 12 insertions(+)
>
> diff --git a/bundle.c b/bundle.c
> index 95367c2d0a0..53ac73834ea 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -17,6 +17,7 @@
> #include "list-objects-filter-options.h"
> #include "connected.h"
> #include "write-or-die.h"
> +#include "fetch-pack.h"
>
> static const char v2_bundle_signature[] = "# v2 git bundle\n";
> static const char v3_bundle_signature[] = "# v3 git bundle\n";
> @@ -615,6 +616,7 @@ int unbundle(struct repository *r, struct bundle_header *header,
> enum verify_bundle_flags flags)
> {
> struct child_process ip = CHILD_PROCESS_INIT;
> + int fsck_objects = 0;
>
> if (verify_bundle(r, header, flags))
> return -1;
> @@ -625,6 +627,14 @@ int unbundle(struct repository *r, struct bundle_header *header,
> if (header->filter.choice)
> strvec_push(&ip.args, "--promisor=from-bundle");
>
> + if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
> + fsck_objects = 1;
> + else if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
> + fsck_objects = fetch_pack_fsck_objects();
> +
> + if (fsck_objects)
> + strvec_push(&ip.args, "--fsck-objects");
> +
> if (extra_index_pack_args) {
> strvec_pushv(&ip.args, extra_index_pack_args->v);
> strvec_clear(extra_index_pack_args);
> diff --git a/bundle.h b/bundle.h
> index 021adbdcbb3..a39d8ea1a7e 100644
> --- a/bundle.h
> +++ b/bundle.h
> @@ -33,6 +33,8 @@ int create_bundle(struct repository *r, const char *path,
> enum verify_bundle_flags {
> VERIFY_BUNDLE_VERBOSE = (1 << 0),
> VERIFY_BUNDLE_QUIET = (1 << 1),
> + VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
> + VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 3),
> };
>
> int verify_bundle(struct repository *r, struct bundle_header *header,
> --
> gitgitgadget
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches
2024-06-11 6:42 ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
` (3 preceding siblings ...)
2024-06-11 6:42 ` [PATCH v5 4/4] unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches Xing Xin via GitGitGadget
@ 2024-06-11 12:45 ` blanet via GitGitGadget
2024-06-11 12:45 ` [PATCH v6 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
` (4 more replies)
4 siblings, 5 replies; 66+ messages in thread
From: blanet via GitGitGadget @ 2024-06-11 12:45 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet
While attempting to fix a reference negotiation bug in bundle-uri, we
identified that the fetch process lacks some crucial object validation
checks when processing bundles. The primary issues are:
1. In the bundle-uri scenario, object IDs were not validated before writing
bundle references. This was the root cause of the original negotiation
bug in bundle-uri and could lead to potential repository corruption.
2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
configurations were not applied when directly fetching bundles or
fetching with bundle-uri enabled. In fact, there were no object
validation supports for unbundle.
The first patch addresses the bundle-uri negotiation issue by removing the
REF_SKIP_OID_VERIFICATION flag when writing bundle references.
Patches 2 through 3 extend verify_bundle_flags for bundle.c:unbundle to add
support for object validation (fsck) in fetch scenarios, mainly following
the suggestions from Junio and Patrick on the mailing list.
Xing Xin (3):
bundle-uri: verify oid before writing refs
fetch-pack: expose fsckObjects configuration logic
unbundle: support object verification for fetches
bundle-uri.c | 5 +-
bundle.c | 5 +
bundle.h | 1 +
fetch-pack.c | 17 ++--
fetch-pack.h | 5 +
t/t5558-clone-bundle-uri.sh | 186 +++++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 33 +++++++
transport.c | 2 +-
8 files changed, 240 insertions(+), 14 deletions(-)
base-commit: b9cfe4845cb2562584837bc0101c0ab76490a239
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/1730
Range-diff vs v5:
1: e958a3ab20c = 1: e958a3ab20c bundle-uri: verify oid before writing refs
2: d21c236b8de = 2: d21c236b8de fetch-pack: expose fsckObjects configuration logic
3: 0a18d7839be < -: ----------- unbundle: extend options to support object verification
4: eb9f21f16b5 ! 3: 53395e8c08a unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches
@@ Metadata
Author: Xing Xin <xingxin.xx@bytedance.com>
## Commit message ##
- unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches
+ unbundle: support object verification for fetches
- This commit passes `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` to `unbundle` in
- the fetching process, including:
+ This commit extends object verification support for fetches in
+ `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
+ option to `verify_bundle_flags`. When this option is enabled,
+ `bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
+ determine whether to append the "--fsck-objects" flag to
+ "git-index-pack".
+
+ `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is now passed to `unbundle` in the
+ fetching process, including:
- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
/*
+ ## bundle.c ##
+@@
+ #include "list-objects-filter-options.h"
+ #include "connected.h"
+ #include "write-or-die.h"
++#include "fetch-pack.h"
+
+ static const char v2_bundle_signature[] = "# v2 git bundle\n";
+ static const char v3_bundle_signature[] = "# v3 git bundle\n";
+@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
+ if (header->filter.choice)
+ strvec_push(&ip.args, "--promisor=from-bundle");
+
++ if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
++ if (fetch_pack_fsck_objects())
++ strvec_push(&ip.args, "--fsck-objects");
++
+ if (extra_index_pack_args) {
+ strvec_pushv(&ip.args, extra_index_pack_args->v);
+ strvec_clear(extra_index_pack_args);
+
+ ## bundle.h ##
+@@ bundle.h: int create_bundle(struct repository *r, const char *path,
+ enum verify_bundle_flags {
+ VERIFY_BUNDLE_VERBOSE = (1 << 0),
+ VERIFY_BUNDLE_QUIET = (1 << 1),
++ VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 2),
+ };
+
+ int verify_bundle(struct repository *r, struct bundle_header *header,
+
## t/t5558-clone-bundle-uri.sh ##
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'create bundle' '
git bundle create B.bundle topic &&
--
gitgitgadget
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v6 1/3] bundle-uri: verify oid before writing refs
2024-06-11 12:45 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
@ 2024-06-11 12:45 ` Xing Xin via GitGitGadget
2024-06-11 19:08 ` Junio C Hamano
2024-06-11 12:45 ` [PATCH v6 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
` (3 subsequent siblings)
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-11 12:45 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the
"creationTokens" heuristic.
To reproduce this issue, consider a repository with a single branch
"main" pointing to commit "A". Firstly, create a base bundle with:
git bundle create base.bundle main
Then, add a new commit "B" on top of "A", and create an incremental
bundle for "main":
git bundle create incr.bundle A..main
Now, generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above should result in a reference
"refs/bundles/main" pointing to "B" in the new repository. However, git
would still download everything from the server, as if it had fetched
nothing locally.
So why the "refs/bundles/main" is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
`bundle-uri.c:download_bundle_list` or via
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
is called by `bundle-uri.c:unbundle_all_bundles` or called within
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
3. To get all prerequisites of the bundle, the bundle header is read
inside `bundle-uri.c:unbundle_from_file` to by calling
`bundle.c:read_bundle_header`.
4. Then it calls `bundle.c:unbundle`, which calls
`bundle.c:verify_bundle` to ensure the repository contains all the
prerequisites.
5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
invokes `packfile.c:prepare_packed_git` or
`packfile.c:reprepare_packed_git`, filling
`raw_object_store->packed_git` and setting `packed_git_initialized`.
6. If `bundle.c:unbundle` succeeds, it writes refs via
`refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
bundle refs which can target arbitrary objects are written to the
repository.
7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
`fetch-pack.c:mark_complete_and_common_ref` and
`fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
prevents `packfile.c:reprepare_packed_git` from being called,
resulting in failures to parse OIDs that reside only in the latest
bundle.
In the example above, when unbunding "incr.bundle", "base.pack" is added
to `packed_git` due to prerequisites verification. However, "B" cannot
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.
This commit fixes the bug by removing `REF_SKIP_OID_VERIFICATION` flag
when writing bundle refs. When `refs.c:refs_update_ref` is called to to
write the corresponding bundle refs, it triggers
`refs.c:ref_transaction_commit`. This, in turn, invokes
`refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of
the refs storage backend. For files backend, this function is
`files-backend.c:files_transaction_prepare`, and for reftable backend,
it is `reftable-backend.c:reftable_be_transaction_prepare`. Both
functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.
A test has been added to demonstrate that bundles with incorrect
headers, where refs point to non-existent objects, do not result in any
bundle refs being created in the repository. Additionally, a set of
negotiation-related tests for fetching with bundle-uri has been
included.
Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 3 +-
t/t5558-clone-bundle-uri.sh | 153 +++++++++++++++++++++++++++++++++++-
2 files changed, 150 insertions(+), 6 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
refs_update_ref(get_main_ref_store(the_repository),
"fetched bundle", bundle_ref.buf, oid,
has_old ? &old_oid : NULL,
- REF_SKIP_OID_VERIFICATION,
- UPDATE_REFS_MSG_ON_ERR);
+ 0, UPDATE_REFS_MSG_ON_ERR);
}
bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..8f4f802e4f1 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -19,10 +19,19 @@ test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
git init clone-from &&
- git -C clone-from checkout -b topic &&
- test_commit -C clone-from A &&
- test_commit -C clone-from B &&
- git -C clone-from bundle create B.bundle topic
+ (
+ cd clone-from &&
+ git checkout -b topic &&
+
+ test_commit A &&
+ git bundle create A.bundle topic &&
+
+ test_commit B &&
+ git bundle create B.bundle topic &&
+
+ # Create a bundle with reference pointing to non-existent object.
+ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
+ )
'
test_expect_success 'clone with path bundle' '
@@ -33,6 +42,16 @@ test_expect_success 'clone with path bundle' '
test_cmp expect actual
'
+test_expect_success 'clone with bundle that has bad header' '
+ git clone --bundle-uri="clone-from/bad-header.bundle" \
+ clone-from clone-bad-header 2>err &&
+ # Write bundle ref fails, but clone can still proceed.
+ commit_b=$(git -C clone-from rev-parse B) &&
+ test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
+ git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
@@ -259,6 +278,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
! grep "refs/bundles/" refs
'
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="clone-from/A.bundle" \
+ clone-from nego-bundle-part &&
+ git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
+ grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+ test_when_finished rm -rf trace*.txt &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=topic --no-tags \
+ --bundle-uri="clone-from/B.bundle" \
+ clone-from nego-bundle-all &&
+ git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/topic
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-no-heuristic &&
+
+ git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-heuristic &&
+
+ git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+ test_when_finished rm -f trace*.txt &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=left --no-tags \
+ --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-all &&
+
+ git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
#########################################################################
# HTTP tests begin here
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v6 2/3] fetch-pack: expose fsckObjects configuration logic
2024-06-11 12:45 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11 12:45 ` [PATCH v6 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-06-11 12:45 ` Xing Xin via GitGitGadget
2024-06-11 19:20 ` Junio C Hamano
2024-06-11 12:45 ` [PATCH v6 3/3] unbundle: support object verification for fetches Xing Xin via GitGitGadget
` (2 subsequent siblings)
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-11 12:45 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
Currently, we can use "transfer.fsckObjects" and the more specific
"fetch.fsckObjects" to control checks for broken objects in received
packs during fetches. However, these configurations were only
acknowledged by `fetch-pack.c:get_pack` and did not take effect in
direct bundle fetches and fetches with _bundle-uri_ enabled.
This commit exposes the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
new function is used to replace the assignment for `fsck_objects` in
`fetch-pack.c:get_pack`. In the next commit, it will also be used by
`bundle.c:unbundle` to better fit fetching scenarios.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
fetch-pack.c | 17 +++++++++++------
fetch-pack.h | 5 +++++
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 7d2aef21add..3acff2baf09 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
strvec_push(&cmd.args, alternate_shallow_file);
}
- if (fetch_fsck_objects >= 0
- ? fetch_fsck_objects
- : transfer_fsck_objects >= 0
- ? transfer_fsck_objects
- : 0)
- fsck_objects = 1;
+ fsck_objects = fetch_pack_fsck_objects();
if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
if (pack_lockfiles || fsck_objects)
@@ -2046,6 +2041,16 @@ static const struct object_id *iterate_ref_map(void *cb_data)
return &ref->old_oid;
}
+int fetch_pack_fsck_objects(void)
+{
+ fetch_pack_setup();
+ if (fetch_fsck_objects >= 0)
+ return fetch_fsck_objects;
+ if (transfer_fsck_objects >= 0)
+ return transfer_fsck_objects;
+ return 0;
+}
+
struct ref *fetch_pack(struct fetch_pack_args *args,
int fd[],
const struct ref *ref,
diff --git a/fetch-pack.h b/fetch-pack.h
index 6775d265175..b5c579cdae2 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -101,4 +101,9 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
*/
int report_unmatched_refs(struct ref **sought, int nr_sought);
+/*
+ * Return true if checks for broken objects in received pack are required.
+ */
+int fetch_pack_fsck_objects(void);
+
#endif
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v6 3/3] unbundle: support object verification for fetches
2024-06-11 12:45 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11 12:45 ` [PATCH v6 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-11 12:45 ` [PATCH v6 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-06-11 12:45 ` Xing Xin via GitGitGadget
2024-06-11 20:05 ` Junio C Hamano
2024-06-11 13:14 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches Patrick Steinhardt
2024-06-17 13:55 ` [PATCH v7 " blanet via GitGitGadget
4 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-11 12:45 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
This commit extends object verification support for fetches in
`bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
option to `verify_bundle_flags`. When this option is enabled,
`bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
determine whether to append the "--fsck-objects" flag to
"git-index-pack".
`VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is now passed to `unbundle` in the
fetching process, including:
- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
This addition ensures a consistent logic for object verification during
fetch operations. Tests have been added to confirm functionality in the
scenarios mentioned above.
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 2 +-
bundle.c | 5 +++++
bundle.h | 1 +
t/t5558-clone-bundle-uri.sh | 35 ++++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 33 +++++++++++++++++++++++++++++++++
transport.c | 2 +-
6 files changed, 75 insertions(+), 3 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 65666a11d9c..e7ebac6ce57 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET)))
+ VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
return 1;
/*
diff --git a/bundle.c b/bundle.c
index 95367c2d0a0..20bbfffbb43 100644
--- a/bundle.c
+++ b/bundle.c
@@ -17,6 +17,7 @@
#include "list-objects-filter-options.h"
#include "connected.h"
#include "write-or-die.h"
+#include "fetch-pack.h"
static const char v2_bundle_signature[] = "# v2 git bundle\n";
static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -625,6 +626,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
+ if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
+ if (fetch_pack_fsck_objects())
+ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index 021adbdcbb3..545df6e9d40 100644
--- a/bundle.h
+++ b/bundle.h
@@ -33,6 +33,7 @@ int create_bundle(struct repository *r, const char *path,
enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
+ VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 2),
};
int verify_bundle(struct repository *r, struct bundle_header *header,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 8f4f802e4f1..48be1b18802 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -30,7 +30,21 @@ test_expect_success 'create bundle' '
git bundle create B.bundle topic &&
# Create a bundle with reference pointing to non-existent object.
- sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
+ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle &&
+
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad-object.bundle bad &&
+ git update-ref -d refs/heads/bad
)
'
@@ -52,6 +66,25 @@ test_expect_success 'clone with bundle that has bad header' '
! grep "refs/bundles/" refs
'
+test_expect_success 'clone with bundle that has bad object' '
+ # Unbundle succeeds if no fsckObjects confugured.
+ git clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-no-fsck &&
+ git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/bad
+ EOF
+ test_cmp expect actual &&
+
+ # Unbundle fails with fsckObjects set true, but clone can still proceed.
+ git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-fsck 2>err &&
+ test_grep "missingEmail" err &&
+ git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
index 0d1e92d9963..5182efc0b45 100755
--- a/t/t5607-clone-bundle.sh
+++ b/t/t5607-clone-bundle.sh
@@ -138,6 +138,39 @@ test_expect_success 'fetch SHA-1 from bundle' '
git fetch --no-tags foo/tip.bundle "$(cat hash)"
'
+test_expect_success 'clone bundle with different fsckObjects configurations' '
+ test_create_repo bundle-fsck &&
+ (
+ cd bundle-fsck &&
+ test_commit first &&
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad.bundle bad
+ ) &&
+
+ git clone bundle-fsck/bad.bundle bundle-no-fsck &&
+
+ git -c fetch.fsckObjects=false -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-no-fsck &&
+
+ test_must_fail git -c fetch.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-fsck 2>err &&
+ test_grep "missingEmail" err &&
+
+ test_must_fail git -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-transfer-fsck 2>err &&
+ test_grep "missingEmail" err
+'
+
test_expect_success 'git bundle uses expected default format' '
git bundle create bundle HEAD^.. &&
cat >expect <<-EOF &&
diff --git a/transport.c b/transport.c
index 0ad04b77fd2..6cd5683bb45 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, 0);
+ &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v5 3/4] unbundle: extend options to support object verification
2024-06-11 9:11 ` Patrick Steinhardt
@ 2024-06-11 12:47 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-06-11 12:47 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, Xing Xin
At 2024-06-11 17:11:09, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Tue, Jun 11, 2024 at 06:42:05AM +0000, Xing Xin via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@bytedance.com>
>>
>> This commit extends object verification support in `bundle.c:unbundle`
>> by adding two new options to `verify_bundle_flags`:
>>
>> - `VERIFY_BUNDLE_FSCK_ALWAYS` explicitly enables checks for broken
>> objects. It will be used to add "--fsck-objects" support for "git
>> bundle unbundle" in a separate series.
>> - `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is designed to be used during fetch
>> operations, specifically for direct bundle fetches and _bundle-uri_
>> enabled fetches. When enabled, `bundle.c:unbundle` invokes
>> `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable
>> checks for broken objects. Passing this flag during fetching will be
>> implemented in a subsequent commit.
>>
>> Note that the option `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
>> `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
>
>Thanks, the new sequence of commits is much easier to follow. It also
>shows that there is no user of `VERIFY_BUNDLE_FSCK_ALWAYS` at the end of
>this series. So maybe we should drop that flag?
OK, let's focus on the fetching scenarios in this patch series.
>If you do that, then I'd also propose to merge patches 2 and 3 into one
>given that both are quite trivial and related to each other.
I merged patches 3 and 4 instead to combine the new option definition and usage
logic, as the two are more closely related and more trivial. :)
Xing Xin
>Other than that this series looks good to me.
>
>Patrick
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches
2024-06-11 12:45 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
` (2 preceding siblings ...)
2024-06-11 12:45 ` [PATCH v6 3/3] unbundle: support object verification for fetches Xing Xin via GitGitGadget
@ 2024-06-11 13:14 ` Patrick Steinhardt
2024-06-17 13:55 ` [PATCH v7 " blanet via GitGitGadget
4 siblings, 0 replies; 66+ messages in thread
From: Patrick Steinhardt @ 2024-06-11 13:14 UTC (permalink / raw)
To: blanet via GitGitGadget; +Cc: git, Karthik Nayak, blanet
[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]
On Tue, Jun 11, 2024 at 12:45:40PM +0000, blanet via GitGitGadget wrote:
> While attempting to fix a reference negotiation bug in bundle-uri, we
> identified that the fetch process lacks some crucial object validation
> checks when processing bundles. The primary issues are:
>
> 1. In the bundle-uri scenario, object IDs were not validated before writing
> bundle references. This was the root cause of the original negotiation
> bug in bundle-uri and could lead to potential repository corruption.
> 2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
> configurations were not applied when directly fetching bundles or
> fetching with bundle-uri enabled. In fact, there were no object
> validation supports for unbundle.
>
> The first patch addresses the bundle-uri negotiation issue by removing the
> REF_SKIP_OID_VERIFICATION flag when writing bundle references.
>
> Patches 2 through 3 extend verify_bundle_flags for bundle.c:unbundle to add
> support for object validation (fsck) in fetch scenarios, mainly following
> the suggestions from Junio and Patrick on the mailing list.
Thanks, this version looks good to me.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 1/3] bundle-uri: verify oid before writing refs
2024-06-11 12:45 ` [PATCH v6 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-06-11 19:08 ` Junio C Hamano
2024-06-17 13:53 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2024-06-11 19:08 UTC (permalink / raw)
To: Xing Xin via GitGitGadget
Cc: git, Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:
> This commit fixes the bug by removing `REF_SKIP_OID_VERIFICATION` flag
"Fix the bug by ...".
> when writing bundle refs. When `refs.c:refs_update_ref` is called to to
"to to"
> write the corresponding bundle refs, it triggers
> `refs.c:ref_transaction_commit`. This, in turn, invokes
> `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of
> the refs storage backend. For files backend, this function is
> `files-backend.c:files_transaction_prepare`, and for reftable backend,
> it is `reftable-backend.c:reftable_be_transaction_prepare`. Both
> functions eventually call `object.c:parse_object`, which can invoke
> `packfile.c:reprepare_packed_git` to refresh `packed_git`.
Nice. That does sound like the right fix. The one who did
something to _require_ us to reprepare causes the packed_git
list refreshed.
> test_expect_success 'create bundle' '
> git init clone-from &&
> - git -C clone-from checkout -b topic &&
> - test_commit -C clone-from A &&
> - test_commit -C clone-from B &&
> - git -C clone-from bundle create B.bundle topic
> + (
> + cd clone-from &&
> + git checkout -b topic &&
> +
> + test_commit A &&
> + git bundle create A.bundle topic &&
> +
> + test_commit B &&
> + git bundle create B.bundle topic &&
> +
> + # Create a bundle with reference pointing to non-existent object.
> + sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
I suspect that this would be terribly unportable. The early part of
a bundle file may be text and sed may be able to grok but are you
sure everybody's implementation of sed would not barf (or even
worse, corrupt) the pack stream data that follows?
The code used in t/lib-bundle.sh:convert_bundle_to_pack() has been
in use since 8315588b (bundle: fix wrong check of read_header()'s
return value & add tests, 2007-03-06), so munging the bundle with
a code similar to it may have a better portability story.
Add something like:
corrupt_bundle_header () {
sed -e '/^$/q' "$@"
cat
}
to t/lib-bundle.sh, which can take an arbitrary sequence of command
line parameters to drive "sed", and can be used like so:
corrupt_bundle_header \
-e "s/^$(git rev-parse A) /$(git rev-parse B) /" \
<A.bndl >B.bndl
perhaps?
> @@ -33,6 +42,16 @@ test_expect_success 'clone with path bundle' '
> test_cmp expect actual
> '
>
> +test_expect_success 'clone with bundle that has bad header' '
> + git clone --bundle-uri="clone-from/bad-header.bundle" \
> + clone-from clone-bad-header 2>err &&
> + # Write bundle ref fails, but clone can still proceed.
> + commit_b=$(git -C clone-from rev-parse B) &&
> + test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
> + git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
> + ! grep "refs/bundles/" refs
> +'
> +
So this is the test the proposed log message discussed. The
description gave a false impression that the "broken header" test
that has not much to do with the bug being fixed was the only added
test---we probably want to correct that impression.
> @@ -259,6 +278,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
> ! grep "refs/bundles/" refs
> '
>
> +#########################################################################
> +# Clone negotiation related tests begin here
Drop this divider and comment. The HTTP tests you see below has a
much better reason to be separated like that in order to warn test
writers (they shouldn't add their random new tests after that point,
because everything after that one is skipped when HTTPD tests are
disabled---see the beginning of t/lib-httpd.sh which is included
after that divider line), but everything here you added is not
special. Everybody should run these tests.
> +test_expect_success 'negotiation: bundle with part of wanted commits' '
> + test_when_finished rm -rf trace*.txt &&
Do not overly depend on glob not matching at this point when you
establish the when-finished handler (or glob matching the files you
want to catch and later test not adding anything you would want to
clean). Quote "rm -f trace*.txt" and drop "r" if you do not absolutely
need it (and I would imagine you don't, given the .txt suffix).
> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> + git clone --no-local --bundle-uri="clone-from/A.bundle" \
> + clone-from nego-bundle-part &&
> + git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
> + grep "refs/bundles/" refs >actual &&
> + cat >expect <<-\EOF &&
> + refs/bundles/topic
> + EOF
Hmph, if the expected pattern is only a few lines without any magic,
test_write_lines >expect refs/bundles/topic &&
may be easier to follow.
> + test_cmp expect actual &&
> + # Ensure that refs/bundles/topic are sent as "have".
> + grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
> +'
Using "test_grep" would make it easier to diagnose when test breaks.
A failing "grep" will be silent (i.e., "I didn't find anything you
told me to look for"). A failing "test_grep" will tell you "I was
told to find this, but didn't find any in that".
> +test_expect_success 'negotiation: bundle with all wanted commits' '
> + test_when_finished rm -rf trace*.txt &&
> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> + git clone --no-local --single-branch --branch=topic --no-tags \
> + --bundle-uri="clone-from/B.bundle" \
> + clone-from nego-bundle-all &&
> + git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
> + grep "refs/bundles/" refs >actual &&
> + cat >expect <<-\EOF &&
> + refs/bundles/topic
> + EOF
> + test_cmp expect actual &&
> + # We already have all needed commits so no "want" needed.
> + ! grep "clone> want " trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list (no heuristic)' '
> + test_when_finished rm -f trace*.txt &&
> + cat >bundle-list <<-EOF &&
> + [bundle]
> + version = 1
> + mode = all
> +
> + [bundle "bundle-1"]
> + uri = file://$(pwd)/clone-from/bundle-1.bundle
> +
> + [bundle "bundle-2"]
> + uri = file://$(pwd)/clone-from/bundle-2.bundle
> + EOF
OK. This is a good use of here-doc (as opposed to test_write_lines
I sugested earlier for different purposes). I wondered if these
$(pwd) and file://$(pwd) are safe (I always get confused by the need
to sometimes use $PWD to help Windows), but I see them used in what
Derrick wrote in this file, so they must be fine.
But there may be characters in the leading part of $(pwd) that we do
not control that needs quoting (like a double quote '"'). The value
of bundle.*.uri may need to be quoted a bit carefully. This is not
a new problem this patch introduces, so you do not have to rewrite
this part of the patch; I'll mark it with #leftoverbits here---the
idea being somebody else who is too bored can come back, see if it
is truly broken, and fix them after all dust settles.
Abusing "git config -f bundle-list" might be safer, e.g.
$ git config -f bundle.list bundle.bundle-1.uri \
'file:///home/abc"def/t/trash dir/clone-from/b1.bndl'
$ cat bundle.list
[bundle "bundle-1"]
uri = file:///home/abc\"def/t/trash dir/clone-from/b1.bndl
as you do not know what other garbage character is in $(pwd) part.
> + # We already have all needed commits so no "want" needed.
> + ! grep "clone> want " trace-packet.txt
Just FYI, to negate test_grep, use
test_grep ! "clone > want " trace-packet.txt
not
! test_grep "clone > want " trace-packet.txt ;# WRONG
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 2/3] fetch-pack: expose fsckObjects configuration logic
2024-06-11 12:45 ` [PATCH v6 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-06-11 19:20 ` Junio C Hamano
0 siblings, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2024-06-11 19:20 UTC (permalink / raw)
To: Xing Xin via GitGitGadget
Cc: git, Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:
> - if (fetch_fsck_objects >= 0
> - ? fetch_fsck_objects
> - : transfer_fsck_objects >= 0
> - ? transfer_fsck_objects
> - : 0)
> - fsck_objects = 1;
> + fsck_objects = fetch_pack_fsck_objects();
> ...
> +int fetch_pack_fsck_objects(void)
> +{
> + fetch_pack_setup();
OK, this one is designed to be as lightweight as possible when it
has already been called, so a potentially duplicated call would not
cause too much worries here.
> + if (fetch_fsck_objects >= 0)
> + return fetch_fsck_objects;
> + if (transfer_fsck_objects >= 0)
> + return transfer_fsck_objects;
> + return 0;
> +}
Much easier to follow. Nicely done.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v6 3/3] unbundle: support object verification for fetches
2024-06-11 12:45 ` [PATCH v6 3/3] unbundle: support object verification for fetches Xing Xin via GitGitGadget
@ 2024-06-11 20:05 ` Junio C Hamano
2024-06-12 18:33 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2024-06-11 20:05 UTC (permalink / raw)
To: Xing Xin via GitGitGadget
Cc: git, Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Xing Xin <xingxin.xx@bytedance.com>
>
> This commit extends object verification support for fetches in
> `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
> option to `verify_bundle_flags`. When this option is enabled,
> `bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
> determine whether to append the "--fsck-objects" flag to
> "git-index-pack".
Please start your proposed log message by stating what the perceived
problem without this patch in the current world is. Without it, you
cannot easily answer the most basic question: "why are we doing this?"
The name VERIFY_BUNDLE_FSCK_FOLLOW_FETCH does not read very well.
VERIFY_BUNDLE part is common across various flags, so what is
specific to the feature is "FSCK_FOLLOW_FETCH", and it is good that
we convey the fact that we do a bit more than the normal
VERIFY_BUNDLE (which is, to read the prerequisite headers and make
sure we have them in the sense that they are reachable from our
refs) with the word FSCK.
But is it necessary (or even a good idea) to limit its usability
with "FOLLOW_FETCH" (which does not look even grammatical)? Aren't
we closing the door to other folks who may want to do a more
thorough fsck-like checks in other codepaths by saying "if you are
not doing this immediately after you fetch, you are unwelcome to use
this flag"?
> `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is now passed to `unbundle` in the
> fetching process, including:
>
> - `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
> - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
>
> This addition ensures a consistent logic for object verification during
> fetch operations. Tests have been added to confirm functionality in the
> scenarios mentioned above.
>
> Reviewed-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
> bundle-uri.c | 2 +-
> bundle.c | 5 +++++
> bundle.h | 1 +
> t/t5558-clone-bundle-uri.sh | 35 ++++++++++++++++++++++++++++++++++-
> t/t5607-clone-bundle.sh | 33 +++++++++++++++++++++++++++++++++
> transport.c | 2 +-
> 6 files changed, 75 insertions(+), 3 deletions(-)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 65666a11d9c..e7ebac6ce57 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
> * the prerequisite commits.
> */
> if ((result = unbundle(r, &header, bundle_fd, NULL,
> - VERIFY_BUNDLE_QUIET)))
> + VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
> return 1;
OK (modulo the flag name).
> + if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
> + if (fetch_pack_fsck_objects())
> + strvec_push(&ip.args, "--fsck-objects");
> +
OK, that's quite straight-forward. We are running "index-pack
--fix-thin --stdin" and feeding the bundle data to it. We just tell
it to also work in the "--fsck-objects" mode.
> diff --git a/transport.c b/transport.c
> index 0ad04b77fd2..6cd5683bb45 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
> if (!data->get_refs_from_bundle_called)
> get_refs_from_bundle_inner(transport);
> ret = unbundle(the_repository, &data->header, data->fd,
> - &extra_index_pack_args, 0);
> + &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
OK.
I wonder if something like this is a potential follow-up topic
somebody may be interested in after the dust settles. That is
exactly why the name of this bit matters.
builtin/bundle.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git c/builtin/bundle.c w/builtin/bundle.c
index d5d41a8f67..eeb5963dcb 100644
--- c/builtin/bundle.c
+++ w/builtin/bundle.c
@@ -194,10 +194,13 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
int bundle_fd = -1;
int ret;
int progress = isatty(2);
+ int fsck_objects = 0;
struct option options[] = {
OPT_BOOL(0, "progress", &progress,
N_("show progress meter")),
+ OPT_BOOL(0, "fsck-objects", &fsck_objects,
+ N_("check the objects in the bundle")),
OPT_END()
};
char *bundle_file;
@@ -217,7 +220,8 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
_("Unbundling objects"), NULL);
ret = !!unbundle(the_repository, &header, bundle_fd,
- &extra_index_pack_args, 0) ||
+ &extra_index_pack_args,
+ fsck_objects ? VERIFY_BUNDLE_FSCK_FOLLOW_FETCH : 0) ||
list_bundle_refs(&header, argc, argv);
bundle_header_release(&header);
cleanup:
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v6 3/3] unbundle: support object verification for fetches
2024-06-11 20:05 ` Junio C Hamano
@ 2024-06-12 18:33 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-06-12 18:33 UTC (permalink / raw)
To: Junio C Hamano
Cc: Xing Xin via GitGitGadget, git, Patrick Steinhardt, Karthik Nayak,
Xing Xin
At 2024-06-12 04:05:35, "Junio C Hamano" <gitster@pobox.com> wrote:
>"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Xing Xin <xingxin.xx@bytedance.com>
>>
>> This commit extends object verification support for fetches in
>> `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
>> option to `verify_bundle_flags`. When this option is enabled,
>> `bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
>> determine whether to append the "--fsck-objects" flag to
>> "git-index-pack".
>
>Please start your proposed log message by stating what the perceived
>problem without this patch in the current world is. Without it, you
>cannot easily answer the most basic question: "why are we doing this?"
Got it.
>The name VERIFY_BUNDLE_FSCK_FOLLOW_FETCH does not read very well.
>VERIFY_BUNDLE part is common across various flags, so what is
>specific to the feature is "FSCK_FOLLOW_FETCH", and it is good that
>we convey the fact that we do a bit more than the normal
>VERIFY_BUNDLE (which is, to read the prerequisite headers and make
>sure we have them in the sense that they are reachable from our
>refs) with the word FSCK.
>
>But is it necessary (or even a good idea) to limit its usability
>with "FOLLOW_FETCH" (which does not look even grammatical)? Aren't
>we closing the door to other folks who may want to do a more
>thorough fsck-like checks in other codepaths by saying "if you are
>not doing this immediately after you fetch, you are unwelcome to use
>this flag"?
I initially considered adding another option VERIFY_BUNDLE_FSCK_ALWAYS
for other scenarios, which would take a higher priority than
VERIFY_BUNDLE_FSCK_FOLLOW_FETCH. However, that approach is also
confusing, as we would have two flags both controlling the fsck
behavior.
How about extending VERIFY_BUNDLE_FSCK and letting the callers decide
whether to add the flag for fscking?
In bundle.c, we can make a small change like:
@@ -625,6 +626,9 @@ int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
+ if (flags & VERIFY_BUNDLE_FSCK)
+ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
strvec_clear(extra_index_pack_args);
For example, in `bundle-uri.c:unbundle_from_file`, which is one of the
callers of unbundle, we can use `fetch_pack_fsck_objects` to decide
whether to add that option:
@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET)))
+ VERIFY_BUNDLE_QUIET | (fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0))))
return 1;
This approach should streamline the code while maintaining flexibility.
The follow-up patch you mentioned below should just work then, it is not
for now because we are touching the unwanted `fetch_pack_fsck_objects`
within `unbundle`.
Xing Xin
>I wonder if something like this is a potential follow-up topic
>somebody may be interested in after the dust settles. That is
>exactly why the name of this bit matters.
>
>
>
> builtin/bundle.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git c/builtin/bundle.c w/builtin/bundle.c
>index d5d41a8f67..eeb5963dcb 100644
>--- c/builtin/bundle.c
>+++ w/builtin/bundle.c
>@@ -194,10 +194,13 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
> int bundle_fd = -1;
> int ret;
> int progress = isatty(2);
>+ int fsck_objects = 0;
>
> struct option options[] = {
> OPT_BOOL(0, "progress", &progress,
> N_("show progress meter")),
>+ OPT_BOOL(0, "fsck-objects", &fsck_objects,
>+ N_("check the objects in the bundle")),
> OPT_END()
> };
> char *bundle_file;
>@@ -217,7 +220,8 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
> strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
> _("Unbundling objects"), NULL);
> ret = !!unbundle(the_repository, &header, bundle_fd,
>- &extra_index_pack_args, 0) ||
>+ &extra_index_pack_args,
>+ fsck_objects ? VERIFY_BUNDLE_FSCK_FOLLOW_FETCH : 0) ||
> list_bundle_refs(&header, argc, argv);
> bundle_header_release(&header);
> cleanup:
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v6 1/3] bundle-uri: verify oid before writing refs
2024-06-11 19:08 ` Junio C Hamano
@ 2024-06-17 13:53 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-06-17 13:53 UTC (permalink / raw)
To: Junio C Hamano
Cc: Xing Xin via GitGitGadget, git, Patrick Steinhardt, Karthik Nayak,
Xing Xin
At 2024-06-12 03:08:04, "Junio C Hamano" <gitster@pobox.com> wrote:
>"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> This commit fixes the bug by removing `REF_SKIP_OID_VERIFICATION` flag
>
>"Fix the bug by ...".
>
>> when writing bundle refs. When `refs.c:refs_update_ref` is called to to
>
>"to to"
Ah, always typos. :-)
[snip]
>> test_expect_success 'create bundle' '
>> git init clone-from &&
>> - git -C clone-from checkout -b topic &&
>> - test_commit -C clone-from A &&
>> - test_commit -C clone-from B &&
>> - git -C clone-from bundle create B.bundle topic
>> + (
>> + cd clone-from &&
>> + git checkout -b topic &&
>> +
>> + test_commit A &&
>> + git bundle create A.bundle topic &&
>> +
>> + test_commit B &&
>> + git bundle create B.bundle topic &&
>> +
>> + # Create a bundle with reference pointing to non-existent object.
>> + sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
>
>I suspect that this would be terribly unportable. The early part of
>a bundle file may be text and sed may be able to grok but are you
>sure everybody's implementation of sed would not barf (or even
>worse, corrupt) the pack stream data that follows?
>
>The code used in t/lib-bundle.sh:convert_bundle_to_pack() has been
>in use since 8315588b (bundle: fix wrong check of read_header()'s
>return value & add tests, 2007-03-06), so munging the bundle with
>a code similar to it may have a better portability story.
>
>Add something like:
>
> corrupt_bundle_header () {
> sed -e '/^$/q' "$@"
> cat
> }
>
>to t/lib-bundle.sh, which can take an arbitrary sequence of command
>line parameters to drive "sed", and can be used like so:
>
> corrupt_bundle_header \
> -e "s/^$(git rev-parse A) /$(git rev-parse B) /" \
> <A.bndl >B.bndl
>
>perhaps?
Thanks, I never knew sed could be used this way! It's so concise!
However, after applying these code, the added test "t5558.5 clone with
bundle that has bad header" fails in some CI jobs showing that we can
not get expected error. For example CI "linux-musl (alpine)" gives:
+ git clone '--bundle-uri=clone-from/bad-header.bundle' clone-from clone-bad-header
+ git -C clone-from rev-parse B
+ commit_b=d9df4505cb3522088b9e29d6051ac16f1564154a
+ test_grep 'trying to write ref '"'"'refs/bundles/topic'"'"' with nonexistent object d9df4505cb3522088b9e29d6051ac16f1564154a' err
+ eval 'last_arg=${2}'
+ last_arg=err
+ test -f err
+ test 2 -lt 2
+ test 'x!' '=' 'xtrying to write ref '"'"'refs/bundles/topic'"'"' with nonexistent object d9df4505cb3522088b9e29d6051ac16f1564154a'
+ test 'x!' '=' 'xtrying to write ref '"'"'refs/bundles/topic'"'"' with nonexistent object d9df4505cb3522088b9e29d6051ac16f1564154a'
+ grep 'trying to write ref '"'"'refs/bundles/topic'"'"' with nonexistent object d9df4505cb3522088b9e29d6051ac16f1564154a' err
+ echo 'error: '"'"'grep trying to write ref '"'"'refs/bundles/topic'"'"' with nonexistent object d9df4505cb3522088b9e29d6051ac16f1564154a' 'err'"'"' didn'"'"'t find a match in:'
error: 'grep trying to write ref 'refs/bundles/topic' with nonexistent object d9df4505cb3522088b9e29d6051ac16f1564154a err' didn't find a match in:
+ test -s err
+ cat err
Cloning into 'clone-bad-header'...
fatal: pack signature mismatch
error: index-pack died
done.
+ return 1
error: last command exited with $?=1
not ok 5 - clone with bundle that has bad header
+# Create a bundle with reference pointing to non-existent object.
+sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
+ <A.bundle >bad-header.bundle &&
+convert_bundle_to_pack \
+ <A.bundle >>bad-header.bundle
More details can be found at https://github.com/blanet/git/actions/runs/9541478191/job/26294731254.
After some digging, I discovered that inside the Alpine container,
`corrupt_bundle_header` is missing the leading "PACK\x00" in the pack
section of the converted bundle. This is likely caused by the
incompatibility of "sed" across different operating systems. I stopped
investigating further due to my unfamiliarity with containers and "sed"
itself. Instead, I found another approach that utilizes the suggested
usage of "sed" and `lib-bundle.sh:convert_bundle_to_pack`.
+# Create a bundle with reference pointing to non-existent object.
+sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
+ <A.bundle >bad-header.bundle &&
+convert_bundle_to_pack \
+ <A.bundle >>bad-header.bundle
>> @@ -33,6 +42,16 @@ test_expect_success 'clone with path bundle' '
>> test_cmp expect actual
>> '
>>
>> +test_expect_success 'clone with bundle that has bad header' '
>> + git clone --bundle-uri="clone-from/bad-header.bundle" \
>> + clone-from clone-bad-header 2>err &&
>> + # Write bundle ref fails, but clone can still proceed.
>> + commit_b=$(git -C clone-from rev-parse B) &&
>> + test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
>> + git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
>> + ! grep "refs/bundles/" refs
>> +'
>> +
>
>So this is the test the proposed log message discussed. The
>description gave a false impression that the "broken header" test
>that has not much to do with the bug being fixed was the only added
>test---we probably want to correct that impression.
Adjusted the commit message.
>> @@ -259,6 +278,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
>> ! grep "refs/bundles/" refs
>> '
>>
>> +#########################################################################
>> +# Clone negotiation related tests begin here
>
>Drop this divider and comment. The HTTP tests you see below has a
>much better reason to be separated like that in order to warn test
>writers (they shouldn't add their random new tests after that point,
>because everything after that one is skipped when HTTPD tests are
>disabled---see the beginning of t/lib-httpd.sh which is included
>after that divider line), but everything here you added is not
>special. Everybody should run these tests.
Thanks for the explanation, dropped the divider in new series.
>> +test_expect_success 'negotiation: bundle with part of wanted commits' '
>> + test_when_finished rm -rf trace*.txt &&
>
>Do not overly depend on glob not matching at this point when you
>establish the when-finished handler (or glob matching the files you
>want to catch and later test not adding anything you would want to
>clean). Quote "rm -f trace*.txt" and drop "r" if you do not absolutely
>need it (and I would imagine you don't, given the .txt suffix).
Yes, the "-r" is not safe and I just need to clean the "*.txt", not involving directories.
>> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
>> + git clone --no-local --bundle-uri="clone-from/A.bundle" \
>> + clone-from nego-bundle-part &&
>> + git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
>> + grep "refs/bundles/" refs >actual &&
>> + cat >expect <<-\EOF &&
>> + refs/bundles/topic
>> + EOF
>
>Hmph, if the expected pattern is only a few lines without any magic,
>
> test_write_lines >expect refs/bundles/topic &&
>
>may be easier to follow.
Applied!
>> + test_cmp expect actual &&
>> + # Ensure that refs/bundles/topic are sent as "have".
>> + grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
>> +'
>
>Using "test_grep" would make it easier to diagnose when test breaks.
>A failing "grep" will be silent (i.e., "I didn't find anything you
>told me to look for"). A failing "test_grep" will tell you "I was
>told to find this, but didn't find any in that".
Thanks, it just accelerated the digging process of the test failure
mentioned above. :)
>> +test_expect_success 'negotiation: bundle with all wanted commits' '
>> + test_when_finished rm -rf trace*.txt &&
>> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
>> + git clone --no-local --single-branch --branch=topic --no-tags \
>> + --bundle-uri="clone-from/B.bundle" \
>> + clone-from nego-bundle-all &&
>> + git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
>> + grep "refs/bundles/" refs >actual &&
>> + cat >expect <<-\EOF &&
>> + refs/bundles/topic
>> + EOF
>> + test_cmp expect actual &&
>> + # We already have all needed commits so no "want" needed.
>> + ! grep "clone> want " trace-packet.txt
>> +'
>> +
>> +test_expect_success 'negotiation: bundle list (no heuristic)' '
>> + test_when_finished rm -f trace*.txt &&
>> + cat >bundle-list <<-EOF &&
>> + [bundle]
>> + version = 1
>> + mode = all
>> +
>> + [bundle "bundle-1"]
>> + uri = file://$(pwd)/clone-from/bundle-1.bundle
>> +
>> + [bundle "bundle-2"]
>> + uri = file://$(pwd)/clone-from/bundle-2.bundle
>> + EOF
>
>OK. This is a good use of here-doc (as opposed to test_write_lines
>I sugested earlier for different purposes). I wondered if these
>$(pwd) and file://$(pwd) are safe (I always get confused by the need
>to sometimes use $PWD to help Windows), but I see them used in what
>Derrick wrote in this file, so they must be fine.
>
>But there may be characters in the leading part of $(pwd) that we do
>not control that needs quoting (like a double quote '"'). The value
>of bundle.*.uri may need to be quoted a bit carefully. This is not
>a new problem this patch introduces, so you do not have to rewrite
>this part of the patch; I'll mark it with #leftoverbits here---the
>idea being somebody else who is too bored can come back, see if it
>is truly broken, and fix them after all dust settles.
>
>Abusing "git config -f bundle-list" might be safer, e.g.
>
> $ git config -f bundle.list bundle.bundle-1.uri \
> 'file:///home/abc"def/t/trash dir/clone-from/b1.bndl'
> $ cat bundle.list
> [bundle "bundle-1"]
> uri = file:///home/abc\"def/t/trash dir/clone-from/b1.bndl
>
>as you do not know what other garbage character is in $(pwd) part.
>
>> + # We already have all needed commits so no "want" needed.
>> + ! grep "clone> want " trace-packet.txt
>
>Just FYI, to negate test_grep, use
>
> test_grep ! "clone > want " trace-packet.txt
>
>not
>
> ! test_grep "clone > want " trace-packet.txt ;# WRONG
Thanks your through review!
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v7 0/3] object checking related additions and fixes for bundles in fetches
2024-06-11 12:45 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
` (3 preceding siblings ...)
2024-06-11 13:14 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches Patrick Steinhardt
@ 2024-06-17 13:55 ` blanet via GitGitGadget
2024-06-17 13:55 ` [PATCH v7 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
` (3 more replies)
4 siblings, 4 replies; 66+ messages in thread
From: blanet via GitGitGadget @ 2024-06-17 13:55 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet
While attempting to fix a reference negotiation bug in bundle-uri, we
identified that the fetch process lacks some crucial object validation
checks when processing bundles. The primary issues are:
1. In the bundle-uri scenario, object IDs were not validated before writing
bundle references. This was the root cause of the original negotiation
bug in bundle-uri and could lead to potential repository corruption.
2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
configurations were not applied when directly fetching bundles or
fetching with bundle-uri enabled. In fact, there were no object
validation supports for unbundle.
The first patch addresses the bundle-uri negotiation issue by removing the
REF_SKIP_OID_VERIFICATION flag when writing bundle references.
Patches 2 through 3 extend verify_bundle_flags for bundle.c:unbundle to add
support for object validation (fsck) in fetch scenarios, mainly following
the suggestions from Junio and Patrick on the mailing list.
Xing Xin (3):
bundle-uri: verify oid before writing refs
fetch-pack: expose fsckObjects configuration logic
unbundle: extend object verification for fetches
bundle-uri.c | 6 +-
bundle.c | 3 +
bundle.h | 1 +
fetch-pack.c | 17 ++--
fetch-pack.h | 5 +
t/t5558-clone-bundle-uri.sh | 181 +++++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 33 +++++++
transport.c | 3 +-
8 files changed, 235 insertions(+), 14 deletions(-)
base-commit: b9cfe4845cb2562584837bc0101c0ab76490a239
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v7
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v7
Pull-Request: https://github.com/gitgitgadget/git/pull/1730
Range-diff vs v6:
1: e958a3ab20c ! 1: fc9f44fda00 bundle-uri: verify oid before writing refs
@@ Commit message
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.
- This commit fixes the bug by removing `REF_SKIP_OID_VERIFICATION` flag
- when writing bundle refs. When `refs.c:refs_update_ref` is called to to
- write the corresponding bundle refs, it triggers
- `refs.c:ref_transaction_commit`. This, in turn, invokes
- `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of
- the refs storage backend. For files backend, this function is
- `files-backend.c:files_transaction_prepare`, and for reftable backend,
- it is `reftable-backend.c:reftable_be_transaction_prepare`. Both
- functions eventually call `object.c:parse_object`, which can invoke
+ Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing
+ bundle refs. When `refs.c:refs_update_ref` is called to write the
+ corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`.
+ This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls
+ `transaction_prepare` of the refs storage backend. For files backend, it
+ is `files-backend.c:files_transaction_prepare`, and for reftable
+ backend, it is `reftable-backend.c:reftable_be_transaction_prepare`.
+ Both functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.
- A test has been added to demonstrate that bundles with incorrect
- headers, where refs point to non-existent objects, do not result in any
- bundle refs being created in the repository. Additionally, a set of
- negotiation-related tests for fetching with bundle-uri has been
- included.
+ A set of negotiation-related tests for cloning with bundle-uri has been
+ included to demonstrate that downloaded bundles are utilized to
+ accelerate fetching.
+
+ Additionally, another test has been added to show that bundles with
+ incorrect headers, where refs point to non-existent objects, do not
+ result in any bundle refs being created in the repository.
Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
bundle_header_release(&header);
## t/t5558-clone-bundle-uri.sh ##
+@@
+ test_description='test fetching bundles with --bundle-uri'
+
+ . ./test-lib.sh
++. "$TEST_DIRECTORY"/lib-bundle.sh
+
+ test_expect_success 'fail to clone from non-existent file' '
+ test_when_finished rm -rf test &&
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle
+ git bundle create B.bundle topic &&
+
+ # Create a bundle with reference pointing to non-existent object.
-+ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
++ sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
++ <A.bundle >bad-header.bundle &&
++ convert_bundle_to_pack \
++ <A.bundle >>bad-header.bundle
+ )
'
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with path bundle' '
'
+test_expect_success 'clone with bundle that has bad header' '
++ # Write bundle ref fails, but clone can still proceed.
+ git clone --bundle-uri="clone-from/bad-header.bundle" \
+ clone-from clone-bad-header 2>err &&
-+ # Write bundle ref fails, but clone can still proceed.
+ commit_b=$(git -C clone-from rev-parse B) &&
+ test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
+ git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
! grep "refs/bundles/" refs
'
-+#########################################################################
-+# Clone negotiation related tests begin here
-+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
-+ test_when_finished rm -rf trace*.txt &&
++ test_when_finished "rm -f trace*.txt" &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="clone-from/A.bundle" \
+ clone-from nego-bundle-part &&
+ git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
-+ cat >expect <<-\EOF &&
-+ refs/bundles/topic
-+ EOF
++ test_write_lines refs/bundles/topic >expect &&
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
-+ grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
++ test_grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
-+ test_when_finished rm -rf trace*.txt &&
++ test_when_finished "rm -f trace*.txt" &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=topic --no-tags \
+ --bundle-uri="clone-from/B.bundle" \
+ clone-from nego-bundle-all &&
+ git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
-+ cat >expect <<-\EOF &&
-+ refs/bundles/topic
-+ EOF
++ test_write_lines refs/bundles/topic >expect &&
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
-+ test_when_finished rm -f trace*.txt &&
++ test_when_finished "rm -f trace*.txt" &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
-+ grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
++ test_grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
-+ test_when_finished rm -f trace*.txt &&
++ test_when_finished "rm -f trace*.txt" &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
-+ grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
++ test_grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
-+ test_when_finished rm -f trace*.txt &&
++ test_when_finished "rm -f trace*.txt" &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
2: d21c236b8de = 2: 3dc0d9dd22f fetch-pack: expose fsckObjects configuration logic
3: 53395e8c08a ! 3: 2f15099bbb9 unbundle: support object verification for fetches
@@ Metadata
Author: Xing Xin <xingxin.xx@bytedance.com>
## Commit message ##
- unbundle: support object verification for fetches
+ unbundle: extend object verification for fetches
- This commit extends object verification support for fetches in
- `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
- option to `verify_bundle_flags`. When this option is enabled,
- `bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
- determine whether to append the "--fsck-objects" flag to
- "git-index-pack".
+ The existing fetch.fsckObjects and transfer.fsckObjects configurations
+ were not fully applied to bundle-involved fetches, including direct
+ bundle fetches and bundle-uri enabled fetches. Furthermore, there was no
+ object verification support for unbundle.
- `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is now passed to `unbundle` in the
- fetching process, including:
+ This commit extends object verification support in `bundle.c:unbundle`
+ by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When
+ this option is enabled, we append the `--fsck-objects` flag to
+ `git-index-pack`.
+
+ The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches,
+ where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether
+ to enable this option for `bundle.c:unbundle`, specifically in:
- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
This addition ensures a consistent logic for object verification during
- fetch operations. Tests have been added to confirm functionality in the
- scenarios mentioned above.
+ fetches. Tests have been added to confirm functionality in the scenarios
+ mentioned above.
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
## bundle-uri.c ##
+@@
+ #include "hashmap.h"
+ #include "pkt-line.h"
+ #include "config.h"
++#include "fetch-pack.h"
+ #include "remote.h"
+
+ static struct {
@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET)))
-+ VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
++ VERIFY_BUNDLE_QUIET | (fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0))))
return 1;
/*
## bundle.c ##
-@@
- #include "list-objects-filter-options.h"
- #include "connected.h"
- #include "write-or-die.h"
-+#include "fetch-pack.h"
-
- static const char v2_bundle_signature[] = "# v2 git bundle\n";
- static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
-+ if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
-+ if (fetch_pack_fsck_objects())
-+ strvec_push(&ip.args, "--fsck-objects");
++ if (flags & VERIFY_BUNDLE_FSCK)
++ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
@@ bundle.h: int create_bundle(struct repository *r, const char *path,
enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
-+ VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 2),
++ VERIFY_BUNDLE_FSCK = (1 << 2),
};
int verify_bundle(struct repository *r, struct bundle_header *header,
## t/t5558-clone-bundle-uri.sh ##
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'create bundle' '
- git bundle create B.bundle topic &&
-
- # Create a bundle with reference pointing to non-existent object.
-- sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
-+ sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle &&
+ sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
+ <A.bundle >bad-header.bundle &&
+ convert_bundle_to_pack \
+- <A.bundle >>bad-header.bundle
++ <A.bundle >>bad-header.bundle &&
+
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with bundle that has bad
+ clone-from clone-bad-object-no-fsck &&
+ git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
-+ cat >expect <<-\EOF &&
-+ refs/bundles/bad
-+ EOF
++ test_write_lines refs/bundles/bad >expect &&
+ test_cmp expect actual &&
+
+ # Unbundle fails with fsckObjects set true, but clone can still proceed.
@@ transport.c: static int fetch_refs_from_bundle(struct transport *transport,
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, 0);
-+ &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
++ &extra_index_pack_args,
++ fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v7 1/3] bundle-uri: verify oid before writing refs
2024-06-17 13:55 ` [PATCH v7 " blanet via GitGitGadget
@ 2024-06-17 13:55 ` Xing Xin via GitGitGadget
2024-06-18 17:37 ` Junio C Hamano
2024-06-17 13:55 ` [PATCH v7 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
` (2 subsequent siblings)
3 siblings, 1 reply; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-17 13:55 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the
"creationTokens" heuristic.
To reproduce this issue, consider a repository with a single branch
"main" pointing to commit "A". Firstly, create a base bundle with:
git bundle create base.bundle main
Then, add a new commit "B" on top of "A", and create an incremental
bundle for "main":
git bundle create incr.bundle A..main
Now, generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above should result in a reference
"refs/bundles/main" pointing to "B" in the new repository. However, git
would still download everything from the server, as if it had fetched
nothing locally.
So why the "refs/bundles/main" is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
`bundle-uri.c:download_bundle_list` or via
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
is called by `bundle-uri.c:unbundle_all_bundles` or called within
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
3. To get all prerequisites of the bundle, the bundle header is read
inside `bundle-uri.c:unbundle_from_file` to by calling
`bundle.c:read_bundle_header`.
4. Then it calls `bundle.c:unbundle`, which calls
`bundle.c:verify_bundle` to ensure the repository contains all the
prerequisites.
5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
invokes `packfile.c:prepare_packed_git` or
`packfile.c:reprepare_packed_git`, filling
`raw_object_store->packed_git` and setting `packed_git_initialized`.
6. If `bundle.c:unbundle` succeeds, it writes refs via
`refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
bundle refs which can target arbitrary objects are written to the
repository.
7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
`fetch-pack.c:mark_complete_and_common_ref` and
`fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
prevents `packfile.c:reprepare_packed_git` from being called,
resulting in failures to parse OIDs that reside only in the latest
bundle.
In the example above, when unbunding "incr.bundle", "base.pack" is added
to `packed_git` due to prerequisites verification. However, "B" cannot
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.
Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing
bundle refs. When `refs.c:refs_update_ref` is called to write the
corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`.
This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls
`transaction_prepare` of the refs storage backend. For files backend, it
is `files-backend.c:files_transaction_prepare`, and for reftable
backend, it is `reftable-backend.c:reftable_be_transaction_prepare`.
Both functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.
A set of negotiation-related tests for cloning with bundle-uri has been
included to demonstrate that downloaded bundles are utilized to
accelerate fetching.
Additionally, another test has been added to show that bundles with
incorrect headers, where refs point to non-existent objects, do not
result in any bundle refs being created in the repository.
Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 3 +-
t/t5558-clone-bundle-uri.sh | 150 +++++++++++++++++++++++++++++++++++-
2 files changed, 147 insertions(+), 6 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
refs_update_ref(get_main_ref_store(the_repository),
"fetched bundle", bundle_ref.buf, oid,
has_old ? &old_oid : NULL,
- REF_SKIP_OID_VERIFICATION,
- UPDATE_REFS_MSG_ON_ERR);
+ 0, UPDATE_REFS_MSG_ON_ERR);
}
bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..2dcdd238a90 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -3,6 +3,7 @@
test_description='test fetching bundles with --bundle-uri'
. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-bundle.sh
test_expect_success 'fail to clone from non-existent file' '
test_when_finished rm -rf test &&
@@ -19,10 +20,22 @@ test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
git init clone-from &&
- git -C clone-from checkout -b topic &&
- test_commit -C clone-from A &&
- test_commit -C clone-from B &&
- git -C clone-from bundle create B.bundle topic
+ (
+ cd clone-from &&
+ git checkout -b topic &&
+
+ test_commit A &&
+ git bundle create A.bundle topic &&
+
+ test_commit B &&
+ git bundle create B.bundle topic &&
+
+ # Create a bundle with reference pointing to non-existent object.
+ sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
+ <A.bundle >bad-header.bundle &&
+ convert_bundle_to_pack \
+ <A.bundle >>bad-header.bundle
+ )
'
test_expect_success 'clone with path bundle' '
@@ -33,6 +46,16 @@ test_expect_success 'clone with path bundle' '
test_cmp expect actual
'
+test_expect_success 'clone with bundle that has bad header' '
+ # Write bundle ref fails, but clone can still proceed.
+ git clone --bundle-uri="clone-from/bad-header.bundle" \
+ clone-from clone-bad-header 2>err &&
+ commit_b=$(git -C clone-from rev-parse B) &&
+ test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
+ git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
@@ -259,6 +282,125 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
! grep "refs/bundles/" refs
'
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+ test_when_finished "rm -f trace*.txt" &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="clone-from/A.bundle" \
+ clone-from nego-bundle-part &&
+ git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ test_write_lines refs/bundles/topic >expect &&
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
+ test_grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+ test_when_finished "rm -f trace*.txt" &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=topic --no-tags \
+ --bundle-uri="clone-from/B.bundle" \
+ clone-from nego-bundle-all &&
+ git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ test_write_lines refs/bundles/topic >expect &&
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+ test_when_finished "rm -f trace*.txt" &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-no-heuristic &&
+
+ git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ test_grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+ test_when_finished "rm -f trace*.txt" &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-heuristic &&
+
+ git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ test_grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+ test_when_finished "rm -f trace*.txt" &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=left --no-tags \
+ --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-all &&
+
+ git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ ! grep "clone> want " trace-packet.txt
+'
+
#########################################################################
# HTTP tests begin here
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v7 2/3] fetch-pack: expose fsckObjects configuration logic
2024-06-17 13:55 ` [PATCH v7 " blanet via GitGitGadget
2024-06-17 13:55 ` [PATCH v7 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-06-17 13:55 ` Xing Xin via GitGitGadget
2024-06-17 13:55 ` [PATCH v7 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
3 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-17 13:55 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
Currently, we can use "transfer.fsckObjects" and the more specific
"fetch.fsckObjects" to control checks for broken objects in received
packs during fetches. However, these configurations were only
acknowledged by `fetch-pack.c:get_pack` and did not take effect in
direct bundle fetches and fetches with _bundle-uri_ enabled.
This commit exposes the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
new function is used to replace the assignment for `fsck_objects` in
`fetch-pack.c:get_pack`. In the next commit, it will also be used by
`bundle.c:unbundle` to better fit fetching scenarios.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
fetch-pack.c | 17 +++++++++++------
fetch-pack.h | 5 +++++
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 7d2aef21add..3acff2baf09 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
strvec_push(&cmd.args, alternate_shallow_file);
}
- if (fetch_fsck_objects >= 0
- ? fetch_fsck_objects
- : transfer_fsck_objects >= 0
- ? transfer_fsck_objects
- : 0)
- fsck_objects = 1;
+ fsck_objects = fetch_pack_fsck_objects();
if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
if (pack_lockfiles || fsck_objects)
@@ -2046,6 +2041,16 @@ static const struct object_id *iterate_ref_map(void *cb_data)
return &ref->old_oid;
}
+int fetch_pack_fsck_objects(void)
+{
+ fetch_pack_setup();
+ if (fetch_fsck_objects >= 0)
+ return fetch_fsck_objects;
+ if (transfer_fsck_objects >= 0)
+ return transfer_fsck_objects;
+ return 0;
+}
+
struct ref *fetch_pack(struct fetch_pack_args *args,
int fd[],
const struct ref *ref,
diff --git a/fetch-pack.h b/fetch-pack.h
index 6775d265175..b5c579cdae2 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -101,4 +101,9 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
*/
int report_unmatched_refs(struct ref **sought, int nr_sought);
+/*
+ * Return true if checks for broken objects in received pack are required.
+ */
+int fetch_pack_fsck_objects(void);
+
#endif
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v7 3/3] unbundle: extend object verification for fetches
2024-06-17 13:55 ` [PATCH v7 " blanet via GitGitGadget
2024-06-17 13:55 ` [PATCH v7 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-17 13:55 ` [PATCH v7 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-06-17 13:55 ` Xing Xin via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
3 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-17 13:55 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
The existing fetch.fsckObjects and transfer.fsckObjects configurations
were not fully applied to bundle-involved fetches, including direct
bundle fetches and bundle-uri enabled fetches. Furthermore, there was no
object verification support for unbundle.
This commit extends object verification support in `bundle.c:unbundle`
by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When
this option is enabled, we append the `--fsck-objects` flag to
`git-index-pack`.
The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches,
where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether
to enable this option for `bundle.c:unbundle`, specifically in:
- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
This addition ensures a consistent logic for object verification during
fetches. Tests have been added to confirm functionality in the scenarios
mentioned above.
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 3 ++-
bundle.c | 3 +++
bundle.h | 1 +
t/t5558-clone-bundle-uri.sh | 33 ++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 33 +++++++++++++++++++++++++++++++++
transport.c | 3 ++-
6 files changed, 73 insertions(+), 3 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 65666a11d9c..ed9b49fdbc1 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -9,6 +9,7 @@
#include "hashmap.h"
#include "pkt-line.h"
#include "config.h"
+#include "fetch-pack.h"
#include "remote.h"
static struct {
@@ -373,7 +374,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET)))
+ VERIFY_BUNDLE_QUIET | (fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0))))
return 1;
/*
diff --git a/bundle.c b/bundle.c
index 95367c2d0a0..f124a2a5626 100644
--- a/bundle.c
+++ b/bundle.c
@@ -625,6 +625,9 @@ int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
+ if (flags & VERIFY_BUNDLE_FSCK)
+ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index 021adbdcbb3..5ccc9a061a4 100644
--- a/bundle.h
+++ b/bundle.h
@@ -33,6 +33,7 @@ int create_bundle(struct repository *r, const char *path,
enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
+ VERIFY_BUNDLE_FSCK = (1 << 2),
};
int verify_bundle(struct repository *r, struct bundle_header *header,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 2dcdd238a90..38a25d08d0a 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -34,7 +34,21 @@ test_expect_success 'create bundle' '
sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
<A.bundle >bad-header.bundle &&
convert_bundle_to_pack \
- <A.bundle >>bad-header.bundle
+ <A.bundle >>bad-header.bundle &&
+
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad-object.bundle bad &&
+ git update-ref -d refs/heads/bad
)
'
@@ -56,6 +70,23 @@ test_expect_success 'clone with bundle that has bad header' '
! grep "refs/bundles/" refs
'
+test_expect_success 'clone with bundle that has bad object' '
+ # Unbundle succeeds if no fsckObjects confugured.
+ git clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-no-fsck &&
+ git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ test_write_lines refs/bundles/bad >expect &&
+ test_cmp expect actual &&
+
+ # Unbundle fails with fsckObjects set true, but clone can still proceed.
+ git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-fsck 2>err &&
+ test_grep "missingEmail" err &&
+ git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs &&
+ ! grep "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
index 0d1e92d9963..5182efc0b45 100755
--- a/t/t5607-clone-bundle.sh
+++ b/t/t5607-clone-bundle.sh
@@ -138,6 +138,39 @@ test_expect_success 'fetch SHA-1 from bundle' '
git fetch --no-tags foo/tip.bundle "$(cat hash)"
'
+test_expect_success 'clone bundle with different fsckObjects configurations' '
+ test_create_repo bundle-fsck &&
+ (
+ cd bundle-fsck &&
+ test_commit first &&
+ cat >data <<-EOF &&
+ tree $(git rev-parse HEAD^{tree})
+ parent $(git rev-parse HEAD)
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ git hash-object --literally -t commit -w --stdin <data >commit &&
+ git branch bad $(cat commit) &&
+ git bundle create bad.bundle bad
+ ) &&
+
+ git clone bundle-fsck/bad.bundle bundle-no-fsck &&
+
+ git -c fetch.fsckObjects=false -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-no-fsck &&
+
+ test_must_fail git -c fetch.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-fsck 2>err &&
+ test_grep "missingEmail" err &&
+
+ test_must_fail git -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-transfer-fsck 2>err &&
+ test_grep "missingEmail" err
+'
+
test_expect_success 'git bundle uses expected default format' '
git bundle create bundle HEAD^.. &&
cat >expect <<-EOF &&
diff --git a/transport.c b/transport.c
index 0ad04b77fd2..a93c4171f7b 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,8 @@ static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, 0);
+ &extra_index_pack_args,
+ fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH v7 1/3] bundle-uri: verify oid before writing refs
2024-06-17 13:55 ` [PATCH v7 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-06-18 17:37 ` Junio C Hamano
2024-06-19 6:30 ` Xing Xin
0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2024-06-18 17:37 UTC (permalink / raw)
To: Xing Xin via GitGitGadget
Cc: git, Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -33,6 +46,16 @@ test_expect_success 'clone with path bundle' '
> test_cmp expect actual
> '
>
> +test_expect_success 'clone with bundle that has bad header' '
> + # Write bundle ref fails, but clone can still proceed.
> + git clone --bundle-uri="clone-from/bad-header.bundle" \
> + clone-from clone-bad-header 2>err &&
> + commit_b=$(git -C clone-from rev-parse B) &&
> + test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
> + git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
> + ! grep "refs/bundles/" refs
Why not "test_grep !" here? There are other uses of bare grep in
the newly added lines, but I won't repeat them here; the same
comment applies to them.
> + test_write_lines refs/bundles/topic >expect &&
> + test_cmp expect actual &&
> + # Ensure that refs/bundles/topic are sent as "have".
> + test_grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
> +'
Can this rev-parse fail (the failure would be hidden from the test)?
If so,
...
test_cmp expect actual &&
# Ensure that refs/bundles/topic is sent as "have"
tip=$(git -C clone-from rev-parse A) &&
test_grep "clone> have $tip" trace-packet.txt
would catch such a failure. You are doing so in the previous test
in the hunk starting at 33/46 above with commit_b variable already.
There are other uses of git command in $(command substitution) whose
exit status are ignored in the newly added lines, but I won't
repeat them here; the same comment applies to them.
Thanks.
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v8 0/3] object checking related additions and fixes for bundles in fetches
2024-06-17 13:55 ` [PATCH v7 " blanet via GitGitGadget
` (2 preceding siblings ...)
2024-06-17 13:55 ` [PATCH v7 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget
@ 2024-06-19 4:07 ` blanet via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
` (2 more replies)
3 siblings, 3 replies; 66+ messages in thread
From: blanet via GitGitGadget @ 2024-06-19 4:07 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet
While attempting to fix a reference negotiation bug in bundle-uri, we
identified that the fetch process lacks some crucial object validation
checks when processing bundles. The primary issues are:
1. In the bundle-uri scenario, object IDs were not validated before writing
bundle references. This was the root cause of the original negotiation
bug in bundle-uri and could lead to potential repository corruption.
2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
configurations were not applied when directly fetching bundles or
fetching with bundle-uri enabled. In fact, there were no object
validation supports for unbundle.
The first patch addresses the bundle-uri negotiation issue by removing the
REF_SKIP_OID_VERIFICATION flag when writing bundle references.
Patches 2 through 3 extend verify_bundle_flags for bundle.c:unbundle to add
support for object validation (fsck) in fetch scenarios, mainly following
the suggestions from Junio and Patrick on the mailing list.
Xing Xin (3):
bundle-uri: verify oid before writing refs
fetch-pack: expose fsckObjects configuration logic
unbundle: extend object verification for fetches
bundle-uri.c | 6 +-
bundle.c | 3 +
bundle.h | 1 +
fetch-pack.c | 17 ++--
fetch-pack.h | 5 +
t/t5558-clone-bundle-uri.sh | 187 +++++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 35 +++++++
transport.c | 3 +-
8 files changed, 243 insertions(+), 14 deletions(-)
base-commit: b9cfe4845cb2562584837bc0101c0ab76490a239
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v8
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v8
Pull-Request: https://github.com/gitgitgadget/git/pull/1730
Range-diff vs v7:
1: fc9f44fda00 ! 1: d8fbde2dcd4 bundle-uri: verify oid before writing refs
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle
+ git bundle create B.bundle topic &&
+
+ # Create a bundle with reference pointing to non-existent object.
-+ sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
++ commit_a=$(git rev-parse A) &&
++ commit_b=$(git rev-parse B) &&
++ sed -e "/^$/q" -e "s/$commit_a /$commit_b /" \
+ <A.bundle >bad-header.bundle &&
+ convert_bundle_to_pack \
+ <A.bundle >>bad-header.bundle
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with path bundle' '
+ commit_b=$(git -C clone-from rev-parse B) &&
+ test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
+ git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
-+ ! grep "refs/bundles/" refs
++ test_grep ! "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
+ test_write_lines refs/bundles/topic >expect &&
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
-+ test_grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
++ tip=$(git -C clone-from rev-parse A) &&
++ test_grep "clone> have $tip" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
+ test_write_lines refs/bundles/topic >expect &&
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
-+ ! grep "clone> want " trace-packet.txt
++ test_grep ! "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
-+ test_grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
++ tip=$(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left) &&
++ test_grep "clone> have $tip" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
-+ test_grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
++ tip=$(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left) &&
++ test_grep "clone> have $tip" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
-+ ! grep "clone> want " trace-packet.txt
++ test_grep ! "clone> want " trace-packet.txt
+'
+
#########################################################################
2: 3dc0d9dd22f ! 2: 518584c8698 fetch-pack: expose fsckObjects configuration logic
@@ Commit message
"fetch.fsckObjects" to control checks for broken objects in received
packs during fetches. However, these configurations were only
acknowledged by `fetch-pack.c:get_pack` and did not take effect in
- direct bundle fetches and fetches with _bundle-uri_ enabled.
+ direct bundle fetches or fetches with _bundle-uri_ enabled.
This commit exposes the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
new function is used to replace the assignment for `fsck_objects` in
- `fetch-pack.c:get_pack`. In the next commit, it will also be used by
- `bundle.c:unbundle` to better fit fetching scenarios.
+ `fetch-pack.c:get_pack`. In the next commit, this function will also be
+ used to extend fsck support for bundle-involved fetches.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
3: 2f15099bbb9 ! 3: 698dd6e49b7 unbundle: extend object verification for fetches
@@ bundle.h: int create_bundle(struct repository *r, const char *path,
## t/t5558-clone-bundle-uri.sh ##
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'create bundle' '
- sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
+ sed -e "/^$/q" -e "s/$commit_a /$commit_b /" \
<A.bundle >bad-header.bundle &&
convert_bundle_to_pack \
- <A.bundle >>bad-header.bundle
+ <A.bundle >>bad-header.bundle &&
+
++ tree_b=$(git rev-parse B^{tree}) &&
+ cat >data <<-EOF &&
-+ tree $(git rev-parse HEAD^{tree})
-+ parent $(git rev-parse HEAD)
++ tree $tree_b
++ parent $commit_b
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
-+ git hash-object --literally -t commit -w --stdin <data >commit &&
-+ git branch bad $(cat commit) &&
++ bad_commit=$(git hash-object --literally -t commit -w --stdin <data) &&
++ git branch bad $bad_commit &&
+ git bundle create bad-object.bundle bad &&
+ git update-ref -d refs/heads/bad
)
'
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with bundle that has bad header' '
- ! grep "refs/bundles/" refs
+ test_grep ! "refs/bundles/" refs
'
+test_expect_success 'clone with bundle that has bad object' '
-+ # Unbundle succeeds if no fsckObjects confugured.
++ # Unbundle succeeds if no fsckObjects configured.
+ git clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-no-fsck &&
+ git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with bundle that has bad
+ clone-from clone-bad-object-fsck 2>err &&
+ test_grep "missingEmail" err &&
+ git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs &&
-+ ! grep "refs/bundles/" refs
++ test_grep ! "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
@@ t/t5607-clone-bundle.sh: test_expect_success 'fetch SHA-1 from bundle' '
+ test_create_repo bundle-fsck &&
+ (
+ cd bundle-fsck &&
-+ test_commit first &&
++ test_commit A &&
++ commit_a=$(git rev-parse A) &&
++ tree_a=$(git rev-parse A^{tree}) &&
+ cat >data <<-EOF &&
-+ tree $(git rev-parse HEAD^{tree})
-+ parent $(git rev-parse HEAD)
++ tree $tree_a
++ parent $commit_a
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
-+ git hash-object --literally -t commit -w --stdin <data >commit &&
-+ git branch bad $(cat commit) &&
++ bad_commit=$(git hash-object --literally -t commit -w --stdin <data) &&
++ git branch bad $bad_commit &&
+ git bundle create bad.bundle bad
+ ) &&
+
--
gitgitgadget
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v8 1/3] bundle-uri: verify oid before writing refs
2024-06-19 4:07 ` [PATCH v8 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
@ 2024-06-19 4:07 ` Xing Xin via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget
2 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-19 4:07 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the
"creationTokens" heuristic.
To reproduce this issue, consider a repository with a single branch
"main" pointing to commit "A". Firstly, create a base bundle with:
git bundle create base.bundle main
Then, add a new commit "B" on top of "A", and create an incremental
bundle for "main":
git bundle create incr.bundle A..main
Now, generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above should result in a reference
"refs/bundles/main" pointing to "B" in the new repository. However, git
would still download everything from the server, as if it had fetched
nothing locally.
So why the "refs/bundles/main" is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
`bundle-uri.c:download_bundle_list` or via
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
is called by `bundle-uri.c:unbundle_all_bundles` or called within
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
3. To get all prerequisites of the bundle, the bundle header is read
inside `bundle-uri.c:unbundle_from_file` to by calling
`bundle.c:read_bundle_header`.
4. Then it calls `bundle.c:unbundle`, which calls
`bundle.c:verify_bundle` to ensure the repository contains all the
prerequisites.
5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
invokes `packfile.c:prepare_packed_git` or
`packfile.c:reprepare_packed_git`, filling
`raw_object_store->packed_git` and setting `packed_git_initialized`.
6. If `bundle.c:unbundle` succeeds, it writes refs via
`refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
bundle refs which can target arbitrary objects are written to the
repository.
7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
`fetch-pack.c:mark_complete_and_common_ref` and
`fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
prevents `packfile.c:reprepare_packed_git` from being called,
resulting in failures to parse OIDs that reside only in the latest
bundle.
In the example above, when unbunding "incr.bundle", "base.pack" is added
to `packed_git` due to prerequisites verification. However, "B" cannot
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.
Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing
bundle refs. When `refs.c:refs_update_ref` is called to write the
corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`.
This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls
`transaction_prepare` of the refs storage backend. For files backend, it
is `files-backend.c:files_transaction_prepare`, and for reftable
backend, it is `reftable-backend.c:reftable_be_transaction_prepare`.
Both functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.
A set of negotiation-related tests for cloning with bundle-uri has been
included to demonstrate that downloaded bundles are utilized to
accelerate fetching.
Additionally, another test has been added to show that bundles with
incorrect headers, where refs point to non-existent objects, do not
result in any bundle refs being created in the repository.
Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 3 +-
t/t5558-clone-bundle-uri.sh | 155 +++++++++++++++++++++++++++++++++++-
2 files changed, 152 insertions(+), 6 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
refs_update_ref(get_main_ref_store(the_repository),
"fetched bundle", bundle_ref.buf, oid,
has_old ? &old_oid : NULL,
- REF_SKIP_OID_VERIFICATION,
- UPDATE_REFS_MSG_ON_ERR);
+ 0, UPDATE_REFS_MSG_ON_ERR);
}
bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..a0895913fe9 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -3,6 +3,7 @@
test_description='test fetching bundles with --bundle-uri'
. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-bundle.sh
test_expect_success 'fail to clone from non-existent file' '
test_when_finished rm -rf test &&
@@ -19,10 +20,24 @@ test_expect_success 'fail to clone from non-bundle file' '
test_expect_success 'create bundle' '
git init clone-from &&
- git -C clone-from checkout -b topic &&
- test_commit -C clone-from A &&
- test_commit -C clone-from B &&
- git -C clone-from bundle create B.bundle topic
+ (
+ cd clone-from &&
+ git checkout -b topic &&
+
+ test_commit A &&
+ git bundle create A.bundle topic &&
+
+ test_commit B &&
+ git bundle create B.bundle topic &&
+
+ # Create a bundle with reference pointing to non-existent object.
+ commit_a=$(git rev-parse A) &&
+ commit_b=$(git rev-parse B) &&
+ sed -e "/^$/q" -e "s/$commit_a /$commit_b /" \
+ <A.bundle >bad-header.bundle &&
+ convert_bundle_to_pack \
+ <A.bundle >>bad-header.bundle
+ )
'
test_expect_success 'clone with path bundle' '
@@ -33,6 +48,16 @@ test_expect_success 'clone with path bundle' '
test_cmp expect actual
'
+test_expect_success 'clone with bundle that has bad header' '
+ # Write bundle ref fails, but clone can still proceed.
+ git clone --bundle-uri="clone-from/bad-header.bundle" \
+ clone-from clone-bad-header 2>err &&
+ commit_b=$(git -C clone-from rev-parse B) &&
+ test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
+ git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
+ test_grep ! "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
@@ -259,6 +284,128 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
! grep "refs/bundles/" refs
'
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+ test_when_finished "rm -f trace*.txt" &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="clone-from/A.bundle" \
+ clone-from nego-bundle-part &&
+ git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ test_write_lines refs/bundles/topic >expect &&
+ test_cmp expect actual &&
+ # Ensure that refs/bundles/topic are sent as "have".
+ tip=$(git -C clone-from rev-parse A) &&
+ test_grep "clone> have $tip" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+ test_when_finished "rm -f trace*.txt" &&
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=topic --no-tags \
+ --bundle-uri="clone-from/B.bundle" \
+ clone-from nego-bundle-all &&
+ git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ test_write_lines refs/bundles/topic >expect &&
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ test_grep ! "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+ test_when_finished "rm -f trace*.txt" &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-no-heuristic &&
+
+ git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ tip=$(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left) &&
+ test_grep "clone> have $tip" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+ test_when_finished "rm -f trace*.txt" &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-heuristic &&
+
+ git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ tip=$(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left) &&
+ test_grep "clone> have $tip" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+ test_when_finished "rm -f trace*.txt" &&
+ cat >bundle-list <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = file://$(pwd)/clone-from/bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = file://$(pwd)/clone-from/bundle-2.bundle
+ creationToken = 2
+ EOF
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git clone --no-local --single-branch --branch=left --no-tags \
+ --bundle-uri="file://$(pwd)/bundle-list" \
+ clone-from nego-bundle-list-all &&
+
+ git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ test_grep ! "clone> want " trace-packet.txt
+'
+
#########################################################################
# HTTP tests begin here
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 2/3] fetch-pack: expose fsckObjects configuration logic
2024-06-19 4:07 ` [PATCH v8 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-06-19 4:07 ` Xing Xin via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget
2 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-19 4:07 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
Currently, we can use "transfer.fsckObjects" and the more specific
"fetch.fsckObjects" to control checks for broken objects in received
packs during fetches. However, these configurations were only
acknowledged by `fetch-pack.c:get_pack` and did not take effect in
direct bundle fetches or fetches with _bundle-uri_ enabled.
This commit exposes the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
new function is used to replace the assignment for `fsck_objects` in
`fetch-pack.c:get_pack`. In the next commit, this function will also be
used to extend fsck support for bundle-involved fetches.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
fetch-pack.c | 17 +++++++++++------
fetch-pack.h | 5 +++++
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 7d2aef21add..3acff2baf09 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
strvec_push(&cmd.args, alternate_shallow_file);
}
- if (fetch_fsck_objects >= 0
- ? fetch_fsck_objects
- : transfer_fsck_objects >= 0
- ? transfer_fsck_objects
- : 0)
- fsck_objects = 1;
+ fsck_objects = fetch_pack_fsck_objects();
if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
if (pack_lockfiles || fsck_objects)
@@ -2046,6 +2041,16 @@ static const struct object_id *iterate_ref_map(void *cb_data)
return &ref->old_oid;
}
+int fetch_pack_fsck_objects(void)
+{
+ fetch_pack_setup();
+ if (fetch_fsck_objects >= 0)
+ return fetch_fsck_objects;
+ if (transfer_fsck_objects >= 0)
+ return transfer_fsck_objects;
+ return 0;
+}
+
struct ref *fetch_pack(struct fetch_pack_args *args,
int fd[],
const struct ref *ref,
diff --git a/fetch-pack.h b/fetch-pack.h
index 6775d265175..b5c579cdae2 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -101,4 +101,9 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
*/
int report_unmatched_refs(struct ref **sought, int nr_sought);
+/*
+ * Return true if checks for broken objects in received pack are required.
+ */
+int fetch_pack_fsck_objects(void);
+
#endif
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 3/3] unbundle: extend object verification for fetches
2024-06-19 4:07 ` [PATCH v8 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-06-19 4:07 ` Xing Xin via GitGitGadget
2 siblings, 0 replies; 66+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-06-19 4:07 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin
From: Xing Xin <xingxin.xx@bytedance.com>
The existing fetch.fsckObjects and transfer.fsckObjects configurations
were not fully applied to bundle-involved fetches, including direct
bundle fetches and bundle-uri enabled fetches. Furthermore, there was no
object verification support for unbundle.
This commit extends object verification support in `bundle.c:unbundle`
by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When
this option is enabled, we append the `--fsck-objects` flag to
`git-index-pack`.
The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches,
where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether
to enable this option for `bundle.c:unbundle`, specifically in:
- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
This addition ensures a consistent logic for object verification during
fetches. Tests have been added to confirm functionality in the scenarios
mentioned above.
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
bundle-uri.c | 3 ++-
bundle.c | 3 +++
bundle.h | 1 +
t/t5558-clone-bundle-uri.sh | 34 +++++++++++++++++++++++++++++++++-
t/t5607-clone-bundle.sh | 35 +++++++++++++++++++++++++++++++++++
transport.c | 3 ++-
6 files changed, 76 insertions(+), 3 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 65666a11d9c..ed9b49fdbc1 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -9,6 +9,7 @@
#include "hashmap.h"
#include "pkt-line.h"
#include "config.h"
+#include "fetch-pack.h"
#include "remote.h"
static struct {
@@ -373,7 +374,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
*/
if ((result = unbundle(r, &header, bundle_fd, NULL,
- VERIFY_BUNDLE_QUIET)))
+ VERIFY_BUNDLE_QUIET | (fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0))))
return 1;
/*
diff --git a/bundle.c b/bundle.c
index 95367c2d0a0..f124a2a5626 100644
--- a/bundle.c
+++ b/bundle.c
@@ -625,6 +625,9 @@ int unbundle(struct repository *r, struct bundle_header *header,
if (header->filter.choice)
strvec_push(&ip.args, "--promisor=from-bundle");
+ if (flags & VERIFY_BUNDLE_FSCK)
+ strvec_push(&ip.args, "--fsck-objects");
+
if (extra_index_pack_args) {
strvec_pushv(&ip.args, extra_index_pack_args->v);
strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index 021adbdcbb3..5ccc9a061a4 100644
--- a/bundle.h
+++ b/bundle.h
@@ -33,6 +33,7 @@ int create_bundle(struct repository *r, const char *path,
enum verify_bundle_flags {
VERIFY_BUNDLE_VERBOSE = (1 << 0),
VERIFY_BUNDLE_QUIET = (1 << 1),
+ VERIFY_BUNDLE_FSCK = (1 << 2),
};
int verify_bundle(struct repository *r, struct bundle_header *header,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index a0895913fe9..cd05321e176 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -36,7 +36,22 @@ test_expect_success 'create bundle' '
sed -e "/^$/q" -e "s/$commit_a /$commit_b /" \
<A.bundle >bad-header.bundle &&
convert_bundle_to_pack \
- <A.bundle >>bad-header.bundle
+ <A.bundle >>bad-header.bundle &&
+
+ tree_b=$(git rev-parse B^{tree}) &&
+ cat >data <<-EOF &&
+ tree $tree_b
+ parent $commit_b
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ bad_commit=$(git hash-object --literally -t commit -w --stdin <data) &&
+ git branch bad $bad_commit &&
+ git bundle create bad-object.bundle bad &&
+ git update-ref -d refs/heads/bad
)
'
@@ -58,6 +73,23 @@ test_expect_success 'clone with bundle that has bad header' '
test_grep ! "refs/bundles/" refs
'
+test_expect_success 'clone with bundle that has bad object' '
+ # Unbundle succeeds if no fsckObjects configured.
+ git clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-no-fsck &&
+ git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ test_write_lines refs/bundles/bad >expect &&
+ test_cmp expect actual &&
+
+ # Unbundle fails with fsckObjects set true, but clone can still proceed.
+ git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad-object.bundle" \
+ clone-from clone-bad-object-fsck 2>err &&
+ test_grep "missingEmail" err &&
+ git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs &&
+ test_grep ! "refs/bundles/" refs
+'
+
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
index 0d1e92d9963..489c6570da5 100755
--- a/t/t5607-clone-bundle.sh
+++ b/t/t5607-clone-bundle.sh
@@ -138,6 +138,41 @@ test_expect_success 'fetch SHA-1 from bundle' '
git fetch --no-tags foo/tip.bundle "$(cat hash)"
'
+test_expect_success 'clone bundle with different fsckObjects configurations' '
+ test_create_repo bundle-fsck &&
+ (
+ cd bundle-fsck &&
+ test_commit A &&
+ commit_a=$(git rev-parse A) &&
+ tree_a=$(git rev-parse A^{tree}) &&
+ cat >data <<-EOF &&
+ tree $tree_a
+ parent $commit_a
+ author A U Thor
+ committer A U Thor
+
+ commit: this is a commit with bad emails
+
+ EOF
+ bad_commit=$(git hash-object --literally -t commit -w --stdin <data) &&
+ git branch bad $bad_commit &&
+ git bundle create bad.bundle bad
+ ) &&
+
+ git clone bundle-fsck/bad.bundle bundle-no-fsck &&
+
+ git -c fetch.fsckObjects=false -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-no-fsck &&
+
+ test_must_fail git -c fetch.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-fetch-fsck 2>err &&
+ test_grep "missingEmail" err &&
+
+ test_must_fail git -c transfer.fsckObjects=true \
+ clone bundle-fsck/bad.bundle bundle-transfer-fsck 2>err &&
+ test_grep "missingEmail" err
+'
+
test_expect_success 'git bundle uses expected default format' '
git bundle create bundle HEAD^.. &&
cat >expect <<-EOF &&
diff --git a/transport.c b/transport.c
index 0ad04b77fd2..a93c4171f7b 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,8 @@ static int fetch_refs_from_bundle(struct transport *transport,
if (!data->get_refs_from_bundle_called)
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
- &extra_index_pack_args, 0);
+ &extra_index_pack_args,
+ fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0);
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re:Re: [PATCH v7 1/3] bundle-uri: verify oid before writing refs
2024-06-18 17:37 ` Junio C Hamano
@ 2024-06-19 6:30 ` Xing Xin
0 siblings, 0 replies; 66+ messages in thread
From: Xing Xin @ 2024-06-19 6:30 UTC (permalink / raw)
To: Junio C Hamano
Cc: Xing Xin via GitGitGadget, git, Patrick Steinhardt, Karthik Nayak,
Xing Xin
At 2024-06-19 01:37:58, "Junio C Hamano" <gitster@pobox.com> wrote:
>"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> @@ -33,6 +46,16 @@ test_expect_success 'clone with path bundle' '
>> test_cmp expect actual
>> '
>>
>> +test_expect_success 'clone with bundle that has bad header' '
>> + # Write bundle ref fails, but clone can still proceed.
>> + git clone --bundle-uri="clone-from/bad-header.bundle" \
>> + clone-from clone-bad-header 2>err &&
>> + commit_b=$(git -C clone-from rev-parse B) &&
>> + test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
>> + git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
>> + ! grep "refs/bundles/" refs
>
>Why not "test_grep !" here? There are other uses of bare grep in
>the newly added lines, but I won't repeat them here; the same
>comment applies to them.
Both `test_grep !` and `! grep` are widely used in tests. Sorry for not
realizing that the former is preferred.
>> + test_write_lines refs/bundles/topic >expect &&
>> + test_cmp expect actual &&
>> + # Ensure that refs/bundles/topic are sent as "have".
>> + test_grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
>> +'
>
>Can this rev-parse fail (the failure would be hidden from the test)?
>If so,
>
> ...
> test_cmp expect actual &&
> # Ensure that refs/bundles/topic is sent as "have"
> tip=$(git -C clone-from rev-parse A) &&
> test_grep "clone> have $tip" trace-packet.txt
>
>would catch such a failure. You are doing so in the previous test
>in the hunk starting at 33/46 above with commit_b variable already.
>
>There are other uses of git command in $(command substitution) whose
>exit status are ignored in the newly added lines, but I won't
>repeat them here; the same comment applies to them.
Fixed in the new series. Thanks.
Xing Xin
^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2024-06-19 6:31 UTC | newest]
Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-15 3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
2024-05-17 5:00 ` Patrick Steinhardt
2024-05-17 16:14 ` Junio C Hamano
2024-05-20 11:48 ` Xing Xin
2024-05-20 17:19 ` Junio C Hamano
2024-05-27 16:04 ` Xing Xin
2024-05-20 9:41 ` Xing Xin
2024-05-17 7:36 ` Karthik Nayak
2024-05-20 10:19 ` Xing Xin
2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
2024-05-21 15:41 ` Karthik Nayak
2024-05-27 15:41 ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-27 15:41 ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-05-28 11:55 ` Patrick Steinhardt
2024-05-30 8:32 ` Xing Xin
2024-05-27 15:41 ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
2024-05-28 12:03 ` Patrick Steinhardt
2024-05-29 18:12 ` Xing Xin
2024-05-30 4:38 ` Patrick Steinhardt
2024-05-30 8:46 ` Xing Xin
2024-05-27 15:41 ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-05-28 12:03 ` Patrick Steinhardt
2024-05-28 17:10 ` Junio C Hamano
2024-05-28 17:24 ` Junio C Hamano
2024-05-29 5:52 ` Patrick Steinhardt
2024-05-30 8:48 ` Xing Xin
2024-05-29 5:52 ` Patrick Steinhardt
2024-05-27 15:41 ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
2024-05-28 12:05 ` Patrick Steinhardt
2024-05-30 8:54 ` Xing Xin
2024-05-30 8:21 ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 2/4] unbundle: extend verify_bundle_flags to support fsck-objects Xing Xin via GitGitGadget
2024-06-06 12:06 ` Patrick Steinhardt
2024-06-11 6:46 ` Xing Xin
2024-05-30 8:21 ` [PATCH v4 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-05-30 8:21 ` [PATCH v4 4/4] unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
2024-06-06 12:06 ` Patrick Steinhardt
2024-06-11 6:46 ` Xing Xin
2024-06-11 6:42 ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 2/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-11 6:42 ` [PATCH v5 3/4] unbundle: extend options to support object verification Xing Xin via GitGitGadget
2024-06-11 9:11 ` Patrick Steinhardt
2024-06-11 12:47 ` Xing Xin
2024-06-11 6:42 ` [PATCH v5 4/4] unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches Xing Xin via GitGitGadget
2024-06-11 12:45 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11 12:45 ` [PATCH v6 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-11 19:08 ` Junio C Hamano
2024-06-17 13:53 ` Xing Xin
2024-06-11 12:45 ` [PATCH v6 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-11 19:20 ` Junio C Hamano
2024-06-11 12:45 ` [PATCH v6 3/3] unbundle: support object verification for fetches Xing Xin via GitGitGadget
2024-06-11 20:05 ` Junio C Hamano
2024-06-12 18:33 ` Xing Xin
2024-06-11 13:14 ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches Patrick Steinhardt
2024-06-17 13:55 ` [PATCH v7 " blanet via GitGitGadget
2024-06-17 13:55 ` [PATCH v7 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-18 17:37 ` Junio C Hamano
2024-06-19 6:30 ` Xing Xin
2024-06-17 13:55 ` [PATCH v7 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-17 13:55 ` [PATCH v7 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-19 4:07 ` [PATCH v8 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).