* git clone of empty repositories doesn't preserve hash @ 2023-04-05 10:28 Adam Majer 2023-04-05 19:04 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Adam Majer @ 2023-04-05 10:28 UTC (permalink / raw) To: git Hi all, I've noticed while adding support for sha256 repositories for Gitea that, git init --bare --object-format=sha256 a git clone a b Then the repository b is initialized as default hash, so sha1. It seems that receive-pack will list the null OID in the header if there are no refs available, but the upload-pack doesn't list anything and hence the header with capabilities and the hash function is missing git receive-pack a git upload-pack a What is the right approach here? Could upload-pack send a NULL OID followed by header info that is then used by clone? There is a workaround to specify the hash via GIT_DEFAULT_HASH. Not ideal though. Thanks, Adam ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 10:28 git clone of empty repositories doesn't preserve hash Adam Majer @ 2023-04-05 19:04 ` Junio C Hamano 2023-04-05 19:47 ` Adam Majer 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2023-04-05 19:04 UTC (permalink / raw) To: Adam Majer; +Cc: git Adam Majer <adamm@zombino.com> writes: > I've noticed while adding support for sha256 repositories for Gitea that, > > git init --bare --object-format=sha256 a > git clone a b > > Then the repository b is initialized as default hash, so sha1. It > seems that receive-pack will list the null OID in the header if there > are no refs available, but the upload-pack doesn't list anything and > hence the header with capabilities and the hash function is missing > > git receive-pack a > git upload-pack a > > What is the right approach here? Could upload-pack send a NULL OID > followed by header info that is then used by clone? Interesting. Does such a clone copy the name of the primary branch from the remote repository to the newly created repository? I recall we had seen such a feature request but offhand I do not recall how we solved it. The need to transmit a capability even when there is no concrete reference to be fetched to satisfy such a feature request should be the same as yours, so there may already be a good place to add that information. Or we may have ended up not solving the "what is the name of the primary branch in this empty repository?", in which case, the solution for this "what is the hash function to be used in this empty repository?" should be designed to be extensive enough to allow us later support that feature easily. Thanks for raising the issue. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 19:04 ` Junio C Hamano @ 2023-04-05 19:47 ` Adam Majer 2023-04-05 20:01 ` Jeff King 0 siblings, 1 reply; 58+ messages in thread From: Adam Majer @ 2023-04-05 19:47 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 4/5/23 21:04, Junio C Hamano wrote: > Does such a clone copy the name of the primary branch from the > remote repository to the newly created repository? Yes it does. # git init -b maestro --object-format=sha256 a # git clone a b # cat b/.git/HEAD ref: refs/heads/maestro - Adam ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 19:47 ` Adam Majer @ 2023-04-05 20:01 ` Jeff King 2023-04-05 20:40 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Jeff King @ 2023-04-05 20:01 UTC (permalink / raw) To: Adam Majer; +Cc: Junio C Hamano, git On Wed, Apr 05, 2023 at 09:47:55PM +0200, Adam Majer wrote: > On 4/5/23 21:04, Junio C Hamano wrote: > > Does such a clone copy the name of the primary branch from the > > remote repository to the newly created repository? > > Yes it does. > > # git init -b maestro --object-format=sha256 a > # git clone a b > # cat b/.git/HEAD > ref: refs/heads/maestro Yeah, we send a special capability line in that case. If you do: GIT_TRACE_PACKET=1 git clone a b you can see that upload-pack indicates that ls-refs understands the "unborn" capability: packet: upload-pack> version 2 packet: upload-pack> agent=git/2.40.0.824.g7b678b1f643 packet: upload-pack> ls-refs=unborn packet: upload-pack> fetch=shallow wait-for-done packet: upload-pack> server-option packet: upload-pack> object-format=sha256 packet: upload-pack> object-info packet: upload-pack> 0000 And then clone asks for it say "yes, I also understand unborn": packet: clone> command=ls-refs packet: clone> agent=git/2.40.0.824.g7b678b1f643 packet: clone> object-format=sha256 packet: clone> 0001 packet: clone> peel packet: clone> symrefs packet: clone> unborn packet: clone> ref-prefix HEAD packet: clone> ref-prefix refs/heads/ packet: clone> ref-prefix refs/tags/ packet: clone> 0000 And then upload-pack can send us the extra information: packet: upload-pack> unborn HEAD symref-target:refs/heads/maestro packet: upload-pack> 0000 I think we'd need to do something similar here for object-format. -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 20:01 ` Jeff King @ 2023-04-05 20:40 ` Junio C Hamano 2023-04-05 21:15 ` Junio C Hamano 2023-04-05 21:23 ` git clone of empty repositories doesn't preserve hash Jeff King 0 siblings, 2 replies; 58+ messages in thread From: Junio C Hamano @ 2023-04-05 20:40 UTC (permalink / raw) To: Jeff King; +Cc: Adam Majer, git Jeff King <peff@peff.net> writes: > Yeah, we send a special capability line in that case. If you do: > > GIT_TRACE_PACKET=1 git clone a b > > you can see that upload-pack indicates that ls-refs understands the > "unborn" capability: > > packet: upload-pack> version 2 > packet: upload-pack> agent=git/2.40.0.824.g7b678b1f643 > packet: upload-pack> ls-refs=unborn > packet: upload-pack> fetch=shallow wait-for-done > packet: upload-pack> server-option > packet: upload-pack> object-format=sha256 > packet: upload-pack> object-info > packet: upload-pack> 0000 > > And then clone asks for it say "yes, I also understand unborn": > > packet: clone> command=ls-refs > packet: clone> agent=git/2.40.0.824.g7b678b1f643 > packet: clone> object-format=sha256 > packet: clone> 0001 > packet: clone> peel > packet: clone> symrefs > packet: clone> unborn > packet: clone> ref-prefix HEAD > packet: clone> ref-prefix refs/heads/ > packet: clone> ref-prefix refs/tags/ > packet: clone> 0000 > > And then upload-pack can send us the extra information: > > packet: upload-pack> unborn HEAD symref-target:refs/heads/maestro > packet: upload-pack> 0000 > > I think we'd need to do something similar here for object-format. I guess we only need to touch "git clone" then. Without being asked, it advertsizes object-format=sha256 already, and when the maestro repository is prepared without --object-format=sha256, upload-pack advertises object-format=sha1 instead. So it probably is just the matter of capturing it and using it to populate the extensions.objectformat with an appropriate value. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 20:40 ` Junio C Hamano @ 2023-04-05 21:15 ` Junio C Hamano 2023-04-05 21:26 ` Jeff King ` (3 more replies) 2023-04-05 21:23 ` git clone of empty repositories doesn't preserve hash Jeff King 1 sibling, 4 replies; 58+ messages in thread From: Junio C Hamano @ 2023-04-05 21:15 UTC (permalink / raw) To: Jeff King; +Cc: Adam Majer, git Junio C Hamano <gitster@pobox.com> writes: > I guess we only need to touch "git clone" then. Without being > asked, it advertsizes object-format=sha256 already, and when the > maestro repository is prepared without --object-format=sha256, > upload-pack advertises object-format=sha1 instead. So it probably > is just the matter of capturing it and using it to populate the > extensions.objectformat with an appropriate value. It turns out that there was a readily mimickable example in 3d8314f8 (clone: propagate empty remote HEAD even with other branches, 2022-07-07). The commit lifted code out of a block, in which we know we are copying from a non-empty repository, to execute also when talking with an empty repository. The recording of the hash-algorithm in the extensions section is done in the same way, so we can do the same "fix". ----- >8 ----- Subject: [PATCH] clone: propagate object-format when cloning from void A user could prepare an empty repository and set it to use SHA256 as the object format. The new repository created by "git clone" from such a repository however would not record that it is expecting objects in the same SHA256 format. This works as expected if the source repository is not empty. Just like we started copying the name of the primary branch from the remote repository even if it is unborn in 3d8314f8 (clone: propagate empty remote HEAD even with other branches, 2022-07-07), lift the code that records the object format out of the block executed only when cloning from an instantiated repository, so that it works also when cloning from an empty repository. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- builtin/clone.c | 11 ++++++----- t/t5702-protocol-v2.sh | 11 +++++++++++ 2 files changed, 17 insertions(+), 5 deletions(-) diff --git c/builtin/clone.c w/builtin/clone.c index 462c286274..8f16d18a43 100644 --- c/builtin/clone.c +++ w/builtin/clone.c @@ -910,6 +910,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix) int err = 0, complete_refs_before_fetch = 1; int submodule_progress; int filter_submodules = 0; + int hash_algo; struct transport_ls_refs_options transport_ls_refs_options = TRANSPORT_LS_REFS_OPTIONS_INIT; @@ -1298,15 +1299,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix) } } - if (mapped_refs) { - int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport)); - /* * Now that we know what algorithm the remote side is using, * let's set ours to the same thing. */ - initialize_repository_version(hash_algo, 1); - repo_set_hash_algo(the_repository, hash_algo); + hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport)); + initialize_repository_version(hash_algo, 1); + repo_set_hash_algo(the_repository, hash_algo); + + if (mapped_refs) { /* * transport_get_remote_refs() may return refs with null sha-1 * in mapped_refs (see struct transport->get_refs_list diff --git c/t/t5702-protocol-v2.sh w/t/t5702-protocol-v2.sh index 71aabe30b7..6af5c2062f 100755 --- c/t/t5702-protocol-v2.sh +++ w/t/t5702-protocol-v2.sh @@ -269,6 +269,17 @@ test_expect_success 'clone propagates unborn HEAD from non-empty repo' ' grep "warning: remote HEAD refers to nonexistent ref" stderr ' +test_expect_success 'clone propagates object-format from empty repo' ' + test_when_finished "rm -fr src256 dst256" && + + echo sha256 >expect && + git init --object-format=sha256 src256 && + git clone src256 dst256 && + git -C dst256 rev-parse --show-object-format >actual && + + test_cmp expect actual +' + test_expect_success 'bare clone propagates unborn HEAD from non-empty repo' ' test_when_finished "rm -rf file_unborn_parent file_unborn_child.git" && ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 21:15 ` Junio C Hamano @ 2023-04-05 21:26 ` Jeff King 2023-04-05 22:48 ` brian m. carlson ` (2 subsequent siblings) 3 siblings, 0 replies; 58+ messages in thread From: Jeff King @ 2023-04-05 21:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: Adam Majer, git On Wed, Apr 05, 2023 at 02:15:33PM -0700, Junio C Hamano wrote: > ----- >8 ----- > Subject: [PATCH] clone: propagate object-format when cloning from void > > A user could prepare an empty repository and set it to use SHA256 as > the object format. The new repository created by "git clone" from > such a repository however would not record that it is expecting > objects in the same SHA256 format. This works as expected if the > source repository is not empty. > > Just like we started copying the name of the primary branch from the > remote repository even if it is unborn in 3d8314f8 (clone: propagate > empty remote HEAD even with other branches, 2022-07-07), lift the > code that records the object format out of the block executed only > when cloning from an instantiated repository, so that it works also > when cloning from an empty repository. Heh, our mails just crossed, but I stumbled upon the same patch. So yes, this looks good to me. I suspect that setting the flag in the_repository might not matter, since we do not have any refs or objects to manipulate, and what we care about here is that initialize_repository_version() is run. But I agree that the two conceptually belong together, and hoisting both out of the block is the right thing to do. > diff --git c/t/t5702-protocol-v2.sh w/t/t5702-protocol-v2.sh > index 71aabe30b7..6af5c2062f 100755 > --- c/t/t5702-protocol-v2.sh > +++ w/t/t5702-protocol-v2.sh > @@ -269,6 +269,17 @@ test_expect_success 'clone propagates unborn HEAD from non-empty repo' ' > grep "warning: remote HEAD refers to nonexistent ref" stderr > ' > > +test_expect_success 'clone propagates object-format from empty repo' ' > + test_when_finished "rm -fr src256 dst256" && > + > + echo sha256 >expect && > + git init --object-format=sha256 src256 && > + git clone src256 dst256 && > + git -C dst256 rev-parse --show-object-format >actual && > + > + test_cmp expect actual > +' And this test looks straightforward. Very nice. -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 21:15 ` Junio C Hamano 2023-04-05 21:26 ` Jeff King @ 2023-04-05 22:48 ` brian m. carlson 2023-04-06 13:11 ` Adam Majer 2023-04-25 21:35 ` brian m. carlson 3 siblings, 0 replies; 58+ messages in thread From: brian m. carlson @ 2023-04-05 22:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Adam Majer, git [-- Attachment #1: Type: text/plain, Size: 1106 bytes --] On 2023-04-05 at 21:15:33, Junio C Hamano wrote: > A user could prepare an empty repository and set it to use SHA256 as > the object format. The new repository created by "git clone" from > such a repository however would not record that it is expecting > objects in the same SHA256 format. This works as expected if the > source repository is not empty. > > Just like we started copying the name of the primary branch from the > remote repository even if it is unborn in 3d8314f8 (clone: propagate > empty remote HEAD even with other branches, 2022-07-07), lift the > code that records the object format out of the block executed only > when cloning from an instantiated repository, so that it works also > when cloning from an empty repository. Yeah, this looks like the right thing to do. I know this did work originally, at least for protocol v2, but I may have neglected to add a test when I wrote it and it regressed. Thanks for the patch, which I think is obviously correct, and adding a test for this case. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 21:15 ` Junio C Hamano 2023-04-05 21:26 ` Jeff King 2023-04-05 22:48 ` brian m. carlson @ 2023-04-06 13:11 ` Adam Majer 2023-04-25 21:35 ` brian m. carlson 3 siblings, 0 replies; 58+ messages in thread From: Adam Majer @ 2023-04-06 13:11 UTC (permalink / raw) To: Junio C Hamano, Jeff King; +Cc: git On 4/5/23 23:15, Junio C Hamano wrote: > Subject: [PATCH] clone: propagate object-format when cloning from void > Thank you, this fixes this issue. - Adam ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 21:15 ` Junio C Hamano ` (2 preceding siblings ...) 2023-04-06 13:11 ` Adam Majer @ 2023-04-25 21:35 ` brian m. carlson 2023-04-25 22:24 ` Junio C Hamano 2023-04-25 23:12 ` Junio C Hamano 3 siblings, 2 replies; 58+ messages in thread From: brian m. carlson @ 2023-04-25 21:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Adam Majer, git [-- Attachment #1: Type: text/plain, Size: 2116 bytes --] On 2023-04-05 at 21:15:33, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > ----- >8 ----- > Subject: [PATCH] clone: propagate object-format when cloning from void > > A user could prepare an empty repository and set it to use SHA256 as > the object format. The new repository created by "git clone" from > such a repository however would not record that it is expecting > objects in the same SHA256 format. This works as expected if the > source repository is not empty. > > Just like we started copying the name of the primary branch from the > remote repository even if it is unborn in 3d8314f8 (clone: propagate > empty remote HEAD even with other branches, 2022-07-07), lift the > code that records the object format out of the block executed only > when cloning from an instantiated repository, so that it works also > when cloning from an empty repository. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> So it looks like this has made it into master and works for v2 but breaks things for v0 and v1. I noticed because the Git LFS testsuite is broken and the following test demonstrates it: ----- %< ----- diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh index 6c8d4c6cf1..3be5057579 100755 --- a/t/t5700-protocol-v1.sh +++ b/t/t5700-protocol-v1.sh @@ -244,6 +244,17 @@ test_expect_success 'push with ssh:// using protocol v1' ' grep "push< version 1" log ' +test_expect_success 'clone propagates object-format from empty repo' ' + test_when_finished "rm -fr src256 dst256" && + + echo sha256 >expect && + git init --object-format=sha256 src256 && + git clone src256 dst256 && + git -C dst256 rev-parse --show-object-format >actual && + + test_cmp expect actual +' + # Test protocol v1 with 'http://' transport # . "$TEST_DIRECTORY"/lib-httpd.sh ----- %< ----- If nobody looks at this, I'll take a look tomorrow and hopefully send a patch. I just wanted to point this out to the list right away in the interest of getting it noticed. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-25 21:35 ` brian m. carlson @ 2023-04-25 22:24 ` Junio C Hamano 2023-04-25 23:12 ` Junio C Hamano 1 sibling, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2023-04-25 22:24 UTC (permalink / raw) To: brian m. carlson; +Cc: Jeff King, Adam Majer, git "brian m. carlson" <sandals@crustytoothpaste.net> writes: > So it looks like this has made it into master and works for v2 but > breaks things for v0 and v1. I noticed because the Git LFS testsuite is > broken and the following test demonstrates it: > > ----- %< ----- > diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh > index 6c8d4c6cf1..3be5057579 100755 > --- a/t/t5700-protocol-v1.sh > +++ b/t/t5700-protocol-v1.sh > @@ -244,6 +244,17 @@ test_expect_success 'push with ssh:// using protocol v1' ' > grep "push< version 1" log > ' > > +test_expect_success 'clone propagates object-format from empty repo' ' > + test_when_finished "rm -fr src256 dst256" && > + > + echo sha256 >expect && > + git init --object-format=sha256 src256 && > + git clone src256 dst256 && > + git -C dst256 rev-parse --show-object-format >actual && > + > + test_cmp expect actual > +' > + > # Test protocol v1 with 'http://' transport > # > . "$TEST_DIRECTORY"/lib-httpd.sh > ----- %< ----- Interesting. I applied - the above addition to the t/t5700 - reverse of 96f4113a (Merge branch 'jc/clone-object-format-from-void', 2023-04-11), but only the part outside t/ on top of today's 'master', and tried to run t5700 and t5702 (the latter has new tests added by 96f4113a to protect the change in the topic from future breakage for v2). It seems both t5700 and t5702 fails. The latter failing is very much expected; the code change reverted by the above experiment is what made it work in the first place. But the former not working, after reverting the change, is totally unexpected---doesn't it mean that the change in question does not have much to do in the breakage? In fact, the new test to 5700 applied on top of v2.40.1 fails, too. Or are you complaining that the fix merged to 'master' only covers the current protocol and not historical v0/v1 protocol? I couldn't tell, and didn't get that impression from the phrasing "breaks", implying whatever that was "broken" used to be working. Puzzled. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-25 21:35 ` brian m. carlson 2023-04-25 22:24 ` Junio C Hamano @ 2023-04-25 23:12 ` Junio C Hamano 2023-04-26 0:20 ` brian m. carlson 2023-04-26 10:51 ` Jeff King 1 sibling, 2 replies; 58+ messages in thread From: Junio C Hamano @ 2023-04-25 23:12 UTC (permalink / raw) To: brian m. carlson; +Cc: Jeff King, Adam Majer, git "brian m. carlson" <sandals@crustytoothpaste.net> writes: > If nobody looks at this, I'll take a look tomorrow and hopefully send a > patch. I just wanted to point this out to the list right away in the > interest of getting it noticed. Thanks. The topic in question has been in 'master' for 2 weeks, since it was merged at 96f4113a (Merge branch 'jc/clone-object-format-from-void', 2023-04-11), and as I said, what your test demonstrates is not a regression caused by the topic but "was broken, did not get addressed, is still broken". So it does not sound like it needs "right away" kind of attention. > +test_expect_success 'clone propagates object-format from empty repo' ' > + test_when_finished "rm -fr src256 dst256" && > + > + echo sha256 >expect && > + git init --object-format=sha256 src256 && > + git clone src256 dst256 && This needs to be at least "git clone --no-local" for it to be v0/v1 protocol test. It is testing the local optimization codepath. We could peek the original repository and copy the hash function name to the target as part of the local cloning. That obviously is outside the scope of the earlier fix that worked only at the protocol level. And I think even with "--no-local" to make it about v0/v1 protocol, the outcome is still pretty much expected. If we make the above command line to GIT_TRACE_PACKET=1 \ git clone --no-local src256 dst256 && to clone over the on-the-wire protocol, then we see Cloning into 'dst256'... packet: upload-pack> 0000 packet: clone< 0000 warning: You appear to have cloned an empty repository. packet: clone> 0000 packet: upload-pack< 0000 --- expect 2023-04-25 22:55:39.771850195 +0000 ... in the output of "sh t5700-*.sh -i -v". Without any ref, v0/v1 can not carry any capabilities, because there is no ref information to tuck the capabilities on. I unfortunately doubt that any solution would exist that does not break compatibility with the deployed clients that expect the current v0/v1. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-25 23:12 ` Junio C Hamano @ 2023-04-26 0:20 ` brian m. carlson 2023-04-26 11:25 ` Jeff King 2023-04-26 10:51 ` Jeff King 1 sibling, 1 reply; 58+ messages in thread From: brian m. carlson @ 2023-04-26 0:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Adam Majer, git [-- Attachment #1: Type: text/plain, Size: 1687 bytes --] On 2023-04-25 at 23:12:15, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > If nobody looks at this, I'll take a look tomorrow and hopefully send a > > patch. I just wanted to point this out to the list right away in the > > interest of getting it noticed. > > Thanks. > > The topic in question has been in 'master' for 2 weeks, since it was > merged at 96f4113a (Merge branch 'jc/clone-object-format-from-void', > 2023-04-11), and as I said, what your test demonstrates is not a > regression caused by the topic but "was broken, did not get > addressed, is still broken". So it does not sound like it needs > "right away" kind of attention. In my case, the clone is over HTTP, so this may not be the ideal way to reproduce it and it may need a better testcase, but it does bisect to the patch above and it is new in master (and doesn't reproduce in 2.40.0). Note that in our case in the Git LFS testsuite, we're using GIT_DEFAULT_HASH=sha256. I believe what is happening is that for some reason, the object-format data in v0 and v1 is not being read properly, and so we're now setting it to sha1 whereas before we were reading the value from the default setting of the repository (sha256). It very well may be that it's always been broken and this has just made it obvious that it's broken, but I'll look tomorrow and probably send a patch. I don't think we should revert this change, but I do think we need to fix it before 2.41, since I think it means right now that all clones over protocol v0 and v1 end up with a SHA-1 repository. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-26 0:20 ` brian m. carlson @ 2023-04-26 11:25 ` Jeff King 2023-04-26 15:08 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Jeff King @ 2023-04-26 11:25 UTC (permalink / raw) To: brian m. carlson; +Cc: Junio C Hamano, Adam Majer, git On Wed, Apr 26, 2023 at 12:20:00AM +0000, brian m. carlson wrote: > In my case, the clone is over HTTP, so this may not be the ideal way to > reproduce it and it may need a better testcase, but it does bisect to > the patch above and it is new in master (and doesn't reproduce in > 2.40.0). Note that in our case in the Git LFS testsuite, we're using > GIT_DEFAULT_HASH=sha256. > > I believe what is happening is that for some reason, the object-format > data in v0 and v1 is not being read properly, and so we're now setting > it to sha1 whereas before we were reading the value from the default > setting of the repository (sha256). I'm having trouble finding any breakage at all. E.g., this test passes: diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 0908534f25..95b10288e7 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -704,4 +704,24 @@ test_expect_success 'no empty path components' ' ! grep "//" log ' +test_expect_success 'v0 clone over http recognizes object-format' ' + git init --bare --object-format=sha256 \ + "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" && + + # do not test an empty repo. In v0, we have no way for an + # empty server to report its object format, so we would + # always default to sha1. We could in theory test that + # a client who wants to default to sha256 will realize + # the other side is sha1, but we have no way to set that local + # default. Unlike git-init, git-clone does not support + # --object-format, nor GIT_DEFAULT_HASH. + git -C "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" --work-tree=. \ + commit --allow-empty -m foo && + + git -c protocol.version=0 clone $HTTPD_URL/smart/sha256.git sha256 && + git -C sha256 rev-parse --show-object-format >actual && + echo sha256 >expect && + test_cmp expect actual +' + test_done I'd expect it to break in the empty-repo case, for the reasons given in the comment (v0 cannot communicate object-format in an empty repo). But that is nothing new. It sounds from your description that your test is running in a mode where the client defaults to sha256 (though I'm not sure how, since we explicitly document that GIT_DEFAULT_HASH should not affect clone), and then you clone an empty sha256 repository via v0, expecting the result to be sha256. But I think that is a wrong expectation, at least from the client's perspective. An empty repository cannot communicate its object-format over v0, so the client should assume its v0, and should then itself become v0. And that last "should itself become" is what Junio's patch fixed. The first part, "empty repository cannot communicate its object-format over v0" is the part is "it's always been broken". We could fix it, but I'm not sure if it is worth the trouble (see my other message). > It very well may be that it's always been broken and this has just made > it obvious that it's broken, but I'll look tomorrow and probably send a > patch. I don't think we should revert this change, but I do think we > need to fix it before 2.41, since I think it means right now that all > clones over protocol v0 and v1 end up with a SHA-1 repository. Hopefully my guess at what your test is doing is correct, and I didn't just leave us off on a tangent. ;) But if it is, then I think that everything in Git is OK. Non-empty repos over v0 work correctly both before and after Junio's patch. Empty ones before his patch were erroneously using sha256 if they preferred it locally, even when the other side really was sha1. They _also_ were using sha256 erroneously when the other side was sha256 but wasn't able to report it (because of v0 limitations). Which is counter-intuitive, perhaps, but was still the wrong thing for a client to do. -Peff ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-26 11:25 ` Jeff King @ 2023-04-26 15:08 ` Junio C Hamano 2023-04-26 15:13 ` [PATCH] doc: GIT_DEFAULT_HASH is and will be ignored during "clone" Junio C Hamano 2023-04-27 4:46 ` git clone of empty repositories doesn't preserve hash Jeff King 0 siblings, 2 replies; 58+ messages in thread From: Junio C Hamano @ 2023-04-26 15:08 UTC (permalink / raw) To: brian m. carlson, Jeff King; +Cc: Adam Majer, git Jeff King <peff@peff.net> writes: > It sounds from your description that your test is running in a mode > where the client defaults to sha256 (though I'm not sure how, since we > explicitly document that GIT_DEFAULT_HASH should not affect clone), and > then you clone an empty sha256 repository via v0, expecting the result > to be sha256. Thanks for coming up with an excellent guess that helped come up with a reproduction. With Brian's patch a bit tweaked (attached below), the test does fail with the current 'master' and passes before the merge in question. And the trace clearly shows that without being told anything about the object format via the capability, the client chooses to honor GIT_DEFAULT_HASH to initialize the new repository with sha256. There is this entry in the release notes of 2.29.0: * "git clone" that clones from SHA-1 repository, while GIT_DEFAULT_HASH set to use SHA-256 already, resulted in an unusable repository that half-claims to be SHA-256 repository with SHA-1 objects and refs. This has been corrected. This refers to 47ac9703 (builtin/clone: avoid failure with GIT_DEFAULT_HASH, 2020-09-20). It seems that the "fix" described there was incomplete and did not address "clone a void" case, and the patch under discussion finished the incomplete fix without knowing by accident. The test that was added by 47ac9703 to t5601 does use non-empty repositories (one with SHA-1, the other with SHA-256) and tries to check the interaction with "clone" with the environment variable. With the patch under discussion, this test did not break and "clone" is still ignoring the GIT_DEFAULT_HASH variable. Also the description in the document of GIT_DEFAULT_HASH makes readers ambivalent: `GIT_DEFAULT_HASH`:: If this variable is set, the default hash algorithm for new repositories will be set to this value. This value is currently ignored when cloning; the setting of the remote repository is used instead. The default is "sha1". THIS VARIABLE IS EXPERIMENTAL! See `--object-format` in linkgit:git-init[1]. To me, "is currently ignored" hints that the author, or somebody close to the author, of this entry feels that it is a bug that "clone" does not honor it. The log message of 47ac9703 also phrases "failed to honor the GIT_DEFAULT_HASH" as if it were a bad thing [*]. On the other hand, it is clearly documented as experimental so anything that has been relying on the behaviour of any command with a particular value set to this variable are waiting to be broken. I am torn about this. Even though we may have been very clear that GIT_DEFAULT_HASH should not kick in in this case, "clone" was buggy (in the sense that it did not behave as documented in an empty repository) and whatever thing that changed behaviour that Brian noticed was ignoring the documentation and taking advantage of that "bug", and Hyrum's law ensues. In the longer term, after/when we allow incremental/over-the-wire migration of object-format, i.e. clone from SHA-1 repository to create SHA-256 repository (or vice versa) and fetching and pushing between them would bidirectionally convert the object format on the fly, it is likely we would teach a new option "--object-format" to "git clone" to say "you would use whatever object format the origin uses by default, but this time, I am telling you to use this format on our side, doing on-the-fly object format conversion as needed". So it would be OK to make sure that GIT_DEFAULT_HASH is ignored by "clone" as documented now, and even after on-the-fly object-format migration is implemented, I would think. [Footnote] * ... but I think it was misguided. What it says there is this: And we also don't want to initialize the repository as SHA-1 initially, since that means if we're cloning an empty repository, we'll have failed to honor the GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a SHA-256 repository. If this were "When we're cloning an empty repository, we'd have failed to honor the object format the other side has chosen and will end up with a SHA-1 repository, not a SHA-256 repository.", then it is very much in line with the reality before the patch under discussion and also in line with the official stance that "clone" should not honor GIT_DEFAULT_HASH. Where the original description breaks down is when the other side is SHA-1 and this side has GIT_DEFAULT_HASH set to SHA-256. If we honored the variable, we'd create a SHA-256 repository that will talk to SHA-1 repository before the rest of the system is ready. t/t5700-protocol-v1.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git i/t/t5700-protocol-v1.sh w/t/t5700-protocol-v1.sh index 6c8d4c6cf1..2f39dd2e05 100755 --- i/t/t5700-protocol-v1.sh +++ w/t/t5700-protocol-v1.sh @@ -244,6 +244,19 @@ test_expect_success 'push with ssh:// using protocol v1' ' grep "push< version 1" log ' +test_expect_success 'clone propagates object-format from empty repo' ' + test_when_finished "rm -fr src256 dst256" && + + echo sha256 >expect && + git init --object-format=sha256 src256 && + GIT_DEFAULT_HASH=sha256 \ + GIT_TRACE_PACKET=1 \ + git clone --no-local src256 dst256 && + git -C dst256 rev-parse --show-object-format >actual && + + test_cmp expect actual +' + # Test protocol v1 with 'http://' transport # . "$TEST_DIRECTORY"/lib-httpd.sh ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH] doc: GIT_DEFAULT_HASH is and will be ignored during "clone" 2023-04-26 15:08 ` Junio C Hamano @ 2023-04-26 15:13 ` Junio C Hamano 2023-04-26 21:06 ` brian m. carlson 2023-04-27 4:46 ` git clone of empty repositories doesn't preserve hash Jeff King 1 sibling, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2023-04-26 15:13 UTC (permalink / raw) To: git; +Cc: brian m. carlson, Jeff King, Adam Majer The phrasing "is currently ignored" was prone to be misinterpreted as if we were wishing if it were honored. Rephrase it to make it clear that the experimental variable will be ignored. In the longer term, after/when we allow incremental/over-the-wire migration of the object-format, i.e. cloning from an SHA-1 repository to create an SHA-256 repository (or vice versa) and fetching and pushing between them would bidirectionally convert the object format on the fly, it is likely that we would teach a new option "--object-format" to "git clone" to say "you would use whatever object format the origin uses by default, but this time, I am telling you to use this format on our side, doing on-the-fly object format conversion as needed". So it is perfectly OK to ignore the settings of this experimental variable, even after such an extension happens that makes it necessary for us to have a way to create a new repository that uses different object format from the origin repository. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * An obvious follow-up to the previous discussion. Documentation/git.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git c/Documentation/git.txt w/Documentation/git.txt index 74973d3cc4..54b043899f 100644 --- c/Documentation/git.txt +++ w/Documentation/git.txt @@ -546,9 +546,9 @@ double-quotes and respecting backslash escapes. E.g., the value `GIT_DEFAULT_HASH`:: If this variable is set, the default hash algorithm for new - repositories will be set to this value. This value is currently - ignored when cloning; the setting of the remote repository - is used instead. The default is "sha1". THIS VARIABLE IS + repositories will be set to this value. This value is + ignored when cloning and the setting of the remote repository + is always used. The default is "sha1". THIS VARIABLE IS EXPERIMENTAL! See `--object-format` in linkgit:git-init[1]. Git Commits ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] doc: GIT_DEFAULT_HASH is and will be ignored during "clone" 2023-04-26 15:13 ` [PATCH] doc: GIT_DEFAULT_HASH is and will be ignored during "clone" Junio C Hamano @ 2023-04-26 21:06 ` brian m. carlson 0 siblings, 0 replies; 58+ messages in thread From: brian m. carlson @ 2023-04-26 21:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King, Adam Majer [-- Attachment #1: Type: text/plain, Size: 1429 bytes --] On 2023-04-26 at 15:13:55, Junio C Hamano wrote: > The phrasing "is currently ignored" was prone to be misinterpreted > as if we were wishing if it were honored. Rephrase it to make it > clear that the experimental variable will be ignored. > > In the longer term, after/when we allow incremental/over-the-wire > migration of the object-format, i.e. cloning from an SHA-1 > repository to create an SHA-256 repository (or vice versa) and > fetching and pushing between them would bidirectionally convert the > object format on the fly, it is likely that we would teach a new > option "--object-format" to "git clone" to say "you would use > whatever object format the origin uses by default, but this time, I > am telling you to use this format on our side, doing on-the-fly > object format conversion as needed". So it is perfectly OK to > ignore the settings of this experimental variable, even after such > an extension happens that makes it necessary for us to have a way to > create a new repository that uses different object format from the > origin repository. I have a different proposal which clarifies when it will and will not be honoured in my series. I think we would want to honour this variable once we have SHA-1 and SHA-256 interop, and can convert on the fly, so I think keeping the "currently" here is a good idea. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-26 15:08 ` Junio C Hamano 2023-04-26 15:13 ` [PATCH] doc: GIT_DEFAULT_HASH is and will be ignored during "clone" Junio C Hamano @ 2023-04-27 4:46 ` Jeff King 1 sibling, 0 replies; 58+ messages in thread From: Jeff King @ 2023-04-27 4:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: brian m. carlson, Adam Majer, git On Wed, Apr 26, 2023 at 08:08:28AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > It sounds from your description that your test is running in a mode > > where the client defaults to sha256 (though I'm not sure how, since we > > explicitly document that GIT_DEFAULT_HASH should not affect clone), and > > then you clone an empty sha256 repository via v0, expecting the result > > to be sha256. > > Thanks for coming up with an excellent guess that helped come up > with a reproduction. > > With Brian's patch a bit tweaked (attached below), the test does > fail with the current 'master' and passes before the merge in > question. And the trace clearly shows that without being told > anything about the object format via the capability, the client > chooses to honor GIT_DEFAULT_HASH to initialize the new repository > with sha256. Ah, OK. The fact that we do respect GIT_DEFAULT_HASH there solves my remaining confusion. And in retrospect it makes sense, because clone is going to use the equivalent of "git init" under the hood. So we'll start with _something_, and then adjust it based on what we read from the other side. And that "something" respects GIT_DEFAULT_HASH. Sort of a side note: This behavior is somewhat due to the fact that clone is implemented as "init, then fetch into the new repository". It could also conceptually be "talk to the remote, then init, then actually fetch". It never mattered before, but with options like object-format, the "init" step may be affected by things the remote said. So in our world, clone has to "fix up" parts of the repository that were initialized by tweaking the object-format config of the live repo. In a world where the very first thing it did was talk to the remote, then it would pass the option along to init in the first place. It's probably not worth trying to re-architect clone, though (not just in terms of work, but who knows what other subtle assumptions are baked into the current ordering). And it's not like it _solves_ this issue. It just might have made finding it a little less confusing. > [Footnote] > > * ... but I think it was misguided. What it says there is this: > > And we also don't want to initialize the repository as SHA-1 > initially, since that means if we're cloning an empty > repository, we'll have failed to honor the GIT_DEFAULT_HASH > variable and will end up with a SHA-1 repository, not a SHA-256 > repository. > > If this were "When we're cloning an empty repository, we'd have > failed to honor the object format the other side has chosen and will > end up with a SHA-1 repository, not a SHA-256 repository.", then it > is very much in line with the reality before the patch under > discussion and also in line with the official stance that "clone" > should not honor GIT_DEFAULT_HASH. > > Where the original description breaks down is when the other side is > SHA-1 and this side has GIT_DEFAULT_HASH set to SHA-256. If we > honored the variable, we'd create a SHA-256 repository that will > talk to SHA-1 repository before the rest of the system is ready. Exactly. We are (and always have been) wrong 50% of the time in an empty repo for v0. Your recent patch fixed v2 in an empty repo, but in doing so flipped which halves of v0 were wrong. -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-25 23:12 ` Junio C Hamano 2023-04-26 0:20 ` brian m. carlson @ 2023-04-26 10:51 ` Jeff King 2023-04-26 15:42 ` Junio C Hamano 2023-04-26 20:40 ` brian m. carlson 1 sibling, 2 replies; 58+ messages in thread From: Jeff King @ 2023-04-26 10:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: brian m. carlson, Adam Majer, git On Tue, Apr 25, 2023 at 04:12:15PM -0700, Junio C Hamano wrote: > And I think even with "--no-local" to make it about v0/v1 protocol, > the outcome is still pretty much expected. If we make the above > command line to > > GIT_TRACE_PACKET=1 \ > git clone --no-local src256 dst256 && > > to clone over the on-the-wire protocol, then we see > > Cloning into 'dst256'... > packet: upload-pack> 0000 > packet: clone< 0000 > warning: You appear to have cloned an empty repository. > packet: clone> 0000 > packet: upload-pack< 0000 > --- expect 2023-04-25 22:55:39.771850195 +0000 > ... > > in the output of "sh t5700-*.sh -i -v". Without any ref, v0/v1 can > not carry any capabilities, because there is no ref information to > tuck the capabilities on. > > I unfortunately doubt that any solution would exist that does not > break compatibility with the deployed clients that expect the > current v0/v1. We could send a capabilities^{} line, which Git has supported on the client side since eb398797cd (connect: advertized capability is not a ref, 2016-09-09). So sending it should not break even old clients (though we would have to check what alternate implementations like libgit2 or dulwich do; we know JGit supports it). However, the object-format support here was broken until the very recent 13e67aa39b (v0 protocol: fix sha1/sha256 confusion for capabilities^{}, 2023-04-14), so it would only be useful going forward (before then we'd die(), but maybe that is preferable to having the wrong object format?). I'm not sure it's worth the effort, though. If you want to use sha256 everywhere and tell the other side about it, you need a modern client anyway, and that means the ability to speak v2. So this would only matter if for some reason the v2 probe was being ignored (e.g., proxies eating it, ssh refusing environment variable, etc), which itself are things that ideally would be fixed (and can maybe one day even go away if we optimistically default to v2). -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-26 10:51 ` Jeff King @ 2023-04-26 15:42 ` Junio C Hamano 2023-04-26 20:40 ` brian m. carlson 1 sibling, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2023-04-26 15:42 UTC (permalink / raw) To: Jeff King; +Cc: brian m. carlson, Adam Majer, git Jeff King <peff@peff.net> writes: > We could send a capabilities^{} line, which Git has supported on the > client side since eb398797cd (connect: advertized capability is not a > ref, 2016-09-09). So sending it should not break even old clients > (though we would have to check what alternate implementations like > libgit2 or dulwich do; we know JGit supports it). Ah, I forgot all about that JGit workaround. Yes, we can exploit it, and any implementation that does not understand it correctly (including git before 13e67aa3 (v0 protocol: fix sha1/sha256 confusion for capabilities^{}, 2023-04-14)) does not work with JGit when cloning an empty repository anyway, so it is not all that bad. But I tend to agree with your conclusion that it may be an update for the sake of completeness to retrofit v0/v1 on the serving side. It would not help any real-world use cases all that much. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-26 10:51 ` Jeff King 2023-04-26 15:42 ` Junio C Hamano @ 2023-04-26 20:40 ` brian m. carlson 2023-04-26 20:53 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 brian m. carlson ` (3 more replies) 1 sibling, 4 replies; 58+ messages in thread From: brian m. carlson @ 2023-04-26 20:40 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Adam Majer, git [-- Attachment #1: Type: text/plain, Size: 2072 bytes --] On 2023-04-26 at 10:51:34, Jeff King wrote: > We could send a capabilities^{} line, which Git has supported on the > client side since eb398797cd (connect: advertized capability is not a > ref, 2016-09-09). So sending it should not break even old clients > (though we would have to check what alternate implementations like > libgit2 or dulwich do; we know JGit supports it). I have a patch which does exactly this, which I will be sending shortly. I've confirmed that libgit2 and JGit support it, which is unsurprising, since all of the implementations, Git included, share the same code. In addition, this is the behaviour we document as supporting, so all implementations should support it. > However, the object-format support here was broken until the very recent > 13e67aa39b (v0 protocol: fix sha1/sha256 confusion for capabilities^{}, > 2023-04-14), so it would only be useful going forward (before then we'd > die(), but maybe that is preferable to having the wrong object format?). I think it's better to die than to silently have the wrong object format, and it also prevents the problem if other clients using v0 or v1 (which effectively have to be supported for compatibility, while v2 is optional) try to clone from a fixed server. > I'm not sure it's worth the effort, though. If you want to use sha256 > everywhere and tell the other side about it, you need a modern client > anyway, and that means the ability to speak v2. So this would only > matter if for some reason the v2 probe was being ignored (e.g., proxies > eating it, ssh refusing environment variable, etc), which itself are > things that ideally would be fixed (and can maybe one day even go away > if we optimistically default to v2). Using v2 everywhere is difficult because many SSH servers still don't pass GIT_PROTOCOL by default, meaning that we're stuck with v0 and v1. In retrospect, sending an environment variable here was not a great decision, but we're stuck with it now. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 2023-04-26 20:40 ` brian m. carlson @ 2023-04-26 20:53 ` brian m. carlson 2023-04-26 20:53 ` [PATCH 1/2] http: advertise capabilities when cloning empty repos brian m. carlson ` (2 more replies) 2023-04-27 4:56 ` git clone of empty repositories doesn't preserve hash Jeff King ` (2 subsequent siblings) 3 siblings, 3 replies; 58+ messages in thread From: brian m. carlson @ 2023-04-26 20:53 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Adam Majer We recently fixed empty clones with SHA-256 over protocol v2 by honouring the hash algorithm specified even when no refs are present. However, in doing so, we made it impossible to set up a v0 or v1 repository by cloning from an empty SHA-256 repository. In doing so, we also broke the Git LFS testsuite for SHA-256 repositories. This series consists of two patches. The first introduces the dummy `capabilities^{}` entry for fetches and clones from an empty repository for v0 and v1, just as we do for clones. This is already supported by older versions of Git, as well as libgit2 and JGit. The second introduces some backwards compatibility to avoid regressing the old behaviour of using GIT_DEFAULT_HASH to initialize the proper hash in this case. We add a flag to see if we explicitly obtained a hash algorithm from the remote side, and if not, we honour GIT_DEFAULT_HASH, as before. brian m. carlson (2): http: advertise capabilities when cloning empty repos Honor GIT_DEFAULT_HASH for empty clones without remote algo Documentation/git.txt | 10 +++++++--- builtin/clone.c | 8 +++++--- connect.c | 5 ++++- pkt-line.h | 2 ++ t/t5551-http-fetch-smart.sh | 27 +++++++++++++++++++++++++++ t/t5700-protocol-v1.sh | 32 ++++++++++++++++++++++++++++++-- transport-helper.c | 1 + transport.c | 14 ++++++++++++++ transport.h | 14 ++++++++++++++ upload-pack.c | 4 ++++ 10 files changed, 108 insertions(+), 9 deletions(-) ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 1/2] http: advertise capabilities when cloning empty repos 2023-04-26 20:53 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 brian m. carlson @ 2023-04-26 20:53 ` brian m. carlson 2023-04-26 21:14 ` Junio C Hamano 2023-04-27 5:30 ` Jeff King 2023-04-26 20:53 ` [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo brian m. carlson 2023-04-26 21:12 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano 2 siblings, 2 replies; 58+ messages in thread From: brian m. carlson @ 2023-04-26 20:53 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Adam Majer From: "brian m. carlson" <bk2204@github.com> When cloning an empty repository, the HTTP protocol version 0 currently offers nothing but the header and flush packets for the /info/refs endpoint. This means that no capabilities are provided, so the client side doesn't know what capabilities are present. However, this does pose a problem when working with SHA-256 repositories, since we use the capabilities to know the remote side's object format (hash algorithm). It used to be possible to set the correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS testsuite did), but this no longer works as of 8b214c2e9d ("clone: propagate object-format when cloning from void", 2023-04-05), since there we always read the hash algorithm from the remote. If there is no hash algorithm provided, we default to SHA-1 for backwards compatibility. Fortunately, the push version of the protocol already indicates a clue for how to solve this. When the /info/refs endpoint is accessed for a push and the remote is empty, we include a dummy "capabilities^{}" ref pointing to the all-zeros object ID. The protocol documentation already indicates this should _always_ be sent, even for fetches and clones, so let's just do that, which means we'll properly announce the hash algorithm as part of the capabilities. This just works with the existing code because we share the same ref code for fetches and clones, and libgit2 does as well. Signed-off-by: brian m. carlson <bk2204@github.com> --- t/t5551-http-fetch-smart.sh | 27 +++++++++++++++++++++++++++ t/t5700-protocol-v1.sh | 21 +++++++++++++++++++-- upload-pack.c | 4 ++++ 3 files changed, 50 insertions(+), 2 deletions(-) diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 0908534f25..21b7767cbd 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -611,6 +611,33 @@ test_expect_success 'client falls back from v2 to v0 to match server' ' grep symref=HEAD:refs/heads/ trace ' +test_expect_success 'create empty http-accessible SHA-256 repository' ' + mkdir "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" && + (cd "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" && + git --bare init --object-format=sha256 + ) +' + +test_expect_success 'clone empty SHA-256 repository with protocol v2' ' + rm -fr sha256 && + echo sha256 >expected && + git -c protocol.version=2 clone "$HTTPD_URL/smart/sha256.git" && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp actual expected && + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && + test_must_be_empty actual +' + +test_expect_success 'clone empty SHA-256 repository with protocol v0' ' + rm -fr sha256 && + echo sha256 >expected && + GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp actual expected && + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && + test_must_be_empty actual +' + test_expect_success 'passing hostname resolution information works' ' BOGUS_HOST=gitbogusexamplehost.invalid && BOGUS_HTTPD_URL=$HTTPD_PROTO://$BOGUS_HOST:$LIB_HTTPD_PORT && diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh index 6c8d4c6cf1..3cd9db9012 100755 --- a/t/t5700-protocol-v1.sh +++ b/t/t5700-protocol-v1.sh @@ -249,10 +249,12 @@ test_expect_success 'push with ssh:// using protocol v1' ' . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd -test_expect_success 'create repo to be served by http:// transport' ' +test_expect_success 'create repos to be served by http:// transport' ' git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config http.receivepack true && - test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one + test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one && + git init --object-format=sha256 "$HTTPD_DOCUMENT_ROOT_PATH/sha256" && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/sha256" config http.receivepack true ' test_expect_success 'clone with http:// using protocol v1' ' @@ -269,6 +271,21 @@ test_expect_success 'clone with http:// using protocol v1' ' grep "git< version 1" log ' +test_expect_success 'clone with http:// using protocol v1 with empty SHA-256 repo' ' + GIT_TRACE_PACKET=1 GIT_TRACE_CURL=1 git -c protocol.version=1 \ + clone "$HTTPD_URL/smart/sha256" sha256 2>log && + + cat log && + echo sha256 >expect && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp expect actual && + + # Client requested to use protocol v1 + grep "Git-Protocol: version=1" log && + # Server responded using protocol v1 + grep "git< version 1" log +' + test_expect_success 'fetch with http:// using protocol v1' ' test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" two && diff --git a/upload-pack.c b/upload-pack.c index 08633dc121..5ef9b162b6 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -120,6 +120,7 @@ struct upload_pack_data { unsigned allow_ref_in_want : 1; /* v2 only */ unsigned allow_sideband_all : 1; /* v2 only */ unsigned advertise_sid : 1; + unsigned sent_capabilities : 1; }; static void upload_pack_data_init(struct upload_pack_data *data) @@ -1240,6 +1241,7 @@ static int send_ref(const char *refname, const struct object_id *oid, git_user_agent_sanitized()); strbuf_release(&symref_info); strbuf_release(&session_id); + data->sent_capabilities = 1; } else { packet_fwrite_fmt(stdout, "%s %s\n", oid_to_hex(oid), refname_nons); } @@ -1379,6 +1381,8 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, data.no_done = 1; head_ref_namespaced(send_ref, &data); for_each_namespaced_ref(send_ref, &data); + if (!data.sent_capabilities && advertise_refs) + send_ref("capabilities^{}", null_oid(), 0, &data); /* * fflush stdout before calling advertise_shallow_grafts because send_ref * uses stdio. ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH 1/2] http: advertise capabilities when cloning empty repos 2023-04-26 20:53 ` [PATCH 1/2] http: advertise capabilities when cloning empty repos brian m. carlson @ 2023-04-26 21:14 ` Junio C Hamano 2023-04-26 21:28 ` brian m. carlson 2023-04-27 5:30 ` Jeff King 1 sibling, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2023-04-26 21:14 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Jeff King, Adam Majer "brian m. carlson" <sandals@crustytoothpaste.net> writes: > From: "brian m. carlson" <bk2204@github.com> > > When cloning an empty repository, the HTTP protocol version 0 currently > offers nothing but the header and flush packets for the /info/refs > endpoint. This means that no capabilities are provided, so the client > side doesn't know what capabilities are present. > > However, this does pose a problem when working with SHA-256 > repositories, since we use the capabilities to know the remote side's > object format (hash algorithm). It used to be possible to set the > correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS > testsuite did), but this no longer works as of 8b214c2e9d ("clone: "this no longer works as of" -> "this was a mistake and was fixed by". > propagate object-format when cloning from void", 2023-04-05), since > there we always read the hash algorithm from the remote. If there is no > hash algorithm provided, we default to SHA-1 for backwards > compatibility. Other than that, looks good to me. Thanks. Will queue. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 1/2] http: advertise capabilities when cloning empty repos 2023-04-26 21:14 ` Junio C Hamano @ 2023-04-26 21:28 ` brian m. carlson 2023-04-27 5:00 ` Jeff King 0 siblings, 1 reply; 58+ messages in thread From: brian m. carlson @ 2023-04-26 21:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King, Adam Majer [-- Attachment #1: Type: text/plain, Size: 1707 bytes --] On 2023-04-26 at 21:14:37, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > From: "brian m. carlson" <bk2204@github.com> > > > > When cloning an empty repository, the HTTP protocol version 0 currently > > offers nothing but the header and flush packets for the /info/refs > > endpoint. This means that no capabilities are provided, so the client > > side doesn't know what capabilities are present. > > > > However, this does pose a problem when working with SHA-256 > > repositories, since we use the capabilities to know the remote side's > > object format (hash algorithm). It used to be possible to set the > > correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS > > testsuite did), but this no longer works as of 8b214c2e9d ("clone: > > "this no longer works as of" -> "this was a mistake and was fixed by". I tend to disagree. While I agree that change is valuable because it fixes v2, which we want, it does cause a change in user-visible behaviour, which broke the Git LFS testsuite. Whether we like things working that way or not, clearly there were people relying on it. Fortunately, in that case, Git LFS can just enable protocol v2 and things work again, but I think "this no longer works" is accurate and more neutral, and addresses the issue. We wouldn't have to deal with that issue if we could gracefully handle git clone --local with older versions of the protocol, but one of the tests fails when we do that. I'll take some more time to see if I can come up with a nice way to gracefully handle that, and if so, I'll send a v2. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 1/2] http: advertise capabilities when cloning empty repos 2023-04-26 21:28 ` brian m. carlson @ 2023-04-27 5:00 ` Jeff King 0 siblings, 0 replies; 58+ messages in thread From: Jeff King @ 2023-04-27 5:00 UTC (permalink / raw) To: brian m. carlson; +Cc: Junio C Hamano, git, Adam Majer On Wed, Apr 26, 2023 at 09:28:31PM +0000, brian m. carlson wrote: > > > However, this does pose a problem when working with SHA-256 > > > repositories, since we use the capabilities to know the remote side's > > > object format (hash algorithm). It used to be possible to set the > > > correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS > > > testsuite did), but this no longer works as of 8b214c2e9d ("clone: > > > > "this no longer works as of" -> "this was a mistake and was fixed by". > > I tend to disagree. While I agree that change is valuable because it > fixes v2, which we want, it does cause a change in user-visible > behaviour, which broke the Git LFS testsuite. Whether we like things > working that way or not, clearly there were people relying on it. > > Fortunately, in that case, Git LFS can just enable protocol v2 and > things work again, but I think "this no longer works" is accurate and > more neutral, and addresses the issue. We wouldn't have to deal with > that issue if we could gracefully handle git clone --local with older > versions of the protocol, but one of the tests fails when we do that. > I'll take some more time to see if I can come up with a nice way to > gracefully handle that, and if so, I'll send a v2. Reiterating what I said upthread, I think it was always 50% broken. Taking the local hash format over the remote one was always the wrong thing to do, but it sometimes worked out (because we happened to match the remote). But the opposite case: git init --object-format=sha1 dst.git GIT_DEFAULT_HASH=sha256 git clone dst.git would previously have done the wrong thing. We just flipped which half was broken and which was not. -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 1/2] http: advertise capabilities when cloning empty repos 2023-04-26 20:53 ` [PATCH 1/2] http: advertise capabilities when cloning empty repos brian m. carlson 2023-04-26 21:14 ` Junio C Hamano @ 2023-04-27 5:30 ` Jeff King 2023-04-27 20:40 ` Junio C Hamano 1 sibling, 1 reply; 58+ messages in thread From: Jeff King @ 2023-04-27 5:30 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Junio C Hamano, Adam Majer On Wed, Apr 26, 2023 at 08:53:23PM +0000, brian m. carlson wrote: > From: "brian m. carlson" <bk2204@github.com> > > When cloning an empty repository, the HTTP protocol version 0 currently > offers nothing but the header and flush packets for the /info/refs > endpoint. This means that no capabilities are provided, so the client > side doesn't know what capabilities are present. Is this really an HTTP problem? If I do: git init --bare --object-format=sha256 remote.git git -c protocol.version=0 clone --bare remote.git local.git git -C local.git rev-parse --show-object-format I will get sha1, which is wrong. Likewise with GIT_DEFAULT_HASH=sha256 on the clone (after Junio's recent patch), regardless of what the server claims. This is really a git-protocol issue that affects all transports. So I think in this hunk: > @@ -1379,6 +1381,8 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, > data.no_done = 1; > head_ref_namespaced(send_ref, &data); > for_each_namespaced_ref(send_ref, &data); > + if (!data.sent_capabilities && advertise_refs) > + send_ref("capabilities^{}", null_oid(), 0, &data); > /* > * fflush stdout before calling advertise_shallow_grafts because send_ref > * uses stdio. you would want to drop the "&& advertise_refs" bit, after which both of the cases above would yield a sha256 repository. There is one other catch, though. Doing as I suggest results in a failure in t5509, because the new code does not interact correctly with namespaces. That is true of your version, as well; it's just that the test suite does not cover the combination of namespaces, http, and empty repos. The issue is that send_ref() will try to strip the namespace, and end up with NULL (which on my glibc system ends up with a ref named "(null)", but obviously could segfault, too). Something like this fixes it: diff --git a/environment.c b/environment.c index 8a96997539..37cd66b295 100644 --- a/environment.c +++ b/environment.c @@ -234,6 +234,8 @@ const char *get_git_namespace(void) const char *strip_namespace(const char *namespaced_ref) { const char *out; + if (!strcmp(namespaced_ref, "capabilities^{}")) + return namespaced_ref; /* magic ref */ if (skip_prefix(namespaced_ref, get_git_namespace(), &out)) return out; return NULL; but I suspect it would be cleaner to refactor send_ref() to allow sending a name more directly. (As an aside, it feels like send_ref() is also wrong not to check for NULL from strip_namespace(), but I guess in practice we do not feed it names outside of the namespace. Might be a good candidate for a BUG() check or other assertion). > +test_expect_success 'clone empty SHA-256 repository with protocol v0' ' > + rm -fr sha256 && > + echo sha256 >expected && > + GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" && > + git -C sha256 rev-parse --show-object-format >actual && > + test_cmp actual expected && > + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && > + test_must_be_empty actual > +' This looks reasonable, though I think if we do not need HTTP to demonstrate the issue (and I don't think we do), then we should probably avoid it, just to get test coverage on platforms that don't support HTTP. -Peff ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH 1/2] http: advertise capabilities when cloning empty repos 2023-04-27 5:30 ` Jeff King @ 2023-04-27 20:40 ` Junio C Hamano 0 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2023-04-27 20:40 UTC (permalink / raw) To: Jeff King; +Cc: brian m. carlson, git, Adam Majer Jeff King <peff@peff.net> writes: > So I think in this hunk: > >> @@ -1379,6 +1381,8 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, >> data.no_done = 1; >> head_ref_namespaced(send_ref, &data); >> for_each_namespaced_ref(send_ref, &data); >> + if (!data.sent_capabilities && advertise_refs) >> + send_ref("capabilities^{}", null_oid(), 0, &data); >> /* >> * fflush stdout before calling advertise_shallow_grafts because send_ref >> * uses stdio. > > you would want to drop the "&& advertise_refs" bit, after which both of > the cases above would yield a sha256 repository. Good suggestion. >> +test_expect_success 'clone empty SHA-256 repository with protocol v0' ' >> + rm -fr sha256 && >> + echo sha256 >expected && >> + GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" && >> + git -C sha256 rev-parse --show-object-format >actual && >> + test_cmp actual expected && >> + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && >> + test_must_be_empty actual >> +' > > This looks reasonable, though I think if we do not need HTTP to > demonstrate the issue (and I don't think we do), then we should probably > avoid it, just to get test coverage on platforms that don't support > HTTP. HTTP tests tend to be more cumbersome to set up and harder to debug than the plain vanilla "over the pipe on the same machine" transport, so I tend to agree with the statement. They however represent a more common use case, so having HTTP tests in addition to non-HTTP tests would be nicer, if we can afford to. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo 2023-04-26 20:53 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 brian m. carlson 2023-04-26 20:53 ` [PATCH 1/2] http: advertise capabilities when cloning empty repos brian m. carlson @ 2023-04-26 20:53 ` brian m. carlson 2023-04-26 21:18 ` Junio C Hamano 2023-04-26 21:33 ` Junio C Hamano 2023-04-26 21:12 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano 2 siblings, 2 replies; 58+ messages in thread From: brian m. carlson @ 2023-04-26 20:53 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Adam Majer From: "brian m. carlson" <bk2204@github.com> The previous commit introduced a change that allows HTTP v0 and v1 operations to determine the hash of an empty remote repository by sending capabilities. However, there are still some cases, such as when cloning locally, where the capabilities are not sent. This is because for local operations, we don't strip out the fake "capabilities^{}" ref, and thus "git ls-remote" would produce incorrect values if we did. However, up until 8b214c2e9d ("clone: propagate object-format when cloning from void", 2023-04-05), we honored GIT_DEFAULT_HASH in this case, so let's continue to do that. Check whether the hash algorithm was explicitly set, and if so, continue to use that value. If not, use the default value for GIT_DEFAULT_HASH to ensure that we can at least properly configure an empty clone whose hash algorithm we know. Note that without this patch, git clone cannot create a SHA-256 repository from an empty remote without protocol v2 (except over HTTP, as in the previous patch). --- Documentation/git.txt | 10 +++++++--- builtin/clone.c | 8 +++++--- connect.c | 5 ++++- pkt-line.h | 2 ++ t/t5700-protocol-v1.sh | 11 +++++++++++ transport-helper.c | 1 + transport.c | 14 ++++++++++++++ transport.h | 14 ++++++++++++++ 8 files changed, 58 insertions(+), 7 deletions(-) diff --git a/Documentation/git.txt b/Documentation/git.txt index 74973d3cc4..48eda9f883 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -547,9 +547,13 @@ double-quotes and respecting backslash escapes. E.g., the value `GIT_DEFAULT_HASH`:: If this variable is set, the default hash algorithm for new repositories will be set to this value. This value is currently - ignored when cloning; the setting of the remote repository - is used instead. The default is "sha1". THIS VARIABLE IS - EXPERIMENTAL! See `--object-format` in linkgit:git-init[1]. + ignored when cloning if the remote value can be definitively + determined; the setting of the remote repository is used + instead. The value is honored if the remote repository's + algorithm cannot be determined, such as some cases when + the remote repository is empty. The default is "sha1". + THIS VARIABLE IS EXPERIMENTAL! See `--object-format` + in linkgit:git-init[1]. Git Commits ~~~~~~~~~~~ diff --git a/builtin/clone.c b/builtin/clone.c index 186845ef0b..c207798de9 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -1316,13 +1316,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix) } } + if (transport_get_hash_algo_explicit(transport)) { /* * Now that we know what algorithm the remote side is using, * let's set ours to the same thing. */ - hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport)); - initialize_repository_version(hash_algo, 1); - repo_set_hash_algo(the_repository, hash_algo); + hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport)); + initialize_repository_version(hash_algo, 1); + repo_set_hash_algo(the_repository, hash_algo); + } if (mapped_refs) { /* diff --git a/connect.c b/connect.c index 3a0186280c..40cb9bf261 100644 --- a/connect.c +++ b/connect.c @@ -243,8 +243,10 @@ static void process_capabilities(struct packet_reader *reader, int *linelen) if (feat_val) { char *hash_name = xstrndup(feat_val, feat_len); int hash_algo = hash_algo_by_name(hash_name); - if (hash_algo != GIT_HASH_UNKNOWN) + if (hash_algo != GIT_HASH_UNKNOWN) { reader->hash_algo = &hash_algos[hash_algo]; + reader->hash_algo_explicit = 1; + } free(hash_name); } else { reader->hash_algo = &hash_algos[GIT_HASH_SHA1]; @@ -493,6 +495,7 @@ static void send_capabilities(int fd_out, struct packet_reader *reader) if (hash_algo == GIT_HASH_UNKNOWN) die(_("unknown object format '%s' specified by server"), hash_name); reader->hash_algo = &hash_algos[hash_algo]; + reader->hash_algo_explicit = 1; packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name); } else { reader->hash_algo = &hash_algos[GIT_HASH_SHA1]; diff --git a/pkt-line.h b/pkt-line.h index 8e9846f315..10700a9d8c 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -190,6 +190,8 @@ struct packet_reader { int line_peeked; unsigned use_sideband : 1; + /* indicates if we saw an explicit capability */ + unsigned hash_algo_explicit : 1; const char *me; /* hash algorithm in use */ diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh index 3cd9db9012..ad24c7fe64 100755 --- a/t/t5700-protocol-v1.sh +++ b/t/t5700-protocol-v1.sh @@ -244,6 +244,17 @@ test_expect_success 'push with ssh:// using protocol v1' ' grep "push< version 1" log ' +test_expect_success 'clone propagates object-format from empty repo' ' + test_when_finished "rm -fr src256 dst256" && + + echo sha256 >expect && + git init --object-format=sha256 src256 && + GIT_DEFAULT_HASH=sha256 git -c protocol.version=1 clone --no-local src256 dst256 && + git -C dst256 rev-parse --show-object-format >actual && + + test_cmp expect actual +' + # Test protocol v1 with 'http://' transport # . "$TEST_DIRECTORY"/lib-httpd.sh diff --git a/transport-helper.c b/transport-helper.c index 6b816940dc..c65cf7c620 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -1236,6 +1236,7 @@ static struct ref *get_refs_list_using_list(struct transport *transport, die(_("unsupported object format '%s'"), value); transport->hash_algo = &hash_algos[algo]; + transport->hash_algo_explicit = 1; } continue; } diff --git a/transport.c b/transport.c index 67afdae57c..7774487e8d 100644 --- a/transport.c +++ b/transport.c @@ -147,6 +147,12 @@ static void get_refs_from_bundle_inner(struct transport *transport) die(_("could not read bundle '%s'"), transport->url); transport->hash_algo = data->header.hash_algo; + /* + * This is always set, even if we didn't get an explicit object-format + * capability, since we know that a missing capability or a v2 bundle + * definitively indicates SHA-1. + */ + transport->hash_algo_explicit = 1; } static struct ref *get_refs_from_bundle(struct transport *transport, @@ -190,6 +196,7 @@ static int fetch_refs_from_bundle(struct transport *transport, ret = unbundle(the_repository, &data->header, data->fd, &extra_index_pack_args, 0); transport->hash_algo = data->header.hash_algo; + transport->hash_algo_explicit = 1; return ret; } @@ -360,6 +367,7 @@ static struct ref *handshake(struct transport *transport, int for_push, } data->finished_handshake = 1; transport->hash_algo = reader.hash_algo; + transport->hash_algo_explicit = reader.hash_algo_explicit; if (reader.line_peeked) BUG("buffer must be empty at the end of handshake()"); @@ -1190,6 +1198,7 @@ struct transport *transport_get(struct remote *remote, const char *url) } ret->hash_algo = &hash_algos[GIT_HASH_SHA1]; + ret->hash_algo_explicit = 0; return ret; } @@ -1199,6 +1208,11 @@ const struct git_hash_algo *transport_get_hash_algo(struct transport *transport) return transport->hash_algo; } +int transport_get_hash_algo_explicit(struct transport *transport) +{ + return transport->hash_algo_explicit; +} + int transport_set_option(struct transport *transport, const char *name, const char *value) { diff --git a/transport.h b/transport.h index 6393cd9823..ce67eefc58 100644 --- a/transport.h +++ b/transport.h @@ -128,6 +128,11 @@ struct transport { * in transport_set_verbosity(). **/ unsigned progress : 1; + /* + * Indicates whether the hash algorithm was initialized explicitly as + * opposed to using a fallback. + */ + unsigned hash_algo_explicit : 1; /* * If transport is at least potentially smart, this points to * git_transport_options structure to use in case transport @@ -305,6 +310,15 @@ int transport_get_remote_bundle_uri(struct transport *transport); * This can only be called after fetching the remote refs. */ const struct git_hash_algo *transport_get_hash_algo(struct transport *transport); +/* + * Fetch whether the hash algorithm provided was explicitly set. + * + * If this value is false, "transport_get_hash_algo" will always return a value + * of SHA-1, which is the default algorithm if none is specified. + * + * This can only be called after fetching the remote refs. + */ +int transport_get_hash_algo_explicit(struct transport *transport); int transport_fetch_refs(struct transport *transport, struct ref *refs); /* ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo 2023-04-26 20:53 ` [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo brian m. carlson @ 2023-04-26 21:18 ` Junio C Hamano 2023-04-26 21:33 ` Junio C Hamano 1 sibling, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2023-04-26 21:18 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Jeff King, Adam Majer "brian m. carlson" <sandals@crustytoothpaste.net> writes: > However, up until 8b214c2e9d ("clone: propagate object-format when > cloning from void", 2023-04-05), we honored GIT_DEFAULT_HASH in this > case, so let's continue to do that. Let's not. Once we identified the bug of mistakenly honoring a wrong variable, let's fix it and keep it fixed. The local optimization, if necessary, can be taught to peek the source repository and propagating the object format selection to the destination repository. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo 2023-04-26 20:53 ` [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo brian m. carlson 2023-04-26 21:18 ` Junio C Hamano @ 2023-04-26 21:33 ` Junio C Hamano 2023-04-27 5:43 ` Jeff King 1 sibling, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2023-04-26 21:33 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Jeff King, Adam Majer "brian m. carlson" <sandals@crustytoothpaste.net> writes: > `GIT_DEFAULT_HASH`:: > If this variable is set, the default hash algorithm for new > repositories will be set to this value. This value is currently > + ignored when cloning if the remote value can be definitively > + determined; the setting of the remote repository is used > + instead. The value is honored if the remote repository's > + algorithm cannot be determined, such as some cases when > + the remote repository is empty. The default is "sha1". > + THIS VARIABLE IS EXPERIMENTAL! See `--object-format` > + in linkgit:git-init[1]. We'd need to evantually cover all the transports (and non-transport like the "--local" optimization) so that the object-format and other choices are communicated from the origin to a new clone anyway, so this extra complexity "until X is fixed, it behaves this way, but otherwise the variable is read in the meantime" may be a disservice to the end users, even though it may make it easier in the shorter term for maintainers of programs that rely on the buggy "git clone" that partially honored this environment variable. In short, I am still not convinced that the above is a good design choice in the longer term. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo 2023-04-26 21:33 ` Junio C Hamano @ 2023-04-27 5:43 ` Jeff King 2023-05-02 23:46 ` Is GIT_DEFAULT_HASH flawed? Felipe Contreras 0 siblings, 1 reply; 58+ messages in thread From: Jeff King @ 2023-04-27 5:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: brian m. carlson, git, Adam Majer On Wed, Apr 26, 2023 at 02:33:30PM -0700, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > `GIT_DEFAULT_HASH`:: > > If this variable is set, the default hash algorithm for new > > repositories will be set to this value. This value is currently > > + ignored when cloning if the remote value can be definitively > > + determined; the setting of the remote repository is used > > + instead. The value is honored if the remote repository's > > + algorithm cannot be determined, such as some cases when > > + the remote repository is empty. The default is "sha1". > > + THIS VARIABLE IS EXPERIMENTAL! See `--object-format` > > + in linkgit:git-init[1]. > > We'd need to evantually cover all the transports (and non-transport > like the "--local" optimization) so that the object-format and other > choices are communicated from the origin to a new clone anyway, so > this extra complexity "until X is fixed, it behaves this way, but > otherwise the variable is read in the meantime" may be a disservice > to the end users, even though it may make it easier in the shorter > term for maintainers of programs that rely on the buggy "git clone" > that partially honored this environment variable. > > In short, I am still not convinced that the above is a good design > choice in the longer term. I also think it is working against the backwards-compatible design of the hash function transition. If we do not see an object-format line from the remote, then either: 1. They sent us capabilities, but it did not include object-format. So if we are in GIT_DEFAULT_HASH=sha256 mode locally, but the other side is an older version of Git (or even a current version of other implementations, like Dulwich) that do not send object-format at all, then we will not correctly fall back to assuming they are sha1. In a non-empty repo, this means we'll fail to parse their ref advertisement (we'll expect sha256 hashes but get sha1), and cloning will be broken. 2. They did not send us capabilities, because the repo is empty (and the server does not have brian's patch 1). The hash transition doc says we're supposed to assume they're sha1. It's _sort of_ academic, in that they also are not telling us about any refs on their side. But we may end up mis-matched with the server (again, this is the 50/50 thing; we don't know what their format is). Presumably that bites us later when we try to push up new objects (but would eventually work when we support interop). I think handling (2) is iffy as a goal, but the collateral damage of (1) is a complete show-stopper for this patch. If we wanted to do (2) by itself, we'd have to distinguish "did they even send us a capabilities line" as a separate case (but I tend to agree with you that it is not worth doing for now). -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Is GIT_DEFAULT_HASH flawed? 2023-04-27 5:43 ` Jeff King @ 2023-05-02 23:46 ` Felipe Contreras 2023-05-03 9:03 ` Adam Majer ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Felipe Contreras @ 2023-05-02 23:46 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: brian m. carlson, git, Adam Majer Hi, Changing the subject as this message seems like a different topic. Jeff King wrote: > On Wed, Apr 26, 2023 at 02:33:30PM -0700, Junio C Hamano wrote: > > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > > > `GIT_DEFAULT_HASH`:: > > > If this variable is set, the default hash algorithm for new > > > repositories will be set to this value. This value is currently > > > + ignored when cloning if the remote value can be definitively > > > + determined; the setting of the remote repository is used > > > + instead. The value is honored if the remote repository's > > > + algorithm cannot be determined, such as some cases when > > > + the remote repository is empty. The default is "sha1". > > > + THIS VARIABLE IS EXPERIMENTAL! See `--object-format` > > > + in linkgit:git-init[1]. > > > > We'd need to evantually cover all the transports (and non-transport > > like the "--local" optimization) so that the object-format and other > > choices are communicated from the origin to a new clone anyway, so > > this extra complexity "until X is fixed, it behaves this way, but > > otherwise the variable is read in the meantime" may be a disservice > > to the end users, even though it may make it easier in the shorter > > term for maintainers of programs that rely on the buggy "git clone" > > that partially honored this environment variable. > > > > In short, I am still not convinced that the above is a good design > > choice in the longer term. > > I also think it is working against the backwards-compatible design of > the hash function transition. To be honest this whole approach seems to be completely flawed to me and against the whole design of git in the first place. In a recent email Linus Torvalds explained why object ids were calculated based {type, size, data} [1], and he explained very clearly that two objects with exactly the same data are not supposed to have the same id if the type is different. If even the tiniest change such as adding a period to a commit messange changes the object id (and thus semantically makes it a different object), then it makes sense that changing the type of an object also changes the object id (and thus it's also a different object). And because the id of the parent is included in the content of every commit, the top-level id ensures the integrity of the whole graph. But then comes this notion that the hash algorithm is a property of the repository, and not part of the object storage, which means changing the whole hash algorithm of a repository is considered less of a change than adding a period to the commit message, worse: not a change at all. I am reminded of the warning Sam Smith gave to the Git project [2] which seemed to be unheard, but the notion of cryptographic algorithm agility makes complete sense to me. In my view one repository should be able to have part SHA-1 history, part SHA3-256 history, and part BLAKE2b history. Changing the hash algorithm of one commit should change the object id of that commit, and thus make it semantically a different commit. In other words: an object of type "blob" should never be confused with an object of type "blob:sha-256", even if the content is exactly the same. The fact that apparently it's so easy to clone a repository with the wrong hash algorithm should give developers pause, as it means the whole point of using cryptographic hash algorithms to ensure the integrity of the commit history is completely gone. I have not been following the SHA-1 -> OID discussions, but I distinctively recall Linus Torvalds mentioning that the choice of using SHA-1 wasn't even for security purposes, it was to ensure integrity. When I do a `git fetch` as long as the new commits have the same SHA-1 as parent as the SHA-1s I have in my repository I can be relatively certain the repository has not been tampered with. Which means that if I do a `git fetch` that suddenly brings SHA-256 commits, some of them must have SHA-1 parents that match the ones I currently have. Otherwise how do I know it's the same history? Maybe that's one of the reasons people don't seem particularly eager to move away from SHA-1: Better the SHA-1 you know, than the SHA-256 you don't. Cheers. [1] https://lore.kernel.org/git/CAHk-=wjr-CMLX2Jo2++rwcv0VNr+HmZqXEVXNsJGiPRUwNxzBQ@mail.gmail.com/ [2] https://lore.kernel.org/git/D433038A-2643-4F63-8677-CA8AB6904AE1@samuelsmith.org/ -- Felipe Contreras ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-02 23:46 ` Is GIT_DEFAULT_HASH flawed? Felipe Contreras @ 2023-05-03 9:03 ` Adam Majer 2023-05-03 15:44 ` Felipe Contreras 2023-05-03 9:09 ` demerphq 2023-05-03 22:54 ` brian m. carlson 2 siblings, 1 reply; 58+ messages in thread From: Adam Majer @ 2023-05-03 9:03 UTC (permalink / raw) To: Felipe Contreras, Jeff King, Junio C Hamano; +Cc: brian m. carlson, git On 5/3/23 01:46, Felipe Contreras wrote: > To be honest this whole approach seems to be completely flawed to me and > against the whole design of git in the first place. The discussion above is mostly moot now since this has been fixed in later patches in this thread, AFAIK. It's also moot for other reasons, like the hash function transition plan is not really implemented, yet. Also, this was about corner-case, like it often is. > In a recent email Linus Torvalds explained why object ids were > calculated based {type, size, data} [1], and he explained very clearly > that two objects with exactly the same data are not supposed to have the > same id if the type is different. This is different. But aside, type + size + data are not really much different from just having data in a hash function. There are plenty of hash collisions where HASH(type + size + data) == HASH(type + size + data') by definition of how these functions work. The problem is always in finding these collisions. But anyway... > In my view one repository should be able to have part SHA-1 history, > part SHA3-256 history, and part BLAKE2b history. Yes, that would be great. Please provide patch series for this :-) > I have not been following the SHA-1 -> OID discussions, but I > distinctively recall Linus Torvalds mentioning that the choice of using > SHA-1 wasn't even for security purposes, it was to ensure integrity. These are different sides of the same coin. Hashes are used to provide integrity. Hashes like MD4, MD5, SHA1, SHA256 are there for integrity. Some of these are no longer recommended and some are completely broken. > Better the SHA-1 you know, than the SHA-256 you don't. Wrong conclusion ;) Also, we know SHA-256 The problem in git-core and virtually all clients and other implementations is/was that SHA1 was hardcoded and assumed to be THE ONE and ONLY hash. It will take quite a bit of work outside of git-core to remove this one assumption (remember two digit year and 2000? - yes I'm old). Once this hash assumption is removed, you can start talking about adding other hashes and interop. Keep in mind -- hashes are there for object reference. They are the glue in git. But there is really nothing stopping us from recalculating them "on the fly". If you have SHA1 repo, you can calculate a SHA256 or whatever hash for any type object. That's not the problem, conceptually speaking. Finally, let not have a "bike shed" discussion about this. The GIT_DEFAULT_HASH is meant to be used by `git init` in-lieu of --object-format parameter, so it's not flawed. When used in other applications, it probably indicates a bug. But we can't fix all the bugs at once :-) Cheers, - Adam ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-03 9:03 ` Adam Majer @ 2023-05-03 15:44 ` Felipe Contreras 2023-05-03 17:21 ` Adam Majer 0 siblings, 1 reply; 58+ messages in thread From: Felipe Contreras @ 2023-05-03 15:44 UTC (permalink / raw) To: Adam Majer, Felipe Contreras, Jeff King, Junio C Hamano Cc: brian m. carlson, git Adam Majer wrote: > On 5/3/23 01:46, Felipe Contreras wrote: > > To be honest this whole approach seems to be completely flawed to me and > > against the whole design of git in the first place. > > The discussion above is mostly moot now since this has been fixed in > later patches in this thread, AFAIK. That particular isssue might be fixed, but that issue should never have happened in the first place if the design was correct. A bad design makes certain errors prone to happen, a good design makes the same errors happen rarely, a great design makes those errors impossible. Git was designed to make it *impossible* to confuse two commits with similar data. The symptom might have been fixed, that doesn't mean there's no underlying problem. > It's also moot for other reasons, like the hash function transition plan is > not really implemented, yet. The implemention of the plan isn't the problem, it's the plan itself. > Also, this was about corner-case, like it often is. A corner-case that should be impossible. > > In a recent email Linus Torvalds explained why object ids were > > calculated based {type, size, data} [1], and he explained very clearly > > that two objects with exactly the same data are not supposed to have the > > same id if the type is different. > > This is different. But aside, type + size + data are not really much > different from just having data in a hash function. It's completely different. > There are plenty of hash collisions where > > HASH(type + size + data) == HASH(type + size + data') > > by definition of how these functions work. The problem is always in > finding these collisions. But anyway... I don't think you understand why Linus Torvalds chose to hash objects. > > In my view one repository should be able to have part SHA-1 history, > > part SHA3-256 history, and part BLAKE2b history. > > Yes, that would be great. Please provide patch series for this :-) I have hundreds of patches being ignored, why would I write yet another patch series that will be ignored? > > I have not been following the SHA-1 -> OID discussions, but I > > distinctively recall Linus Torvalds mentioning that the choice of using > > SHA-1 wasn't even for security purposes, it was to ensure integrity. > > These are different sides of the same coin. Hashes are used to provide > integrity. Hashes like MD4, MD5, SHA1, SHA256 are there for integrity. > Some of these are no longer recommended and some are completely broken. There are different philosophical views of what "security" means, and it seems pretty clear to me that your view does not align with the view of Linus Torvalds. > > Better the SHA-1 you know, than the SHA-256 you don't. > > Wrong conclusion ;) Also, we know SHA-256 You don't understand what is being said. Which hash is more trustworthy? a. 69c786637d7a7fe3b2b8f7d989af095f5f49c3a8 b. d891b12414e1d9331f8cbb15acfe690671974f27ba76e2b423294cfb7a055f2f If you answer b just beacuse it's SHA-256 you don't understand security. b is a random commit I generated, a is the current git.git master. A SHA-1 hash from a source you trust is inifinitely more trustworthy than a random SHA-256 hash. Even a known MD5 hash is better in this respect. > Keep in mind -- hashes are there for object reference. No. I don't think you understand why Linus Torvalds used hashes. > If you have SHA1 repo, you can calculate a SHA256 or whatever hash for any > type object. I know it *can* be done, I understand how hash algorithms work, but just because something *can* be done doesn't mean it *should*. You *can* generate a SHA-1 of a blob's data, instead of a SHA-1 of a blob's `type + size + data`, does that mean we should? No. > Finally, let not have a "bike shed" discussion about this. Discussing the original design of git's object storage which has withstood the test of time for 18 years is not "bike sheding". I don't even think you understand what I'm trying to say. --- Why do you think these commands generate different hashes? git hash-object -t blob /dev/null git hash-object -t tree /dev/null -- Felipe Contreras ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-03 15:44 ` Felipe Contreras @ 2023-05-03 17:21 ` Adam Majer 2023-05-08 0:34 ` Felipe Contreras 0 siblings, 1 reply; 58+ messages in thread From: Adam Majer @ 2023-05-03 17:21 UTC (permalink / raw) To: Felipe Contreras, Jeff King, Junio C Hamano; +Cc: brian m. carlson, git On May 3, 2023 5:44:24 p.m. GMT+02:00, Felipe Contreras <felipe.contreras@gmail.com> wrote: >Git was designed to make it *impossible* to confuse two commits with similar >data. That was never ever the problem here. >> This is different. But aside, type + size + data are not really much >> different from just having data in a hash function. > >It's completely different. How so? Type and size are just about 2 and a dozen bits of data, respectfully. >There are different philosophical views of what "security" means, and it seems >pretty clear to me that your view does not align with the view of Linus >Torvalds. I'm not sure why you are name dropping Linus everywhere or assuming you know more than anyone here about hash functions. Your explanation is quite clear to me (and probably everyone else here). But I'll just leave it at that. Cheers, Adam ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-03 17:21 ` Adam Majer @ 2023-05-08 0:34 ` Felipe Contreras 0 siblings, 0 replies; 58+ messages in thread From: Felipe Contreras @ 2023-05-08 0:34 UTC (permalink / raw) To: Adam Majer, Felipe Contreras, Jeff King, Junio C Hamano Cc: brian m. carlson, git Adam Majer wrote: > On May 3, 2023 5:44:24 p.m. GMT+02:00, Felipe Contreras <felipe.contreras@gmail.com> wrote: > >Git was designed to make it *impossible* to confuse two commits with similar > >data. > > That was never ever the problem here. But it will be. > >> This is different. But aside, type + size + data are not really much > >> different from just having data in a hash function. > > > >It's completely different. > > How so? Type and size are just about 2 and a dozen bits of data, respectfully. Do you understand how checksums work? Compare these two objects: 1. 0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33 2. 6ba62a7c5e3e9a260c5a30adf2756882c02f12a6 Are they a) "not much different", or b) "completely different"? Answer: doesn't matter, they are *different*. Period. > >There are different philosophical views of what "security" means, and it seems > >pretty clear to me that your view does not align with the view of Linus > >Torvalds. > > > I'm not sure why you are name dropping Linus everywhere I don't know if you are aware, but Linus Torvalds is the author of git. He also happens to be the author of the most successful software project in history: Linux. So generally his design choices are considered to be good. > or assuming you know more than anyone here about hash functions. I don't assume such a thing. But I'm pretty certain not many people are aware of the integrity issues VCSs presented circa 2004, that git hashes solved in 2005, because if they did, they could have created an object model storage similar to git's, and no one did (except Linus Torvalds). > Your explanation is quite clear to me (and probably everyone else > here). But I'll just leave it at that. Is it? Then you would have no trouble steel manning my argument, which you haven't done. -- Felipe Contreras ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-02 23:46 ` Is GIT_DEFAULT_HASH flawed? Felipe Contreras 2023-05-03 9:03 ` Adam Majer @ 2023-05-03 9:09 ` demerphq 2023-05-03 18:20 ` Felipe Contreras 2023-05-03 22:54 ` brian m. carlson 2 siblings, 1 reply; 58+ messages in thread From: demerphq @ 2023-05-03 9:09 UTC (permalink / raw) To: Felipe Contreras Cc: Jeff King, Junio C Hamano, brian m. carlson, git, Adam Majer On Wed, 3 May 2023 at 02:17, Felipe Contreras <felipe.contreras@gmail.com> wrote: > > Hi, > > Changing the subject as this message seems like a different topic. > > Jeff King wrote: > > On Wed, Apr 26, 2023 at 02:33:30PM -0700, Junio C Hamano wrote: > > > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > > > > > `GIT_DEFAULT_HASH`:: > > > > If this variable is set, the default hash algorithm for new > > > > repositories will be set to this value. This value is currently > > > > + ignored when cloning if the remote value can be definitively > > > > + determined; the setting of the remote repository is used > > > > + instead. The value is honored if the remote repository's > > > > + algorithm cannot be determined, such as some cases when > > > > + the remote repository is empty. The default is "sha1". > > > > + THIS VARIABLE IS EXPERIMENTAL! See `--object-format` > > > > + in linkgit:git-init[1]. > > > > > > We'd need to evantually cover all the transports (and non-transport > > > like the "--local" optimization) so that the object-format and other > > > choices are communicated from the origin to a new clone anyway, so > > > this extra complexity "until X is fixed, it behaves this way, but > > > otherwise the variable is read in the meantime" may be a disservice > > > to the end users, even though it may make it easier in the shorter > > > term for maintainers of programs that rely on the buggy "git clone" > > > that partially honored this environment variable. > > > > > > In short, I am still not convinced that the above is a good design > > > choice in the longer term. > > > > I also think it is working against the backwards-compatible design of > > the hash function transition. > > To be honest this whole approach seems to be completely flawed to me and > against the whole design of git in the first place. > > In a recent email Linus Torvalds explained why object ids were > calculated based {type, size, data} [1], and he explained very clearly > that two objects with exactly the same data are not supposed to have the > same id if the type is different. He said: --- quote-begin --- The "no aliasing" means that no two distinct pointers can point to the same data. So a tagged pointer of type "commit" can not point to the same object as a tagged pointer of type "blob". They are distinct pointers, even if (maybe) the commit object encoding ends up then being identical to a blob object. --- quote-end --- As far as I could tell he didn't really explain *why* he wanted this, and IMO it is non-obvious why he would care if a blob and a commit had the same text, and thus the same ID. He just said he didnt want it to happen, not why. I can imagine some aesthetic reasons why you might want to ensure that no blob has the same ID as a commit, and I can imagine it might make debugging easier at certain points, but it seems unnecessary given the data is write once. > If even the tiniest change such as adding a period to a commit messange > changes the object id (and thus semantically makes it a different > object), then it makes sense that changing the type of an object also > changes the object id (and thus it's also a different object). > > And because the id of the parent is included in the content of every > commit, the top-level id ensures the integrity of the whole graph. > > But then comes this notion that the hash algorithm is a property of the > repository, and not part of the object storage, which means changing the > whole hash algorithm of a repository is considered less of a change than > adding a period to the commit message, worse: not a change at all. I really dont understand why you think having two hash functions producing different results for the same data is comparable to a single hash producing different results for different data. In one case you have two different continuum of identifiers, with one ID per continuum, and in the other you have two different identifiers in the same continuum, and if you a continuum you would have 4 different identifiers right? Eg, the two cases are really quite different at a fundamental level. > I am reminded of the warning Sam Smith gave to the Git project [2] which > seemed to be unheard, but the notion of cryptographic algorithm agility > makes complete sense to me. > > In my view one repository should be able to have part SHA-1 history, > part SHA3-256 history, and part BLAKE2b history. Isn't this orthagonal to your other points? > Changing the hash algorithm of one commit should change the object id of > that commit, and thus make it semantically a different commit. > > In other words: an object of type "blob" should never be confused with > an object of type "blob:sha-256", even if the content is exactly the > same. This doesn't make sense to me. As long as we can distinguish the hashes produced by the different hash functions in use we can create a mapping of the data that is hashed such that we have a 1:1 mapping of identifiers of each type at which point it really doesn't matter which hash function is used. > The fact that apparently it's so easy to clone a repository with > the wrong hash algorithm should give developers pause, as it means the > whole point of using cryptographic hash algorithms to ensure the > integrity of the commit history is completely gone. This is a leap too far. The fact that it is "so easy to clone a repo with the wrong hash algorithm" is completely orthogonal to the fundamental principles of hash identifiers from strong hash functions. You seem to be deriving grand conclusions from what sounds to me like a simple bug/design-oversight. > I have not been following the SHA-1 -> OID discussions, but I > distinctively recall Linus Torvalds mentioning that the choice of using > SHA-1 wasn't even for security purposes, it was to ensure integrity. > When I do a `git fetch` as long as the new commits have the same SHA-1 > as parent as the SHA-1s I have in my repository I can be relatively > certain the repository has not been tampered with. Which means that if I > do a `git fetch` that suddenly brings SHA-256 commits, some of them must > have SHA-1 parents that match the ones I currently have. Otherwise how > do I know it's the same history? So consider what /could/ happen here. You fetch a commit which uses SHA-256 into a repo where all of your local commits use SHA-1. The commit you fetched says its parent is some SHA-256 ID you don't know about as all your ID's are SHA-1. So git then could go and construct an index, hashing each item using SHA-256 instead of SHA-1, and using the result to build a bi-directional mapping from SHA-1 to SHA-256 and back. All it has to do then is look into the mapping to find if the SHA-256 parent id is present in your repo. If it is then you know it's the same history. The key point here is that if you ignore SHAttered artifacts (which seems reasonable as you can detect the attack during hashing) you can build a 1:1 map of SHA-1 and SHA-256 ids. Once you have that mapping it doesn't matter which ID is used. > Maybe that's one of the reasons people don't seem particularly eager to > move away from SHA-1: Maybe, but it doesn't make sense to me. You seem to be putting undue weight on an unnecessary aspect of the git design: there doesn't seem to be a reason for Linuses "no aliasing" policy, and it seems like one could build a git-a-like without it without suffering any significant penalties. Regardless, provided that the hash functions allow a 1:1 mapping of ID's (which is assumed by using "collision free hash functions"), it seems like it really doesn't matter which hash is used at any given time. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-03 9:09 ` demerphq @ 2023-05-03 18:20 ` Felipe Contreras 0 siblings, 0 replies; 58+ messages in thread From: Felipe Contreras @ 2023-05-03 18:20 UTC (permalink / raw) To: demerphq, Felipe Contreras Cc: Jeff King, Junio C Hamano, brian m. carlson, git, Adam Majer demerphq wrote: > On Wed, 3 May 2023 at 02:17, Felipe Contreras > <felipe.contreras@gmail.com> wrote: > > Changing the subject as this message seems like a different topic. > > Jeff King wrote: > > > On Wed, Apr 26, 2023 at 02:33:30PM -0700, Junio C Hamano wrote: > > > > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > > > > > > > `GIT_DEFAULT_HASH`:: > > > > > If this variable is set, the default hash algorithm for new > > > > > repositories will be set to this value. This value is currently > > > > > + ignored when cloning if the remote value can be definitively > > > > > + determined; the setting of the remote repository is used > > > > > + instead. The value is honored if the remote repository's > > > > > + algorithm cannot be determined, such as some cases when > > > > > + the remote repository is empty. The default is "sha1". > > > > > + THIS VARIABLE IS EXPERIMENTAL! See `--object-format` > > > > > + in linkgit:git-init[1]. > > > > > > > > We'd need to evantually cover all the transports (and non-transport > > > > like the "--local" optimization) so that the object-format and other > > > > choices are communicated from the origin to a new clone anyway, so > > > > this extra complexity "until X is fixed, it behaves this way, but > > > > otherwise the variable is read in the meantime" may be a disservice > > > > to the end users, even though it may make it easier in the shorter > > > > term for maintainers of programs that rely on the buggy "git clone" > > > > that partially honored this environment variable. > > > > > > > > In short, I am still not convinced that the above is a good design > > > > choice in the longer term. > > > > > > I also think it is working against the backwards-compatible design of > > > the hash function transition. > > > > To be honest this whole approach seems to be completely flawed to me and > > against the whole design of git in the first place. > > > > In a recent email Linus Torvalds explained why object ids were > > calculated based {type, size, data} [1], and he explained very clearly > > that two objects with exactly the same data are not supposed to have the > > same id if the type is different. > > He said: > > --- quote-begin --- > The "no aliasing" means that no two distinct pointers can point to the > same data. So a tagged pointer of type "commit" can not point to the > same object as a tagged pointer of type "blob". They are distinct > pointers, even if (maybe) the commit object encoding ends up then > being identical to a blob object. > --- quote-end --- > > As far as I could tell he didn't really explain *why* he wanted this, > and IMO it is non-obvious why he would care if a blob and a commit had > the same text, and thus the same ID. He just said he didnt want it to > happen, not why. But we don't need to understand why to know it's part of the core design. If something is part of the core design as a rule it's better to not mess with it. > I can imagine some aesthetic reasons why you might want to ensure that > no blob has the same ID as a commit, and I can imagine it might make > debugging easier at certain points, but it seems unnecessary given the > data is write once. I don't know, but to me separating objects makes sense not just conceptually, but in practice there's a whole class of potential errors that could be avoided. For example, I can think of an implementation of `git prune` that would check commits first, then trees, then blobs, and the blobs that not reachable from any trees are removed. But if a commit can have the same id as a blob, you have to think of a different implementation. If that's not possible, then you just forget about those potential issues. > > If even the tiniest change such as adding a period to a commit messange > > changes the object id (and thus semantically makes it a different > > object), then it makes sense that changing the type of an object also > > changes the object id (and thus it's also a different object). > > > > And because the id of the parent is included in the content of every > > commit, the top-level id ensures the integrity of the whole graph. > > > > But then comes this notion that the hash algorithm is a property of the > > repository, and not part of the object storage, which means changing the > > whole hash algorithm of a repository is considered less of a change than > > adding a period to the commit message, worse: not a change at all. > > I really dont understand why you think having two hash functions > producing different results for the same data is comparable to a > single hash producing different results for different data. That depends on what you consider the "data" to be. If you consider the content of a blob to be the data, then you wouldn't have different results if a commit has the same data: it would be the same id. If instead you consider the data to be `type+content`, then you would have different results. > In one case you have two different continuum of identifiers, with one > ID per continuum, and in the other you have two different identifiers > in the same continuum, and if you a continuum you would have 4 > different identifiers right? Eg, the two cases are really quite > different at a fundamental level. That entirely depends on what data you hash. If you hash `algo+type+size+data` there's only one id per object. Period. > > I am reminded of the warning Sam Smith gave to the Git project [2] which > > seemed to be unheard, but the notion of cryptographic algorithm agility > > makes complete sense to me. > > > > In my view one repository should be able to have part SHA-1 history, > > part SHA3-256 history, and part BLAKE2b history. > > Isn't this orthagonal to your other points? Not if you consider changing the hash algorithm of a repository to be an important part of its history (more important than adding a period to a commit). > > Changing the hash algorithm of one commit should change the object id of > > that commit, and thus make it semantically a different commit. > > > > In other words: an object of type "blob" should never be confused with > > an object of type "blob:sha-256", even if the content is exactly the > > same. > > This doesn't make sense to me. As long as we can distinguish the > hashes produced by the different hash functions in use we can create a > mapping of the data that is hashed such that we have a 1:1 mapping of > identifiers of each type at which point it really doesn't matter which > hash function is used. Yes, we *can*, that doesn't mean we *should*. If you do `git commit --ammend --signoff` to add your `Signed-off-by` to a commit, there's a 1:1 mapping from the original commit, to the new one, but conceptually in git they are different objects. I recall Linus Torvalds mentioned he used Monotone as a guideline of what *not* to do. In Monotone you could add the equivalent of `Signed-off-by` without changing the hash of the commit, in fact, you could add any metadata if I recall correctly. But this opens a whole can of worms because now how do you know you have all the metadata relevant to the commit? Making all the metadata of a commit part of the commit solves the integrity problem Monotone had at the cost of making git commits essentially immutable: any change means it's a different commit. If making *any* change in the object, makes it conceptually a different object, including the type of the object, how on Earth is changing the hash algorithm not considered a change? This object: ❯ git hash-object -t blob /dev/null e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 Is considered different from this object: ❯ git hash-object -t tree /dev/null 4b825dc642cb6eb9a060e54bf8d69288fbee4904 That's why they have a different hash. Why would these objects be considered the same? ❯ git hash-object -t blob /dev/null e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 ❯ git hash-object -t blob /dev/null 473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813 It makes *zero* sense that adding a period changes the object, adding a s-o-b changes the object, changing the type changes the object, but changing the hash algorithm does not. > > The fact that apparently it's so easy to clone a repository with > > the wrong hash algorithm should give developers pause, as it means the > > whole point of using cryptographic hash algorithms to ensure the > > integrity of the commit history is completely gone. > > This is a leap too far. The fact that it is "so easy to clone a repo > with the wrong hash algorithm" is completely orthogonal to the > fundamental principles of hash identifiers from strong hash functions. Only if you think changing the hash algorithm is a less important part of an object than adding a period. Do you honestly think these two should be considered the same object? a) tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 author Felipe Contreras <felipe.contreras@gmail.com> 0 -0600 committer Felipe Contreras <felipe.contreras@gmail.com> 0 -0600 Initial commit b) tree 6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321 author Felipe Contreras <felipe.contreras@gmail.com> 0 -0600 committer Felipe Contreras <felipe.contreras@gmail.com> 0 -0600 Initial commit > You seem to be deriving grand conclusions from what sounds to me like > a simple bug/design-oversight. I think you are dismissing the brilliant idea that made git's object storage model so successful. > > I have not been following the SHA-1 -> OID discussions, but I > > distinctively recall Linus Torvalds mentioning that the choice of using > > SHA-1 wasn't even for security purposes, it was to ensure integrity. > > When I do a `git fetch` as long as the new commits have the same SHA-1 > > as parent as the SHA-1s I have in my repository I can be relatively > > certain the repository has not been tampered with. Which means that if I > > do a `git fetch` that suddenly brings SHA-256 commits, some of them must > > have SHA-1 parents that match the ones I currently have. Otherwise how > > do I know it's the same history? > > So consider what /could/ happen here. You fetch a commit which uses > SHA-256 into a repo where all of your local commits use SHA-1. The > commit you fetched says its parent is some SHA-256 ID you don't know > about as all your ID's are SHA-1. So git then could go and construct > an index, hashing each item using SHA-256 instead of SHA-1, and using > the result to build a bi-directional mapping from SHA-1 to SHA-256 and > back. All it has to do then is look into the mapping to find if the > SHA-256 parent id is present in your repo. If it is then you know it's > the same history. Yeah, that *could* happen. Doesn't mean it *should*. > The key point here is that if you ignore SHAttered artifacts (which > seems reasonable as you can detect the attack during hashing) you can > build a 1:1 map of SHA-1 and SHA-256 ids. Once you have that mapping > it doesn't matter which ID is used. It may not matter to you. It matters to me. > > Maybe that's one of the reasons people don't seem particularly eager to > > move away from SHA-1: > > Maybe, but it doesn't make sense to me. You seem to be putting undue > weight on an unnecessary aspect of the git design: there doesn't seem > to be a reason for Linuses "no aliasing" policy, and it seems like one > could build a git-a-like without it without suffering any significant > penalties. The fact you don't see a reason doesn't mean there isn't one. This is an argument from ignorance fallacy. The appendix was considered a vestigial organ because nobody could see a reason for it. Did that mean it served no puprpose? No. > Regardless, provided that the hash functions allow a 1:1 mapping of > ID's (which is assumed by using "collision free hash functions"), it > seems like it really doesn't matter which hash is used at any given > time. People who designed CVS, Subversion, Monotone, and Mercurial didn't see a reason for many of Git's design choices either. I'd argue they were wrong. I think changing the hash algorithm of a commit matters. Cheers. -- Felipe Contreras ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-02 23:46 ` Is GIT_DEFAULT_HASH flawed? Felipe Contreras 2023-05-03 9:03 ` Adam Majer 2023-05-03 9:09 ` demerphq @ 2023-05-03 22:54 ` brian m. carlson 2023-05-08 2:00 ` Felipe Contreras 2 siblings, 1 reply; 58+ messages in thread From: brian m. carlson @ 2023-05-03 22:54 UTC (permalink / raw) To: Felipe Contreras; +Cc: Jeff King, Junio C Hamano, git, Adam Majer [-- Attachment #1: Type: text/plain, Size: 3096 bytes --] On 2023-05-02 at 23:46:02, Felipe Contreras wrote: > In my view one repository should be able to have part SHA-1 history, > part SHA3-256 history, and part BLAKE2b history. That is practically very difficult and it means that it's hard to have confidence in the later history because SHA-1 is weak and you have to rely on it to verify the SHA-256 history later. Since attacks always get better, SHA-1 will eventually be so weak that collisions can be computed in the amount of time we now take for MD4 or MD5 collisions (i.e., seconds), and with your plan, we'd have to retain that history forever with the resulting lack of confidence in part of the history. This also doesn't work with various structures like trees, the index, and pack and index formats, which have no indication of the algorithm used and simply rely on fixed-size, often 4-byte aligned object IDs without any metadata. In addition, the internals of the code often don't pass around enough data to make these values variable and thus this approach would substantially complicate the code in many ways. Also, we've already decided on the current design a long time ago with the transition plan after extensive, thoughtful discussion by many people. Very few people other than me have worked on sending patches to work on the hash function transition, and that work up to now has all been done on my personal time, without compensation of any sort, out of a desire to improve the project. Lots of people have opined on how it should have been different without sending any patches. If you would like to propose patches for the extensive amount of work to implement your solution, then we could consider them, although I will warn you that your approach will likely require at least several hundred patches. However, I refer you to the list archives to determine why your approach is not the one we chose and is not, in my view, the best path forward. I should also be clear that I have no intention of submitting patches to change our approach now or in the future, or redoing the patches I've already sent. > The fact that apparently it's so easy to clone a repository with > the wrong hash algorithm should give developers pause, as it means the > whole point of using cryptographic hash algorithms to ensure the > integrity of the commit history is completely gone. No, it doesn't. It means that our empty repositories until recently lacked any indication of the algorithm or other capabilities, which was a mistake in our original protocol design that has now been corrected. If you interact with the repository later on when it has data, then if you're using the wrong hash algorithm, you'll find that you get a helpful error message that that's not yet supported. If you patched Git to ignore that check, you'd find that your repository would just be very broken in many ways with lots of random crashing and seemingly unrelated error messages instead of subtly using the wrong algorithm. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-03 22:54 ` brian m. carlson @ 2023-05-08 2:00 ` Felipe Contreras 2023-05-08 21:38 ` brian m. carlson 0 siblings, 1 reply; 58+ messages in thread From: Felipe Contreras @ 2023-05-08 2:00 UTC (permalink / raw) To: brian m. carlson, Felipe Contreras Cc: Jeff King, Junio C Hamano, git, Adam Majer brian m. carlson wrote: > On 2023-05-02 at 23:46:02, Felipe Contreras wrote: > > In my view one repository should be able to have part SHA-1 history, > > part SHA3-256 history, and part BLAKE2b history. > > That is practically very difficult and it means that it's hard to have > confidence in the later history because SHA-1 is weak and you have to > rely on it to verify the SHA-256 history later. Why would I have to rely on SHA-1 to verify the SHA-256 history later on? > Since attacks always get better, SHA-1 will eventually be so weak that > collisions can be computed in the amount of time we now take for MD4 > or MD5 collisions (i.e., seconds), and with your plan, we'd have to > retain that history forever with the resulting lack of confidence in > part of the history. We have to do the same with your plan as well. Your plan relies on SHA-256 being interchangeable with SHA-1, so if the Git project decided to switch *today* to SHA-256, we would have two object ids: 1. 69c786637d7a7fe3b2b8f7d989af095f5f49c3a8 2. 2b4ebdace10518280172449c012af17b51e9d46e023a91a5d3dd3a8ad9e4a116 This object would refer to a tree and a parent object with SHA-1 ids, which would be OK, because they would be interchangeable with some corresponding SHA-256 ids. Isn't that your plan? Therefore the SHA-1 of the parent of the commit, and the tree of the commit would be trusted and retained forever. > This also doesn't work with various structures like trees, the index, > and pack and index formats, which have no indication of the algorithm > used and simply rely on fixed-size, often 4-byte aligned object IDs > without any metadata. So? The index and pack objects can be regenerated, so at any point in time they could be regenerated for SHA-1 or SHA-256. The tree object is a no-brainer. For an object of type "commit:256" you require a tree of type "tree:256". Easy. > In addition, the internals of the code often don't pass around enough > data to make these values variable and thus this approach would > substantially complicate the code in many ways. Really? `enum object_type` is not passed around? > Also, we've already decided on the current design a long time ago with > the transition plan after extensive, thoughtful discussion by many > people. Who is "we"? I've participated in many discussions in the git mailing list where the consensus is that 99% of people decide to do something, and that something never happens. The fact that "we" have decided something doesn't carry as much weight as you seem to think it does. Moreover, haven't "we" decided that this transitioning plan is *tentantive*, and the SHA-256 feature is *experimental*? > Very few people other than me have worked on sending patches to > work on the hash function transition, and that work up to now has all > been done on my personal time, without compensation of any sort, out of > a desire to improve the project. Which seems to suggest if there is a need, it's not very pressing. Doesn't it? > Lots of people have opined on how it should have been different > without sending any patches. As is typical. > If you would like to propose patches for the extensive amount of work > to implement your solution, then we could consider them, although I > will warn you that your approach will likely require at least several > hundred patches. That's not an issue. I've started projects with several hundred patches just to prove that something is possible. > However, I refer you to the list archives to determine why > your approach is not the one we chose and is not, in my view, the best > path forward. Yeah? Provide me with *one* mail proposing my approach. > I should also be clear that I have no intention of submitting patches > to change our approach now or in the future, or redoing the patches > I've already sent. You don't have to. (and it's not really necessary as it's typically the case that people don't provide patches for designs that compete against their own). > > The fact that apparently it's so easy to clone a repository with > > the wrong hash algorithm should give developers pause, as it means the > > whole point of using cryptographic hash algorithms to ensure the > > integrity of the commit history is completely gone. > > No, it doesn't. It means that our empty repositories until recently > lacked any indication of the algorithm or other capabilities, which was > a mistake in our original protocol design that has now been corrected. Yes it does. Can I clone a repository that already transitioned to SHA-256, and then push a SHA-1 commit? Well, of course I can't because that's not currently implemented, but if we followed the current plan that apparently "we" have decided on, it should be. > If you interact with the repository later on when it has data, then if > you're using the wrong hash algorithm, you'll find that you get a > helpful error message that that's not yet supported. That isn't true according to your plan. SHA-1 would be interchangeable with SHA-256, would it not? So according to the current plan, I would be able to push a SHA-1 commit on a SHA-256 repository. --- Is it not the case that the current plan aims to have support for SHA-1 and SHA-256 object ids at the same time? In other words: in your ideal world, the following object ids would *both* refer to the same git object: 1. 69c786637d7a7fe3b2b8f7d989af095f5f49c3a8 2. 2b4ebdace10518280172449c012af17b51e9d46e023a91a5d3dd3a8ad9e4a116 Would they not? -- Felipe Contreras ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-08 2:00 ` Felipe Contreras @ 2023-05-08 21:38 ` brian m. carlson 2023-05-09 10:32 ` Oswald Buddenhagen 0 siblings, 1 reply; 58+ messages in thread From: brian m. carlson @ 2023-05-08 21:38 UTC (permalink / raw) To: Felipe Contreras; +Cc: Jeff King, Junio C Hamano, git, Adam Majer [-- Attachment #1: Type: text/plain, Size: 5737 bytes --] On 2023-05-08 at 02:00:56, Felipe Contreras wrote: > brian m. carlson wrote: > > On 2023-05-02 at 23:46:02, Felipe Contreras wrote: > > > In my view one repository should be able to have part SHA-1 history, > > > part SHA3-256 history, and part BLAKE2b history. > > > > That is practically very difficult and it means that it's hard to have > > confidence in the later history because SHA-1 is weak and you have to > > rely on it to verify the SHA-256 history later. > > Why would I have to rely on SHA-1 to verify the SHA-256 history later > on? If your history contains mixed and matched hash algorithms, you'll need to be able to verify those commits to the root to have any confidence in a signed commit or tag, which means trusting SHA-1 if you have any SHA-1 commits in the repository. > > Since attacks always get better, SHA-1 will eventually be so weak that > > collisions can be computed in the amount of time we now take for MD4 > > or MD5 collisions (i.e., seconds), and with your plan, we'd have to > > retain that history forever with the resulting lack of confidence in > > part of the history. > > We have to do the same with your plan as well. > > Your plan relies on SHA-256 being interchangeable with SHA-1, so if the > Git project decided to switch *today* to SHA-256, we would have two > object ids: > > 1. 69c786637d7a7fe3b2b8f7d989af095f5f49c3a8 > 2. 2b4ebdace10518280172449c012af17b51e9d46e023a91a5d3dd3a8ad9e4a116 > > This object would refer to a tree and a parent object with SHA-1 ids, > which would be OK, because they would be interchangeable with some > corresponding SHA-256 ids. > > Isn't that your plan? For a period of time, yes. At some point, people will abandon SHA-1 and won't use it anymore for a particular repository, and then its security doesn't matter. > Therefore the SHA-1 of the parent of the commit, and the tree of the > commit would be trusted and retained forever. Nope. At some point, we just turn off SHA-1. > > This also doesn't work with various structures like trees, the index, > > and pack and index formats, which have no indication of the algorithm > > used and simply rely on fixed-size, often 4-byte aligned object IDs > > without any metadata. > > So? The index and pack objects can be regenerated, so at any point in > time they could be regenerated for SHA-1 or SHA-256. Right, and that point you've basically converted the repository over. > The tree object is a no-brainer. For an object of type "commit:256" you > require a tree of type "tree:256". Easy. That doesn't work with the pack format because there are only seven valid types of objects, and five of them are used. > > Also, we've already decided on the current design a long time ago with > > the transition plan after extensive, thoughtful discussion by many > > people. > > Who is "we"? The list members. > I've participated in many discussions in the git mailing list where the > consensus is that 99% of people decide to do something, and that > something never happens. > > The fact that "we" have decided something doesn't carry as much weight > as you seem to think it does. Jonathan Nieder decided to propose an approach for how we'd go about this, and it was discussed extensively on the list and the parts that I've implemented have almost completely conformed to that documentation. I think his approach was very thoughtful and addressed many questions about how the project was to proceed, and I appreciate that he sent it and that others contributed in a helpful way. The project wouldn't have been possible unless we had a clear decision on how to implement things. There was an opportunity at the time to comment on the approach and propose alternatives, and we didn't choose to adopt any. > Moreover, haven't "we" decided that this transitioning plan is > *tentantive*, and the SHA-256 feature is *experimental*? We've documented that it's experimental because it's not seeing wide use. I expect that will change, at which point it will no longer be experimental. The design is not tentative and there are no plans to change it. > > Very few people other than me have worked on sending patches to > > work on the hash function transition, and that work up to now has all > > been done on my personal time, without compensation of any sort, out of > > a desire to improve the project. > > Which seems to suggest if there is a need, it's not very pressing. > > Doesn't it? No, I don't agree. An approach to continue to use SHA-1 indefinitely means that Git will not be viable at many major organizations and in many major governments which have restrictions on using insecure cryptography. I think this ends the point at which I'd like to respond to your proposal further. I continue to feel that it's argumentative, unproductive, and not in the best interests of the project, and I don't think continuing here is going to shed any more light on the topic. Ultimately, I feel that my previous statement that we're not going to see eye to eye on most things continues to be correct. I also don't think arguing with you is a productive use of my time or a positive contribution to the community, and in general, I'm going to avoid commenting on your patches or proposals because I can't see a way that we can interact in a useful and helpful way on the list or elsewhere. As a result, I'll kindly ask you to refrain from CCing me on further emails to the list or otherwise emailing me for the indefinite future, and I'll do the same for you after this email. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-08 21:38 ` brian m. carlson @ 2023-05-09 10:32 ` Oswald Buddenhagen 2023-05-09 16:47 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Oswald Buddenhagen @ 2023-05-09 10:32 UTC (permalink / raw) To: brian m. carlson, Felipe Contreras, Jeff King, Junio C Hamano, git, Adam Majer On Mon, May 08, 2023 at 09:38:56PM +0000, brian m. carlson wrote: >On 2023-05-08 at 02:00:56, Felipe Contreras wrote: >> brian m. carlson wrote: >> > On 2023-05-02 at 23:46:02, Felipe Contreras wrote: >> > > In my view one repository should be able to have part SHA-1 history, >> > > part SHA3-256 history, and part BLAKE2b history. >> > >> > That is practically very difficult and it means that it's hard to have >> > confidence in the later history because SHA-1 is weak and you have to >> > rely on it to verify the SHA-256 history later. >> >> Why would I have to rely on SHA-1 to verify the SHA-256 history later >> on? > >If your history contains mixed and matched hash algorithms, you'll need >to be able to verify those commits to the root to have any confidence in >a signed commit or tag, which means trusting SHA-1 if you have any SHA-1 >commits in the repository. > the history is traversed from the end anyway, so having sha-1 in the history is entirely irrelevant for verifying sha-256 commits, assuming one may only upgrade the algorithm. the transition plan implies the intent to ultimately get rid of old algos, but this is a non-starter, because old histories need to remain accessible indefinitely (you can't rewrite all external references, and even for in-history references this would be unreliable and would falsify historical builds). i won't try making an argument for mixed histories, as i'm assuming i wouldn't add anything that hasn't already been written. -- ossi ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Is GIT_DEFAULT_HASH flawed? 2023-05-09 10:32 ` Oswald Buddenhagen @ 2023-05-09 16:47 ` Junio C Hamano 0 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2023-05-09 16:47 UTC (permalink / raw) To: Oswald Buddenhagen Cc: brian m. carlson, Felipe Contreras, Jeff King, git, Adam Majer Oswald Buddenhagen <oswald.buddenhagen@gmx.de> writes: >>If your history contains mixed and matched hash algorithms, you'll need >>to be able to verify those commits to the root to have any confidence in >>a signed commit or tag, which means trusting SHA-1 if you have any SHA-1 >>commits in the repository. >> > the history is traversed from the end anyway, so having sha-1 in the > history is entirely irrelevant for verifying sha-256 commits, assuming > one may only upgrade the algorithm. That depends on what is meant by "verify a commit". If we are only interested in the tree contents, then the newer commits whose trees are hashed with a more secure hash algorithm recursively down to the blobs would lack any weak link hashed by a less secure algorithm. Most people do not care as deeply how their project tree came to the shape it has today as they care about what is in the recent trees, so this is an acceptable stance to take. If the less secure algorithm becomes so weak that the history, up to the last commit that was signed by it, can be rewritten arbitrarily, however, an attacker can lie about why the code that survives to this day looks the way it does by forging old parts of the history, and mislead today's developers to do wrong things. The only way to prevent that kind of attack is to verify a commit by recursively making sure not just the commit and its tree, but its parents are all authentic, and less secure algorithm may make it impossible. The old history hashed with the old algorithm needs to be kept to help external references, but I think as a mitigation, those who care about that part of history can create a copy of the history hashed with the new algorithm and publish the correspondence between the two parallel histories to assure the integrity of that part of the old history. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 2023-04-26 20:53 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 brian m. carlson 2023-04-26 20:53 ` [PATCH 1/2] http: advertise capabilities when cloning empty repos brian m. carlson 2023-04-26 20:53 ` [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo brian m. carlson @ 2023-04-26 21:12 ` Junio C Hamano 2 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2023-04-26 21:12 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Jeff King, Adam Majer "brian m. carlson" <sandals@crustytoothpaste.net> writes: > The second introduces some backwards compatibility to avoid regressing > the old behaviour of using GIT_DEFAULT_HASH to initialize the proper > hash in this case. We add a flag to see if we explicitly obtained a > hash algorithm from the remote side, and if not, we honour > GIT_DEFAULT_HASH, as before. I am fairly negative on this half of the series. The first one is excellent, though. THanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-26 20:40 ` brian m. carlson 2023-04-26 20:53 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 brian m. carlson @ 2023-04-27 4:56 ` Jeff King 2023-05-01 17:00 ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 brian m. carlson 2023-05-17 19:24 ` [PATCH v3 " brian m. carlson 3 siblings, 0 replies; 58+ messages in thread From: Jeff King @ 2023-04-27 4:56 UTC (permalink / raw) To: brian m. carlson; +Cc: Junio C Hamano, Adam Majer, git On Wed, Apr 26, 2023 at 08:40:48PM +0000, brian m. carlson wrote: > On 2023-04-26 at 10:51:34, Jeff King wrote: > > We could send a capabilities^{} line, which Git has supported on the > > client side since eb398797cd (connect: advertized capability is not a > > ref, 2016-09-09). So sending it should not break even old clients > > (though we would have to check what alternate implementations like > > libgit2 or dulwich do; we know JGit supports it). > > I have a patch which does exactly this, which I will be sending shortly. > I've confirmed that libgit2 and JGit support it, which is unsurprising, > since all of the implementations, Git included, share the same code. In > addition, this is the behaviour we document as supporting, so all > implementations should support it. Yeah, I was worried about how accurate that "should" is. :) Since you checked the others, I peeked at dulwich's code, and it appears to support it, too (I didn't actually run a test, but the code is pretty clear). It doesn't support sha256 at all, but that's OK. What I'd be concerned about is breaking clients when there is no useful capability to advertise (though of course we could decide to send capabilities^{} only when there is something useful to say). > > However, the object-format support here was broken until the very recent > > 13e67aa39b (v0 protocol: fix sha1/sha256 confusion for capabilities^{}, > > 2023-04-14), so it would only be useful going forward (before then we'd > > die(), but maybe that is preferable to having the wrong object format?). > > I think it's better to die than to silently have the wrong object > format, and it also prevents the problem if other clients using v0 or v1 > (which effectively have to be supported for compatibility, while v2 is > optional) try to clone from a fixed server. OK, good. So we can ignore that recently fixed bug (which only affects the case where older versions are cloning something whose hash format does not match theirs). > > I'm not sure it's worth the effort, though. If you want to use sha256 > > everywhere and tell the other side about it, you need a modern client > > anyway, and that means the ability to speak v2. So this would only > > matter if for some reason the v2 probe was being ignored (e.g., proxies > > eating it, ssh refusing environment variable, etc), which itself are > > things that ideally would be fixed (and can maybe one day even go away > > if we optimistically default to v2). > > Using v2 everywhere is difficult because many SSH servers still don't > pass GIT_PROTOCOL by default, meaning that we're stuck with v0 and v1. > In retrospect, sending an environment variable here was not a great > decision, but we're stuck with it now. True (I even got bit by this at one point, not realizing that I wasn't using v2 with one of my personal servers). I certainly don't have an objection if you're willing to do the work. It would also allow us to fix the long-broken "unborn HEAD" problem for v0 in an empty repo (though I am also fine if we don't go that far). -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 2023-04-26 20:40 ` brian m. carlson 2023-04-26 20:53 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 brian m. carlson 2023-04-27 4:56 ` git clone of empty repositories doesn't preserve hash Jeff King @ 2023-05-01 17:00 ` brian m. carlson 2023-05-01 17:00 ` [PATCH v2 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson 2023-05-01 17:37 ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano 2023-05-17 19:24 ` [PATCH v3 " brian m. carlson 3 siblings, 2 replies; 58+ messages in thread From: brian m. carlson @ 2023-05-01 17:00 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Adam Majer We recently fixed empty clones with SHA-256 over protocol v2 by honouring the hash algorithm specified even when no refs are present. However, in doing so, we made it impossible to set up a v0 or v1 repository by cloning from an empty SHA-256 repository. In doing so, we also broke the Git LFS testsuite for SHA-256 repositories. This series introduces the dummy `capabilities^{}` entry for fetches and clones from an empty repository for v0 and v1, just as we do for clones. This is already supported by older versions of Git, as well as libgit2, dulwich, and JGit. Unlike in v1, we wire this up for all protocols and fix the NULL pointer dereference that would occur in that case, as well as add some more tests. The second patch has been dropped, since it is no longer needed and was not very popular. Changes since v1: * Drop patch to honour GIT_DEFAULT_HASH * Support all requests, not just HTTP. * Add more tests. * Fix NULL pointer dereference. brian m. carlson (1): upload-pack: advertise capabilities when cloning empty repos t/t5551-http-fetch-smart.sh | 27 +++++++++++++++++++++++++++ t/t5700-protocol-v1.sh | 31 +++++++++++++++++++++++++++++-- upload-pack.c | 7 ++++++- 3 files changed, 62 insertions(+), 3 deletions(-) ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v2 1/1] upload-pack: advertise capabilities when cloning empty repos 2023-05-01 17:00 ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 brian m. carlson @ 2023-05-01 17:00 ` brian m. carlson 2023-05-01 22:40 ` Jeff King 2023-05-01 17:37 ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano 1 sibling, 1 reply; 58+ messages in thread From: brian m. carlson @ 2023-05-01 17:00 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Adam Majer From: "brian m. carlson" <bk2204@github.com> When cloning an empty repository, protocol versions 0 and 1 currently offer nothing but the header and flush packets for the /info/refs endpoint. This means that no capabilities are provided, so the client side doesn't know what capabilities are present. However, this does pose a problem when working with SHA-256 repositories, since we use the capabilities to know the remote side's object format (hash algorithm). As of 8b214c2e9d ("clone: propagate object-format when cloning from void", 2023-04-05), this has been fixed for protocol v2, since there we always read the hash algorithm from the remote. Fortunately, the push version of the protocol already indicates a clue for how to solve this. When the /info/refs endpoint is accessed for a push and the remote is empty, we include a dummy "capabilities^{}" ref pointing to the all-zeros object ID. The protocol documentation already indicates this should _always_ be sent, even for fetches and clones, so let's just do that, which means we'll properly announce the hash algorithm as part of the capabilities. This just works with the existing code because we share the same ref code for fetches and clones, and libgit2, JGit, and dulwich do as well. There is one minor issue to fix, though. When we call send_ref with namespaces, we would return NULL with the capabilities entry, which would cause a crash. Instead, let's make sure we don't try to strip the namespace if we're using our special capabilities entry. Add several sets of tests for HTTP as well as for local clones. Signed-off-by: brian m. carlson <bk2204@github.com> --- t/t5551-http-fetch-smart.sh | 27 +++++++++++++++++++++++++++ t/t5700-protocol-v1.sh | 31 +++++++++++++++++++++++++++++-- upload-pack.c | 7 ++++++- 3 files changed, 62 insertions(+), 3 deletions(-) diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 0908534f25..21b7767cbd 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -611,6 +611,33 @@ test_expect_success 'client falls back from v2 to v0 to match server' ' grep symref=HEAD:refs/heads/ trace ' +test_expect_success 'create empty http-accessible SHA-256 repository' ' + mkdir "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" && + (cd "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" && + git --bare init --object-format=sha256 + ) +' + +test_expect_success 'clone empty SHA-256 repository with protocol v2' ' + rm -fr sha256 && + echo sha256 >expected && + git -c protocol.version=2 clone "$HTTPD_URL/smart/sha256.git" && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp actual expected && + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && + test_must_be_empty actual +' + +test_expect_success 'clone empty SHA-256 repository with protocol v0' ' + rm -fr sha256 && + echo sha256 >expected && + GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp actual expected && + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && + test_must_be_empty actual +' + test_expect_success 'passing hostname resolution information works' ' BOGUS_HOST=gitbogusexamplehost.invalid && BOGUS_HTTPD_URL=$HTTPD_PROTO://$BOGUS_HOST:$LIB_HTTPD_PORT && diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh index 6c8d4c6cf1..a73b4d4ff6 100755 --- a/t/t5700-protocol-v1.sh +++ b/t/t5700-protocol-v1.sh @@ -244,15 +244,28 @@ test_expect_success 'push with ssh:// using protocol v1' ' grep "push< version 1" log ' +test_expect_success 'clone propagates object-format from empty repo' ' + test_when_finished "rm -fr src256 dst256" && + + echo sha256 >expect && + git init --object-format=sha256 src256 && + git clone --no-local src256 dst256 && + git -C dst256 rev-parse --show-object-format >actual && + + test_cmp expect actual +' + # Test protocol v1 with 'http://' transport # . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd -test_expect_success 'create repo to be served by http:// transport' ' +test_expect_success 'create repos to be served by http:// transport' ' git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config http.receivepack true && - test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one + test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one && + git init --object-format=sha256 "$HTTPD_DOCUMENT_ROOT_PATH/sha256" && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/sha256" config http.receivepack true ' test_expect_success 'clone with http:// using protocol v1' ' @@ -269,6 +282,20 @@ test_expect_success 'clone with http:// using protocol v1' ' grep "git< version 1" log ' +test_expect_success 'clone with http:// using protocol v1 with empty SHA-256 repo' ' + GIT_TRACE_PACKET=1 GIT_TRACE_CURL=1 git -c protocol.version=1 \ + clone "$HTTPD_URL/smart/sha256" sha256 2>log && + + echo sha256 >expect && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp expect actual && + + # Client requested to use protocol v1 + grep "Git-Protocol: version=1" log && + # Server responded using protocol v1 + grep "git< version 1" log +' + test_expect_success 'fetch with http:// using protocol v1' ' test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" two && diff --git a/upload-pack.c b/upload-pack.c index 08633dc121..d7b31d0527 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -120,6 +120,7 @@ struct upload_pack_data { unsigned allow_ref_in_want : 1; /* v2 only */ unsigned allow_sideband_all : 1; /* v2 only */ unsigned advertise_sid : 1; + unsigned sent_capabilities : 1; }; static void upload_pack_data_init(struct upload_pack_data *data) @@ -1212,7 +1213,8 @@ static int send_ref(const char *refname, const struct object_id *oid, static const char *capabilities = "multi_ack thin-pack side-band" " side-band-64k ofs-delta shallow deepen-since deepen-not" " deepen-relative no-progress include-tag multi_ack_detailed"; - const char *refname_nons = strip_namespace(refname); + const char *refname_nons = !strcmp(refname, "capabilities^{}") ? + refname : strip_namespace(refname); struct object_id peeled; struct upload_pack_data *data = cb_data; @@ -1240,6 +1242,7 @@ static int send_ref(const char *refname, const struct object_id *oid, git_user_agent_sanitized()); strbuf_release(&symref_info); strbuf_release(&session_id); + data->sent_capabilities = 1; } else { packet_fwrite_fmt(stdout, "%s %s\n", oid_to_hex(oid), refname_nons); } @@ -1379,6 +1382,8 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, data.no_done = 1; head_ref_namespaced(send_ref, &data); for_each_namespaced_ref(send_ref, &data); + if (!data.sent_capabilities) + send_ref("capabilities^{}", null_oid(), 0, &data); /* * fflush stdout before calling advertise_shallow_grafts because send_ref * uses stdio. ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v2 1/1] upload-pack: advertise capabilities when cloning empty repos 2023-05-01 17:00 ` [PATCH v2 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson @ 2023-05-01 22:40 ` Jeff King 2023-05-01 22:51 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Jeff King @ 2023-05-01 22:40 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Junio C Hamano, Adam Majer On Mon, May 01, 2023 at 05:00:18PM +0000, brian m. carlson wrote: > There is one minor issue to fix, though. When we call send_ref with > namespaces, we would return NULL with the capabilities entry, which > would cause a crash. Instead, let's make sure we don't try to strip the > namespace if we're using our special capabilities entry. Thanks, this hunk: > @@ -1212,7 +1213,8 @@ static int send_ref(const char *refname, const struct object_id *oid, > static const char *capabilities = "multi_ack thin-pack side-band" > " side-band-64k ofs-delta shallow deepen-since deepen-not" > " deepen-relative no-progress include-tag multi_ack_detailed"; > - const char *refname_nons = strip_namespace(refname); > + const char *refname_nons = !strcmp(refname, "capabilities^{}") ? > + refname : strip_namespace(refname); > struct object_id peeled; > struct upload_pack_data *data = cb_data; looks much better than sticking it in strip_namespace() as I did earlier. I did wonder about refactoring further: diff --git a/upload-pack.c b/upload-pack.c index d7b31d0527..e1d75d7c3c 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -1207,19 +1207,17 @@ static void format_session_id(struct strbuf *buf, struct upload_pack_data *d) { strbuf_addf(buf, " session-id=%s", trace2_session_id()); } -static int send_ref(const char *refname, const struct object_id *oid, - int flag UNUSED, void *cb_data) +static void write_v0_ref(struct upload_pack_data *data, + const char *refname, const char *refname_nons, + const struct object_id *oid) { static const char *capabilities = "multi_ack thin-pack side-band" " side-band-64k ofs-delta shallow deepen-since deepen-not" " deepen-relative no-progress include-tag multi_ack_detailed"; - const char *refname_nons = !strcmp(refname, "capabilities^{}") ? - refname : strip_namespace(refname); struct object_id peeled; - struct upload_pack_data *data = cb_data; if (mark_our_ref(refname_nons, refname, oid, &data->hidden_refs)) - return 0; + return; if (capabilities) { struct strbuf symref_info = STRBUF_INIT; @@ -1249,6 +1247,12 @@ static int send_ref(const char *refname, const struct object_id *oid, capabilities = NULL; if (!peel_iterated_oid(oid, &peeled)) packet_fwrite_fmt(stdout, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons); +} + +static int send_ref(const char *refname, const struct object_id *oid, + int flag UNUSED, void *cb_data) +{ + write_v0_ref(cb_data, refname, strip_namespace(refname), oid); return 0; } @@ -1382,8 +1386,10 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, data.no_done = 1; head_ref_namespaced(send_ref, &data); for_each_namespaced_ref(send_ref, &data); - if (!data.sent_capabilities) - send_ref("capabilities^{}", null_oid(), 0, &data); + if (!data.sent_capabilities) { + const char *ref = "capabilities^{}"; + write_v0_ref(&data, ref, ref, null_oid()); + } /* * fflush stdout before calling advertise_shallow_grafts because send_ref * uses stdio. which avoids doing an extra strcmp() on every ref. But probably it is not that big a deal either way. > Add several sets of tests for HTTP as well as for local clones. This part puzzled me a bit. There's a local test in t5700, which is good. But it also gets HTTP tests. What do they offer versus the ones in t5551 (or vice versa)? -Peff ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v2 1/1] upload-pack: advertise capabilities when cloning empty repos 2023-05-01 22:40 ` Jeff King @ 2023-05-01 22:51 ` Junio C Hamano 0 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2023-05-01 22:51 UTC (permalink / raw) To: Jeff King; +Cc: brian m. carlson, git, Adam Majer Jeff King <peff@peff.net> writes: > @@ -1382,8 +1386,10 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, > data.no_done = 1; > head_ref_namespaced(send_ref, &data); > for_each_namespaced_ref(send_ref, &data); > - if (!data.sent_capabilities) > - send_ref("capabilities^{}", null_oid(), 0, &data); > + if (!data.sent_capabilities) { > + const char *ref = "capabilities^{}"; > + write_v0_ref(&data, ref, ref, null_oid()); > + } Ah, this separation of duties wrt the namespace stripping makes the result easier to read. The version brian posted looked good enough to me, but if we are to have another iteration, incorporating this change would be nice. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 2023-05-01 17:00 ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 brian m. carlson 2023-05-01 17:00 ` [PATCH v2 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson @ 2023-05-01 17:37 ` Junio C Hamano 1 sibling, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2023-05-01 17:37 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Jeff King, Adam Majer "brian m. carlson" <sandals@crustytoothpaste.net> writes: > Changes since v1: > * Support all requests, not just HTTP. > * Add more tests. > * Fix NULL pointer dereference. Great. Will queue. Let's merge it down to 'next' soonish. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 2023-04-26 20:40 ` brian m. carlson ` (2 preceding siblings ...) 2023-05-01 17:00 ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 brian m. carlson @ 2023-05-17 19:24 ` brian m. carlson 2023-05-17 19:24 ` [PATCH v3 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson ` (2 more replies) 3 siblings, 3 replies; 58+ messages in thread From: brian m. carlson @ 2023-05-17 19:24 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Adam Majer We recently fixed empty clones with SHA-256 over protocol v2 by honouring the hash algorithm specified even when no refs are present. However, in doing so, we made it impossible to set up a v0 or v1 repository by cloning from an empty SHA-256 repository. In doing so, we also broke the Git LFS testsuite for SHA-256 repositories. This series introduces the dummy `capabilities^{}` entry for fetches and clones from an empty repository for v0 and v1, just as we do for clones. This is already supported by older versions of Git, as well as libgit2, dulwich, and JGit. Changes since v2: * Move advertisement of fake capabilities ref to a separate function to avoid an extra strcmp. Changes since v1: * Drop patch to honour GIT_DEFAULT_HASH * Support all requests, not just HTTP. * Add more tests. * Fix NULL pointer dereference. brian m. carlson (1): upload-pack: advertise capabilities when cloning empty repos t/t5551-http-fetch-smart.sh | 27 +++++++++++++++++++++++++++ t/t5700-protocol-v1.sh | 31 +++++++++++++++++++++++++++++-- upload-pack.c | 22 +++++++++++++++++----- 3 files changed, 73 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v3 1/1] upload-pack: advertise capabilities when cloning empty repos 2023-05-17 19:24 ` [PATCH v3 " brian m. carlson @ 2023-05-17 19:24 ` brian m. carlson 2023-05-17 21:48 ` [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano 2023-05-18 18:28 ` Jeff King 2 siblings, 0 replies; 58+ messages in thread From: brian m. carlson @ 2023-05-17 19:24 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Adam Majer From: "brian m. carlson" <bk2204@github.com> When cloning an empty repository, protocol versions 0 and 1 currently offer nothing but the header and flush packets for the /info/refs endpoint. This means that no capabilities are provided, so the client side doesn't know what capabilities are present. However, this does pose a problem when working with SHA-256 repositories, since we use the capabilities to know the remote side's object format (hash algorithm). As of 8b214c2e9d ("clone: propagate object-format when cloning from void", 2023-04-05), this has been fixed for protocol v2, since there we always read the hash algorithm from the remote. Fortunately, the push version of the protocol already indicates a clue for how to solve this. When the /info/refs endpoint is accessed for a push and the remote is empty, we include a dummy "capabilities^{}" ref pointing to the all-zeros object ID. The protocol documentation already indicates this should _always_ be sent, even for fetches and clones, so let's just do that, which means we'll properly announce the hash algorithm as part of the capabilities. This just works with the existing code because we share the same ref code for fetches and clones, and libgit2, JGit, and dulwich do as well. There is one minor issue to fix, though. If we called send_ref with namespaces, we would return NULL with the capabilities entry, which would cause a crash. Instead, let's refactor out a function to print just the ref itself without stripping the namespace and use it for our special capabilities entry. Add several sets of tests for HTTP as well as for local clones. The behavior can be slightly different for HTTP versus a local or SSH clone because of the stateless-rpc functionality, so it's worth testing both. Signed-off-by: brian m. carlson <bk2204@github.com> --- t/t5551-http-fetch-smart.sh | 27 +++++++++++++++++++++++++++ t/t5700-protocol-v1.sh | 31 +++++++++++++++++++++++++++++-- upload-pack.c | 22 +++++++++++++++++----- 3 files changed, 73 insertions(+), 7 deletions(-) diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 0908534f25..21b7767cbd 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -611,6 +611,33 @@ test_expect_success 'client falls back from v2 to v0 to match server' ' grep symref=HEAD:refs/heads/ trace ' +test_expect_success 'create empty http-accessible SHA-256 repository' ' + mkdir "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" && + (cd "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" && + git --bare init --object-format=sha256 + ) +' + +test_expect_success 'clone empty SHA-256 repository with protocol v2' ' + rm -fr sha256 && + echo sha256 >expected && + git -c protocol.version=2 clone "$HTTPD_URL/smart/sha256.git" && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp actual expected && + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && + test_must_be_empty actual +' + +test_expect_success 'clone empty SHA-256 repository with protocol v0' ' + rm -fr sha256 && + echo sha256 >expected && + GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp actual expected && + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && + test_must_be_empty actual +' + test_expect_success 'passing hostname resolution information works' ' BOGUS_HOST=gitbogusexamplehost.invalid && BOGUS_HTTPD_URL=$HTTPD_PROTO://$BOGUS_HOST:$LIB_HTTPD_PORT && diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh index 6c8d4c6cf1..a73b4d4ff6 100755 --- a/t/t5700-protocol-v1.sh +++ b/t/t5700-protocol-v1.sh @@ -244,15 +244,28 @@ test_expect_success 'push with ssh:// using protocol v1' ' grep "push< version 1" log ' +test_expect_success 'clone propagates object-format from empty repo' ' + test_when_finished "rm -fr src256 dst256" && + + echo sha256 >expect && + git init --object-format=sha256 src256 && + git clone --no-local src256 dst256 && + git -C dst256 rev-parse --show-object-format >actual && + + test_cmp expect actual +' + # Test protocol v1 with 'http://' transport # . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd -test_expect_success 'create repo to be served by http:// transport' ' +test_expect_success 'create repos to be served by http:// transport' ' git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config http.receivepack true && - test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one + test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one && + git init --object-format=sha256 "$HTTPD_DOCUMENT_ROOT_PATH/sha256" && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/sha256" config http.receivepack true ' test_expect_success 'clone with http:// using protocol v1' ' @@ -269,6 +282,20 @@ test_expect_success 'clone with http:// using protocol v1' ' grep "git< version 1" log ' +test_expect_success 'clone with http:// using protocol v1 with empty SHA-256 repo' ' + GIT_TRACE_PACKET=1 GIT_TRACE_CURL=1 git -c protocol.version=1 \ + clone "$HTTPD_URL/smart/sha256" sha256 2>log && + + echo sha256 >expect && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp expect actual && + + # Client requested to use protocol v1 + grep "Git-Protocol: version=1" log && + # Server responded using protocol v1 + grep "git< version 1" log +' + test_expect_success 'fetch with http:// using protocol v1' ' test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" two && diff --git a/upload-pack.c b/upload-pack.c index 08633dc121..d3312006a3 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -120,6 +120,7 @@ struct upload_pack_data { unsigned allow_ref_in_want : 1; /* v2 only */ unsigned allow_sideband_all : 1; /* v2 only */ unsigned advertise_sid : 1; + unsigned sent_capabilities : 1; }; static void upload_pack_data_init(struct upload_pack_data *data) @@ -1206,18 +1207,17 @@ static void format_session_id(struct strbuf *buf, struct upload_pack_data *d) { strbuf_addf(buf, " session-id=%s", trace2_session_id()); } -static int send_ref(const char *refname, const struct object_id *oid, - int flag UNUSED, void *cb_data) +static void write_v0_ref(struct upload_pack_data *data, + const char *refname, const char *refname_nons, + const struct object_id *oid) { static const char *capabilities = "multi_ack thin-pack side-band" " side-band-64k ofs-delta shallow deepen-since deepen-not" " deepen-relative no-progress include-tag multi_ack_detailed"; - const char *refname_nons = strip_namespace(refname); struct object_id peeled; - struct upload_pack_data *data = cb_data; if (mark_our_ref(refname_nons, refname, oid, &data->hidden_refs)) - return 0; + return; if (capabilities) { struct strbuf symref_info = STRBUF_INIT; @@ -1240,12 +1240,20 @@ static int send_ref(const char *refname, const struct object_id *oid, git_user_agent_sanitized()); strbuf_release(&symref_info); strbuf_release(&session_id); + data->sent_capabilities = 1; } else { packet_fwrite_fmt(stdout, "%s %s\n", oid_to_hex(oid), refname_nons); } capabilities = NULL; if (!peel_iterated_oid(oid, &peeled)) packet_fwrite_fmt(stdout, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons); + return; +} + +static int send_ref(const char *refname, const struct object_id *oid, + int flag UNUSED, void *cb_data) +{ + write_v0_ref(cb_data, refname, strip_namespace(refname), oid); return 0; } @@ -1379,6 +1387,10 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, data.no_done = 1; head_ref_namespaced(send_ref, &data); for_each_namespaced_ref(send_ref, &data); + if (!data.sent_capabilities) { + const char *refname = "capabilities^{}"; + write_v0_ref(&data, refname, refname, null_oid()); + } /* * fflush stdout before calling advertise_shallow_grafts because send_ref * uses stdio. ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 2023-05-17 19:24 ` [PATCH v3 " brian m. carlson 2023-05-17 19:24 ` [PATCH v3 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson @ 2023-05-17 21:48 ` Junio C Hamano 2023-05-17 22:28 ` brian m. carlson 2023-05-18 18:28 ` Jeff King 2 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2023-05-17 21:48 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Jeff King, Adam Majer "brian m. carlson" <sandals@crustytoothpaste.net> writes: > We recently fixed empty clones with SHA-256 over protocol v2 by > honouring the hash algorithm specified even when no refs are present. > However, in doing so, we made it impossible to set up a v0 or v1 > repository by cloning from an empty SHA-256 repository. In doing so, we > also broke the Git LFS testsuite for SHA-256 repositories. > > This series introduces the dummy `capabilities^{}` entry for fetches and > clones from an empty repository for v0 and v1, just as we do for clones. > This is already supported by older versions of Git, as well as libgit2, > dulwich, and JGit. > > Changes since v2: > * Move advertisement of fake capabilities ref to a separate function to > avoid an extra strcmp. We want this in -rc2 if not -rc1 for the upcoming release, right? I've read the patch again and it all looked sensible. Thanks. > Changes since v1: > * Drop patch to honour GIT_DEFAULT_HASH > * Support all requests, not just HTTP. > * Add more tests. > * Fix NULL pointer dereference. > > brian m. carlson (1): > upload-pack: advertise capabilities when cloning empty repos > > t/t5551-http-fetch-smart.sh | 27 +++++++++++++++++++++++++++ > t/t5700-protocol-v1.sh | 31 +++++++++++++++++++++++++++++-- > upload-pack.c | 22 +++++++++++++++++----- > 3 files changed, 73 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 2023-05-17 21:48 ` [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano @ 2023-05-17 22:28 ` brian m. carlson 0 siblings, 0 replies; 58+ messages in thread From: brian m. carlson @ 2023-05-17 22:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King, Adam Majer [-- Attachment #1: Type: text/plain, Size: 1279 bytes --] On 2023-05-17 at 21:48:39, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > We recently fixed empty clones with SHA-256 over protocol v2 by > > honouring the hash algorithm specified even when no refs are present. > > However, in doing so, we made it impossible to set up a v0 or v1 > > repository by cloning from an empty SHA-256 repository. In doing so, we > > also broke the Git LFS testsuite for SHA-256 repositories. > > > > This series introduces the dummy `capabilities^{}` entry for fetches and > > clones from an empty repository for v0 and v1, just as we do for clones. > > This is already supported by older versions of Git, as well as libgit2, > > dulwich, and JGit. > > > > Changes since v2: > > * Move advertisement of fake capabilities ref to a separate function to > > avoid an extra strcmp. > > We want this in -rc2 if not -rc1 for the upcoming release, right? > I've read the patch again and it all looked sensible. If it's possible, that would be great. I understand you just put out -rc0, and I apologize for the delay in getting back to you, but it would be ideal to avoid having the problem we're fixing in the release. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 2023-05-17 19:24 ` [PATCH v3 " brian m. carlson 2023-05-17 19:24 ` [PATCH v3 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson 2023-05-17 21:48 ` [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano @ 2023-05-18 18:28 ` Jeff King 2023-05-19 15:32 ` brian m. carlson 2 siblings, 1 reply; 58+ messages in thread From: Jeff King @ 2023-05-18 18:28 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Junio C Hamano, Adam Majer On Wed, May 17, 2023 at 07:24:42PM +0000, brian m. carlson wrote: > Changes since v2: > * Move advertisement of fake capabilities ref to a separate function to > avoid an extra strcmp. Thanks, the code change looks good to me. I'm still a little puzzled why we need http tests both in t5551 and t5700. Not too big a deal to have redundant tests, obviously, but I wonder if I am missing something. -Peff ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 2023-05-18 18:28 ` Jeff King @ 2023-05-19 15:32 ` brian m. carlson 0 siblings, 0 replies; 58+ messages in thread From: brian m. carlson @ 2023-05-19 15:32 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, Adam Majer [-- Attachment #1: Type: text/plain, Size: 487 bytes --] On 2023-05-18 at 18:28:38, Jeff King wrote: > I'm still a little puzzled why we need http tests both in t5551 and > t5700. Not too big a deal to have redundant tests, obviously, but I > wonder if I am missing something. Technically, they work with v0 and v1, which are separate, and I wanted to test both, not just v0. I realize that practically, they are very similar, but I prefer to be a bit more thorough. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: git clone of empty repositories doesn't preserve hash 2023-04-05 20:40 ` Junio C Hamano 2023-04-05 21:15 ` Junio C Hamano @ 2023-04-05 21:23 ` Jeff King 1 sibling, 0 replies; 58+ messages in thread From: Jeff King @ 2023-04-05 21:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Adam Majer, git On Wed, Apr 05, 2023 at 01:40:22PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Yeah, we send a special capability line in that case. If you do: > > > > GIT_TRACE_PACKET=1 git clone a b > > > > you can see that upload-pack indicates that ls-refs understands the > > "unborn" capability: > > > > packet: upload-pack> version 2 > > packet: upload-pack> agent=git/2.40.0.824.g7b678b1f643 > > packet: upload-pack> ls-refs=unborn > > packet: upload-pack> fetch=shallow wait-for-done > > packet: upload-pack> server-option > > packet: upload-pack> object-format=sha256 > > packet: upload-pack> object-info > > packet: upload-pack> 0000 > > > > And then clone asks for it say "yes, I also understand unborn": > > > > packet: clone> command=ls-refs > > packet: clone> agent=git/2.40.0.824.g7b678b1f643 > > packet: clone> object-format=sha256 > > packet: clone> 0001 > > packet: clone> peel > > packet: clone> symrefs > > packet: clone> unborn > > packet: clone> ref-prefix HEAD > > packet: clone> ref-prefix refs/heads/ > > packet: clone> ref-prefix refs/tags/ > > packet: clone> 0000 > > > > And then upload-pack can send us the extra information: > > > > packet: upload-pack> unborn HEAD symref-target:refs/heads/maestro > > packet: upload-pack> 0000 > > > > I think we'd need to do something similar here for object-format. > > I guess we only need to touch "git clone" then. Without being > asked, it advertsizes object-format=sha256 already, and when the > maestro repository is prepared without --object-format=sha256, > upload-pack advertises object-format=sha1 instead. So it probably > is just the matter of capturing it and using it to populate the > extensions.objectformat with an appropriate value. Ah, yeah, you're right. I was thinking that capability advertisement was "by the way, I understand sha256". But I may just be showing my ignorance of the current state of the hash-transition protocol extensions. :) I'm actually surprised this does not Just Work already. The client gets the intended algorithm from that capability line, which is how we know how to parse any ls-refs output at all (we are not just guessing from the length). But we only bother to set it in the local config if we have refs to fetch, which seems like a bug. So the solution is maybe something like this: diff --git a/builtin/clone.c b/builtin/clone.c index 462c286274c..5eca95cb892 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -1296,19 +1296,22 @@ int cmd_clone(int argc, const char **argv, const char *prefix) clear_bundle_list(transport->bundles); FREE_AND_NULL(transport->bundles); } } - if (mapped_refs) { + { int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport)); /* * Now that we know what algorithm the remote side is using, * let's set ours to the same thing. */ initialize_repository_version(hash_algo, 1); repo_set_hash_algo(the_repository, hash_algo); + } + + if (mapped_refs) { /* * transport_get_remote_refs() may return refs with null sha-1 * in mapped_refs (see struct transport->get_refs_list * comment). In that case we need fetch it early because * remote_head code below relies on it. ^ permalink raw reply related [flat|nested] 58+ messages in thread
end of thread, other threads:[~2023-05-19 15:32 UTC | newest] Thread overview: 58+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-05 10:28 git clone of empty repositories doesn't preserve hash Adam Majer 2023-04-05 19:04 ` Junio C Hamano 2023-04-05 19:47 ` Adam Majer 2023-04-05 20:01 ` Jeff King 2023-04-05 20:40 ` Junio C Hamano 2023-04-05 21:15 ` Junio C Hamano 2023-04-05 21:26 ` Jeff King 2023-04-05 22:48 ` brian m. carlson 2023-04-06 13:11 ` Adam Majer 2023-04-25 21:35 ` brian m. carlson 2023-04-25 22:24 ` Junio C Hamano 2023-04-25 23:12 ` Junio C Hamano 2023-04-26 0:20 ` brian m. carlson 2023-04-26 11:25 ` Jeff King 2023-04-26 15:08 ` Junio C Hamano 2023-04-26 15:13 ` [PATCH] doc: GIT_DEFAULT_HASH is and will be ignored during "clone" Junio C Hamano 2023-04-26 21:06 ` brian m. carlson 2023-04-27 4:46 ` git clone of empty repositories doesn't preserve hash Jeff King 2023-04-26 10:51 ` Jeff King 2023-04-26 15:42 ` Junio C Hamano 2023-04-26 20:40 ` brian m. carlson 2023-04-26 20:53 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 brian m. carlson 2023-04-26 20:53 ` [PATCH 1/2] http: advertise capabilities when cloning empty repos brian m. carlson 2023-04-26 21:14 ` Junio C Hamano 2023-04-26 21:28 ` brian m. carlson 2023-04-27 5:00 ` Jeff King 2023-04-27 5:30 ` Jeff King 2023-04-27 20:40 ` Junio C Hamano 2023-04-26 20:53 ` [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo brian m. carlson 2023-04-26 21:18 ` Junio C Hamano 2023-04-26 21:33 ` Junio C Hamano 2023-04-27 5:43 ` Jeff King 2023-05-02 23:46 ` Is GIT_DEFAULT_HASH flawed? Felipe Contreras 2023-05-03 9:03 ` Adam Majer 2023-05-03 15:44 ` Felipe Contreras 2023-05-03 17:21 ` Adam Majer 2023-05-08 0:34 ` Felipe Contreras 2023-05-03 9:09 ` demerphq 2023-05-03 18:20 ` Felipe Contreras 2023-05-03 22:54 ` brian m. carlson 2023-05-08 2:00 ` Felipe Contreras 2023-05-08 21:38 ` brian m. carlson 2023-05-09 10:32 ` Oswald Buddenhagen 2023-05-09 16:47 ` Junio C Hamano 2023-04-26 21:12 ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano 2023-04-27 4:56 ` git clone of empty repositories doesn't preserve hash Jeff King 2023-05-01 17:00 ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 brian m. carlson 2023-05-01 17:00 ` [PATCH v2 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson 2023-05-01 22:40 ` Jeff King 2023-05-01 22:51 ` Junio C Hamano 2023-05-01 17:37 ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano 2023-05-17 19:24 ` [PATCH v3 " brian m. carlson 2023-05-17 19:24 ` [PATCH v3 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson 2023-05-17 21:48 ` [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano 2023-05-17 22:28 ` brian m. carlson 2023-05-18 18:28 ` Jeff King 2023-05-19 15:32 ` brian m. carlson 2023-04-05 21:23 ` git clone of empty repositories doesn't preserve hash Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).