Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Felipe Contreras <felipe.contreras@gmail.com>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	git@vger.kernel.org, Adam Majer <adamm@zombino.com>
Subject: Is GIT_DEFAULT_HASH flawed?
Date: Tue, 02 May 2023 17:46:02 -0600	[thread overview]
Message-ID: <6451a0ba5c3fb_200ae2945b@chronos.notmuch> (raw)
In-Reply-To: <20230427054343.GE982277@coredump.intra.peff.net>

Hi,

Changing the subject as this message seems like a different topic.

Jeff King wrote:
> On Wed, Apr 26, 2023 at 02:33:30PM -0700, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > 
> > >  `GIT_DEFAULT_HASH`::
> > >  	If this variable is set, the default hash algorithm for new
> > >  	repositories will be set to this value. This value is currently
> > > +	ignored when cloning if the remote value can be definitively
> > > +	determined; the setting of the remote repository is used
> > > +	instead. The value is honored if the remote repository's
> > > +	algorithm cannot be determined, such as some cases when
> > > +	the remote repository is empty. The default is "sha1".
> > > +	THIS VARIABLE IS EXPERIMENTAL! See `--object-format`
> > > +	in linkgit:git-init[1].
> > 
> > We'd need to evantually cover all the transports (and non-transport
> > like the "--local" optimization) so that the object-format and other
> > choices are communicated from the origin to a new clone anyway, so
> > this extra complexity "until X is fixed, it behaves this way, but
> > otherwise the variable is read in the meantime" may be a disservice
> > to the end users, even though it may make it easier in the shorter
> > term for maintainers of programs that rely on the buggy "git clone"
> > that partially honored this environment variable.
> > 
> > In short, I am still not convinced that the above is a good design
> > choice in the longer term.
> 
> I also think it is working against the backwards-compatible design of
> the hash function transition.

To be honest this whole approach seems to be completely flawed to me and
against the whole design of git in the first place.

In a recent email Linus Torvalds explained why object ids were
calculated based {type, size, data} [1], and he explained very clearly
that two objects with exactly the same data are not supposed to have the
same id if the type is different.

If even the tiniest change such as adding a period to a commit messange
changes the object id (and thus semantically makes it a different
object), then it makes sense that changing the type of an object also
changes the object id (and thus it's also a different object).

And because the id of the parent is included in the content of every
commit, the top-level id ensures the integrity of the whole graph.

But then comes this notion that the hash algorithm is a property of the
repository, and not part of the object storage, which means changing the
whole hash algorithm of a repository is considered less of a change than
adding a period to the commit message, worse: not a change at all.

I am reminded of the warning Sam Smith gave to the Git project [2] which
seemed to be unheard, but the notion of cryptographic algorithm agility
makes complete sense to me.

In my view one repository should be able to have part SHA-1 history,
part SHA3-256 history, and part BLAKE2b history.

Changing the hash algorithm of one commit should change the object id of
that commit, and thus make it semantically a different commit.

In other words: an object of type "blob" should never be confused with
an object of type "blob:sha-256", even if the content is exactly the
same.

The fact that apparently it's so easy to clone a repository with
the wrong hash algorithm should give developers pause, as it means the
whole point of using cryptographic hash algorithms to ensure the
integrity of the commit history is completely gone.

I have not been following the SHA-1 -> OID discussions, but I
distinctively recall Linus Torvalds mentioning that the choice of using
SHA-1 wasn't even for security purposes, it was to ensure integrity.
When I do a `git fetch` as long as the new commits have the same SHA-1
as parent as the SHA-1s I have in my repository I can be relatively
certain the repository has not been tampered with. Which means that if I
do a `git fetch` that suddenly brings SHA-256 commits, some of them must
have SHA-1 parents that match the ones I currently have. Otherwise how
do I know it's the same history?

Maybe that's one of the reasons people don't seem particularly eager to
move away from SHA-1:

Better the SHA-1 you know, than the SHA-256 you don't.

Cheers.

[1] https://lore.kernel.org/git/CAHk-=wjr-CMLX2Jo2++rwcv0VNr+HmZqXEVXNsJGiPRUwNxzBQ@mail.gmail.com/
[2] https://lore.kernel.org/git/D433038A-2643-4F63-8677-CA8AB6904AE1@samuelsmith.org/

-- 
Felipe Contreras

  reply	other threads:[~2023-05-02 23:46 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-05 10:28 git clone of empty repositories doesn't preserve hash Adam Majer
2023-04-05 19:04 ` Junio C Hamano
2023-04-05 19:47   ` Adam Majer
2023-04-05 20:01     ` Jeff King
2023-04-05 20:40       ` Junio C Hamano
2023-04-05 21:15         ` Junio C Hamano
2023-04-05 21:26           ` Jeff King
2023-04-05 22:48           ` brian m. carlson
2023-04-06 13:11           ` Adam Majer
2023-04-25 21:35           ` brian m. carlson
2023-04-25 22:24             ` Junio C Hamano
2023-04-25 23:12             ` Junio C Hamano
2023-04-26  0:20               ` brian m. carlson
2023-04-26 11:25                 ` Jeff King
2023-04-26 15:08                   ` Junio C Hamano
2023-04-26 15:13                     ` [PATCH] doc: GIT_DEFAULT_HASH is and will be ignored during "clone" Junio C Hamano
2023-04-26 21:06                       ` brian m. carlson
2023-04-27  4:46                     ` git clone of empty repositories doesn't preserve hash Jeff King
2023-04-26 10:51               ` Jeff King
2023-04-26 15:42                 ` Junio C Hamano
2023-04-26 20:40                 ` brian m. carlson
2023-04-26 20:53                   ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 brian m. carlson
2023-04-26 20:53                     ` [PATCH 1/2] http: advertise capabilities when cloning empty repos brian m. carlson
2023-04-26 21:14                       ` Junio C Hamano
2023-04-26 21:28                         ` brian m. carlson
2023-04-27  5:00                           ` Jeff King
2023-04-27  5:30                       ` Jeff King
2023-04-27 20:40                         ` Junio C Hamano
2023-04-26 20:53                     ` [PATCH 2/2] Honor GIT_DEFAULT_HASH for empty clones without remote algo brian m. carlson
2023-04-26 21:18                       ` Junio C Hamano
2023-04-26 21:33                       ` Junio C Hamano
2023-04-27  5:43                         ` Jeff King
2023-05-02 23:46                           ` Felipe Contreras [this message]
2023-05-03  9:03                             ` Is GIT_DEFAULT_HASH flawed? Adam Majer
2023-05-03 15:44                               ` Felipe Contreras
2023-05-03 17:21                                 ` Adam Majer
2023-05-08  0:34                                   ` Felipe Contreras
2023-05-03  9:09                             ` demerphq
2023-05-03 18:20                               ` Felipe Contreras
2023-05-03 22:54                             ` brian m. carlson
2023-05-08  2:00                               ` Felipe Contreras
2023-05-08 21:38                                 ` brian m. carlson
2023-05-09 10:32                                   ` Oswald Buddenhagen
2023-05-09 16:47                                     ` Junio C Hamano
2023-04-26 21:12                     ` [PATCH 0/2] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano
2023-04-27  4:56                   ` git clone of empty repositories doesn't preserve hash Jeff King
2023-05-01 17:00                   ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 brian m. carlson
2023-05-01 17:00                     ` [PATCH v2 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson
2023-05-01 22:40                       ` Jeff King
2023-05-01 22:51                         ` Junio C Hamano
2023-05-01 17:37                     ` [PATCH v2 0/1] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano
2023-05-17 19:24                   ` [PATCH v3 " brian m. carlson
2023-05-17 19:24                     ` [PATCH v3 1/1] upload-pack: advertise capabilities when cloning empty repos brian m. carlson
2023-05-17 21:48                     ` [PATCH v3 0/1] Fix empty SHA-256 clones with v0 and v1 Junio C Hamano
2023-05-17 22:28                       ` brian m. carlson
2023-05-18 18:28                     ` Jeff King
2023-05-19 15:32                       ` brian m. carlson
2023-04-05 21:23         ` git clone of empty repositories doesn't preserve hash Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6451a0ba5c3fb_200ae2945b@chronos.notmuch \
    --to=felipe.contreras@gmail.com \
    --cc=adamm@zombino.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).