[RFC][PATCH 0/32] SHA256 and SHA1 interoperability

Git Mailing List Archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
@ 2023-09-08 23:05 Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256 Eric W. Biederman
                   ` (34 more replies)
  0 siblings, 35 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:05 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson


I would like to see the SHA256 transition happen so I started playing
with the k2204-transition-interop branch of brian m. carlson's tree.

Before I go farther I need to some other folks to look at this and see
if this is a general direction that the git project can stand.

This patchset is not complete it does not implement converting a
received pack of the compatibility hash into the hash function of the
repository, nor have I written any automated tests.  Both need to happen
before this is finalized.

That said I think I have working implementations of all of the
interesting cases.  In particular I have "git index-pack" computing the
compatibility hash of every object in a pack file, and I can tell you
the sha256 of every sha1 in the git://git.kernel.org/pub/scm/git/git.git

To get there I have tweaked the transition plan a little.

So far I have just aimed for code that works, so there is doubtless
room for improvement.  My hope is that I have implemented enough
that people can play with this, and that people can see all of the
weird little details that need to be taken care of to make this work.

What do everyone else think?  Does this direction look plausible?

Eric W. Biederman (24):
      doc hash-file-transition: A map file for mapping between sha1 and sha256
      doc hash-function-transition: Replace compatObjectFormat with compatMap
      object-file-convert:  Stubs for converting from one object format to another
      object-name: Initial support for ^{sha1} and ^{sha256}
      repository: add a compatibility hash algorithm
      loose: Compatibilty short name support
      object-file: Update the loose object map when writing loose objects
      bulk-checkin: Only accept blobs
      pack: Communicate the compat_oid through struct pack_idx_entry
      object-file: Add a compat_oid_in parameter to write_object_file_flags
      object: Factor out parse_mode out of fast-import and tree-walk into in object.h
      builtin/cat-file:  Let the oid determine the output algorithm
      tree-walk: init_tree_desc take an oid to get the hash algorithm
      object-file: Handle compat objects in check_object_signature
      builtin/ls-tree: Let the oid determine the output algorithm
      builtin/pack-objects:  Communicate the compatibility hash through struct pack_idx_entry
      pack-compat-map:  Add support for .compat files of a packfile
      object-file-convert: Implement convert_object_file_{begin,step,end}
      builtin/fast-import: compute compatibility hashs for imported objects
      builtin/index-pack:  Add a simple oid index
      builtin/index-pack:  Compute the compatibility hash
      builtin/index-pack: Make the stack in compute_compat_oid explicit
      unpack-objects: Update to compute and write the compatibility hashes
      object-file-convert: Implement repo_submodule_oid_to_algop

brian m. carlson (8):
      repository: Implement core.compatMap
      loose: add a mapping between SHA-1 and SHA-256 for loose objects
      bulk-checkin: hash object with compatibility algorithm
      commit: write commits for both hashes
      cache: add a function to read an OID of a specific algorithm
      object-file-convert: add a function to convert trees between algorithms
      object-file-convert: convert commit objects when writing
      object-file-convert: convert tag commits when writing


 Documentation/config/core.txt                      |   6 +
 .../technical/hash-function-transition.txt         |  56 ++-
 Makefile                                           |   4 +
 archive.c                                          |   3 +-
 builtin.h                                          |   1 +
 builtin/am.c                                       |   6 +-
 builtin/cat-file.c                                 |   8 +-
 builtin/checkout.c                                 |   8 +-
 builtin/clone.c                                    |   2 +-
 builtin/commit.c                                   |   2 +-
 builtin/fast-import.c                              | 110 +++--
 builtin/grep.c                                     |   8 +-
 builtin/index-pack.c                               | 441 ++++++++++++++++++++-
 builtin/ls-tree.c                                  |   5 +-
 builtin/merge.c                                    |   3 +-
 builtin/pack-objects.c                             |  13 +-
 builtin/read-tree.c                                |   2 +-
 builtin/show-compat-map.c                          | 139 +++++++
 builtin/stash.c                                    |   5 +-
 builtin/unpack-objects.c                           |  14 +-
 bulk-checkin.c                                     |  55 ++-
 bulk-checkin.h                                     |   6 +-
 cache-tree.c                                       |   4 +-
 commit.c                                           | 176 +++++---
 commit.h                                           |   1 +
 delta-islands.c                                    |   2 +-
 diff-lib.c                                         |   2 +-
 fsck.c                                             |   6 +-
 git.c                                              |   1 +
 hash-ll.h                                          |   3 +
 hash.h                                             |   9 +-
 http-push.c                                        |   2 +-
 list-objects.c                                     |   2 +-
 loose.c                                            | 256 ++++++++++++
 loose.h                                            |  20 +
 match-trees.c                                      |   4 +-
 merge-ort.c                                        |  11 +-
 merge-recursive.c                                  |   2 +-
 merge.c                                            |   3 +-
 object-file-convert.c                              | 366 +++++++++++++++++
 object-file-convert.h                              |  50 +++
 object-file.c                                      | 197 +++++++--
 object-name.c                                      |  77 +++-
 object-store-ll.h                                  |  13 +-
 object.c                                           |   2 +
 object.h                                           |  18 +
 pack-bitmap-write.c                                |   2 +-
 pack-compat-map.c                                  | 334 ++++++++++++++++
 pack-compat-map.h                                  |  27 ++
 pack-write.c                                       | 158 ++++++++
 pack.h                                             |   1 +
 packfile.c                                         |  15 +-
 reflog.c                                           |   2 +-
 repository.c                                       |  17 +
 repository.h                                       |   4 +
 revision.c                                         |   4 +-
 setup.c                                            |   5 +
 setup.h                                            |   1 +
 tree-walk.c                                        |  58 ++-
 tree-walk.h                                        |   7 +-
 tree.c                                             |   2 +-
 walker.c                                           |   2 +-
 62 files changed, 2525 insertions(+), 238 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-10 14:24   ` brian m. carlson
  2023-09-12  0:14   ` brian m. carlson
  2023-09-08 23:10 ` [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap Eric W. Biederman
                   ` (33 subsequent siblings)
  34 siblings, 2 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

The v3 pack index file as documented has a lot of complexity making it
difficult to implement correctly.  I worked with bryan's preliminary
implementation and it took several passes to get the bugs out.

The complexity also requires multiple table look-ups to find all of
the information that is needed to translate from one kind of oid to
another.  Which can't be good for cache locality.

Even worse coming up with a new index file version requires making
changes that have the potentialy to break anything that uses the index
of a pack file.

Instead of continuing to deal with the chance of braking things
besides the oid mapping functionality, the additional complexity in
the file format, and worry if the performance would be reasonable I
stripped down the problem to it's fundamental complexity and came up
with a file format that is exactly about mapping one kind of oid to
another, and only supports two kinds of oids.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 .../technical/hash-function-transition.txt    | 40 +++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index ed574810891c..4b937480848a 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -209,6 +209,46 @@ format described in linkgit:gitformat-pack[5], just like
 today. The content that is compressed and stored uses SHA-256 content
 instead of SHA-1 content.
 
+Per Pack Mapping Table
+~~~~~~~~~~~~~~~~~~~~~~
+A pack compat map file (.compat) files have the following format:
+
+HEADER:
+	4-byte signature:
+	    The signature is: {'C', 'M', 'A', 'P'}
+	1-byte version number:
+	    Git only writes or recognizes version 1.
+	1-byte First Object Id Version
+	    We infer the length of object IDs (OIDs) from this value:
+		1 => SHA-1
+		2 => SHA-256
+	1-byte Second Object Id Version
+	    We infer the length of object IDs (OIDs) from this value:
+		1 => SHA-1
+		2 => SHA-256
+	1-byte reserved (must be zero)
+	4-byte number of objects names contained in this mapping
+	1-byte length in bytes of shorted object names for the first object id.
+	       This is the shortest possible length needed to make the
+	       first object names unambigious.
+	1-byte reserved (must be zero)
+	1-byte length in bytes of shorted object names for the second object id.
+	       This is the shortest possible length needed to make the
+	       second object names unambigious.
+	1-byte reserved (must be zero)
+
+OBJECT NAME TABLES:
+	[Object name raw length + 4]*Number of object names
+	   This table is sorted by object name
+	   Each entry in the table is formated as:
+		[20 or 32 byte] Object name
+		4-byte index into the other object name table
+
+TRAILER:
+	checksum of the corresponding packfile, and
+
+	checksum of all of the above.
+
 Pack index
 ~~~~~~~~~~
 Pack index (.idx) files use a new v3 format that supports multiple
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256 Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-10 14:34   ` brian m. carlson
  2023-09-08 23:10 ` [PATCH 03/32] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
                   ` (32 subsequent siblings)
  34 siblings, 1 reply; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

Ir makes a lot of sense for the hash algorithm that determines how all
of the objects in the repostiory be an extension so that versions of
git that don't know about it won't even try.

For implementing the compatiblity maps that really is not the case.
An version of git that does not recognizes the won't care and continue
to use the repository as is.  The mapping functionality simply won't be
present.

Similarly if all of the objects are not mapped this could cause
some practical difficulties but it will not cause anything to perform
the wrong actions to the repository.  Some commands just won't work.
In the worst case all that needs to happen is for the compatibilty
maps to be rebuilt.

So let's use an option that forces unnecessary breakage of existing
tools.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 .../technical/hash-function-transition.txt       | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index 4b937480848a..10572c5794f9 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -148,14 +148,14 @@ Detailed Design
 Repository format extension
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 A SHA-256 repository uses repository format version `1` (see
-Documentation/technical/repository-version.txt) with extensions
-`objectFormat` and `compatObjectFormat`:
+Documentation/technical/repository-version.txt) with the extension
+`objectFormat`, and an optional core.compatMap configuration.
 
 	[core]
 		repositoryFormatVersion = 1
+		compatMap = on
 	[extensions]
 		objectFormat = sha256
-		compatObjectFormat = sha1
 
 The combination of setting `core.repositoryFormatVersion=1` and
 populating `extensions.*` ensures that all versions of Git later than
@@ -682,7 +682,7 @@ Some initial steps can be implemented independently of one another:
 - adding support for the PSRC field and safer object pruning
 
 The first user-visible change is the introduction of the objectFormat
-extension (without compatObjectFormat). This requires:
+extension. This requires:
 
 - teaching fsck about this mode of operation
 - using the hash function API (vtable) when computing object names
@@ -690,7 +690,7 @@ extension (without compatObjectFormat). This requires:
 - rejecting attempts to fetch from or push to an incompatible
   repository
 
-Next comes introduction of compatObjectFormat:
+Next comes introduction of compatMap:
 
 - implementing the loose-object-idx
 - translating object names between object formats
@@ -724,9 +724,9 @@ Over time projects would encourage their users to adopt the "early
 transition" and then "late transition" modes to take advantage of the
 new, more futureproof SHA-256 object names.
 
-When objectFormat and compatObjectFormat are both set, commands
-generating signatures would generate both SHA-1 and SHA-256 signatures
-by default to support both new and old users.
+When objectFormat and compatMap are both set, commands generating
+signatures would generate both SHA-1 and SHA-256 signatures by default
+to support both new and old users.
 
 In projects using SHA-256 heavily, users could be encouraged to adopt
 the "post-transition" mode to avoid accidentally making implicit use
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 03/32] object-file-convert:  Stubs for converting from one object format to another
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256 Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 04/32] object-name: Initial support for ^{sha1} and ^{sha256} Eric W. Biederman
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

Two basic functions are provided:
- convert_object_file Takes an object file it's type and hash algorithm
  and converts it into the equivalent object file that would
  have been generated with hash algorithm "to".

  For blob objects there is no converstion to be done and it is an
  error to use this function on them.

  For commit, tree and tag objects that embedded oids are replaced by
  the oids of the objects they refer to if those objects had been
  generated with the hash "to".

- repo_oid_to_algop which takes an oid that refers to an object file
  and returns the oid of the equavalent object file generated
  with the target hash algorithm.

Two core functions are modified:
- oid_object_info_extended is updated to detect an oid encoding
  that does not match the current repository, use repo_oid_to_algop
  to find the correspoding oid in the current repository and to return
  the data for the oid.

The pair of files object-file-convert.c and object-file-convert.h
is introduced to hold as much of this logic as possible to keep
this conversion logic cleanly separated from everything else
and in the hopes that someday the code will be clean enough
git can support compiling out support for sha1 and the
various conversion functions.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Makefile              |  1 +
 object-file-convert.c | 55 ++++++++++++++++++++++++++++
 object-file-convert.h | 24 +++++++++++++
 object-file.c         | 83 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 163 insertions(+)
 create mode 100644 object-file-convert.c
 create mode 100644 object-file-convert.h

diff --git a/Makefile b/Makefile
index 577630936535..f7e824f25cda 100644
--- a/Makefile
+++ b/Makefile
@@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += notes.o
+LIB_OBJS += object-file-convert.o
 LIB_OBJS += object-file.o
 LIB_OBJS += object-name.o
 LIB_OBJS += object.o
diff --git a/object-file-convert.c b/object-file-convert.c
new file mode 100644
index 000000000000..9f4d5b354f5f
--- /dev/null
+++ b/object-file-convert.c
@@ -0,0 +1,55 @@
+#include "git-compat-util.h"
+#include "gettext.h"
+#include "strbuf.h"
+#include "repository.h"
+#include "hash-ll.h"
+#include "object.h"
+#include "object-file-convert.h"
+
+int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
+		      const struct git_hash_algo *to, struct object_id *dest)
+{
+	/*
+	 * If the source alogirthm is not set, then we're using the
+	 * default hash algorithm for that object.
+	 */
+	const struct git_hash_algo *from =
+		src->algo ? &hash_algos[src->algo] : repo->hash_algo;
+
+	if (from == to) {
+		if (src != dest)
+			oidcpy(dest, src);
+		return 0;
+	}
+	return -1;
+}
+
+int convert_object_file(struct strbuf *outbuf,
+			const struct git_hash_algo *from,
+			const struct git_hash_algo *to,
+			const void *buf, size_t len,
+			enum object_type type,
+			int gentle)
+{
+	int ret;
+
+	/* Don't call this function when no conversion is necessary */
+	if ((from == to) || (type == OBJ_BLOB))
+		die("Refusing noop object file conversion");
+
+	switch (type) {
+	case OBJ_COMMIT:
+	case OBJ_TREE:
+	case OBJ_TAG:
+	default:
+		/* Not implemented yet, so fail. */
+		ret = -1;
+		break;
+	}
+	if (!ret)
+		return 0;
+	if (gentle)
+		return ret;
+	die(_("Failed to convert object from %s to %s"),
+		from->name, to->name);
+}
diff --git a/object-file-convert.h b/object-file-convert.h
new file mode 100644
index 000000000000..a4f802aa8eea
--- /dev/null
+++ b/object-file-convert.h
@@ -0,0 +1,24 @@
+#ifndef OBJECT_CONVERT_H
+#define OBJECT_CONVERT_H
+
+struct repository;
+struct object_id;
+struct git_hash_algo;
+struct strbuf;
+#include "object.h"
+
+int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
+		      const struct git_hash_algo *to, struct object_id *dest);
+
+/*
+ * Convert an object file from one hash algorithm to another algorithm.
+ * Return -1 on failure, 0 on success.
+ */
+int convert_object_file(struct strbuf *outbuf,
+			const struct git_hash_algo *from,
+			const struct git_hash_algo *to,
+			const void *buf, size_t len,
+			enum object_type type,
+			int gentle);
+
+#endif /* OBJECT_CONVERT_H */
diff --git a/object-file.c b/object-file.c
index 7dc0c4bfbba8..7f24f19b8a68 100644
--- a/object-file.c
+++ b/object-file.c
@@ -36,6 +36,7 @@
 #include "quote.h"
 #include "packfile.h"
 #include "object-file.h"
+#include "object-file-convert.h"
 #include "object-store.h"
 #include "oidtree.h"
 #include "path.h"
@@ -1660,10 +1661,92 @@ static int do_oid_object_info_extended(struct repository *r,
 	return 0;
 }
 
+static int oid_object_info_convert(struct repository *r,
+				   const struct object_id *input_oid,
+				   struct object_info *input_oi, unsigned flags)
+{
+	const struct git_hash_algo *input_algo = &hash_algos[input_oid->algo];
+	int do_die = flags & OBJECT_INFO_DIE_IF_CORRUPT;
+	struct strbuf type_name = STRBUF_INIT;
+	struct object_info oi = *input_oi;
+	struct object_id oid, delta_base_oid;
+	unsigned long size;
+	void *content;
+	int ret;
+
+	if (repo_oid_to_algop(r, input_oid, the_hash_algo, &oid)) {
+		if (do_die)
+			die(_("missing mapping of %s to %s"),
+			    oid_to_hex(input_oid), the_hash_algo->name);
+		return -1;
+	}
+
+	/* Do we need to convert the delta base oid? */
+	if (oi.delta_base_oid)
+		oi.delta_base_oid = &delta_base_oid;
+
+	/* Do we need attributes that differ when converted? */
+	if (oi.sizep || oi.contentp) {
+		oi.contentp = &content;
+		oi.sizep = &size;
+		oi.type_name = &type_name;
+	}
+
+	ret = oid_object_info_extended(r, &oid, &oi, flags);
+	if (ret)
+		return -1;
+
+	if (oi.contentp == &content) {
+		struct strbuf outbuf = STRBUF_INIT;
+		enum object_type type;
+
+		type = type_from_string_gently(type_name.buf, type_name.len,
+					       !do_die);
+		if (type == -1)
+			return -1;
+		if (type != OBJ_BLOB) {
+			ret = convert_object_file(&outbuf,
+						  the_hash_algo, input_algo,
+						  content, size, type, !do_die);
+			if (ret == -1)
+				return -1;
+			size = outbuf.len;
+			content = strbuf_detach(&outbuf, NULL);
+		}
+		if (input_oi->sizep)
+			*input_oi->sizep = size;
+		if (input_oi->contentp)
+			*input_oi->contentp = content;
+		else
+			free(content);
+		if (input_oi->type_name)
+			*input_oi->type_name = type_name;
+		else
+			strbuf_release(&type_name);
+	}
+	if (oi.delta_base_oid == &delta_base_oid) {
+		if (repo_oid_to_algop(r, &delta_base_oid, input_algo,
+				 input_oi->delta_base_oid)) {
+			if (do_die)
+				die(_("missing mapping of %s to %s"),
+				    oid_to_hex(&delta_base_oid),
+				    input_algo->name);
+			return -1;
+		}
+	}
+	input_oi->whence = oi.whence;
+	input_oi->u = oi.u;
+	return ret;
+}
+
 int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 			     struct object_info *oi, unsigned flags)
 {
 	int ret;
+
+	if (oid->algo && (hash_algo_by_ptr(r->hash_algo) != oid->algo))
+		return oid_object_info_convert(r, oid, oi, flags);
+
 	obj_read_lock();
 	ret = do_oid_object_info_extended(r, oid, oi, flags);
 	obj_read_unlock();
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 04/32] object-name: Initial support for ^{sha1} and ^{sha256}
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (2 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 03/32] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 05/32] repository: add a compatibility hash algorithm Eric W. Biederman
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

In Documentation/technical/hash-function-transition.txt it suggests
supporting references like abac87a^{sha1} and f787cac^{sha256}.

This changes goes a step farther and supports a short oid in any
algorithm, and to just ensures enough of the oid is present to
disambiguate between all possible oids in any algorithm.

Support for suffixes of ^{sha1} and ^{sha256} is implemented as it is
easy, and can be handy for testing.  To support this mode of operation
two flags are added: GET_OID_SHA1, and GET_OID_SHA256.

By default when an oid is specified in an algorithm that does not
match the algorithm of the repository, the oid is translated to the
oid that matches the hash algorithm of the repository.  This ensures
oids that don't match the repository hash algorithm can be used
everywhere oids can currently be used.

A new flag is added GET_OID_UNTRANSLATED that suppresses the
translation of an oid into the repositories hash algorithm.
This is useful for testing and raw tools like git cat-file.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 hash-ll.h     |  3 +++
 object-name.c | 59 +++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/hash-ll.h b/hash-ll.h
index 10d84cc20888..2a4f72d70c3f 100644
--- a/hash-ll.h
+++ b/hash-ll.h
@@ -143,8 +143,11 @@ struct object_id {
 #define GET_OID_BLOB             040
 #define GET_OID_FOLLOW_SYMLINKS 0100
 #define GET_OID_RECORD_PATH     0200
+#define GET_OID_SHA1           01000
+#define GET_OID_SHA256         02000
 #define GET_OID_ONLY_TO_DIE    04000
 #define GET_OID_REQUIRE_PATH  010000
+#define GET_OID_UNTRANSLATED  020000
 
 #define GET_OID_DISAMBIGUATORS \
 	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
diff --git a/object-name.c b/object-name.c
index 0bfa29dbbfe9..ebe87f5c4fdd 100644
--- a/object-name.c
+++ b/object-name.c
@@ -25,6 +25,7 @@
 #include "midx.h"
 #include "commit-reach.h"
 #include "date.h"
+#include "object-file-convert.h"
 
 static int get_oid_oneline(struct repository *r, const char *, struct object_id *, struct commit_list *);
 
@@ -32,6 +33,7 @@ typedef int (*disambiguate_hint_fn)(struct repository *, const struct object_id
 
 struct disambiguate_state {
 	int len; /* length of prefix in hex chars */
+	int algo;
 	char hex_pfx[GIT_MAX_HEXSZ + 1];
 	struct object_id bin_pfx;
 
@@ -49,6 +51,10 @@ struct disambiguate_state {
 
 static void update_candidates(struct disambiguate_state *ds, const struct object_id *current)
 {
+	/* Is the oid encoded in the desired algo? */
+	if (ds->algo && (current->algo != ds->algo))
+		return;
+
 	if (ds->always_call_fn) {
 		ds->ambiguous = ds->fn(ds->repo, current, ds->cb_data) ? 1 : 0;
 		return;
@@ -134,6 +140,8 @@ static void unique_in_midx(struct multi_pack_index *m,
 {
 	uint32_t num, i, first = 0;
 	const struct object_id *current = NULL;
+	int len = ds->len > ds->repo->hash_algo->hexsz ?
+		ds->repo->hash_algo->hexsz : ds->len;
 	num = m->num_objects;
 
 	if (!num)
@@ -149,7 +157,7 @@ static void unique_in_midx(struct multi_pack_index *m,
 	for (i = first; i < num && !ds->ambiguous; i++) {
 		struct object_id oid;
 		current = nth_midxed_object_oid(&oid, m, i);
-		if (!match_hash(ds->len, ds->bin_pfx.hash, current->hash))
+		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
 			break;
 		update_candidates(ds, current);
 	}
@@ -159,6 +167,8 @@ static void unique_in_pack(struct packed_git *p,
 			   struct disambiguate_state *ds)
 {
 	uint32_t num, i, first = 0;
+	int len = ds->len > ds->repo->hash_algo->hexsz ?
+		ds->repo->hash_algo->hexsz : ds->len;
 
 	if (p->multi_pack_index)
 		return;
@@ -177,7 +187,7 @@ static void unique_in_pack(struct packed_git *p,
 	for (i = first; i < num && !ds->ambiguous; i++) {
 		struct object_id oid;
 		nth_packed_object_id(&oid, p, i);
-		if (!match_hash(ds->len, ds->bin_pfx.hash, oid.hash))
+		if (!match_hash(len, ds->bin_pfx.hash, oid.hash))
 			break;
 		update_candidates(ds, &oid);
 	}
@@ -188,6 +198,10 @@ static void find_short_packed_object(struct disambiguate_state *ds)
 	struct multi_pack_index *m;
 	struct packed_git *p;
 
+	/* Skip, unless oids from the repository algorithm are wanted */
+	if (ds->algo && (&hash_algos[ds->algo] != ds->repo->hash_algo))
+		return;
+
 	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
 	     m = m->next)
 		unique_in_midx(m, ds);
@@ -330,7 +344,7 @@ static int init_object_disambiguation(struct repository *r,
 {
 	int i;
 
-	if (len < MINIMUM_ABBREV || len > the_hash_algo->hexsz)
+	if (len < MINIMUM_ABBREV || len > GIT_MAX_HEXSZ)
 		return -1;
 
 	memset(ds, 0, sizeof(*ds));
@@ -357,6 +371,7 @@ static int init_object_disambiguation(struct repository *r,
 	ds->len = len;
 	ds->hex_pfx[len] = '\0';
 	ds->repo = r;
+	ds->algo = GIT_HASH_UNKNOWN;
 	prepare_alt_odb(r);
 	return 0;
 }
@@ -491,9 +506,10 @@ static int repo_collect_ambiguous(struct repository *r UNUSED,
 	return collect_ambiguous(oid, data);
 }
 
-static int sort_ambiguous(const void *a, const void *b, void *ctx)
+static int sort_ambiguous(const void *va, const void *vb, void *ctx)
 {
 	struct repository *sort_ambiguous_repo = ctx;
+	const struct object_id *a = va, *b = vb;
 	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
 	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
 	int a_type_sort;
@@ -503,8 +519,13 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
 	 * Sorts by hash within the same object type, just as
 	 * oid_array_for_each_unique() would do.
 	 */
-	if (a_type == b_type)
-		return oidcmp(a, b);
+	if (a_type == b_type) {
+		/* Is the hash algorithm the same? */
+		if (a->algo == b->algo)
+			return oidcmp(a, b);
+		else
+			return a->algo < b->algo ? -1 : 1;
+	}
 
 	/*
 	 * Between object types show tags, then commits, and finally
@@ -553,6 +574,11 @@ static enum get_oid_result get_short_oid(struct repository *r,
 	else
 		ds.fn = default_disambiguate_hint;
 
+	if (flags & GET_OID_SHA1)
+		ds.algo = GIT_HASH_SHA1;
+	else if (flags & GET_OID_SHA256)
+		ds.algo = GIT_HASH_SHA256;
+
 	find_short_object_filename(&ds);
 	find_short_packed_object(&ds);
 	status = finish_object_disambiguation(&ds, oid);
@@ -606,6 +632,15 @@ static enum get_oid_result get_short_oid(struct repository *r,
 		strbuf_release(&out.sb);
 	}
 
+	/* Ensure oid->algo is set */
+	if (oid->algo == GIT_HASH_UNKNOWN)
+		oid->algo = hash_algo_by_ptr(r->hash_algo);
+
+	/* Return oids using the repository's hash algorithm */
+	if ((&hash_algos[oid->algo] != r->hash_algo) &&
+	    !(flags & GET_OID_UNTRANSLATED))
+		repo_oid_to_algop(r, oid, r->hash_algo, oid);
+
 	return status;
 }
 
@@ -787,10 +822,12 @@ void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
 int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 			      const struct object_id *oid, int len)
 {
+	const struct git_hash_algo *algo =
+		oid->algo ? &hash_algos[oid->algo] : r->hash_algo;
 	struct disambiguate_state ds;
 	struct min_abbrev_data mad;
 	struct object_id oid_ret;
-	const unsigned hexsz = r->hash_algo->hexsz;
+	const unsigned hexsz = algo->hexsz;
 
 	if (len < 0) {
 		unsigned long count = repo_approximate_object_count(r);
@@ -1158,6 +1195,14 @@ static int peel_onion(struct repository *r, const char *name, int len,
 		return -1;
 
 	sp++; /* beginning of type name, or closing brace for empty */
+
+	if (starts_with(sp, "sha1}"))
+		return get_short_oid(r, name, len - 7, oid,
+				     lookup_flags | GET_OID_SHA1);
+	else if (starts_with(sp, "sha256"))
+		return get_short_oid(r, name, len - 9, oid,
+				     lookup_flags | GET_OID_SHA256);
+
 	if (starts_with(sp, "commit}"))
 		expected_type = OBJ_COMMIT;
 	else if (starts_with(sp, "tag}"))
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 05/32] repository: add a compatibility hash algorithm
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (3 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 04/32] object-name: Initial support for ^{sha1} and ^{sha256} Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 06/32] repository: Implement core.compatMap Eric W. Biederman
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

We currently have support for using a full stage 4 SHA-256
implementation.  However, we'd like to support interoperability with
SHA-1 repositories as well.  The transition plan anticipates a
compatibility hash algorithm configuration option that we can use to
implement support for this.  Let's add an element to the repository
structure that indicates the compatibility hash algorithm so we can use
it when we need to consider interoperability between algorithms.

For now, we always set it to NULL, but we'll initialize it differently
in the future.

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 repository.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/repository.h b/repository.h
index 5f18486f6465..6c4130f0c36e 100644
--- a/repository.h
+++ b/repository.h
@@ -160,6 +160,9 @@ struct repository {
 	/* Repository's current hash algorithm, as serialized on disk. */
 	const struct git_hash_algo *hash_algo;
 
+	/* Repository's compatibility hash algorithm. */
+	const struct git_hash_algo *compat_hash_algo;
+
 	/* A unique-id for tracing purposes. */
 	int trace2_repo_id;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 06/32] repository: Implement core.compatMap
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (4 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 05/32] repository: add a compatibility hash algorithm Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 07/32] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

Add a configuration option to enable updating and reading from
compatibility hash maps when git accesses the reposotiry.

Add a helper function repo_enable_compat_map that when passed false
disables the compatiblily hash algorithm and when passed true computes
the compatibilty hash algorithm and sets "repo->compat_hash_algo".

For now the option is limited to being specified in ".git/config".
Perhaps in the future we can allow specifying it in ".gitconfig" as
well.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Documentation/config/core.txt |  6 ++++++
 repository.c                  | 11 +++++++++++
 repository.h                  |  1 +
 setup.c                       |  5 +++++
 setup.h                       |  1 +
 5 files changed, 24 insertions(+)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index dfbdaf00b8bc..a9eb2006cc32 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -736,3 +736,9 @@ core.abbrev::
 	If set to "no", no abbreviation is made and the object names
 	are shown in their full length.
 	The minimum length is 4.
+
+core.compatMap::
+	Enables the use of a compat map to recored the hash in the
+	other object format.  This allows repositories in different
+	objects formats to interoperate.  It allows looking up old oids
+	in a repository that has been converted from sha1 to sha256.
diff --git a/repository.c b/repository.c
index a7679ceeaa45..de620d82bfc6 100644
--- a/repository.c
+++ b/repository.c
@@ -104,6 +104,16 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
 	repo->hash_algo = &hash_algos[hash_algo];
 }
 
+void repo_enable_compat_map(struct repository *repo, int enable_compat)
+{
+	const struct git_hash_algo *other_algo =
+		&hash_algos[(hash_algo_by_ptr(repo->hash_algo) == GIT_HASH_SHA1) ?
+			GIT_HASH_SHA256 :
+			GIT_HASH_SHA1];
+
+	repo->compat_hash_algo = enable_compat ? other_algo : NULL;
+}
+
 /*
  * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
  * Return 0 upon success and a non-zero value upon failure.
@@ -184,6 +194,7 @@ int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
+	repo_enable_compat_map(repo, format.use_compat_map);
 	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
diff --git a/repository.h b/repository.h
index 6c4130f0c36e..03cadf6d9a98 100644
--- a/repository.h
+++ b/repository.h
@@ -202,6 +202,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
 		     const struct set_gitdir_args *extra_args);
 void repo_set_worktree(struct repository *repo, const char *path);
 void repo_set_hash_algo(struct repository *repo, int algo);
+void repo_enable_compat_map(struct repository *repo, int enable_compat);
 void initialize_the_repository(void);
 RESULT_MUST_BE_USED
 int repo_init(struct repository *r, const char *gitdir, const char *worktree);
diff --git a/setup.c b/setup.c
index 18927a847b86..b4d32bd820f1 100644
--- a/setup.c
+++ b/setup.c
@@ -623,6 +623,8 @@ static int check_repo_format(const char *var, const char *value,
 			return 0;
 		}
 	}
+	else if (strcmp(var, "core.compatmap") == 0)
+		data->use_compat_map = git_config_bool(var, value);
 
 	return read_worktree_config(var, value, ctx, vdata);
 }
@@ -1564,8 +1566,10 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		}
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			repo_enable_compat_map(the_repository, repo_fmt.use_compat_map);
 			the_repository->repository_format_worktree_config =
 				repo_fmt.worktree_config;
+
 			/* take ownership of repo_fmt.partial_clone */
 			the_repository->repository_format_partial_clone =
 				repo_fmt.partial_clone;
@@ -1657,6 +1661,7 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
+	repo_enable_compat_map(the_repository, fmt->use_compat_map);
 	the_repository->repository_format_worktree_config =
 		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =
diff --git a/setup.h b/setup.h
index 58fd2605dd26..afa05b2b64f3 100644
--- a/setup.h
+++ b/setup.h
@@ -86,6 +86,7 @@ struct repository_format {
 	int worktree_config;
 	int is_bare;
 	int hash_algo;
+	int use_compat_map;
 	int sparse_index;
 	char *work_tree;
 	struct string_list unknown_extensions;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 07/32] loose: add a mapping between SHA-1 and SHA-256 for loose objects
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (5 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 06/32] repository: Implement core.compatMap Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 08/32] loose: Compatibilty short name support Eric W. Biederman
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

As part of the transition plan, we'd like to add a file in the .git
directory that maps loose objects between SHA-1 and SHA-256.  Let's
implement the specification in the transition plan and store this data
on a per-repository basis in struct repository.

****
- split repo_object_map between repo_loose_object_map_oid and
  repo_oid_to_algop.
- Verified the loose_map is set in repo_loose_object_map_oid

-- EWB

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 Makefile              |   1 +
 loose.c               | 243 ++++++++++++++++++++++++++++++++++++++++++
 loose.h               |  20 ++++
 object-file-convert.c |  14 ++-
 object-store-ll.h     |   3 +
 object.c              |   2 +
 repository.c          |   6 ++
 7 files changed, 288 insertions(+), 1 deletion(-)
 create mode 100644 loose.c
 create mode 100644 loose.h

diff --git a/Makefile b/Makefile
index f7e824f25cda..3c18664def9a 100644
--- a/Makefile
+++ b/Makefile
@@ -1053,6 +1053,7 @@ LIB_OBJS += list-objects-filter.o
 LIB_OBJS += list-objects.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
+LIB_OBJS += loose.o
 LIB_OBJS += ls-refs.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
diff --git a/loose.c b/loose.c
new file mode 100644
index 000000000000..8ddb7112a541
--- /dev/null
+++ b/loose.c
@@ -0,0 +1,243 @@
+#include "git-compat-util.h"
+#include "hash.h"
+#include "path.h"
+#include "object-store.h"
+#include "hex.h"
+#include "wrapper.h"
+#include "gettext.h"
+#include "loose.h"
+#include "lockfile.h"
+
+static const char *loose_object_header = "# loose-object-idx\n";
+
+static inline int should_use_loose_object_map(struct repository *repo)
+{
+	return repo->compat_hash_algo && repo->gitdir;
+}
+
+void loose_object_map_init(struct loose_object_map **map)
+{
+	struct loose_object_map *m;
+	m = xmalloc(sizeof(**map));
+	m->to_compat = kh_init_oid_map();
+	m->to_storage = kh_init_oid_map();
+	*map = m;
+}
+
+static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const struct object_id *value)
+{
+	khiter_t pos;
+	int ret;
+	struct object_id *stored;
+
+	pos = kh_put_oid_map(map, *key, &ret);
+
+	/* This item already exists in the map. */
+	if (ret == 0)
+		return 0;
+
+	stored = xmalloc(sizeof(*stored));
+	oidcpy(stored, value);
+	kh_value(map, pos) = stored;
+	return 1;
+}
+
+static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
+{
+	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
+	FILE *fp;
+
+	if (!dir->loose_map)
+		loose_object_map_init(&dir->loose_map);
+
+	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
+	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
+
+	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
+	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
+
+	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
+	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
+
+	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
+	fp = fopen(path.buf, "rb");
+	if (!fp)
+		return 0;
+
+	errno = 0;
+	if (strbuf_getwholeline(&buf, fp, '\n') || strcmp(buf.buf, loose_object_header))
+		goto err;
+	while (!strbuf_getline_lf(&buf, fp)) {
+		const char *p;
+		struct object_id oid, compat_oid;
+		if (parse_oid_hex_algop(buf.buf, &oid, &p, repo->hash_algo) ||
+		    *p++ != ' ' ||
+		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
+		    p != buf.buf + buf.len)
+			goto err;
+		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
+		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
+	}
+
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return errno ? -1 : 0;
+err:
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return -1;
+}
+
+int repo_read_loose_object_map(struct repository *repo)
+{
+	struct object_directory *dir;
+
+	if (!should_use_loose_object_map(repo))
+		return 0;
+
+	prepare_alt_odb(repo);
+
+	for (dir = repo->objects->odb; dir; dir = dir->next) {
+		if (load_one_loose_object_map(repo, dir) < 0) {
+			return -1;
+		}
+	}
+	return 0;
+}
+
+int repo_write_loose_object_map(struct repository *repo)
+{
+	kh_oid_map_t *map = repo->objects->odb->loose_map->to_compat;
+	struct lock_file lock;
+	int fd;
+	khiter_t iter;
+	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
+
+	if (!should_use_loose_object_map(repo))
+		return 0;
+
+	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
+	fd = hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
+	iter = kh_begin(map);
+	if (write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
+		goto errout;
+
+	for (; iter != kh_end(map); iter++) {
+		if (kh_exist(map, iter)) {
+			if (oideq(&kh_key(map, iter), the_hash_algo->empty_tree) ||
+			    oideq(&kh_key(map, iter), the_hash_algo->empty_blob))
+				continue;
+			strbuf_addf(&buf, "%s %s\n", oid_to_hex(&kh_key(map, iter)), oid_to_hex(kh_value(map, iter)));
+			if (write_in_full(fd, buf.buf, buf.len) < 0)
+				goto errout;
+			strbuf_reset(&buf);
+		}
+	}
+	strbuf_release(&buf);
+	if (commit_lock_file(&lock) < 0) {
+		error_errno(_("could not write loose object index %s"), path.buf);
+		strbuf_release(&path);
+		return -1;
+	}
+	strbuf_release(&path);
+	return 0;
+errout:
+	rollback_lock_file(&lock);
+	strbuf_release(&buf);
+	error_errno(_("failed to write loose object index %s\n"), path.buf);
+	strbuf_release(&path);
+	return -1;
+}
+
+static int write_one_object(struct repository *repo, const struct object_id *oid,
+			    const struct object_id *compat_oid)
+{
+	struct lock_file lock;
+	int fd;
+	struct stat st;
+	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
+
+	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
+	hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
+
+	fd = open(path.buf, O_WRONLY | O_CREAT | O_APPEND, 0666);
+	if (fd < 0)
+		goto errout;
+	if (fstat(fd, &st) < 0)
+		goto errout;
+	if (!st.st_size && write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
+		goto errout;
+
+	strbuf_addf(&buf, "%s %s\n", oid_to_hex(oid), oid_to_hex(compat_oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0)
+		goto errout;
+	if (close(fd))
+		goto errout;
+	adjust_shared_perm(path.buf);
+	rollback_lock_file(&lock);
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return 0;
+errout:
+	error_errno(_("failed to write loose object index %s\n"), path.buf);
+	close(fd);
+	rollback_lock_file(&lock);
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return -1;
+}
+
+int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
+			      const struct object_id *compat_oid)
+{
+	int inserted = 0;
+
+	if (!should_use_loose_object_map(repo))
+		return 0;
+
+	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
+	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
+	if (inserted)
+		return write_one_object(repo, oid, compat_oid);
+	return 0;
+}
+
+int repo_loose_object_map_oid(struct repository *repo, struct object_id *dest,
+			      const struct git_hash_algo *to,
+			      const struct object_id *src)
+{
+	struct object_directory *dir;
+	kh_oid_map_t *map;
+	khiter_t pos;
+
+	for (dir = repo->objects->odb; dir; dir = dir->next) {
+		struct loose_object_map *loose_map = dir->loose_map;
+		if (!loose_map)
+			continue;
+		map = (to == repo->compat_hash_algo) ?
+			loose_map->to_compat :
+			loose_map->to_storage;
+		pos = kh_get_oid_map(map, *src);
+		if (pos < kh_end(map)) {
+			oidcpy(dest, kh_value(map, pos));
+			return 0;
+		}
+	}
+	return -1;
+}
+
+void loose_object_map_clear(struct loose_object_map **map)
+{
+	struct loose_object_map *m = *map;
+	struct object_id *oid;
+
+	if (!m)
+		return;
+
+	kh_foreach_value(m->to_compat, oid, free(oid));
+	kh_foreach_value(m->to_storage, oid, free(oid));
+	kh_destroy_oid_map(m->to_compat);
+	kh_destroy_oid_map(m->to_storage);
+	free(m);
+	*map = NULL;
+}
diff --git a/loose.h b/loose.h
new file mode 100644
index 000000000000..061c6937aead
--- /dev/null
+++ b/loose.h
@@ -0,0 +1,20 @@
+#ifndef LOOSE_H
+#define LOOSE_H
+
+#include "khash.h"
+
+struct loose_object_map {
+	kh_oid_map_t *to_compat;
+	kh_oid_map_t *to_storage;
+};
+
+void loose_object_map_init(struct loose_object_map **map);
+void loose_object_map_clear(struct loose_object_map **map);
+int repo_loose_object_map_oid(struct repository *repo, struct object_id *dest,
+	const struct git_hash_algo *dest_algo, const struct object_id *src);
+int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
+			      const struct object_id *compat_oid);
+int repo_read_loose_object_map(struct repository *repo);
+int repo_write_loose_object_map(struct repository *repo);
+
+#endif
diff --git a/object-file-convert.c b/object-file-convert.c
index 9f4d5b354f5f..e7c62434016d 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -4,6 +4,7 @@
 #include "repository.h"
 #include "hash-ll.h"
 #include "object.h"
+#include "loose.h"
 #include "object-file-convert.h"
 
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
@@ -21,7 +22,18 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 			oidcpy(dest, src);
 		return 0;
 	}
-	return -1;
+	if (repo_loose_object_map_oid(repo, dest, to, src)) {
+		/*
+		 * We may have loaded the object map at repo initialization but
+		 * another process (perhaps upstream of a pipe from us) may have
+		 * written a new object into the map.  If the object is missing,
+		 * let's reload the map to see if the object has appeared.
+		 */
+		repo_read_loose_object_map(repo);
+		if (repo_loose_object_map_oid(repo, dest, to, src))
+			return -1;
+	}
+	return 0;
 }
 
 int convert_object_file(struct strbuf *outbuf,
diff --git a/object-store-ll.h b/object-store-ll.h
index 26a3895c821c..bc76d6bec80d 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -26,6 +26,9 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/* Map between object IDs for loose objects. */
+	struct loose_object_map *loose_map;
+
 	/*
 	 * This is a temporary object store created by the tmp_objdir
 	 * facility. Disable ref updates since the objects in the store
diff --git a/object.c b/object.c
index 2c61e4c86217..186a0a47c0fb 100644
--- a/object.c
+++ b/object.c
@@ -13,6 +13,7 @@
 #include "alloc.h"
 #include "packfile.h"
 #include "commit-graph.h"
+#include "loose.h"
 
 unsigned int get_max_object_index(void)
 {
@@ -540,6 +541,7 @@ void free_object_directory(struct object_directory *odb)
 {
 	free(odb->path);
 	odb_clear_loose_cache(odb);
+	loose_object_map_clear(&odb->loose_map);
 	free(odb);
 }
 
diff --git a/repository.c b/repository.c
index de620d82bfc6..4ab44d3b0344 100644
--- a/repository.c
+++ b/repository.c
@@ -14,6 +14,7 @@
 #include "read-cache-ll.h"
 #include "remote.h"
 #include "setup.h"
+#include "loose.h"
 #include "submodule-config.h"
 #include "sparse-index.h"
 #include "trace2.h"
@@ -112,6 +113,8 @@ void repo_enable_compat_map(struct repository *repo, int enable_compat)
 			GIT_HASH_SHA1];
 
 	repo->compat_hash_algo = enable_compat ? other_algo : NULL;
+	if (enable_compat)
+		repo_read_loose_object_map(repo);
 }
 
 /*
@@ -204,6 +207,9 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	if (repo->compat_hash_algo)
+		repo_read_loose_object_map(repo);
+
 	clear_repository_format(&format);
 	return 0;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 08/32] loose: Compatibilty short name support
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (6 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 07/32] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 09/32] object-file: Update the loose object map when writing loose objects Eric W. Biederman
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

Update loose_objects_cache when udpating the loose objects map.  This
oidtree is used to discover which oids are possibilities when
resolving short names, and it can support a mixture of sha1
and sha256 oids.

With this any oid recorded objects/loose-objects-idx is usable
for resolving an oid to an object.

To make this maintainable a helper insert_loose_map is factored
out of load_one_loose_object_map and repo_add_loose_object_map,
and then modified to also update the loose_objects_cache.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 loose.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/loose.c b/loose.c
index 8ddb7112a541..81ed0e6b0c1e 100644
--- a/loose.c
+++ b/loose.c
@@ -7,6 +7,7 @@
 #include "gettext.h"
 #include "loose.h"
 #include "lockfile.h"
+#include "oidtree.h"
 
 static const char *loose_object_header = "# loose-object-idx\n";
 
@@ -42,6 +43,21 @@ static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const
 	return 1;
 }
 
+static int insert_loose_map(struct object_directory *odb,
+			    const struct object_id *oid,
+			    const struct object_id *compat_oid)
+{
+	struct loose_object_map *map = odb->loose_map;
+	int inserted = 0;
+
+	inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
+	inserted |= insert_oid_pair(map->to_storage, compat_oid, oid);
+	if (inserted)
+		oidtree_insert(odb->loose_objects_cache, compat_oid);
+
+	return inserted;
+}
+
 static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
 {
 	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
@@ -49,15 +65,14 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
 
 	if (!dir->loose_map)
 		loose_object_map_init(&dir->loose_map);
+	if (!dir->loose_objects_cache) {
+		ALLOC_ARRAY(dir->loose_objects_cache, 1);
+		oidtree_init(dir->loose_objects_cache);
+	}
 
-	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
-	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
-
-	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
-	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
-
-	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
-	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
+	insert_loose_map(dir, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
+	insert_loose_map(dir, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
+	insert_loose_map(dir, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
 
 	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
 	fp = fopen(path.buf, "rb");
@@ -75,8 +90,7 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
 		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
 		    p != buf.buf + buf.len)
 			goto err;
-		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
-		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
+		insert_loose_map(dir, &oid, &compat_oid);
 	}
 
 	strbuf_release(&buf);
@@ -195,8 +209,7 @@ int repo_add_loose_object_map(struct repository *repo, const struct object_id *o
 	if (!should_use_loose_object_map(repo))
 		return 0;
 
-	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
-	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
+	inserted = insert_loose_map(repo->objects->odb, oid, compat_oid);
 	if (inserted)
 		return write_one_object(repo, oid, compat_oid);
 	return 0;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 09/32] object-file: Update the loose object map when writing loose objects
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (7 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 08/32] loose: Compatibilty short name support Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 10/32] bulk-checkin: Only accept blobs Eric W. Biederman
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

To implement SHA1 compatibility on SHA256 repositories the loose
object map needs to be updated whenver a loose object is written.
Updating the loose object map this way allows git to support
the old hash algorithm in constant time.

The functions write_loose_object, and stream_loose_object are
the only two functions that write to the loose object store.

Update stream_loose_object to compute the compatibiilty hash, update
the loose object, and then call repo_add_loose_object_map to update
the loose object map.

Update write_object_file_flags to convert the object into
it's compatibility encoding, hash the compatibility encoding,
write the object, and then update the loose object map.

Update force_object_loose to lookup the hash of the compatibility
encoding, write the loose object, and then update the loose object
map.

Update write_object_file_litterally to refuse to write any objects
when a compatibility encoding is enabled.  The problem is that
write_object_file_literally is frequently used to write ill-formed
objects.  Especially when the type of those objects is changed there
is by definition no possibile way to convert them, as no converstion
has been defined.

Since a compatibilty encoding can not be found and a compatibility
mapping can not be written the cleanest behavior is to simply
disallow write_object_file_literraly from writing files.

Except that the loose objects are updated before the loose object map
I have not done any analysis to see how robust this scheme is in the
event of failure.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 object-file.c | 94 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 76 insertions(+), 18 deletions(-)

diff --git a/object-file.c b/object-file.c
index 7f24f19b8a68..6a14b8875343 100644
--- a/object-file.c
+++ b/object-file.c
@@ -44,6 +44,7 @@
 #include "setup.h"
 #include "submodule.h"
 #include "fsck.h"
+#include "loose.h"
 
 /* The maximum size for an object header. */
 #define MAX_HEADER_LEN 32
@@ -2035,9 +2036,12 @@ static int start_loose_object_common(struct strbuf *tmp_file,
 				     const char *filename, unsigned flags,
 				     git_zstream *stream,
 				     unsigned char *buf, size_t buflen,
-				     git_hash_ctx *c,
+				     git_hash_ctx *c, git_hash_ctx *compat_c,
 				     char *hdr, int hdrlen)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	int fd;
 
 	fd = create_tmpfile(tmp_file, filename);
@@ -2057,14 +2061,18 @@ static int start_loose_object_common(struct strbuf *tmp_file,
 	git_deflate_init(stream, zlib_compression_level);
 	stream->next_out = buf;
 	stream->avail_out = buflen;
-	the_hash_algo->init_fn(c);
+	algo->init_fn(c);
+	if (compat && compat_c)
+		compat->init_fn(compat_c);
 
 	/*  Start to feed header to zlib stream */
 	stream->next_in = (unsigned char *)hdr;
 	stream->avail_in = hdrlen;
 	while (git_deflate(stream, 0) == Z_OK)
 		; /* nothing */
-	the_hash_algo->update_fn(c, hdr, hdrlen);
+	algo->update_fn(c, hdr, hdrlen);
+	if (compat && compat_c)
+		compat->update_fn(compat_c, hdr, hdrlen);
 
 	return fd;
 }
@@ -2073,16 +2081,21 @@ static int start_loose_object_common(struct strbuf *tmp_file,
  * Common steps for the inner git_deflate() loop for writing loose
  * objects. Returns what git_deflate() returns.
  */
-static int write_loose_object_common(git_hash_ctx *c,
+static int write_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
 				     git_zstream *stream, const int flush,
 				     unsigned char *in0, const int fd,
 				     unsigned char *compressed,
 				     const size_t compressed_len)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	int ret;
 
 	ret = git_deflate(stream, flush ? Z_FINISH : 0);
-	the_hash_algo->update_fn(c, in0, stream->next_in - in0);
+	algo->update_fn(c, in0, stream->next_in - in0);
+	if (compat && compat_c)
+		compat->update_fn(compat_c, in0, stream->next_in - in0);
 	if (write_in_full(fd, compressed, stream->next_out - compressed) < 0)
 		die_errno(_("unable to write loose object file"));
 	stream->next_out = compressed;
@@ -2097,15 +2110,21 @@ static int write_loose_object_common(git_hash_ctx *c,
  * - End the compression of zlib stream.
  * - Get the calculated oid to "oid".
  */
-static int end_loose_object_common(git_hash_ctx *c, git_zstream *stream,
-				   struct object_id *oid)
+static int end_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
+				   git_zstream *stream, struct object_id *oid,
+				   struct object_id *compat_oid)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	int ret;
 
 	ret = git_deflate_end_gently(stream);
 	if (ret != Z_OK)
 		return ret;
-	the_hash_algo->final_oid_fn(oid, c);
+	algo->final_oid_fn(oid, c);
+	if (compat && compat_c)
+		compat->final_oid_fn(compat_oid, compat_c);
 
 	return Z_OK;
 }
@@ -2129,7 +2148,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 
 	fd = start_loose_object_common(&tmp_file, filename.buf, flags,
 				       &stream, compressed, sizeof(compressed),
-				       &c, hdr, hdrlen);
+				       &c, NULL, hdr, hdrlen);
 	if (fd < 0)
 		return -1;
 
@@ -2139,14 +2158,14 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 	do {
 		unsigned char *in0 = stream.next_in;
 
-		ret = write_loose_object_common(&c, &stream, 1, in0, fd,
+		ret = write_loose_object_common(&c, NULL, &stream, 1, in0, fd,
 						compressed, sizeof(compressed));
 	} while (ret == Z_OK);
 
 	if (ret != Z_STREAM_END)
 		die(_("unable to deflate new object %s (%d)"), oid_to_hex(oid),
 		    ret);
-	ret = end_loose_object_common(&c, &stream, &parano_oid);
+	ret = end_loose_object_common(&c, NULL, &stream, &parano_oid, NULL);
 	if (ret != Z_OK)
 		die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid),
 		    ret);
@@ -2191,10 +2210,12 @@ static int freshen_packed_object(const struct object_id *oid)
 int stream_loose_object(struct input_stream *in_stream, size_t len,
 			struct object_id *oid)
 {
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
+	struct object_id compat_oid;
 	int fd, ret, err = 0, flush = 0;
 	unsigned char compressed[4096];
 	git_zstream stream;
-	git_hash_ctx c;
+	git_hash_ctx c, compat_c;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct strbuf filename = STRBUF_INIT;
 	int dirlen;
@@ -2218,7 +2239,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 	 */
 	fd = start_loose_object_common(&tmp_file, filename.buf, 0,
 				       &stream, compressed, sizeof(compressed),
-				       &c, hdr, hdrlen);
+				       &c, &compat_c, hdr, hdrlen);
 	if (fd < 0) {
 		err = -1;
 		goto cleanup;
@@ -2236,7 +2257,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 			if (in_stream->is_finished)
 				flush = 1;
 		}
-		ret = write_loose_object_common(&c, &stream, flush, in0, fd,
+		ret = write_loose_object_common(&c, &compat_c, &stream, flush, in0, fd,
 						compressed, sizeof(compressed));
 		/*
 		 * Unlike write_loose_object(), we do not have the entire
@@ -2259,7 +2280,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 	 */
 	if (ret != Z_STREAM_END)
 		die(_("unable to stream deflate new object (%d)"), ret);
-	ret = end_loose_object_common(&c, &stream, oid);
+	ret = end_loose_object_common(&c, &compat_c, &stream, oid, &compat_oid);
 	if (ret != Z_OK)
 		die(_("deflateEnd on stream object failed (%d)"), ret);
 	close_loose_object(fd, tmp_file.buf);
@@ -2286,6 +2307,8 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 	}
 
 	err = finalize_object_file(tmp_file.buf, filename.buf);
+	if (!err && compat)
+		err = repo_add_loose_object_map(the_repository, oid, &compat_oid);
 cleanup:
 	strbuf_release(&tmp_file);
 	strbuf_release(&filename);
@@ -2296,17 +2319,38 @@ int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
 			    unsigned flags)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct object_id compat_oid;
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen = sizeof(hdr);
 
+	/* Generate compat_oid */
+	if (compat) {
+		if (type == OBJ_BLOB)
+			hash_object_file(compat, buf, len, type, &compat_oid);
+		else {
+			struct strbuf converted = STRBUF_INIT;
+			convert_object_file(&converted, algo, compat,
+					    buf, len, type, 0);
+			hash_object_file(compat, converted.buf, converted.len,
+					 type, &compat_oid);
+			strbuf_release(&converted);
+		}
+	}
+
 	/* Normally if we have it in the pack then we do not bother writing
 	 * it out into .git/objects/??/?{38} file.
 	 */
-	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
-				  &hdrlen);
+	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		return 0;
-	return write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags);
+	if (write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags))
+		return -1;
+	if (compat)
+		return repo_add_loose_object_map(repo, oid, &compat_oid);
+	return 0;
 }
 
 int write_object_file_literally(const void *buf, unsigned long len,
@@ -2324,6 +2368,10 @@ int write_object_file_literally(const void *buf, unsigned long len,
 
 	if (!(flags & HASH_WRITE_OBJECT))
 		goto cleanup;
+	else if (the_repository->compat_hash_algo) {
+		status = -1;
+		goto cleanup;
+	}
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		goto cleanup;
 	status = write_loose_object(oid, header, hdrlen, buf, len, 0, 0);
@@ -2335,9 +2383,12 @@ int write_object_file_literally(const void *buf, unsigned long len,
 
 int force_object_loose(const struct object_id *oid, time_t mtime)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	void *buf;
 	unsigned long len;
 	struct object_info oi = OBJECT_INFO_INIT;
+	struct object_id compat_oid;
 	enum object_type type;
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen;
@@ -2350,8 +2401,15 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
 	oi.contentp = &buf;
 	if (oid_object_info_extended(the_repository, oid, &oi, 0))
 		return error(_("cannot read object for %s"), oid_to_hex(oid));
+	if (compat) {
+		if (repo_oid_to_algop(repo, oid, compat, &compat_oid))
+			return error(_("cannot map object %s to %s"),
+				     oid_to_hex(oid), compat->name);
+	}
 	hdrlen = format_object_header(hdr, sizeof(hdr), type, len);
 	ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0);
+	if (!ret && compat)
+		ret = repo_add_loose_object_map(the_repository, oid, &compat_oid);
 	free(buf);
 
 	return ret;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 10/32] bulk-checkin: Only accept blobs
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (8 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 09/32] object-file: Update the loose object map when writing loose objects Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 11/32] pack: Communicate the compat_oid through struct pack_idx_entry Eric W. Biederman
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

As the code is written today bulk_checkin only accepts blobs.  When
dealing with multiple hash algorithms it is necessary to distinguish
between blobs and object types that have embedded oids.  For object
that embed oids a completely new object needs to be generated to
compute the compatibility hash on.  For blobs however all that is
needed is to compute the compatibility hash on the same blob as the
default hash.

As the code will soon need the compatiblity hash from
a bulk checkin remove support for a bulk checking of
anything except blobs.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 bulk-checkin.c | 35 +++++++++++++++++------------------
 bulk-checkin.h |  6 +++---
 object-file.c  | 12 ++++++------
 3 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 73bff3a23d27..223562b4e748 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -155,10 +155,10 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id
  * status before calling us just in case we ask it to call us again
  * with a new pack.
  */
-static int stream_to_pack(struct bulk_checkin_packfile *state,
-			  git_hash_ctx *ctx, off_t *already_hashed_to,
-			  int fd, size_t size, enum object_type type,
-			  const char *path, unsigned flags)
+static int stream_blob_to_pack(struct bulk_checkin_packfile *state,
+			       git_hash_ctx *ctx, off_t *already_hashed_to,
+			       int fd, size_t size, const char *path,
+			       unsigned flags)
 {
 	git_zstream s;
 	unsigned char ibuf[16384];
@@ -170,7 +170,7 @@ static int stream_to_pack(struct bulk_checkin_packfile *state,
 
 	git_deflate_init(&s, pack_compression_level);
 
-	hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size);
+	hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, size);
 	s.next_out = obuf + hdrlen;
 	s.avail_out = sizeof(obuf) - hdrlen;
 
@@ -247,11 +247,10 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state,
 		die_errno("unable to write pack header");
 }
 
-static int deflate_to_pack(struct bulk_checkin_packfile *state,
-			   struct object_id *result_oid,
-			   int fd, size_t size,
-			   enum object_type type, const char *path,
-			   unsigned flags)
+static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
+				struct object_id *result_oid,
+				int fd, size_t size,
+				const char *path, unsigned flags)
 {
 	off_t seekback, already_hashed_to;
 	git_hash_ctx ctx;
@@ -265,7 +264,7 @@ static int deflate_to_pack(struct bulk_checkin_packfile *state,
 		return error("cannot find the current offset");
 
 	header_len = format_object_header((char *)obuf, sizeof(obuf),
-					  type, size);
+					  OBJ_BLOB, size);
 	the_hash_algo->init_fn(&ctx);
 	the_hash_algo->update_fn(&ctx, obuf, header_len);
 
@@ -282,8 +281,8 @@ static int deflate_to_pack(struct bulk_checkin_packfile *state,
 			idx->offset = state->offset;
 			crc32_begin(state->f);
 		}
-		if (!stream_to_pack(state, &ctx, &already_hashed_to,
-				    fd, size, type, path, flags))
+		if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
+					 fd, size, path, flags))
 			break;
 		/*
 		 * Writing this object to the current pack will make
@@ -350,12 +349,12 @@ void fsync_loose_object_bulk_checkin(int fd, const char *filename)
 	}
 }
 
-int index_bulk_checkin(struct object_id *oid,
-		       int fd, size_t size, enum object_type type,
-		       const char *path, unsigned flags)
+int index_blob_bulk_checkin(struct object_id *oid,
+			    int fd, size_t size,
+			    const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&bulk_checkin_packfile, oid, fd, size, type,
-				     path, flags);
+	int status = deflate_blob_to_pack(&bulk_checkin_packfile, oid, fd, size,
+					  path, flags);
 	if (!odb_transaction_nesting)
 		flush_bulk_checkin_packfile(&bulk_checkin_packfile);
 	return status;
diff --git a/bulk-checkin.h b/bulk-checkin.h
index 48fe9a6e9171..aa7286a7b3e1 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -9,9 +9,9 @@
 void prepare_loose_object_bulk_checkin(void);
 void fsync_loose_object_bulk_checkin(int fd, const char *filename);
 
-int index_bulk_checkin(struct object_id *oid,
-		       int fd, size_t size, enum object_type type,
-		       const char *path, unsigned flags);
+int index_blob_bulk_checkin(struct object_id *oid,
+			    int fd, size_t size,
+			    const char *path, unsigned flags);
 
 /*
  * Tell the object database to optimize for adding
diff --git a/object-file.c b/object-file.c
index 6a14b8875343..6cc4ae1fd957 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2587,11 +2587,11 @@ static int index_core(struct index_state *istate,
  * binary blobs, they generally do not want to get any conversion, and
  * callers should avoid this code path when filters are requested.
  */
-static int index_stream(struct object_id *oid, int fd, size_t size,
-			enum object_type type, const char *path,
-			unsigned flags)
+static int index_blob_stream(struct object_id *oid, int fd, size_t size,
+			     const char *path,
+			     unsigned flags)
 {
-	return index_bulk_checkin(oid, fd, size, type, path, flags);
+	return index_blob_bulk_checkin(oid, fd, size, path, flags);
 }
 
 int index_fd(struct index_state *istate, struct object_id *oid,
@@ -2613,8 +2613,8 @@ int index_fd(struct index_state *istate, struct object_id *oid,
 		ret = index_core(istate, oid, fd, xsize_t(st->st_size),
 				 type, path, flags);
 	else
-		ret = index_stream(oid, fd, xsize_t(st->st_size), type, path,
-				   flags);
+		ret = index_blob_stream(oid, fd, xsize_t(st->st_size), path,
+					flags);
 	close(fd);
 	return ret;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 11/32] pack: Communicate the compat_oid through struct pack_idx_entry
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (9 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 10/32] bulk-checkin: Only accept blobs Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 12/32] bulk-checkin: hash object with compatibility algorithm Eric W. Biederman
                   ` (23 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

Add compat_oid into struct pack_idx_entry to allow communicating the
the compat hash value of the objects being indexed to the code that
builds the indexes for a pack.

Having a mechanism that communicates the compat_oid from the code
building the pack is necessary for bulk-checkin, fast-import, and
index-pack.  Only pack-objects could rely on the existing
comaptibility mappings, but there is not point since the
other creators of indexes can't.

Unfortunately this adds a 4 byte hole into struct pack_idx_entry.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 pack.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pack.h b/pack.h
index 3ab9e3f60c0b..321d38374f70 100644
--- a/pack.h
+++ b/pack.h
@@ -75,6 +75,7 @@ struct pack_idx_header {
  */
 struct pack_idx_entry {
 	struct object_id oid;
+	struct object_id compat_oid;
 	uint32_t crc32;
 	off_t offset;
 };
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 12/32] bulk-checkin: hash object with compatibility algorithm
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (10 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 11/32] pack: Communicate the compat_oid through struct pack_idx_entry Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-11  6:17   ` Junio C Hamano
  2023-09-08 23:10 ` [PATCH 13/32] object-file: Add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
                   ` (22 subsequent siblings)
  34 siblings, 1 reply; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

Any time we write an object into the repository when we're in dual hash
mode, we need to compute both algorithms.  We already do this when we
write a loose object into the repository, but we also need to do so in
the other case we write an object, which is the bulk check-in code.

****

Write the compatibility hash into idx->compat_oid so it is available
for code that generates indexes that include the compatibilty
mappings.

--EWB

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 bulk-checkin.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 223562b4e748..3206412a19e0 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -156,7 +156,8 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id
  * with a new pack.
  */
 static int stream_blob_to_pack(struct bulk_checkin_packfile *state,
-			       git_hash_ctx *ctx, off_t *already_hashed_to,
+			       git_hash_ctx *ctx, git_hash_ctx *compat_ctx,
+			       off_t *already_hashed_to,
 			       int fd, size_t size, const char *path,
 			       unsigned flags)
 {
@@ -167,6 +168,7 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state,
 	int status = Z_OK;
 	int write_object = (flags & HASH_WRITE_OBJECT);
 	off_t offset = 0;
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
 
 	git_deflate_init(&s, pack_compression_level);
 
@@ -188,8 +190,11 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state,
 				size_t hsize = offset - *already_hashed_to;
 				if (rsize < hsize)
 					hsize = rsize;
-				if (hsize)
+				if (hsize) {
 					the_hash_algo->update_fn(ctx, ibuf, hsize);
+					if (compat)
+						compat->update_fn(compat_ctx, ibuf, hsize);
+				}
 				*already_hashed_to = offset;
 			}
 			s.next_in = ibuf;
@@ -253,11 +258,13 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				const char *path, unsigned flags)
 {
 	off_t seekback, already_hashed_to;
-	git_hash_ctx ctx;
+	git_hash_ctx ctx, compat_ctx;
 	unsigned char obuf[16384];
 	unsigned header_len;
 	struct hashfile_checkpoint checkpoint = {0};
 	struct pack_idx_entry *idx = NULL;
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
+	struct object_id compat_oid = {};
 
 	seekback = lseek(fd, 0, SEEK_CUR);
 	if (seekback == (off_t) -1)
@@ -267,6 +274,10 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 					  OBJ_BLOB, size);
 	the_hash_algo->init_fn(&ctx);
 	the_hash_algo->update_fn(&ctx, obuf, header_len);
+	if (compat) {
+		compat->init_fn(&compat_ctx);
+		compat->update_fn(&compat_ctx, obuf, header_len);
+	}
 
 	/* Note: idx is non-NULL when we are writing */
 	if ((flags & HASH_WRITE_OBJECT) != 0)
@@ -281,7 +292,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 			idx->offset = state->offset;
 			crc32_begin(state->f);
 		}
-		if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
+		if (!stream_blob_to_pack(state, &ctx, &compat_ctx,
+					 &already_hashed_to,
 					 fd, size, path, flags))
 			break;
 		/*
@@ -298,6 +310,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 			return error("cannot seek back");
 	}
 	the_hash_algo->final_oid_fn(result_oid, &ctx);
+	if (compat)
+		compat->final_oid_fn(&compat_oid, &compat_ctx);
 	if (!idx)
 		return 0;
 
@@ -308,6 +322,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 		free(idx);
 	} else {
 		oidcpy(&idx->oid, result_oid);
+		if (compat)
+			oidcpy(&idx->compat_oid, &compat_oid);
 		ALLOC_GROW(state->written,
 			   state->nr_written + 1,
 			   state->alloc_written);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 13/32] object-file: Add a compat_oid_in parameter to write_object_file_flags
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (11 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 12/32] bulk-checkin: hash object with compatibility algorithm Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 14/32] commit: write commits for both hashes Eric W. Biederman
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

To create the proper signatures for commit objects both versions of
the commit object need to be generated and signed.  After that it is
a waste to throw away the work of generating the compatibility hash
so update write_object_file_flags to take a compatibility hash input
parameter that it can use to skip the work of generating the
compatability hash.

Update the places that don't generate the compatability hash to
pass NULL so it is easy to tell write_object_file_flags should
not attempt to use their compatability hash.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 cache-tree.c      | 2 +-
 object-file.c     | 6 ++++--
 object-store-ll.h | 4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 641427ed410a..ddc7d3d86959 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -448,7 +448,7 @@ static int update_one(struct cache_tree *it,
 		hash_object_file(the_hash_algo, buffer.buf, buffer.len,
 				 OBJ_TREE, &it->oid);
 	} else if (write_object_file_flags(buffer.buf, buffer.len, OBJ_TREE,
-					   &it->oid, flags & WRITE_TREE_SILENT
+					   &it->oid, NULL, flags & WRITE_TREE_SILENT
 					   ? HASH_SILENT : 0)) {
 		strbuf_release(&buffer);
 		return -1;
diff --git a/object-file.c b/object-file.c
index 6cc4ae1fd957..fd420dd303df 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2317,7 +2317,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 
 int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
-			    unsigned flags)
+			    struct object_id *compat_oid_in, unsigned flags)
 {
 	struct repository *repo = the_repository;
 	const struct git_hash_algo *algo = repo->hash_algo;
@@ -2328,7 +2328,9 @@ int write_object_file_flags(const void *buf, unsigned long len,
 
 	/* Generate compat_oid */
 	if (compat) {
-		if (type == OBJ_BLOB)
+		if (compat_oid_in)
+			oidcpy(&compat_oid, compat_oid_in);
+		else if (type == OBJ_BLOB)
 			hash_object_file(compat, buf, len, type, &compat_oid);
 		else {
 			struct strbuf converted = STRBUF_INIT;
diff --git a/object-store-ll.h b/object-store-ll.h
index bc76d6bec80d..c5f2bb2fc2fe 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -255,11 +255,11 @@ void hash_object_file(const struct git_hash_algo *algo, const void *buf,
 
 int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
-			    unsigned flags);
+			    struct object_id *comapt_oid_in, unsigned flags);
 static inline int write_object_file(const void *buf, unsigned long len,
 				    enum object_type type, struct object_id *oid)
 {
-	return write_object_file_flags(buf, len, type, oid, 0);
+	return write_object_file_flags(buf, len, type, oid, NULL, 0);
 }
 
 int write_object_file_literally(const void *buf, unsigned long len,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 14/32] commit: write commits for both hashes
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (12 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 13/32] object-file: Add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-11  6:25   ` Junio C Hamano
  2023-09-08 23:10 ` [PATCH 15/32] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
                   ` (20 subsequent siblings)
  34 siblings, 1 reply; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree.  In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.

However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data.  While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.

Consequently, we don't want to use the standard mapping that occurs when
we write an object.  Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.

We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents.  If we're signing
the commit, we then sign both versions and append both signatures to
both buffers.  To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.

In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.

*****

Updated to use write_object_file_flags and repo_oid_to_algop

-- EWB

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 commit.c | 176 +++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 131 insertions(+), 45 deletions(-)

diff --git a/commit.c b/commit.c
index b3223478bc2a..522ebb4b3002 100644
--- a/commit.c
+++ b/commit.c
@@ -28,6 +28,7 @@
 #include "shallow.h"
 #include "tree.h"
 #include "hook.h"
+#include "object-file-convert.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
 
@@ -1100,12 +1101,11 @@ static const char *gpg_sig_headers[] = {
 	"gpgsig-sha256",
 };
 
-int sign_with_header(struct strbuf *buf, const char *keyid)
+static int add_commit_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
 {
-	struct strbuf sig = STRBUF_INIT;
 	int inspos, copypos;
 	const char *eoh;
-	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)];
+	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algo)];
 	int gpg_sig_header_len = strlen(gpg_sig_header);
 
 	/* find the end of the header */
@@ -1115,15 +1115,8 @@ int sign_with_header(struct strbuf *buf, const char *keyid)
 	else
 		inspos = eoh - buf->buf + 1;
 
-	if (!keyid || !*keyid)
-		keyid = get_signing_key();
-	if (sign_buffer(buf, &sig, keyid)) {
-		strbuf_release(&sig);
-		return -1;
-	}
-
-	for (copypos = 0; sig.buf[copypos]; ) {
-		const char *bol = sig.buf + copypos;
+	for (copypos = 0; sig->buf[copypos]; ) {
+		const char *bol = sig->buf + copypos;
 		const char *eol = strchrnul(bol, '\n');
 		int len = (eol - bol) + !!*eol;
 
@@ -1136,11 +1129,17 @@ int sign_with_header(struct strbuf *buf, const char *keyid)
 		inspos += len;
 		copypos += len;
 	}
-	strbuf_release(&sig);
 	return 0;
 }
 
-
+static int sign_commit_to_strbuf(struct strbuf *sig, struct strbuf *buf, const char *keyid)
+{
+	if (!keyid || !*keyid)
+		keyid = get_signing_key();
+	if (sign_buffer(buf, sig, keyid))
+		return -1;
+	return 0;
+}
 
 int parse_signed_commit(const struct commit *commit,
 			struct strbuf *payload, struct strbuf *signature,
@@ -1599,70 +1598,157 @@ N_("Warning: commit message did not conform to UTF-8.\n"
    "You may want to amend it after fixing the message, or set the config\n"
    "variable i18n.commitEncoding to the encoding your project uses.\n");
 
-int commit_tree_extended(const char *msg, size_t msg_len,
-			 const struct object_id *tree,
-			 struct commit_list *parents, struct object_id *ret,
-			 const char *author, const char *committer,
-			 const char *sign_commit,
-			 struct commit_extra_header *extra)
+static void write_commit_tree(struct strbuf *buffer, const char *msg, size_t msg_len,
+			      const struct object_id *tree,
+			      const struct object_id *parents, size_t parents_len,
+			      const char *author, const char *committer,
+			      struct commit_extra_header *extra)
 {
-	int result;
 	int encoding_is_utf8;
-	struct strbuf buffer;
-
-	assert_oid_type(tree, OBJ_TREE);
-
-	if (memchr(msg, '\0', msg_len))
-		return error("a NUL byte in commit log message not allowed.");
+	size_t i;
 
 	/* Not having i18n.commitencoding is the same as having utf-8 */
 	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
 
-	strbuf_init(&buffer, 8192); /* should avoid reallocs for the headers */
-	strbuf_addf(&buffer, "tree %s\n", oid_to_hex(tree));
+	strbuf_init(buffer, 8192); /* should avoid reallocs for the headers */
+	strbuf_addf(buffer, "tree %s\n", oid_to_hex(tree));
 
 	/*
 	 * NOTE! This ordering means that the same exact tree merged with a
 	 * different order of parents will be a _different_ changeset even
 	 * if everything else stays the same.
 	 */
-	while (parents) {
-		struct commit *parent = pop_commit(&parents);
-		strbuf_addf(&buffer, "parent %s\n",
-			    oid_to_hex(&parent->object.oid));
-	}
+	for (i = 0; i < parents_len; i++)
+		strbuf_addf(buffer, "parent %s\n", oid_to_hex(&parents[i]));
 
 	/* Person/date information */
 	if (!author)
 		author = git_author_info(IDENT_STRICT);
-	strbuf_addf(&buffer, "author %s\n", author);
+	strbuf_addf(buffer, "author %s\n", author);
 	if (!committer)
 		committer = git_committer_info(IDENT_STRICT);
-	strbuf_addf(&buffer, "committer %s\n", committer);
+	strbuf_addf(buffer, "committer %s\n", committer);
 	if (!encoding_is_utf8)
-		strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
+		strbuf_addf(buffer, "encoding %s\n", git_commit_encoding);
 
 	while (extra) {
-		add_extra_header(&buffer, extra);
+		add_extra_header(buffer, extra);
 		extra = extra->next;
 	}
-	strbuf_addch(&buffer, '\n');
+	strbuf_addch(buffer, '\n');
 
 	/* And add the comment */
-	strbuf_add(&buffer, msg, msg_len);
+	strbuf_add(buffer, msg, msg_len);
+}
 
-	/* And check the encoding */
-	if (encoding_is_utf8 && !verify_utf8(&buffer))
-		fprintf(stderr, _(commit_utf8_warn));
+int commit_tree_extended(const char *msg, size_t msg_len,
+			 const struct object_id *tree,
+			 struct commit_list *parents, struct object_id *ret,
+			 const char *author, const char *committer,
+			 const char *sign_commit,
+			 struct commit_extra_header *extra)
+{
+	struct repository *r = the_repository;
+	int result = 0;
+	int encoding_is_utf8;
+	struct strbuf buffer, compat_buffer;
+	struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
+	struct object_id *parent_buf = NULL;
+	struct object_id compat_oid = {};
+	size_t i, nparents;
+
+	/* Not having i18n.commitencoding is the same as having utf-8 */
+	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
+
+	assert_oid_type(tree, OBJ_TREE);
+
+	if (memchr(msg, '\0', msg_len))
+		return error("a NUL byte in commit log message not allowed.");
+
+	nparents = commit_list_count(parents);
+	parent_buf = xcalloc(nparents, sizeof(*parent_buf));
+	for (i = 0; i < nparents; i++) {
+		struct commit *parent = pop_commit(&parents);
+		oidcpy(&parent_buf[i], &parent->object.oid);
+	}
 
-	if (sign_commit && sign_with_header(&buffer, sign_commit)) {
+	/* should avoid reallocs for the headers */
+	strbuf_init(&buffer, 8192);
+	strbuf_init(&compat_buffer, 8192);
+
+	write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra);
+	if (sign_commit && sign_commit_to_strbuf(&sig, &buffer, sign_commit)) {
 		result = -1;
 		goto out;
 	}
+	if (r->compat_hash_algo) {
+		struct object_id mapped_tree;
+		struct object_id *mapped_parents = xcalloc(nparents, sizeof(*mapped_parents));
+		if (repo_oid_to_algop(r, tree, r->compat_hash_algo, &mapped_tree)) {
+			result = -1;
+			free(mapped_parents);
+			goto out;
+		}
+		for (i = 0; i < nparents; i++)
+			if (repo_oid_to_algop(r, &parent_buf[i], r->compat_hash_algo, &mapped_parents[i])) {
+				result = -1;
+				free(mapped_parents);
+				goto out;
+			}
+		write_commit_tree(&compat_buffer, msg, msg_len, &mapped_tree,
+				  mapped_parents, nparents, author, committer, extra);
+
+		hash_object_file(r->compat_hash_algo, compat_buffer.buf, compat_buffer.len,
+				 OBJ_COMMIT, &compat_oid);
 
-	result = write_object_file(buffer.buf, buffer.len, OBJ_COMMIT, ret);
+		if (sign_commit && sign_commit_to_strbuf(&compat_sig, &compat_buffer, sign_commit)) {
+			result = -1;
+			goto out;
+		}
+	}
+
+	if (sign_commit) {
+		struct sig_pairs {
+			struct strbuf *sig;
+			const struct git_hash_algo *algo;
+		} bufs [2] = {
+			{ &compat_sig, r->compat_hash_algo },
+			{ &sig, r->hash_algo },
+		};
+		int i;
+
+		/*
+		 * We write algorithms in the order they were implemented in
+		 * Git to produce a stable hash when multiple algorithms are
+		 * used.
+		 */
+		if (r->compat_hash_algo && hash_algo_by_ptr(bufs[0].algo) > hash_algo_by_ptr(bufs[1].algo))
+			SWAP(bufs[0], bufs[1]);
+
+		/*
+		 * We traverse each algorithm in order, and apply the signature
+		 * to each buffer.
+		 */
+		for (i = 0; i < ARRAY_SIZE(bufs); i++) {
+			if (!bufs[i].algo)
+				continue;
+			add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo);
+			if (r->compat_hash_algo)
+				add_commit_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
+		}
+	}
+
+	/* And check the encoding. */
+	if (encoding_is_utf8 && (!verify_utf8(&buffer) || !verify_utf8(&compat_buffer)))
+		fprintf(stderr, _(commit_utf8_warn));
+
+	result = write_object_file_flags(buffer.buf, buffer.len, OBJ_COMMIT,
+					 ret, &compat_oid, 0);
 out:
 	strbuf_release(&buffer);
+	strbuf_release(&compat_buffer);
+	strbuf_release(&sig);
+	strbuf_release(&compat_sig);
 	return result;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 15/32] cache: add a function to read an OID of a specific algorithm
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (13 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 14/32] commit: write commits for both hashes Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 16/32] object: Factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

Currently, we always read a object ID of the current algorithm with
oidread.  However, once we start converting objects, we'll need to
consider what happens when we want to read an object ID of a specific
algorithm, such as the compatibility algorithm.  To make this easier,
let's define oidread_algop, which specifies which algorithm we should
use for our object ID, and define oidread in terms of it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 hash.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hash.h b/hash.h
index 615ae0691d07..e064807c1733 100644
--- a/hash.h
+++ b/hash.h
@@ -73,10 +73,15 @@ static inline void oidclr(struct object_id *oid)
 	oid->algo = hash_algo_by_ptr(the_hash_algo);
 }
 
+static inline void oidread_algop(struct object_id *oid, const unsigned char *hash, const struct git_hash_algo *algop)
+{
+	memcpy(oid->hash, hash, algop->rawsz);
+	oid->algo = hash_algo_by_ptr(algop);
+}
+
 static inline void oidread(struct object_id *oid, const unsigned char *hash)
 {
-	memcpy(oid->hash, hash, the_hash_algo->rawsz);
-	oid->algo = hash_algo_by_ptr(the_hash_algo);
+	oidread_algop(oid, hash, the_hash_algo);
 }
 
 static inline int is_empty_blob_sha1(const unsigned char *sha1)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 16/32] object: Factor out parse_mode out of fast-import and tree-walk into in object.h
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (14 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 15/32] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 17/32] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

builtin/fast-import.c and tree-walk.c have almost identical version of
get_mode.  The two functions started out the same but have diverged
slightly.  The version in fast-import changed mode to a uint16_t to
save memory.  The version in tree-walk started erroring if no mode was
present.

As far as I can tell both of these changes are valid for both of the
callers, so add the both changes and place the common parsing helper
in object.h

Rename the helper from get_mode to parse_mode so it does not
conflict with another helper named get_mode in diff-no-index.c

This will be used shortly in a new helper decode_tree_entry_raw
which is used to compute cmpatibility objects as part of
the sha256 transition.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/fast-import.c | 18 ++----------------
 object.h              | 18 ++++++++++++++++++
 tree-walk.c           | 22 +++-------------------
 3 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 4dbb10aff3da..2c645fcfbe3f 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -1235,20 +1235,6 @@ static void *gfi_unpack_entry(
 	return unpack_entry(the_repository, p, oe->idx.offset, &type, sizep);
 }
 
-static const char *get_mode(const char *str, uint16_t *modep)
-{
-	unsigned char c;
-	uint16_t mode = 0;
-
-	while ((c = *str++) != ' ') {
-		if (c < '0' || c > '7')
-			return NULL;
-		mode = (mode << 3) + (c - '0');
-	}
-	*modep = mode;
-	return str;
-}
-
 static void load_tree(struct tree_entry *root)
 {
 	struct object_id *oid = &root->versions[1].oid;
@@ -1286,7 +1272,7 @@ static void load_tree(struct tree_entry *root)
 		t->entries[t->entry_count++] = e;
 
 		e->tree = NULL;
-		c = get_mode(c, &e->versions[1].mode);
+		c = parse_mode(c, &e->versions[1].mode);
 		if (!c)
 			die("Corrupt mode in %s", oid_to_hex(oid));
 		e->versions[0].mode = e->versions[1].mode;
@@ -2275,7 +2261,7 @@ static void file_change_m(const char *p, struct branch *b)
 	struct object_id oid;
 	uint16_t mode, inline_data = 0;
 
-	p = get_mode(p, &mode);
+	p = parse_mode(p, &mode);
 	if (!p)
 		die("Corrupt mode: %s", command_buf.buf);
 	switch (mode) {
diff --git a/object.h b/object.h
index 114d45954d08..70c8d4ae63dc 100644
--- a/object.h
+++ b/object.h
@@ -190,6 +190,24 @@ void *create_object(struct repository *r, const struct object_id *oid, void *obj
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
+
+static inline const char *parse_mode(const char *str, uint16_t *modep)
+{
+	unsigned char c;
+	unsigned int mode = 0;
+
+	if (*str == ' ')
+		return NULL;
+
+	while ((c = *str++) != ' ') {
+		if (c < '0' || c > '7')
+			return NULL;
+		mode = (mode << 3) + (c - '0');
+	}
+	*modep = mode;
+	return str;
+}
+
 /*
  * Returns the object, having parsed it to find out what it is.
  *
diff --git a/tree-walk.c b/tree-walk.c
index 29ead71be173..3af50a01c2c7 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -10,27 +10,11 @@
 #include "pathspec.h"
 #include "json-writer.h"
 
-static const char *get_mode(const char *str, unsigned int *modep)
-{
-	unsigned char c;
-	unsigned int mode = 0;
-
-	if (*str == ' ')
-		return NULL;
-
-	while ((c = *str++) != ' ') {
-		if (c < '0' || c > '7')
-			return NULL;
-		mode = (mode << 3) + (c - '0');
-	}
-	*modep = mode;
-	return str;
-}
-
 static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size, struct strbuf *err)
 {
 	const char *path;
-	unsigned int mode, len;
+	unsigned int len;
+	uint16_t mode;
 	const unsigned hashsz = the_hash_algo->rawsz;
 
 	if (size < hashsz + 3 || buf[size - (hashsz + 1)]) {
@@ -38,7 +22,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 		return -1;
 	}
 
-	path = get_mode(buf, &mode);
+	path = parse_mode(buf, &mode);
 	if (!path) {
 		strbuf_addstr(err, _("malformed mode in tree entry"));
 		return -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 17/32] object-file-convert: add a function to convert trees between algorithms
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (15 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 16/32] object: Factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 18/32] object-file-convert: convert commit objects when writing Eric W. Biederman
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

In the future, we're going to want to provide SHA-256 repositories that
have compatibility support for SHA-1 as well.  In order to do so, we'll
need to be able to convert tree objects from SHA-256 to SHA-1 by writing
a tree with each SHA-256 object ID mapped to a SHA-1 object ID.

We implement a function, convert_tree_object, that takes an existing
tree buffer and writes it to a new strbuf, converting between
algorithms.  Let's make this function generic, because while we only
need it to convert from the main algorithm to the compatibility
algorithm now, we may need to do the other way around in the future,
such as for transport.

We avoid reusing the code in decode_tree_entry because that code
normalizes data, and we don't want that here.  We want to produce a
complete round trip of data, so if, for example, the old entry had a
wrongly zero-padded mode, we'd want to preserve that when converting to
ensure a stable hash value.

****
- Removed the repository parameter to convert_tree_object
- Removed setting from and to defaults in convert_tree_object
- Replaced repo_map_object with oid_to_algop
- Replaced get_mode with parse_mode
- Made convert_tree_object static.
- Called convert_tree_object from convert_object_file.

-- EWB

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 51 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index e7c62434016d..f266c8c6cc95 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -1,8 +1,10 @@
 #include "git-compat-util.h"
 #include "gettext.h"
 #include "strbuf.h"
+#include "hex.h"
 #include "repository.h"
 #include "hash-ll.h"
+#include "hash.h"
 #include "object.h"
 #include "loose.h"
 #include "object-file-convert.h"
@@ -36,6 +38,51 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 	return 0;
 }
 
+static int decode_tree_entry_raw(struct object_id *oid, const char **path,
+				 size_t *len, const struct git_hash_algo *algo,
+				 const char *buf, unsigned long size)
+{
+	uint16_t mode;
+	const unsigned hashsz = algo->rawsz;
+
+	if (size < hashsz + 3 || buf[size - (hashsz + 1)]) {
+		return -1;
+	}
+
+	*path = parse_mode(buf, &mode);
+	if (!*path || !**path)
+		return -1;
+	*len = strlen(*path) + 1;
+
+	oidread_algop(oid, (const unsigned char *)*path + *len, algo);
+	return 0;
+}
+
+static int convert_tree_object(struct strbuf *out,
+			       const struct git_hash_algo *from,
+			       const struct git_hash_algo *to,
+			       const char *buffer, size_t size)
+{
+	const char *p = buffer, *end = buffer + size;
+
+	while (p < end) {
+		struct object_id entry_oid, mapped_oid;
+		const char *path = NULL;
+		size_t pathlen;
+
+		if (decode_tree_entry_raw(&entry_oid, &path, &pathlen, from, p,
+					  end - p))
+			return error(_("failed to decode tree entry"));
+		if (repo_oid_to_algop(the_repository, &entry_oid, to, &mapped_oid))
+			return error(_("failed to map tree entry for %s"), oid_to_hex(&entry_oid));
+		strbuf_add(out, p, path - p);
+		strbuf_add(out, path, pathlen);
+		strbuf_add(out, mapped_oid.hash, to->rawsz);
+		p = path + pathlen + from->rawsz;
+	}
+	return 0;
+}
+
 int convert_object_file(struct strbuf *outbuf,
 			const struct git_hash_algo *from,
 			const struct git_hash_algo *to,
@@ -50,8 +97,10 @@ int convert_object_file(struct strbuf *outbuf,
 		die("Refusing noop object file conversion");
 
 	switch (type) {
-	case OBJ_COMMIT:
 	case OBJ_TREE:
+		ret = convert_tree_object(outbuf, from, to, buf, len);
+		break;
+	case OBJ_COMMIT:
 	case OBJ_TAG:
 	default:
 		/* Not implemented yet, so fail. */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 18/32] object-file-convert: convert commit objects when writing
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (16 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 17/32] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 19/32] object-file-convert: convert tag commits " Eric W. Biederman
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

When writing a commit object in a repository with both SHA-1 and
SHA-256, we'll need to convert our commit objects so that we can write
the hash values for both into the repository.  To do so, let's add a
function to convert commit objects.

Read the commit object and map the tree value and any of the parent
values, and copy the rest of the commit through unmodified.  Note that
we don't need to modify the signature headers, because they are the same
under both algorithms.

****
- made static and moved to object-file-convert.c
- Renamed the variable compat_oid to mapped_oid for clarity
- Replaced repo_map_object with oid_to_algop
-- EWB

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 44 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/object-file-convert.c b/object-file-convert.c
index f266c8c6cc95..9c715a9864d5 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -83,6 +83,48 @@ static int convert_tree_object(struct strbuf *out,
 	return 0;
 }
 
+static int convert_commit_object(struct strbuf *out,
+				 const struct git_hash_algo *from,
+				 const struct git_hash_algo *to,
+				 const char *buffer, size_t size)
+{
+	const char *tail = buffer;
+	const char *bufptr = buffer;
+	const int tree_entry_len = from->hexsz + 5;
+	const int parent_entry_len = from->hexsz + 7;
+	struct object_id oid, mapped_oid;
+	const char *p;
+
+	tail += size;
+	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
+			bufptr[tree_entry_len] != '\n')
+		return error("bogus commit object");
+	if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0)
+		return error("bad tree pointer");
+
+	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+		return error("unable to map tree %s in commit object",
+			     oid_to_hex(&oid));
+	strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid));
+	bufptr = p + 1;
+
+	while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) {
+		if (tail <= bufptr + parent_entry_len + 1 ||
+		    parse_oid_hex_algop(bufptr + 7, &oid, &p, from) ||
+		    *p != '\n')
+			return error("bad parents in commit");
+
+		if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+			return error("unable to map parent %s in commit object",
+				     oid_to_hex(&oid));
+
+		strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid));
+		bufptr = p + 1;
+	}
+	strbuf_add(out, bufptr, tail - bufptr);
+	return 0;
+}
+
 int convert_object_file(struct strbuf *outbuf,
 			const struct git_hash_algo *from,
 			const struct git_hash_algo *to,
@@ -101,6 +143,8 @@ int convert_object_file(struct strbuf *outbuf,
 		ret = convert_tree_object(outbuf, from, to, buf, len);
 		break;
 	case OBJ_COMMIT:
+		ret = convert_commit_object(outbuf, from, to, buf, len);
+		break;
 	case OBJ_TAG:
 	default:
 		/* Not implemented yet, so fail. */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 19/32] object-file-convert: convert tag commits when writing
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (17 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 18/32] object-file-convert: convert commit objects when writing Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 20/32] builtin/cat-file: Let the oid determine the output algorithm Eric W. Biederman
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

When writing a tag object in a repository with both SHA-1 and SHA-256,
we'll need to convert our commit objects so that we can write the hash
values for both into the repository.  To do so, let's add a function to
convert tag objects.

Note that signatures for tag objects in the current algorithm trail the
message, and those for the alternate algorithm are in headers.
Therefore, we parse the tag object for both a trailing signature and a
header and then, when writing the other format, swap the two around.

We expose the add_commit_signature function, which we rename now that it
is useful for tags as well, and use it to add the header.

****
- Moved convert_tag_object into object-file-convert.c and
  made it static
- Adjusted how convert_object_file calls convert_tag_object
--EWB

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 commit.c              |  6 +++---
 commit.h              |  1 +
 object-file-convert.c | 50 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index 522ebb4b3002..54f19ed0328c 100644
--- a/commit.c
+++ b/commit.c
@@ -1101,7 +1101,7 @@ static const char *gpg_sig_headers[] = {
 	"gpgsig-sha256",
 };
 
-static int add_commit_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
+int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
 {
 	int inspos, copypos;
 	const char *eoh;
@@ -1732,9 +1732,9 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 		for (i = 0; i < ARRAY_SIZE(bufs); i++) {
 			if (!bufs[i].algo)
 				continue;
-			add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo);
+			add_header_signature(&buffer, bufs[i].sig, bufs[i].algo);
 			if (r->compat_hash_algo)
-				add_commit_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
+				add_header_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
 		}
 	}
 
diff --git a/commit.h b/commit.h
index 28928833c544..03edcec0129f 100644
--- a/commit.h
+++ b/commit.h
@@ -370,5 +370,6 @@ int parse_buffer_signed_by_header(const char *buffer,
 				  struct strbuf *payload,
 				  struct strbuf *signature,
 				  const struct git_hash_algo *algop);
+int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo);
 
 #endif /* COMMIT_H */
diff --git a/object-file-convert.c b/object-file-convert.c
index 9c715a9864d5..d381d3d2ea65 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -7,6 +7,8 @@
 #include "hash.h"
 #include "object.h"
 #include "loose.h"
+#include "commit.h"
+#include "gpg-interface.h"
 #include "object-file-convert.h"
 
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
@@ -125,6 +127,52 @@ static int convert_commit_object(struct strbuf *out,
 	return 0;
 }
 
+static int convert_tag_object(struct strbuf *out,
+			      const struct git_hash_algo *from,
+			      const struct git_hash_algo *to,
+			      const char *buffer, size_t size)
+{
+	struct strbuf payload = STRBUF_INIT, temp = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT;
+	size_t payload_size;
+	struct object_id oid, mapped_oid;
+	const char *p;
+
+	/* Add some slop for longer signature header in the new algorithm. */
+	strbuf_grow(out, size + 7);
+
+	/* Is there a signature for our algorithm? */
+	payload_size = parse_signed_buffer(buffer, size);
+	strbuf_add(&payload, buffer, payload_size);
+	if (payload_size != size) {
+		/* Yes, there is. */
+		strbuf_add(&oursig, buffer + payload_size, size - payload_size);
+	}
+	/* Now, is there a signature for the other algorithm? */
+	if (parse_buffer_signed_by_header(payload.buf, payload.len, &temp, &othersig, to)) {
+		/* Yes, there is. */
+		strbuf_swap(&payload, &temp);
+		strbuf_release(&temp);
+	}
+
+	/*
+	 * Our payload is now in payload and we may have up to two signatrures
+	 * in oursig and othersig.
+	 */
+	if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n')
+		return error("bogus tag object");
+	if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0)
+		return error("bad tag object ID");
+	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+		return error("unable to map tree %s in tag object",
+			     oid_to_hex(&oid));
+	strbuf_addf(out, "object %s\n", oid_to_hex(&mapped_oid));
+	strbuf_add(out, p, payload.len - (p - payload.buf));
+	strbuf_addbuf(out, &othersig);
+	if (oursig.len)
+		add_header_signature(out, &oursig, from);
+	return 0;
+}
+
 int convert_object_file(struct strbuf *outbuf,
 			const struct git_hash_algo *from,
 			const struct git_hash_algo *to,
@@ -146,6 +194,8 @@ int convert_object_file(struct strbuf *outbuf,
 		ret = convert_commit_object(outbuf, from, to, buf, len);
 		break;
 	case OBJ_TAG:
+		ret = convert_tag_object(outbuf, from, to, buf, len);
+		break;
 	default:
 		/* Not implemented yet, so fail. */
 		ret = -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 20/32] builtin/cat-file:  Let the oid determine the output algorithm
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (18 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 19/32] object-file-convert: convert tag commits " Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 21/32] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

Use GET_OID_UNTRANSLATED when calling get_oid_with_context.  This
implements the semi-obvious behaviour that specifying a sha1 oid shows
the output for a sha1 encoded object, and specifying a sha256 oid
shows the output for a sha256 encoded object.

This is useful for testing the the conversion of an object to an
equivalent object encoded with a different hash function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/cat-file.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 694c8538df2f..7c9600292376 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -107,7 +107,10 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct object_info oi = OBJECT_INFO_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
-	unsigned get_oid_flags = GET_OID_RECORD_PATH | GET_OID_ONLY_TO_DIE;
+	unsigned get_oid_flags =
+		GET_OID_RECORD_PATH |
+		GET_OID_ONLY_TO_DIE |
+		GET_OID_UNTRANSLATED;
 	const char *path = force_path;
 	const int opt_cw = (opt == 'c' || opt == 'w');
 	if (!path && opt_cw)
@@ -223,7 +226,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 								     &size);
 				const char *target;
 				if (!skip_prefix(buffer, "object ", &target) ||
-				    get_oid_hex(target, &blob_oid))
+				    get_oid_hex_algop(target, &blob_oid,
+						      &hash_algos[oid.algo]))
 					die("%s not a valid tag", oid_to_hex(&oid));
 				free(buffer);
 			} else
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 21/32] tree-walk: init_tree_desc take an oid to get the hash algorithm
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (19 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 20/32] builtin/cat-file: Let the oid determine the output algorithm Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 22/32] object-file: Handle compat objects in check_object_signature Eric W. Biederman
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.

Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.

Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.

Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 archive.c              |  3 ++-
 builtin/am.c           |  6 +++---
 builtin/checkout.c     |  8 +++++---
 builtin/clone.c        |  2 +-
 builtin/commit.c       |  2 +-
 builtin/grep.c         |  8 ++++----
 builtin/merge.c        |  3 ++-
 builtin/pack-objects.c |  6 ++++--
 builtin/read-tree.c    |  2 +-
 builtin/stash.c        |  5 +++--
 cache-tree.c           |  2 +-
 delta-islands.c        |  2 +-
 diff-lib.c             |  2 +-
 fsck.c                 |  6 ++++--
 http-push.c            |  2 +-
 list-objects.c         |  2 +-
 match-trees.c          |  4 ++--
 merge-ort.c            | 11 ++++++-----
 merge-recursive.c      |  2 +-
 merge.c                |  3 ++-
 pack-bitmap-write.c    |  2 +-
 packfile.c             |  3 ++-
 reflog.c               |  2 +-
 revision.c             |  4 ++--
 tree-walk.c            | 36 +++++++++++++++++++++---------------
 tree-walk.h            |  7 +++++--
 tree.c                 |  2 +-
 walker.c               |  2 +-
 28 files changed, 80 insertions(+), 59 deletions(-)

diff --git a/archive.c b/archive.c
index ca11db185b15..b10269aee7be 100644
--- a/archive.c
+++ b/archive.c
@@ -339,7 +339,8 @@ int write_archive_entries(struct archiver_args *args,
 		opts.src_index = args->repo->index;
 		opts.dst_index = args->repo->index;
 		opts.fn = oneway_merge;
-		init_tree_desc(&t, args->tree->buffer, args->tree->size);
+		init_tree_desc(&t, &args->tree->object.oid,
+			       args->tree->buffer, args->tree->size);
 		if (unpack_trees(1, &t, &opts))
 			return -1;
 		git_attr_set_direction(GIT_ATTR_INDEX);
diff --git a/builtin/am.c b/builtin/am.c
index 8bde034fae68..4dfd714b910e 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1991,8 +1991,8 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
 	opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = twoway_merge;
-	init_tree_desc(&t[0], head->buffer, head->size);
-	init_tree_desc(&t[1], remote->buffer, remote->size);
+	init_tree_desc(&t[0], &head->object.oid, head->buffer, head->size);
+	init_tree_desc(&t[1], &remote->object.oid, remote->buffer, remote->size);
 
 	if (unpack_trees(2, t, &opts)) {
 		rollback_lock_file(&lock_file);
@@ -2026,7 +2026,7 @@ static int merge_tree(struct tree *tree)
 	opts.dst_index = &the_index;
 	opts.merge = 1;
 	opts.fn = oneway_merge;
-	init_tree_desc(&t[0], tree->buffer, tree->size);
+	init_tree_desc(&t[0], &tree->object.oid, tree->buffer, tree->size);
 
 	if (unpack_trees(1, t, &opts)) {
 		rollback_lock_file(&lock_file);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f53612f46870..03eff73fd031 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -701,7 +701,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
 	parse_tree(tree);
-	init_tree_desc(&tree_desc, tree->buffer, tree->size);
+	init_tree_desc(&tree_desc, &tree->object.oid, tree->buffer, tree->size);
 	switch (unpack_trees(1, &tree_desc, &opts)) {
 	case -2:
 		*writeout_error = 1;
@@ -815,10 +815,12 @@ static int merge_working_tree(const struct checkout_opts *opts,
 			die(_("unable to parse commit %s"),
 				oid_to_hex(old_commit_oid));
 
-		init_tree_desc(&trees[0], tree->buffer, tree->size);
+		init_tree_desc(&trees[0], &tree->object.oid,
+			       tree->buffer, tree->size);
 		parse_tree(new_tree);
 		tree = new_tree;
-		init_tree_desc(&trees[1], tree->buffer, tree->size);
+		init_tree_desc(&trees[1], &tree->object.oid,
+			       tree->buffer, tree->size);
 
 		ret = unpack_trees(2, trees, &topts);
 		clear_unpack_trees_porcelain(&topts);
diff --git a/builtin/clone.c b/builtin/clone.c
index c6357af94989..79ceefb93995 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -737,7 +737,7 @@ static int checkout(int submodule_progress, int filter_submodules)
 	if (!tree)
 		die(_("unable to parse commit %s"), oid_to_hex(&oid));
 	parse_tree(tree);
-	init_tree_desc(&t, tree->buffer, tree->size);
+	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts) < 0)
 		die(_("unable to checkout working tree"));
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 7da5f924484d..537319932b65 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -340,7 +340,7 @@ static void create_base_index(const struct commit *current_head)
 	if (!tree)
 		die(_("failed to unpack HEAD tree object"));
 	parse_tree(tree);
-	init_tree_desc(&t, tree->buffer, tree->size);
+	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts))
 		exit(128); /* We've already reported the error, finish dying */
 }
diff --git a/builtin/grep.c b/builtin/grep.c
index 50e712a18479..0c2b8a376f8e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -530,7 +530,7 @@ static int grep_submodule(struct grep_opt *opt,
 		strbuf_addstr(&base, filename);
 		strbuf_addch(&base, '/');
 
-		init_tree_desc(&tree, data, size);
+		init_tree_desc(&tree, oid, data, size);
 		hit = grep_tree(&subopt, pathspec, &tree, &base, base.len,
 				object_type == OBJ_COMMIT);
 		strbuf_release(&base);
@@ -574,7 +574,7 @@ static int grep_cache(struct grep_opt *opt,
 
 			data = repo_read_object_file(the_repository, &ce->oid,
 						     &type, &size);
-			init_tree_desc(&tree, data, size);
+			init_tree_desc(&tree, &ce->oid, data, size);
 
 			hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0);
 			strbuf_setlen(&name, name_base_len);
@@ -670,7 +670,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 				    oid_to_hex(&entry.oid));
 
 			strbuf_addch(base, '/');
-			init_tree_desc(&sub, data, size);
+			init_tree_desc(&sub, &entry.oid, data, size);
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
@@ -714,7 +714,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 			strbuf_add(&base, name, len);
 			strbuf_addch(&base, ':');
 		}
-		init_tree_desc(&tree, data, size);
+		init_tree_desc(&tree, &obj->oid, data, size);
 		hit = grep_tree(opt, pathspec, &tree, &base, base.len,
 				obj->type == OBJ_COMMIT);
 		strbuf_release(&base);
diff --git a/builtin/merge.c b/builtin/merge.c
index de68910177fb..718165d45917 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -704,7 +704,8 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
 	cache_tree_free(&the_index.cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		parse_tree(trees[i]);
-		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
+		init_tree_desc(t+i, &trees[i]->object.oid,
+			       trees[i]->buffer, trees[i]->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
 		return -1;
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d2a162d52804..d34902002656 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1756,7 +1756,8 @@ static void add_pbase_object(struct tree_desc *tree,
 			tree = pbase_tree_get(&entry.oid);
 			if (!tree)
 				return;
-			init_tree_desc(&sub, tree->tree_data, tree->tree_size);
+			init_tree_desc(&sub, &tree->oid,
+				       tree->tree_data, tree->tree_size);
 
 			add_pbase_object(&sub, down, downlen, fullname);
 			pbase_tree_put(tree);
@@ -1816,7 +1817,8 @@ static void add_preferred_base_object(const char *name)
 		}
 		else {
 			struct tree_desc tree;
-			init_tree_desc(&tree, it->pcache.tree_data, it->pcache.tree_size);
+			init_tree_desc(&tree, &it->pcache.oid,
+				       it->pcache.tree_data, it->pcache.tree_size);
 			add_pbase_object(&tree, name, cmplen, name);
 		}
 	}
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 1fec702a04fa..24d6d156d3a2 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -264,7 +264,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
 		parse_tree(tree);
-		init_tree_desc(t+i, tree->buffer, tree->size);
+		init_tree_desc(t+i, &tree->object.oid, tree->buffer, tree->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
 		return 128;
diff --git a/builtin/stash.c b/builtin/stash.c
index fe64cde9ce30..9ee52af4d28e 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -285,7 +285,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	if (parse_tree(tree))
 		return -1;
 
-	init_tree_desc(t, tree->buffer, tree->size);
+	init_tree_desc(t, &tree->object.oid, tree->buffer, tree->size);
 
 	opts.head_idx = 1;
 	opts.src_index = &the_index;
@@ -871,7 +871,8 @@ static void diff_include_untracked(const struct stash_info *info, struct diff_op
 		tree[i] = parse_tree_indirect(oid[i]);
 		if (parse_tree(tree[i]) < 0)
 			die(_("failed to parse tree"));
-		init_tree_desc(&tree_desc[i], tree[i]->buffer, tree[i]->size);
+		init_tree_desc(&tree_desc[i], &tree[i]->object.oid,
+			       tree[i]->buffer, tree[i]->size);
 	}
 
 	unpack_tree_opt.head_idx = -1;
diff --git a/cache-tree.c b/cache-tree.c
index ddc7d3d86959..334973a01cee 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -770,7 +770,7 @@ static void prime_cache_tree_rec(struct repository *r,
 
 	oidcpy(&it->oid, &tree->object.oid);
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	cnt = 0;
 	while (tree_entry(&desc, &entry)) {
 		if (!S_ISDIR(entry.mode))
diff --git a/delta-islands.c b/delta-islands.c
index 5de5759f3f13..1ff3506b10f2 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -289,7 +289,7 @@ void resolve_tree_islands(struct repository *r,
 		if (!tree || parse_tree(tree) < 0)
 			die(_("bad tree object %s"), oid_to_hex(&ent->idx.oid));
 
-		init_tree_desc(&desc, tree->buffer, tree->size);
+		init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 		while (tree_entry(&desc, &entry)) {
 			struct object *obj;
 
diff --git a/diff-lib.c b/diff-lib.c
index 6b0c6a7180cc..add323f5628d 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -558,7 +558,7 @@ static int diff_cache(struct rev_info *revs,
 	opts.pathspec = &revs->diffopt.pathspec;
 	opts.pathspec->recursive = 1;
 
-	init_tree_desc(&t, tree->buffer, tree->size);
+	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	return unpack_trees(1, &t, &opts);
 }
 
diff --git a/fsck.c b/fsck.c
index 2b1e348005b7..6b492a48da82 100644
--- a/fsck.c
+++ b/fsck.c
@@ -313,7 +313,8 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 		return -1;
 
 	name = fsck_get_object_name(options, &tree->object.oid);
-	if (init_tree_desc_gently(&desc, tree->buffer, tree->size, 0))
+	if (init_tree_desc_gently(&desc, &tree->object.oid,
+				  tree->buffer, tree->size, 0))
 		return -1;
 	while (tree_entry_gently(&desc, &entry)) {
 		struct object *obj;
@@ -583,7 +584,8 @@ static int fsck_tree(const struct object_id *tree_oid,
 	const char *o_name;
 	struct name_stack df_dup_candidates = { NULL };
 
-	if (init_tree_desc_gently(&desc, buffer, size, TREE_DESC_RAW_MODES)) {
+	if (init_tree_desc_gently(&desc, tree_oid, buffer, size,
+				  TREE_DESC_RAW_MODES)) {
 		retval += report(options, tree_oid, OBJ_TREE,
 				 FSCK_MSG_BAD_TREE,
 				 "cannot be parsed as a tree");
diff --git a/http-push.c b/http-push.c
index a704f490fdb2..81c35b5e96f7 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1308,7 +1308,7 @@ static struct object_list **process_tree(struct tree *tree,
 	obj->flags |= SEEN;
 	p = add_one_object(obj, p);
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry))
 		switch (object_type(entry.mode)) {
diff --git a/list-objects.c b/list-objects.c
index e60a6cd5b46e..312335c8a7f2 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -97,7 +97,7 @@ static void process_tree_contents(struct traversal_context *ctx,
 	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
 		all_entries_interesting : entry_not_interesting;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
 		if (match != all_entries_interesting) {
diff --git a/match-trees.c b/match-trees.c
index 0885ac681cd5..3412b6a1401d 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -63,7 +63,7 @@ static void *fill_tree_desc_strict(struct tree_desc *desc,
 		die("unable to read tree (%s)", oid_to_hex(hash));
 	if (type != OBJ_TREE)
 		die("%s is not a tree", oid_to_hex(hash));
-	init_tree_desc(desc, buffer, size);
+	init_tree_desc(desc, hash, buffer, size);
 	return buffer;
 }
 
@@ -194,7 +194,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	buf = repo_read_object_file(the_repository, oid1, &type, &sz);
 	if (!buf)
 		die("cannot read tree %s", oid_to_hex(oid1));
-	init_tree_desc(&desc, buf, sz);
+	init_tree_desc(&desc, oid1, buf, sz);
 
 	rewrite_here = NULL;
 	while (desc.size) {
diff --git a/merge-ort.c b/merge-ort.c
index 8631c997002d..3a5729c91e48 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1679,9 +1679,10 @@ static int collect_merge_info(struct merge_options *opt,
 	parse_tree(merge_base);
 	parse_tree(side1);
 	parse_tree(side2);
-	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
-	init_tree_desc(t + 1, side1->buffer, side1->size);
-	init_tree_desc(t + 2, side2->buffer, side2->size);
+	init_tree_desc(t + 0, &merge_base->object.oid,
+		       merge_base->buffer, merge_base->size);
+	init_tree_desc(t + 1, &side1->object.oid, side1->buffer, side1->size);
+	init_tree_desc(t + 2, &side2->object.oid, side2->buffer, side2->size);
 
 	trace2_region_enter("merge", "traverse_trees", opt->repo);
 	ret = traverse_trees(NULL, 3, t, &info);
@@ -4400,9 +4401,9 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.fn = twoway_merge;
 	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
 	parse_tree(prev);
-	init_tree_desc(&trees[0], prev->buffer, prev->size);
+	init_tree_desc(&trees[0], &prev->object.oid, prev->buffer, prev->size);
 	parse_tree(next);
-	init_tree_desc(&trees[1], next->buffer, next->size);
+	init_tree_desc(&trees[1], &next->object.oid, next->buffer, next->size);
 
 	ret = unpack_trees(2, trees, &unpack_opts);
 	clear_unpack_trees_porcelain(&unpack_opts);
diff --git a/merge-recursive.c b/merge-recursive.c
index 6a4081bb0f52..93df9eecdd95 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -411,7 +411,7 @@ static inline int merge_detect_rename(struct merge_options *opt)
 static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree)
 {
 	parse_tree(tree);
-	init_tree_desc(desc, tree->buffer, tree->size);
+	init_tree_desc(desc, &tree->object.oid, tree->buffer, tree->size);
 }
 
 static int unpack_trees_start(struct merge_options *opt,
diff --git a/merge.c b/merge.c
index b60925459c29..86179c34102d 100644
--- a/merge.c
+++ b/merge.c
@@ -81,7 +81,8 @@ int checkout_fast_forward(struct repository *r,
 	}
 	for (i = 0; i < nr_trees; i++) {
 		parse_tree(trees[i]);
-		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
+		init_tree_desc(t+i, &trees[i]->object.oid,
+			       trees[i]->buffer, trees[i]->size);
 	}
 
 	memset(&opts, 0, sizeof(opts));
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index f6757c3cbf20..9211e08f0127 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -366,7 +366,7 @@ static int fill_bitmap_tree(struct bitmap *bitmap,
 	if (parse_tree(tree) < 0)
 		die("unable to load tree object %s",
 		    oid_to_hex(&tree->object.oid));
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
diff --git a/packfile.c b/packfile.c
index 9cc0a2e37a83..1fae0fcdd9e7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2250,7 +2250,8 @@ static int add_promisor_object(const struct object_id *oid,
 		struct tree *tree = (struct tree *)obj;
 		struct tree_desc desc;
 		struct name_entry entry;
-		if (init_tree_desc_gently(&desc, tree->buffer, tree->size, 0))
+		if (init_tree_desc_gently(&desc, &tree->object.oid,
+					  tree->buffer, tree->size, 0))
 			/*
 			 * Error messages are given when packs are
 			 * verified, so do not print any here.
diff --git a/reflog.c b/reflog.c
index 9ad50e7d93e4..c6992a19268f 100644
--- a/reflog.c
+++ b/reflog.c
@@ -40,7 +40,7 @@ static int tree_is_complete(const struct object_id *oid)
 		tree->buffer = data;
 		tree->size = size;
 	}
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	complete = 1;
 	while (tree_entry(&desc, &entry)) {
 		if (!repo_has_object_file(the_repository, &entry.oid) ||
diff --git a/revision.c b/revision.c
index 2f4c53ea207b..a60dfc23a2a5 100644
--- a/revision.c
+++ b/revision.c
@@ -82,7 +82,7 @@ static void mark_tree_contents_uninteresting(struct repository *r,
 	if (parse_tree_gently(tree, 1) < 0)
 		return;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
@@ -189,7 +189,7 @@ static void add_children_by_path(struct repository *r,
 	if (parse_tree_gently(tree, 1) < 0)
 		return;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
diff --git a/tree-walk.c b/tree-walk.c
index 3af50a01c2c7..0b44ec7c75ff 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -15,7 +15,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 	const char *path;
 	unsigned int len;
 	uint16_t mode;
-	const unsigned hashsz = the_hash_algo->rawsz;
+	const unsigned hashsz = desc->algo->rawsz;
 
 	if (size < hashsz + 3 || buf[size - (hashsz + 1)]) {
 		strbuf_addstr(err, _("too-short tree object"));
@@ -37,15 +37,19 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 	desc->entry.path = path;
 	desc->entry.mode = (desc->flags & TREE_DESC_RAW_MODES) ? mode : canon_mode(mode);
 	desc->entry.pathlen = len - 1;
-	oidread(&desc->entry.oid, (const unsigned char *)path + len);
+	oidread_algop(&desc->entry.oid, (const unsigned char *)path + len,
+		      desc->algo);
 
 	return 0;
 }
 
-static int init_tree_desc_internal(struct tree_desc *desc, const void *buffer,
-				   unsigned long size, struct strbuf *err,
+static int init_tree_desc_internal(struct tree_desc *desc,
+				   const struct object_id *oid,
+				   const void *buffer, unsigned long size,
+				   struct strbuf *err,
 				   enum tree_desc_flags flags)
 {
+	desc->algo = (oid && oid->algo) ? &hash_algos[oid->algo] : the_hash_algo;
 	desc->buffer = buffer;
 	desc->size = size;
 	desc->flags = flags;
@@ -54,19 +58,21 @@ static int init_tree_desc_internal(struct tree_desc *desc, const void *buffer,
 	return 0;
 }
 
-void init_tree_desc(struct tree_desc *desc, const void *buffer, unsigned long size)
+void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid,
+		    const void *buffer, unsigned long size)
 {
 	struct strbuf err = STRBUF_INIT;
-	if (init_tree_desc_internal(desc, buffer, size, &err, 0))
+	if (init_tree_desc_internal(desc, tree_oid, buffer, size, &err, 0))
 		die("%s", err.buf);
 	strbuf_release(&err);
 }
 
-int init_tree_desc_gently(struct tree_desc *desc, const void *buffer, unsigned long size,
+int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid,
+			  const void *buffer, unsigned long size,
 			  enum tree_desc_flags flags)
 {
 	struct strbuf err = STRBUF_INIT;
-	int result = init_tree_desc_internal(desc, buffer, size, &err, flags);
+	int result = init_tree_desc_internal(desc, oid, buffer, size, &err, flags);
 	if (result)
 		error("%s", err.buf);
 	strbuf_release(&err);
@@ -85,7 +91,7 @@ void *fill_tree_descriptor(struct repository *r,
 		if (!buf)
 			die("unable to read tree %s", oid_to_hex(oid));
 	}
-	init_tree_desc(desc, buf, size);
+	init_tree_desc(desc, oid, buf, size);
 	return buf;
 }
 
@@ -102,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a)
 static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err)
 {
 	const void *buf = desc->buffer;
-	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz;
+	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + desc->algo->rawsz;
 	unsigned long size = desc->size;
 	unsigned long len = end - (const unsigned char *)buf;
 
@@ -611,7 +617,7 @@ int get_tree_entry(struct repository *r,
 		retval = -1;
 	} else {
 		struct tree_desc t;
-		init_tree_desc(&t, tree, size);
+		init_tree_desc(&t, tree_oid, tree, size);
 		retval = find_tree_entry(r, &t, name, oid, mode);
 	}
 	free(tree);
@@ -654,7 +660,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 	struct tree_desc t;
 	int follows_remaining = GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS;
 
-	init_tree_desc(&t, NULL, 0UL);
+	init_tree_desc(&t, NULL, NULL, 0UL);
 	strbuf_addstr(&namebuf, name);
 	oidcpy(&current_tree_oid, tree_oid);
 
@@ -690,7 +696,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 				goto done;
 
 			/* descend */
-			init_tree_desc(&t, tree, size);
+			init_tree_desc(&t, &current_tree_oid, tree, size);
 		}
 
 		/* Handle symlinks to e.g. a//b by removing leading slashes */
@@ -724,7 +730,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 			free(parent->tree);
 			parents_nr--;
 			parent = &parents[parents_nr - 1];
-			init_tree_desc(&t, parent->tree, parent->size);
+			init_tree_desc(&t, &parent->oid, parent->tree, parent->size);
 			strbuf_remove(&namebuf, 0, remainder ? 3 : 2);
 			continue;
 		}
@@ -804,7 +810,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 			contents_start = contents;
 
 			parent = &parents[parents_nr - 1];
-			init_tree_desc(&t, parent->tree, parent->size);
+			init_tree_desc(&t, &parent->oid, parent->tree, parent->size);
 			strbuf_splice(&namebuf, 0, len,
 				      contents_start, link_len);
 			if (remainder)
diff --git a/tree-walk.h b/tree-walk.h
index 74cdceb3fed2..cf54d01019e9 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -26,6 +26,7 @@ struct name_entry {
  * A semi-opaque data structure used to maintain the current state of the walk.
  */
 struct tree_desc {
+	const struct git_hash_algo *algo;
 	/*
 	 * pointer into the memory representation of the tree. It always
 	 * points at the current entry being visited.
@@ -85,9 +86,11 @@ int update_tree_entry_gently(struct tree_desc *);
  * size parameters are assumed to be the same as the buffer and size
  * members of `struct tree`.
  */
-void init_tree_desc(struct tree_desc *desc, const void *buf, unsigned long size);
+void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid,
+		    const void *buf, unsigned long size);
 
-int init_tree_desc_gently(struct tree_desc *desc, const void *buf, unsigned long size,
+int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid,
+			  const void *buf, unsigned long size,
 			  enum tree_desc_flags flags);
 
 /*
diff --git a/tree.c b/tree.c
index c745462f968e..44bcf728f10a 100644
--- a/tree.c
+++ b/tree.c
@@ -27,7 +27,7 @@ int read_tree_at(struct repository *r,
 	if (parse_tree(tree))
 		return -1;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
 		if (retval != all_entries_interesting) {
diff --git a/walker.c b/walker.c
index 65002a7220ad..c0fd632d921c 100644
--- a/walker.c
+++ b/walker.c
@@ -45,7 +45,7 @@ static int process_tree(struct walker *walker, struct tree *tree)
 	if (parse_tree(tree))
 		return -1;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
 		struct object *obj = NULL;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 22/32] object-file: Handle compat objects in check_object_signature
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (20 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 21/32] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 23/32] builtin/ls-tree: Let the oid determine the output algorithm Eric W. Biederman
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

Update check_object_signature to find the hash algorithm the exising
signature uses, and to use the same hash algorithm when recomputing it
to check the signature is valid.

This will be useful when teaching git ls-tree to display objects
encoded with the compat hash algorithm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 object-file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index fd420dd303df..d6140ebccaf1 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1094,9 +1094,11 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size,
 			   enum object_type type)
 {
+	const struct git_hash_algo *algo =
+		oid->algo ? &hash_algos[oid->algo] : r->hash_algo;
 	struct object_id real_oid;
 
-	hash_object_file(r->hash_algo, buf, size, type, &real_oid);
+	hash_object_file(algo, buf, size, type, &real_oid);
 
 	return !oideq(oid, &real_oid) ? -1 : 0;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 23/32] builtin/ls-tree: Let the oid determine the output algorithm
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (21 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 22/32] object-file: Handle compat objects in check_object_signature Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 24/32] builtin/pack-objects: Communicate the compatibility hash through struct pack_idx_entry Eric W. Biederman
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

Update cmd_ls_tree to call get_oid_with_context and pass
GET_OID_UNTRANSLATED instead of calling the simpler repo_get_oid.

This implments in ls-tree the behavior that asking to display a sha1
hash displays the corrresponding sha1 encoded object and asking to
display a sha256 hash displayes the corresponding sha256 encoded
object.

This is useful for testing the conversion of an object to an
equivlanet object encoded with a different hash function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/ls-tree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index f558db5f3b80..346e3fd812eb 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -376,6 +376,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 		OPT_END()
 	};
 	struct ls_tree_cmdmode_to_fmt *m2f = ls_tree_cmdmode_format;
+	struct object_context obj_context;
 	int ret;
 
 	git_config(git_default_config, NULL);
@@ -407,7 +408,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 			ls_tree_usage, ls_tree_options);
 	if (argc < 1)
 		usage_with_options(ls_tree_usage, ls_tree_options);
-	if (repo_get_oid(the_repository, argv[0], &oid))
+	if (get_oid_with_context(the_repository, argv[0],
+				 GET_OID_UNTRANSLATED, &oid,
+				 &obj_context))
 		die("Not a valid object name %s", argv[0]);
 
 	/*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 24/32] builtin/pack-objects:  Communicate the compatibility hash through struct pack_idx_entry
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (22 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 23/32] builtin/ls-tree: Let the oid determine the output algorithm Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 25/32] pack-compat-map: Add support for .compat files of a packfile Eric W. Biederman
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

When pack-objects is run all objects in the repository should already
have a compatibilty hash computed so it is just necessary to read
the existing mappings and store the value in struct pack_idx_entry.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/pack-objects.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d34902002656..ff04660a18fd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -42,6 +42,7 @@
 #include "promisor-remote.h"
 #include "pack-mtimes.h"
 #include "parse-options.h"
+#include "object-file-convert.h"
 
 /*
  * Objects we are going to pack are collected in the `to_pack` structure.
@@ -1547,10 +1548,16 @@ static struct object_entry *create_object_entry(const struct object_id *oid,
 						struct packed_git *found_pack,
 						off_t found_offset)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	struct object_entry *entry;
 
 	entry = packlist_alloc(&to_pack, oid);
 	entry->hash = hash;
+	if (compat &&
+	    repo_oid_to_algop(repo, &entry->idx.oid, compat,
+			      &entry->idx.compat_oid))
+		die(_("can't map object %s while writing pack"), oid_to_hex(oid));
 	oe_set_type(entry, type);
 	if (exclude)
 		entry->preferred_base = 1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 25/32] pack-compat-map:  Add support for .compat files of a packfile
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (23 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 24/32] builtin/pack-objects: Communicate the compatibility hash through struct pack_idx_entry Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-11  6:30   ` Junio C Hamano
  2023-09-08 23:10 ` [PATCH 26/32] object-file-convert: Implement convert_object_file_{begin,step,end} Eric W. Biederman
                   ` (9 subsequent siblings)
  34 siblings, 1 reply; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

These .compat files hold a bidirectional mapping between the names of
stored objects between sha1 and sha256.

Care has been taken so that index-pack --verify can be supported to
validate an existing compat map file is not currupted.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Makefile                  |   2 +
 builtin.h                 |   1 +
 builtin/show-compat-map.c | 139 ++++++++++++++++
 git.c                     |   1 +
 object-file-convert.c     |   7 +
 object-name.c             |  18 ++
 object-store-ll.h         |   6 +
 pack-compat-map.c         | 334 ++++++++++++++++++++++++++++++++++++++
 pack-compat-map.h         |  27 +++
 pack-write.c              | 158 ++++++++++++++++++
 packfile.c                |  12 ++
 11 files changed, 705 insertions(+)
 create mode 100644 builtin/show-compat-map.c
 create mode 100644 pack-compat-map.c
 create mode 100644 pack-compat-map.h

diff --git a/Makefile b/Makefile
index 3c18664def9a..b3f3dbe7bfeb 100644
--- a/Makefile
+++ b/Makefile
@@ -1088,6 +1088,7 @@ LIB_OBJS += pack-check.o
 LIB_OBJS += pack-mtimes.o
 LIB_OBJS += pack-objects.o
 LIB_OBJS += pack-revindex.o
+LIB_OBJS += pack-compat-map.o
 LIB_OBJS += pack-write.o
 LIB_OBJS += packfile.o
 LIB_OBJS += pager.o
@@ -1299,6 +1300,7 @@ BUILTIN_OBJS += builtin/send-pack.o
 BUILTIN_OBJS += builtin/shortlog.o
 BUILTIN_OBJS += builtin/show-branch.o
 BUILTIN_OBJS += builtin/show-index.o
+BUILTIN_OBJS += builtin/show-compat-map.o
 BUILTIN_OBJS += builtin/show-ref.o
 BUILTIN_OBJS += builtin/sparse-checkout.o
 BUILTIN_OBJS += builtin/stash.o
diff --git a/builtin.h b/builtin.h
index d560baa6618a..25882d281dd2 100644
--- a/builtin.h
+++ b/builtin.h
@@ -223,6 +223,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix);
 int cmd_show(int argc, const char **argv, const char *prefix);
 int cmd_show_branch(int argc, const char **argv, const char *prefix);
 int cmd_show_index(int argc, const char **argv, const char *prefix);
+int cmd_show_compat_map(int argc, const char **argv, const char *prefix);
 int cmd_sparse_checkout(int argc, const char **argv, const char *prefix);
 int cmd_status(int argc, const char **argv, const char *prefix);
 int cmd_stash(int argc, const char **argv, const char *prefix);
diff --git a/builtin/show-compat-map.c b/builtin/show-compat-map.c
new file mode 100644
index 000000000000..8cc10bdaab61
--- /dev/null
+++ b/builtin/show-compat-map.c
@@ -0,0 +1,139 @@
+#include "builtin.h"
+#include "gettext.h"
+#include "hash.h"
+#include "hex.h"
+#include "pack.h"
+#include "parse-options.h"
+#include "repository.h"
+
+static const char *const show_compat_map_usage[] = {
+	"git show-compat-map [--verbose] ",
+	NULL
+};
+
+struct pack_compat_map_header {
+	uint8_t sig[4];
+	uint8_t version;
+	uint8_t first_oid_version;
+	uint8_t second_oid_version;
+	uint8_t mbz1;
+	uint32_t nr_objects;
+	uint8_t first_abbrev_len;
+	uint8_t mbz2;
+	uint8_t second_abbrev_len;
+	uint8_t mbz3;
+};
+
+struct map_entry {
+	struct object_id oid;
+	uint32_t index;
+};
+
+static const struct git_hash_algo *from_oid_version(unsigned oid_version)
+{
+	if (oid_version == 1) {
+		return &hash_algos[GIT_HASH_SHA1];
+	} else if (oid_version == 2) {
+		return &hash_algos[GIT_HASH_SHA256];
+	}
+	die("unknown oid version %u\n", oid_version);
+}
+
+static void read_half_map(struct map_entry *map, unsigned nr,
+		     const struct git_hash_algo *algo)
+{
+	unsigned i;
+	for (i = 0; i < nr; i++) {
+		uint32_t index;
+		if (fread(map[i].oid.hash, algo->rawsz, 1, stdin) != 1)
+			die("unable to read hash of %s entry %u/%u",
+			    algo->name, i, nr);
+		if (fread(&index, 4, 1, stdin) != 1)
+			die("unable to read index of %s entry %u/%u",
+			    algo->name, i, nr);
+		map[i].oid.algo = hash_algo_by_ptr(algo);
+		map[i].index = ntohl(index);
+	}
+}
+
+static void print_half_map(const struct map_entry *map,
+			   unsigned nr)
+{
+	unsigned i;
+	for (i = 0; i < nr; i++) {
+		printf("%s %"PRIu32"\n",
+		       oid_to_hex(&map[i].oid),
+		       map[i].index);
+	}
+}
+
+static void print_map(const struct map_entry *map,
+		      const struct map_entry *compat_map,
+		      unsigned nr)
+{
+	unsigned i;
+	for (i = 0; i < nr; i++) {
+		printf("%s ",
+		       oid_to_hex(&map[i].oid));
+		printf("%s\n",
+		       oid_to_hex(&compat_map[map[i].index].oid));
+	}
+}
+
+int cmd_show_compat_map(int argc, const char **argv, const char *prefix)
+{
+	const struct git_hash_algo *algo = NULL, *compat = NULL;
+	unsigned nr;
+	struct pack_compat_map_header hdr;
+	struct map_entry *map, *compat_map;
+	int verbose = 0;
+	const struct option show_comapt_map_options[] = {
+		OPT_BOOL(0, "verbose", &verbose,
+			 N_("print implementation details of the map file")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, show_comapt_map_options,
+			     show_compat_map_usage, 0);
+
+	if (fread(&hdr, sizeof(hdr), 1, stdin) != 1)
+		die("unable to read header");
+	if ((hdr.sig[0] != 'C') ||
+	    (hdr.sig[1] != 'M') ||
+	    (hdr.sig[2] != 'A') ||
+	    (hdr.sig[3] != 'P'))
+		die("Missing map signature");
+	if (hdr.version != 1)
+		die("Unknown map version");
+	if ((hdr.mbz1 != 0) ||
+	    (hdr.mbz2 != 0) ||
+	    (hdr.mbz3 != 0))
+		die("Must be zero fields non-zero");
+
+	nr = ntohl(hdr.nr_objects);
+
+	algo = from_oid_version(hdr.first_oid_version);
+	compat = from_oid_version(hdr.second_oid_version);
+
+
+	if (verbose) {
+		printf("Map v%u for %u objects from %s to %s abbrevs (%u:%u)\n",
+		       hdr.version,
+		       nr,
+		       algo->name, compat->name,
+		       hdr.first_abbrev_len,
+		       hdr.second_abbrev_len);
+	}
+	ALLOC_ARRAY(map, nr);
+	ALLOC_ARRAY(compat_map, nr);
+	read_half_map(map, nr, algo);
+	read_half_map(compat_map, nr, compat);
+	if (verbose) {
+		print_half_map(map, nr);
+		print_half_map(compat_map, nr);
+	}
+	print_map(map, compat_map, nr);
+	free(compat_map);
+	free(map);
+	return 0;
+}
diff --git a/git.c b/git.c
index c67e44dd82d2..bfaeece5ae0e 100644
--- a/git.c
+++ b/git.c
@@ -606,6 +606,7 @@ static struct cmd_struct commands[] = {
 	{ "show", cmd_show, RUN_SETUP },
 	{ "show-branch", cmd_show_branch, RUN_SETUP },
 	{ "show-index", cmd_show_index, RUN_SETUP_GENTLY },
+	{ "show-compat-map", cmd_show_compat_map, RUN_SETUP_GENTLY },
 	{ "show-ref", cmd_show_ref, RUN_SETUP },
 	{ "sparse-checkout", cmd_sparse_checkout, RUN_SETUP },
 	{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },
diff --git a/object-file-convert.c b/object-file-convert.c
index d381d3d2ea65..7978aa63dfa9 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -9,6 +9,7 @@
 #include "loose.h"
 #include "commit.h"
 #include "gpg-interface.h"
+#include "pack-compat-map.h"
 #include "object-file-convert.h"
 
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
@@ -27,6 +28,12 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 		return 0;
 	}
 	if (repo_loose_object_map_oid(repo, dest, to, src)) {
+		/*
+		 * It's not in the loose object map, so let's see if it's in a
+		 * pack.
+		 */
+		if (!repo_packed_oid_to_algop(repo, src, to, dest))
+			return 0;
 		/*
 		 * We may have loaded the object map at repo initialization but
 		 * another process (perhaps upstream of a pipe from us) may have
diff --git a/object-name.c b/object-name.c
index ebe87f5c4fdd..d33c82bc96ba 100644
--- a/object-name.c
+++ b/object-name.c
@@ -26,6 +26,7 @@
 #include "commit-reach.h"
 #include "date.h"
 #include "object-file-convert.h"
+#include "pack-compat-map.h"
 
 static int get_oid_oneline(struct repository *r, const char *, struct object_id *, struct commit_list *);
 
@@ -210,6 +211,19 @@ static void find_short_packed_object(struct disambiguate_state *ds)
 		unique_in_pack(p, ds);
 }
 
+static void find_short_packed_compat_object(struct disambiguate_state *ds)
+{
+	struct packed_git *p;
+
+	/* Skip, unless compatibility oids are wanted */
+	if (!ds->algo && (&hash_algos[ds->algo] != ds->repo->compat_hash_algo))
+		return;
+
+	for (p = get_packed_git(ds->repo); p && !ds->ambiguous; p = p->next)
+		pack_compat_map_each(ds->repo, p, ds->bin_pfx.hash, ds->len,
+				     match_prefix, ds);
+}
+
 static int finish_object_disambiguation(struct disambiguate_state *ds,
 					struct object_id *oid)
 {
@@ -581,6 +595,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
 
 	find_short_object_filename(&ds);
 	find_short_packed_object(&ds);
+	find_short_packed_compat_object(&ds);
 	status = finish_object_disambiguation(&ds, oid);
 
 	/*
@@ -592,6 +607,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
 		reprepare_packed_git(r);
 		find_short_object_filename(&ds);
 		find_short_packed_object(&ds);
+		find_short_packed_compat_object(&ds);
 		status = finish_object_disambiguation(&ds, oid);
 	}
 
@@ -659,6 +675,7 @@ int repo_for_each_abbrev(struct repository *r, const char *prefix,
 	ds.cb_data = &collect;
 	find_short_object_filename(&ds);
 	find_short_packed_object(&ds);
+	find_short_packed_compat_object(&ds);
 
 	ret = oid_array_for_each_unique(&collect, fn, cb_data);
 	oid_array_clear(&collect);
@@ -871,6 +888,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 	ds.cb_data = (void *)&mad;
 
 	find_short_object_filename(&ds);
+	find_short_packed_compat_object(&ds);
 	(void)finish_object_disambiguation(&ds, &oid_ret);
 
 	hex[mad.cur_len] = 0;
diff --git a/object-store-ll.h b/object-store-ll.h
index c5f2bb2fc2fe..c37c19ada0c3 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -135,6 +135,12 @@ struct packed_git {
 	 */
 	const uint32_t *mtimes_map;
 	size_t mtimes_size;
+
+	const void *compat_mapping;
+	size_t compat_mapping_size;
+	const uint8_t *hash_map;
+	const uint8_t *compat_hash_map;
+
 	/* something like ".git/objects/pack/xxxxx.pack" */
 	char pack_name[FLEX_ARRAY]; /* more */
 };
diff --git a/pack-compat-map.c b/pack-compat-map.c
new file mode 100644
index 000000000000..3a992095ebe3
--- /dev/null
+++ b/pack-compat-map.c
@@ -0,0 +1,334 @@
+#include "git-compat-util.h"
+#include "gettext.h"
+#include "hex.h"
+#include "hash-ll.h"
+#include "hash.h"
+#include "object-store.h"
+#include "object-file.h"
+#include "packfile.h"
+#include "pack-compat-map.h"
+#include "packfile.h"
+
+struct pack_compat_map_header {
+	uint8_t sig[4];
+	uint8_t version;
+	uint8_t first_oid_version;
+	uint8_t second_oid_version;
+	uint8_t mbz1;
+	uint32_t nr_objects;
+	uint8_t first_abbrev_len;
+	uint8_t mbz2;
+	uint8_t second_abbrev_len;
+	uint8_t mbz3;
+};
+
+static char *pack_compat_map_filename(struct packed_git *p)
+{
+	size_t len;
+	if (!strip_suffix(p->pack_name, ".pack", &len))
+		BUG("pack_name does not end in .pack");
+	return xstrfmt("%.*s.compat", (int)len, p->pack_name);
+}
+
+static int oid_version_match(const char *filename,
+			     unsigned oid_version,
+			     const struct git_hash_algo *algo)
+{
+	const struct git_hash_algo *found = NULL;
+	int ret = 0;
+
+	if (oid_version == 1) {
+		found = &hash_algos[GIT_HASH_SHA1];
+	} else if (oid_version == 2) {
+		found = &hash_algos[GIT_HASH_SHA256];
+	}
+	if (found == NULL) {
+		ret = error(_("compat map file %s hash version %u unknown"),
+			    filename, oid_version);
+	}
+	else if (found != algo) {
+		ret = error(_("compat map file %s found hash %s expected hash %s"),
+			    filename, found->name, algo->name);
+	}
+	return ret;
+}
+
+
+static int load_pack_compat_map_file(char *compat_map_file,
+				     struct repository *repo,
+				     struct packed_git *p)
+{
+	const struct pack_compat_map_header *hdr;
+	unsigned compat_map_objects = 0;
+	const uint8_t *data = NULL;
+	const uint8_t *packs_hash = NULL;
+	int fd, ret = 0;
+	struct stat st;
+	size_t size, map1sz, map2sz, expected_size;
+
+	fd = git_open(compat_map_file);
+
+	if (fd < 0) {
+		ret = -1;
+		goto cleanup;
+	}
+	if (fstat(fd, &st)) {
+		ret = error_errno(_("failed to read %s"), compat_map_file);
+		goto cleanup;
+	}
+
+	size = xsize_t(st.st_size);
+
+	if (size < sizeof(struct pack_compat_map_header)) {
+		ret = error(_("compat map file %s is too small"), compat_map_file);
+		goto cleanup;
+	}
+
+	data = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+	hdr = (const struct pack_compat_map_header *)data;
+	if ((hdr->sig[0] != 'C') ||
+	    (hdr->sig[1] != 'M') ||
+	    (hdr->sig[2] != 'A') ||
+	    (hdr->sig[3] != 'P')) {
+		ret = error(_("compat map file %s has unknown signature"),
+			    compat_map_file);
+		goto cleanup;
+	}
+
+	if (hdr->version != 1) {
+		ret = error(_("compat map file %s has unsupported version %"PRIu8),
+			    compat_map_file, hdr->version);
+		goto cleanup;
+	}
+
+	ret = oid_version_match(compat_map_file, hdr->first_oid_version, repo->hash_algo);
+	if (ret)
+		goto cleanup;
+	ret = oid_version_match(compat_map_file, hdr->second_oid_version, repo->compat_hash_algo);
+	if (ret)
+		goto cleanup;
+	compat_map_objects = ntohl(hdr->nr_objects);
+	if (compat_map_objects != p->num_objects) {
+		ret = error(_("compat map file %s number of objects found %u wanted %u"),
+			    compat_map_file, compat_map_objects, p->num_objects);
+		goto cleanup;
+	}
+
+	map1sz = st_mult(repo->hash_algo->rawsz + 4, compat_map_objects);
+	map2sz = st_mult(repo->compat_hash_algo->rawsz + 4, compat_map_objects);
+
+	expected_size = sizeof(struct pack_compat_map_header);
+	expected_size = st_add(expected_size, map1sz);
+	expected_size = st_add(expected_size, map2sz);
+	expected_size = st_add(expected_size, 2 * repo->hash_algo->rawsz);
+
+	if (size != expected_size) {
+		ret = error(_("compat map file %s is corrupt size %zu expected %zu objects %u sz1 %zu sz2 %zu"),
+			    compat_map_file, size, expected_size, compat_map_objects,
+			    map1sz, map2sz
+			);
+		goto cleanup;
+	}
+
+	packs_hash = data + sizeof(struct pack_compat_map_header) + map1sz + map2sz;
+	if (hashcmp(packs_hash, p->hash)) {
+		ret = error(_("compat map file %s does not match pack %s\n"),
+			      compat_map_file, hash_to_hex(p->hash));
+	}
+
+
+	p->compat_mapping = data;
+	p->compat_mapping_size = size;
+
+	p->hash_map = data + sizeof(struct pack_compat_map_header);
+	p->compat_hash_map = p->hash_map + map1sz;
+
+cleanup:
+	if (ret) {
+		if (data) {
+			munmap((void *)data, size);
+		}
+	}
+	if (fd >= 0)
+		close(fd);
+	return ret;
+}
+
+int load_pack_compat_map(struct repository *repo, struct packed_git *p)
+{
+	char *compat_map_name = NULL;
+	int ret = 0;
+
+	if (p->compat_mapping)
+		return ret;	/* already loaded */
+
+	if (!repo->compat_hash_algo)
+		return 1;		/* Nothing to do */
+
+	ret = open_pack_index(p);
+	if (ret < 0)
+		goto cleanup;
+
+	compat_map_name = pack_compat_map_filename(p);
+	ret = load_pack_compat_map_file(compat_map_name, repo, p);
+cleanup:
+	free(compat_map_name);
+	return ret;
+}
+
+static int keycmp(const unsigned char *a, const unsigned char *b,
+		  size_t key_hex_size)
+{
+	size_t key_byte_size = key_hex_size / 2;
+	unsigned a_last, b_last, mask = (key_hex_size & 1) ? 0xf0 : 0;
+	int cmp = memcmp(a, b, key_byte_size);
+	if (cmp)
+		return cmp;
+
+	a_last = a[key_byte_size] & mask;
+	b_last = b[key_byte_size] & mask;
+
+	if (a_last == b_last)
+		cmp = 0;
+	else if (a_last < b_last)
+		cmp = -1;
+	else
+		cmp = 1;
+
+	return cmp;
+}
+
+static const uint8_t *bsearch_map(const unsigned char *hash,
+				  const uint8_t *table, unsigned nr,
+				  size_t entry_size, size_t key_hex_size)
+{
+	uint32_t hi, lo;
+
+	hi = nr - 1;
+	lo = 0;
+	while (lo < hi) {
+		unsigned mi = lo + ((hi - lo) / 2);
+		const unsigned char *entry = table + (mi * entry_size);
+		int cmp = keycmp(entry, hash, key_hex_size);
+		if (!cmp)
+			return entry;
+		if (cmp > 0)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+	if (lo == hi) {
+		const unsigned char *entry = table + (lo * entry_size);
+		int cmp = keycmp(entry, hash, key_hex_size);
+		if (!cmp)
+			return entry;
+	}
+	return NULL;
+}
+
+static void map_each(const struct git_hash_algo *compat,
+		     const unsigned char *prefix, size_t prefix_hexsz,
+		     const uint8_t *table, unsigned nr, size_t entry_bytes,
+		     compat_map_iter_t iter, void *data)
+{
+	const uint8_t *found, *last = table + (entry_bytes * nr);
+
+	found = bsearch_map(prefix, table, nr, entry_bytes, prefix_hexsz);
+	if (!found)
+		return;
+
+	/* Visit each matching key */
+	do {
+		struct object_id oid;
+
+		if (keycmp(found, prefix, prefix_hexsz) != 0)
+			break;
+
+		oidread_algop(&oid, found, compat);
+		if (iter(&oid, data) == CB_BREAK)
+			break;
+
+		found = found + entry_bytes;
+	} while (found < last);
+}
+
+void pack_compat_map_each(struct repository *repo, struct packed_git *p,
+			 const unsigned char *prefix, size_t prefix_hexsz,
+			 compat_map_iter_t iter, void *data)
+{
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+
+	if (!p->num_objects ||
+	    (!p->compat_mapping && load_pack_compat_map(repo, p)))
+		return;
+
+	if (prefix_hexsz > compat->hexsz)
+		prefix_hexsz = compat->hexsz;
+
+	map_each(compat, prefix, prefix_hexsz,
+		 p->compat_hash_map, p->num_objects, compat->rawsz + 4,
+		 iter, data);
+}
+
+static int compat_map_to_algop(const struct object_id *src,
+			       const struct git_hash_algo *to,
+			       const struct git_hash_algo *from,
+			       const uint8_t *to_table,
+			       const uint8_t *from_table,
+			       unsigned nr,
+			       struct object_id *dest)
+{
+	const uint8_t *found;
+	uint32_t index;
+
+	if (src->algo != hash_algo_by_ptr(from))
+		return -1;
+
+	found = bsearch_map(src->hash,
+			    from_table, nr,
+			    from->rawsz + 4,
+			    from->hexsz);
+	if (!found)
+		return -1;
+
+	index = ntohl(*(uint32_t *)(found + from->rawsz));
+	oidread_algop(dest, to_table + index * (to->rawsz + 4), to);
+	return 0;
+}
+
+static int pack_to_algop(struct repository *repo, struct packed_git *p,
+			 const struct object_id *src,
+			 const struct git_hash_algo *to, struct object_id *dest)
+{
+	if (!p->compat_mapping && load_pack_compat_map(repo, p))
+		return -1;
+
+	if (to == repo->hash_algo) {
+		return compat_map_to_algop(src, to, repo->compat_hash_algo,
+					   p->hash_map,
+					   p->compat_hash_map,
+					   p->num_objects, dest);
+	}
+	else if (to == repo->compat_hash_algo) {
+		return compat_map_to_algop(src, to, repo->hash_algo,
+					   p->compat_hash_map,
+					   p->hash_map,
+					   p->num_objects, dest);
+	}
+	else
+		return -1;
+}
+
+int repo_packed_oid_to_algop(struct repository *repo,
+			     const struct object_id *src,
+			     const struct git_hash_algo *to,
+			     struct object_id *dest)
+{
+	struct packed_git *p;
+	for (p = get_packed_git(repo); p; p = p->next) {
+		if (!pack_to_algop(repo, p, src, to, dest))
+			return 0;
+	}
+	return -1;
+}
diff --git a/pack-compat-map.h b/pack-compat-map.h
new file mode 100644
index 000000000000..2a4561ffdff6
--- /dev/null
+++ b/pack-compat-map.h
@@ -0,0 +1,27 @@
+#ifndef PACK_COMPAT_MAP_H
+#define PACK_COMPAT_MAP_H
+
+#include "cbtree.h"
+struct repository;
+struct packed_git;
+struct object_id;
+struct git_hash_algo;
+struct pack_idx_entry;
+
+int load_pack_compat_map(struct repository *repo, struct packed_git *p);
+
+typedef enum cb_next (*compat_map_iter_t)(const struct object_id *, void *data);
+void pack_compat_map_each(struct repository *repo, struct packed_git *p,
+			 const unsigned char *prefix, size_t prefix_hexsz,
+			 compat_map_iter_t, void *data);
+
+int repo_packed_oid_to_algop(struct repository *repo,
+			     const struct object_id *src,
+			     const struct git_hash_algo *to,
+			     struct object_id *dest);
+
+const char *write_compat_map_file(const char *compat_map_name,
+				  struct pack_idx_entry **objects,
+				  int nr_objects, const unsigned char *hash);
+
+#endif /* PACK_COMPAT_MAP_H */
diff --git a/pack-write.c b/pack-write.c
index b19ddf15b284..f22eea964f77 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -12,6 +12,7 @@
 #include "pack-revindex.h"
 #include "path.h"
 #include "strbuf.h"
+#include "object-file-convert.h"
 
 void reset_pack_idx_option(struct pack_idx_option *opts)
 {
@@ -345,6 +346,157 @@ static char *write_mtimes_file(struct packing_data *to_pack,
 	return mtimes_name;
 }
 
+struct map_entry {
+	const struct pack_idx_entry *idx;
+	uint32_t oid_index;
+	uint32_t compat_oid_index;
+};
+
+static int map_oid_cmp(const void *_a, const void *_b)
+{
+	struct map_entry *a = *(struct map_entry **)_a;
+	struct map_entry *b = *(struct map_entry **)_b;
+	return oidcmp(&a->idx->oid, &b->idx->oid);
+}
+
+static int map_compat_oid_cmp(const void *_a, const void *_b)
+{
+	struct map_entry *a = *(struct map_entry **)_a;
+	struct map_entry *b = *(struct map_entry **)_b;
+	return oidcmp(&a->idx->compat_oid, &b->idx->compat_oid);
+}
+
+struct pack_compat_map_header {
+	uint8_t sig[4];
+	uint8_t version;
+	uint8_t first_oid_version;
+	uint8_t second_oid_version;
+	uint8_t mbz1;
+	uint32_t nr_objects;
+	uint8_t first_abbrev_len;
+	uint8_t mbz2;
+	uint8_t second_abbrev_len;
+	uint8_t mbz3;
+};
+
+static inline unsigned last_matching_offset(const struct object_id *a,
+					    const struct object_id *b,
+					    const struct git_hash_algo *algop)
+{
+	unsigned i;
+	for (i = 0; i < algop->rawsz; i++)
+		if (a->hash[i] != b->hash[i])
+			return i;
+	/* We should never hit this case. */
+	return i;
+}
+
+/*
+ * The *hash contains the pack content hash.
+ * The objects array is passed in sorted.
+ */
+const char *write_compat_map_file(const char *compat_map_name,
+				  struct pack_idx_entry **objects,
+				  int nr_objects, const unsigned char *hash)
+{
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	unsigned short_name_len, compat_short_name_len;
+	struct hashfile *f;
+	struct map_entry *map_entries, **map;
+	struct pack_compat_map_header hdr;
+	unsigned i;
+	int fd;
+
+	if (!compat || !nr_objects)
+		return NULL;
+
+	ALLOC_ARRAY(map_entries, nr_objects);
+	ALLOC_ARRAY(map, nr_objects);
+	short_name_len = 1;
+	for (i = 0; i < nr_objects; ++i) {
+		unsigned offset;
+
+		map[i] = &map_entries[i];
+		map_entries[i].idx = objects[i];
+		if (!objects[i]->compat_oid.algo)
+			BUG("No mapping from %s to %s\n",
+			    oid_to_hex(&objects[i]->oid),
+			    compat->name);
+
+		map_entries[i].oid_index = i;
+		map_entries[i].compat_oid_index = 0;
+		if (i == 0)
+			continue;
+
+		offset = last_matching_offset(&map_entries[i].idx->oid,
+					      &map_entries[i - 1].idx->oid,
+					      algo);
+		if (offset > short_name_len)
+			short_name_len = offset;
+	}
+	QSORT(map, nr_objects, map_compat_oid_cmp);
+	compat_short_name_len = 1;
+	for (i = 0; i < nr_objects; ++i) {
+		unsigned offset;
+
+		map[i]->compat_oid_index = i;
+
+		if (i == 0)
+			continue;
+
+		offset = last_matching_offset(&map[i]->idx->compat_oid,
+					      &map[i - 1]->idx->compat_oid,
+					      compat);
+		if (offset > compat_short_name_len)
+			compat_short_name_len = offset;
+	}
+
+	if (compat_map_name) {
+		/* Verify an existing compat map file */
+		f = hashfd_check(compat_map_name);
+	} else {
+		struct strbuf tmp_file = STRBUF_INIT;
+		fd = odb_mkstemp(&tmp_file, "pack/tmp_compat_map_XXXXXX");
+		compat_map_name = strbuf_detach(&tmp_file, NULL);
+		f = hashfd(fd, compat_map_name);
+	}
+
+	hdr.sig[0] = 'C';
+	hdr.sig[1] = 'M';
+	hdr.sig[2] = 'A';
+	hdr.sig[3] = 'P';
+	hdr.version = 1;
+	hdr.first_oid_version = oid_version(algo);
+	hdr.second_oid_version = oid_version(compat);
+	hdr.mbz1 = 0;
+	hdr.nr_objects = htonl(nr_objects);
+	hdr.first_abbrev_len = short_name_len;
+	hdr.mbz2 = 0;
+	hdr.second_abbrev_len = compat_short_name_len;
+	hdr.mbz3 = 0;
+	hashwrite(f, &hdr, sizeof(hdr));
+
+	QSORT(map, nr_objects, map_oid_cmp);
+	for (i = 0; i < nr_objects; i++) {
+		hashwrite(f, map[i]->idx->oid.hash, algo->rawsz);
+		hashwrite_be32(f, map[i]->compat_oid_index);
+	}
+	QSORT(map, nr_objects, map_compat_oid_cmp);
+	for (i = 0; i < nr_objects; i++) {
+		hashwrite(f, map[i]->idx->compat_oid.hash, compat->rawsz);
+		hashwrite_be32(f, map[i]->oid_index);
+	}
+
+	hashwrite(f, hash, algo->rawsz);
+	finalize_hashfile(f, NULL, FSYNC_COMPONENT_PACK_METADATA,
+			  CSUM_HASH_IN_STREAM | CSUM_CLOSE | CSUM_FSYNC);
+	free(map);
+	free(map_entries);
+	return compat_map_name;
+}
+
 off_t write_pack_header(struct hashfile *f, uint32_t nr_entries)
 {
 	struct pack_header hdr;
@@ -548,6 +700,7 @@ void stage_tmp_packfiles(struct strbuf *name_buffer,
 {
 	const char *rev_tmp_name = NULL;
 	char *mtimes_tmp_name = NULL;
+	const char *compat_map_tmp_name = NULL;
 
 	if (adjust_shared_perm(pack_tmp_name))
 		die_errno("unable to make temporary pack file readable");
@@ -566,11 +719,16 @@ void stage_tmp_packfiles(struct strbuf *name_buffer,
 						    hash);
 	}
 
+	compat_map_tmp_name = write_compat_map_file(NULL, written_list,
+						    nr_written, hash);
+
 	rename_tmp_packfile(name_buffer, pack_tmp_name, "pack");
 	if (rev_tmp_name)
 		rename_tmp_packfile(name_buffer, rev_tmp_name, "rev");
 	if (mtimes_tmp_name)
 		rename_tmp_packfile(name_buffer, mtimes_tmp_name, "mtimes");
+	if (compat_map_tmp_name)
+		rename_tmp_packfile(name_buffer, compat_map_tmp_name, "compat");
 
 	free((char *)rev_tmp_name);
 	free(mtimes_tmp_name);
diff --git a/packfile.c b/packfile.c
index 1fae0fcdd9e7..c1a6bd9bc6b3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -349,6 +349,17 @@ static void close_pack_mtimes(struct packed_git *p)
 	p->mtimes_map = NULL;
 }
 
+static void close_pack_compat_map(struct packed_git *p)
+{
+	if (!p->compat_mapping)
+		return;
+
+	munmap((void *)p->compat_mapping, p->compat_mapping_size);
+	p->compat_mapping = NULL;
+	p->hash_map = NULL;
+	p->compat_hash_map = NULL;
+}
+
 void close_pack(struct packed_git *p)
 {
 	close_pack_windows(p);
@@ -356,6 +367,7 @@ void close_pack(struct packed_git *p)
 	close_pack_index(p);
 	close_pack_revindex(p);
 	close_pack_mtimes(p);
+	close_pack_compat_map(p);
 	oidset_clear(&p->bad_objects);
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 26/32] object-file-convert: Implement convert_object_file_{begin,step,end}
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (24 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 25/32] pack-compat-map: Add support for .compat files of a packfile Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-11  6:28   ` Junio C Hamano
  2023-09-08 23:10 ` [PATCH 27/32] builtin/fast-import: compute compatibility hashs for imported objects Eric W. Biederman
                   ` (8 subsequent siblings)
  34 siblings, 1 reply; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

When converting trees, commits, and tags the objects they reference
need to be converted before the objects themselves can be converted.

Split convert_objet_file_convert into a couple of pieces that are
effectively an iterator over the oids that need to be converted.  This
allows the objects to be processed depth first when being converted
and it allows changing the logic to map oids.  In cases like "git
index-pack" none of the oids will be mapped in any of the existing
mapping tables so an in-memory table needs to be converted and
consulted, and this allows that.

Not having to update the existing object id mapping mechanisms
is particularly nice as it makes it easy to avoid having to
introduce new locks to syncrhonize the update of internal
mapping mechanisms.

This was inspired by a similar change by "brian m. carlson"
where he modified convert_object_file to return the
unmmaped oids.

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 object-file-convert.c | 226 ++++++++++++++++++++++++++++++------------
 object-file-convert.h |  21 ++++
 2 files changed, 186 insertions(+), 61 deletions(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index 7978aa63dfa9..3fd080ebc112 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -67,55 +67,74 @@ static int decode_tree_entry_raw(struct object_id *oid, const char **path,
 	return 0;
 }
 
-static int convert_tree_object(struct strbuf *out,
-			       const struct git_hash_algo *from,
-			       const struct git_hash_algo *to,
-			       const char *buffer, size_t size)
+static int convert_tree_object_step(struct object_file_convert_state *state)
 {
-	const char *p = buffer, *end = buffer + size;
+	const char *buf = state->buf, *p, *end = buf + state->buf_len;
+	const struct git_hash_algo *from = state->from;
+	const struct git_hash_algo *to = state->to;
+	struct strbuf *out = state->outbuf;
+
+	/* The current position */
+	p = buf + state->buf_pos;
 
 	while (p < end) {
-		struct object_id entry_oid, mapped_oid;
+		struct object_id entry_oid;
 		const char *path = NULL;
 		size_t pathlen;
 
 		if (decode_tree_entry_raw(&entry_oid, &path, &pathlen, from, p,
 					  end - p))
 			return error(_("failed to decode tree entry"));
-		if (repo_oid_to_algop(the_repository, &entry_oid, to, &mapped_oid))
-			return error(_("failed to map tree entry for %s"), oid_to_hex(&entry_oid));
+
+		if (!state->mapped_oid.algo) {
+			oidcpy(&state->oid, &entry_oid);
+			return 1;
+		}
+		else if (!oideq(&entry_oid, &state->oid))
+			return error(_("bad object_file_convert_state oid"));
+
 		strbuf_add(out, p, path - p);
 		strbuf_add(out, path, pathlen);
-		strbuf_add(out, mapped_oid.hash, to->rawsz);
+		strbuf_add(out, state->mapped_oid.hash, to->rawsz);
+		state->mapped_oid.algo = 0;
 		p = path + pathlen + from->rawsz;
+		state->buf_pos = p - buf;
 	}
 	return 0;
 }
 
-static int convert_commit_object(struct strbuf *out,
-				 const struct git_hash_algo *from,
-				 const struct git_hash_algo *to,
-				 const char *buffer, size_t size)
+static int convert_commit_object_step(struct object_file_convert_state *state)
 {
-	const char *tail = buffer;
-	const char *bufptr = buffer;
+	const struct git_hash_algo *from = state->from;
+	struct strbuf *out = state->outbuf;
+	const char *buf = state->buf;
+	const char *tail = buf + state->buf_len;
+	const char *bufptr = buf + state->buf_pos;
 	const int tree_entry_len = from->hexsz + 5;
 	const int parent_entry_len = from->hexsz + 7;
-	struct object_id oid, mapped_oid;
+	struct object_id oid;
 	const char *p;
 
-	tail += size;
-	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
-			bufptr[tree_entry_len] != '\n')
-		return error("bogus commit object");
-	if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0)
-		return error("bad tree pointer");
+	if (state->buf_pos == 0) {
+		if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
+		    bufptr[tree_entry_len] != '\n')
+			return error("bogus commit object");
+
+		if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0)
+			return error("bad tree pointer");
 
-	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
-		return error("unable to map tree %s in commit object",
-			     oid_to_hex(&oid));
-	strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid));
-	bufptr = p + 1;
+		if (!state->mapped_oid.algo) {
+			oidcpy(&state->oid, &oid);
+			return 1;
+		}
+		else if (!oideq(&oid, &state->oid))
+			return error(_("bad object_file_convert_state oid"));
+
+		strbuf_addf(out, "tree %s\n", oid_to_hex(&state->mapped_oid));
+		state->mapped_oid.algo = 0;
+		bufptr = p + 1;
+		state->buf_pos = bufptr - buf;
+	}
 
 	while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) {
 		if (tail <= bufptr + parent_entry_len + 1 ||
@@ -123,26 +142,44 @@ static int convert_commit_object(struct strbuf *out,
 		    *p != '\n')
 			return error("bad parents in commit");
 
-		if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
-			return error("unable to map parent %s in commit object",
-				     oid_to_hex(&oid));
+		if (!state->mapped_oid.algo) {
+			oidcpy(&state->oid, &oid);
+			return 1;
+		}
+		else if (!oideq(&oid, &state->oid))
+			return error(_("bad object_file_convert_state oid"));
 
-		strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid));
+		strbuf_addf(out, "parent %s\n", oid_to_hex(&state->mapped_oid));
+		state->mapped_oid.algo = 0;
 		bufptr = p + 1;
+		state->buf_pos = bufptr - buf;
 	}
 	strbuf_add(out, bufptr, tail - bufptr);
 	return 0;
 }
 
-static int convert_tag_object(struct strbuf *out,
-			      const struct git_hash_algo *from,
-			      const struct git_hash_algo *to,
-			      const char *buffer, size_t size)
+static int convert_tag_object_step(struct object_file_convert_state *state)
 {
 	struct strbuf payload = STRBUF_INIT, temp = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT;
-	size_t payload_size;
-	struct object_id oid, mapped_oid;
+	const struct git_hash_algo *from = state->from;
+	const struct git_hash_algo *to = state->to;
+	struct strbuf *out = state->outbuf;
+	const char *buffer = state->buf;
+	size_t payload_size, size = state->buf_len;;
+	struct object_id oid;
 	const char *p;
+	int ret = 0;
+
+	if (!state->mapped_oid.algo) {
+		if (strncmp(buffer, "object ", 7) ||
+		    buffer[from->hexsz + 7] != '\n')
+			return error("bogus tag object");
+		if (parse_oid_hex_algop(buffer + 7, &oid, &p, from) < 0)
+			return error("bad tag object ID");
+
+		oidcpy(&state->oid, &oid);
+		return 1;
+	}
 
 	/* Add some slop for longer signature header in the new algorithm. */
 	strbuf_grow(out, size + 7);
@@ -165,52 +202,119 @@ static int convert_tag_object(struct strbuf *out,
 	 * Our payload is now in payload and we may have up to two signatrures
 	 * in oursig and othersig.
 	 */
-	if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n')
-		return error("bogus tag object");
-	if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0)
-		return error("bad tag object ID");
-	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
-		return error("unable to map tree %s in tag object",
-			     oid_to_hex(&oid));
-	strbuf_addf(out, "object %s\n", oid_to_hex(&mapped_oid));
+	if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n') {
+		ret = error("bogus tag object");
+		goto out;
+	}
+	if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0) {
+		ret = error("bad tag object ID");
+		goto out;
+	}
+	if (!oideq(&oid, &state->oid)) {
+		ret = error(_("bad object_file_convert_state oid"));
+		goto out;
+	}
+
+	strbuf_addf(out, "object %s\n", oid_to_hex(&state->mapped_oid));
 	strbuf_add(out, p, payload.len - (p - payload.buf));
 	strbuf_addbuf(out, &othersig);
 	if (oursig.len)
 		add_header_signature(out, &oursig, from);
-	return 0;
+out:
+	strbuf_release(&oursig);
+	strbuf_release(&othersig);
+	strbuf_release(&payload);
+	return ret;
 }
 
-int convert_object_file(struct strbuf *outbuf,
-			const struct git_hash_algo *from,
-			const struct git_hash_algo *to,
-			const void *buf, size_t len,
-			enum object_type type,
-			int gentle)
+void convert_object_file_begin(struct object_file_convert_state *state,
+			      struct strbuf *outbuf,
+			      const struct git_hash_algo *from,
+			      const struct git_hash_algo *to,
+			      const void *buf, size_t len,
+			      enum object_type type)
 {
-	int ret;
+	memset(state, 0, sizeof(*state));
+	state->outbuf = outbuf;
+	state->from = from;
+	state->to = to;
+	state->buf = buf;
+	state->buf_len = len;
+	state->buf_pos = 0;
+	state->type = type;
+
 
 	/* Don't call this function when no conversion is necessary */
 	if ((from == to) || (type == OBJ_BLOB))
-		die("Refusing noop object file conversion");
+		BUG("Attempting noop object file conversion");
 
 	switch (type) {
 	case OBJ_TREE:
-		ret = convert_tree_object(outbuf, from, to, buf, len);
+	case OBJ_COMMIT:
+	case OBJ_TAG:
+		break;
+	default:
+		/* Not implemented yet, so fail. */
+		BUG("Unknown object file type found in conversion");
+	}
+}
+
+int convert_object_file_step(struct object_file_convert_state *state)
+{
+	int ret;
+
+	switch(state->type) {
+	case OBJ_TREE:
+		ret = convert_tree_object_step(state);
 		break;
 	case OBJ_COMMIT:
-		ret = convert_commit_object(outbuf, from, to, buf, len);
+		ret = convert_commit_object_step(state);
 		break;
 	case OBJ_TAG:
-		ret = convert_tag_object(outbuf, from, to, buf, len);
+		ret = convert_tag_object_step(state);
 		break;
 	default:
-		/* Not implemented yet, so fail. */
 		ret = -1;
 		break;
 	}
-	if (!ret)
-		return 0;
-	if (gentle)
+	return ret;
+}
+
+void convert_object_file_end(struct object_file_convert_state *state, int ret)
+{
+	if (ret != 0) {
+		strbuf_release(state->outbuf);
+	}
+	memset(state, 0, sizeof(*state));
+}
+
+int convert_object_file(struct strbuf *outbuf,
+			const struct git_hash_algo *from,
+			const struct git_hash_algo *to,
+			const void *buf, size_t len,
+			enum object_type type,
+			int gentle)
+{
+	struct object_file_convert_state state;
+	int ret;
+
+	convert_object_file_begin(&state, outbuf, from, to, buf, len, type);
+
+	for (;;) {
+		ret = convert_object_file_step(&state);
+		if (ret != 1)
+			break;
+		ret = repo_oid_to_algop(the_repository, &state.oid, state.to,
+					&state.mapped_oid);
+		if (ret) {
+			error(_("failed to map %s entry for %s"),
+			      type_name(type), oid_to_hex(&state.oid));
+			break;
+		}
+	}
+
+	convert_object_file_end(&state, ret);
+	if (!ret || gentle)
 		return ret;
 	die(_("Failed to convert object from %s to %s"),
 		from->name, to->name);
diff --git a/object-file-convert.h b/object-file-convert.h
index a4f802aa8eea..da032d7a91ef 100644
--- a/object-file-convert.h
+++ b/object-file-convert.h
@@ -10,6 +10,27 @@ struct strbuf;
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 		      const struct git_hash_algo *to, struct object_id *dest);
 
+struct object_file_convert_state {
+	struct strbuf *outbuf;
+	const struct git_hash_algo *from;
+	const struct git_hash_algo *to;
+	const void *buf;
+	size_t buf_len;
+	size_t buf_pos;
+	enum object_type type;
+	struct object_id oid;
+	struct object_id mapped_oid;
+};
+
+void convert_object_file_begin(struct object_file_convert_state *state,
+			       struct strbuf *outbuf,
+			       const struct git_hash_algo *from,
+			       const struct git_hash_algo *to,
+			       const void *buf, size_t len,
+			       enum object_type type);
+int convert_object_file_step(struct object_file_convert_state *state);
+void convert_object_file_end(struct object_file_convert_state *state, int ret);
+
 /*
  * Convert an object file from one hash algorithm to another algorithm.
  * Return -1 on failure, 0 on success.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 27/32] builtin/fast-import: compute compatibility hashs for imported objects
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (25 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 26/32] object-file-convert: Implement convert_object_file_{begin,step,end} Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 28/32] builtin/index-pack: Add a simple oid index Eric W. Biederman
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

When the code is in dual hash mode for every object fast-import
creates compute the standard oid and it's compatibility mapping.  The
compatibility mapping is stored in struct pack_idx_entry so that it
can be used when an index is created.

For fast-import the code needs to be careful because when a new object
only refers to other newly created objects the compatibility mapping
for those new objects is not stored anywhere permanently.  So have the
code first look the the compatibility oid in the newly created
objects, and then look for the compatibilty oid in the standard
mapping tables.

As fast-import requires objects to be specified before the
objects that reference them nothing special needs to happen
to deal with out of order objects.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/fast-import.c | 89 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 77 insertions(+), 12 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 2c645fcfbe3f..f1c250dd3c8f 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -26,6 +26,8 @@
 #include "commit-reach.h"
 #include "khash.h"
 #include "date.h"
+#include "object-file-convert.h"
+#include "pack-compat-map.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
@@ -775,9 +777,14 @@ static void start_packfile(void)
 	all_packs[pack_id] = p;
 }
 
-static const char *create_index(void)
+struct pack_index_names {
+	const char *index_name;
+	const char *compat_name;
+};
+
+static struct pack_index_names create_index(void)
 {
-	const char *tmpfile;
+	struct pack_index_names tmp = {};
 	struct pack_idx_entry **idx, **c, **last;
 	struct object_entry *e;
 	struct object_entry_pool *o;
@@ -793,13 +800,15 @@ static const char *create_index(void)
 	if (c != last)
 		die("internal consistency error creating the index");
 
-	tmpfile = write_idx_file(NULL, idx, object_count, &pack_idx_opts,
-				 pack_data->hash);
+	tmp.index_name = write_idx_file(NULL, idx, object_count, &pack_idx_opts,
+					pack_data->hash);
+	tmp.compat_name = write_compat_map_file(NULL, idx, object_count,
+						pack_data->hash);
 	free(idx);
-	return tmpfile;
+	return tmp;
 }
 
-static char *keep_pack(const char *curr_index_name)
+static char *keep_pack(struct pack_index_names curr)
 {
 	static const char *keep_msg = "fast-import";
 	struct strbuf name = STRBUF_INIT;
@@ -818,9 +827,17 @@ static char *keep_pack(const char *curr_index_name)
 		die("cannot store pack file");
 
 	odb_pack_name(&name, pack_data->hash, "idx");
-	if (finalize_object_file(curr_index_name, name.buf))
+	if (finalize_object_file(curr.index_name, name.buf))
 		die("cannot store index file");
-	free((void *)curr_index_name);
+
+	if (curr.compat_name) {
+		odb_pack_name(&name, pack_data->hash, "compat");
+		if (finalize_object_file(curr.compat_name, name.buf))
+			die("cannot store compatibility map file");
+	}
+
+	free((void *)curr.index_name);
+	free((void *)curr.compat_name);
 	return strbuf_detach(&name, NULL);
 }
 
@@ -943,6 +960,8 @@ static int store_object(
 	struct object_id *oidout,
 	uintmax_t mark)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	void *out, *delta;
 	struct object_entry *e;
 	unsigned char hdr[96];
@@ -966,8 +985,7 @@ static int store_object(
 	if (e->idx.offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
-	} else if (find_sha1_pack(oid.hash,
-				  get_all_packs(the_repository))) {
+	} else if (find_sha1_pack(oid.hash, get_all_packs(repo))) {
 		e->type = type;
 		e->pack_id = MAX_PACK_ID;
 		e->idx.offset = 1; /* just not zero! */
@@ -1026,6 +1044,42 @@ static int store_object(
 	e->type = type;
 	e->pack_id = pack_id;
 	e->idx.offset = pack_size;
+	if (compat && (type == OBJ_BLOB)) {
+		compat->init_fn(&c);
+		compat->update_fn(&c, hdr, hdrlen);
+		compat->update_fn(&c, dat->buf, dat->len);
+		compat->final_oid_fn(&e->idx.compat_oid, &c);
+	} else if (compat) {
+		struct object_file_convert_state state;
+		struct strbuf out = STRBUF_INIT;
+		int ret;
+
+		convert_object_file_begin(&state, &out, the_hash_algo, compat,
+					  dat->buf, dat->len, type);
+		for (;;) {
+			struct object_entry *pobj;
+
+			convert_object_file_step(&state);
+			if (ret != 1)
+				break;
+
+			ret = -1;
+			pobj = find_object(&state.oid);
+			if (pobj && pobj->idx.compat_oid.algo)
+				oidcpy(&state.mapped_oid, &pobj->idx.compat_oid);
+			else if (pobj)
+				break;
+			else if (repo_oid_to_algop(repo, &state.oid, compat,
+						   &state.mapped_oid))
+				break;
+		}
+		convert_object_file_end(&state, ret);
+		if (ret)
+			die(_("No mapping for %s to %s\n"),
+			    oid_to_hex(&state.oid), compat->name);
+		hash_object_file(compat, out.buf, out.len, type, &e->idx.compat_oid);
+		strbuf_release(&out);
+	}
 	object_count++;
 	object_count_by_type[type]++;
 
@@ -1084,14 +1138,15 @@ static void truncate_pack(struct hashfile_checkpoint *checkpoint)
 
 static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 {
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
 	size_t in_sz = 64 * 1024, out_sz = 64 * 1024;
 	unsigned char *in_buf = xmalloc(in_sz);
 	unsigned char *out_buf = xmalloc(out_sz);
 	struct object_entry *e;
-	struct object_id oid;
+	struct object_id oid, compat_oid;
 	unsigned long hdrlen;
 	off_t offset;
-	git_hash_ctx c;
+	git_hash_ctx c, compat_c;
 	git_zstream s;
 	struct hashfile_checkpoint checkpoint;
 	int status = Z_OK;
@@ -1109,6 +1164,10 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 
 	the_hash_algo->init_fn(&c);
 	the_hash_algo->update_fn(&c, out_buf, hdrlen);
+	if (compat) {
+		compat->init_fn(&compat_c);
+		compat->update_fn(&compat_c, out_buf, hdrlen);
+	}
 
 	crc32_begin(pack_file);
 
@@ -1127,6 +1186,8 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 				die("EOF in data (%" PRIuMAX " bytes remaining)", len);
 
 			the_hash_algo->update_fn(&c, in_buf, n);
+			if (compat)
+				compat->update_fn(&compat_c, in_buf, n);
 			s.next_in = in_buf;
 			s.avail_in = n;
 			len -= n;
@@ -1153,6 +1214,8 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 	}
 	git_deflate_end(&s);
 	the_hash_algo->final_oid_fn(&oid, &c);
+	if (compat)
+		compat->final_oid_fn(&compat_oid, &compat_c);
 
 	if (oidout)
 		oidcpy(oidout, &oid);
@@ -1180,6 +1243,8 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 		e->pack_id = pack_id;
 		e->idx.offset = offset;
 		e->idx.crc32 = crc32_end(pack_file);
+		if (compat)
+			oidcpy(&e->idx.compat_oid, &compat_oid);
 		object_count++;
 		object_count_by_type[OBJ_BLOB]++;
 	}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 28/32] builtin/index-pack:  Add a simple oid index
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (26 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 27/32] builtin/fast-import: compute compatibility hashs for imported objects Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 29/32] builtin/index-pack: Compute the compatibility hash Eric W. Biederman
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

To support computing the compatibility hash a way to lookup objects by
their oid is needed.  This adds a simple hash table to enable looking
up objects by their oid.  The implementation is inspired by the hash
table for looking up object_entries by their oid in struct packing_data,
and implemented in pack-objects.c

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/index-pack.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 006ffdc9c550..75c2113e455c 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -126,6 +126,9 @@ static int ref_deltas_alloc;
 static int nr_resolved_deltas;
 static int nr_threads;
 
+static int32_t *oid_index;
+static uint32_t oid_index_size;
+
 static int from_stdin;
 static int strict;
 static int do_fsck_object;
@@ -183,6 +186,62 @@ static inline void unlock_mutex(pthread_mutex_t *mutex)
 		pthread_mutex_unlock(mutex);
 }
 
+static uint32_t locate_oid_index(const struct object_id *oid, int *found)
+{
+	uint32_t i, mask = (oid_index_size - 1);
+
+	i = oidhash(oid) & mask;
+
+	while (oid_index[i] > 0) {
+		uint32_t pos = oid_index[i] - 1;
+
+		if (oideq(oid, &objects[pos].idx.oid)) {
+			*found = 1;
+			return i;
+		}
+
+		i = (i + 1) & mask;
+	}
+
+	*found = 0;
+	return i;
+}
+
+static void place_in_oid_index(struct object_entry *obj)
+{
+	int found;
+	uint32_t pos = locate_oid_index(&obj->idx.oid, &found);
+
+	/* Ignore duplicates */
+	if (found)
+		return;
+
+	oid_index[pos] = (obj - objects) + 1;
+}
+
+static struct object_entry *find_in_oid_index(struct object_id *oid)
+{
+	uint32_t i;
+	int found;
+
+	i = locate_oid_index(oid, &found);
+	if (!found)
+		return NULL;
+
+	return &objects[oid_index[i] - 1];
+}
+
+static inline uint32_t closest_pow2(uint32_t v)
+{
+	v = v - 1;
+	v |= v >> 1;
+	v |= v >> 2;
+	v |= v >> 4;
+	v |= v >> 8;
+	v |= v >> 16;
+	return v + 1;
+}
+
 /*
  * Mutex and conditional variable can't be statically-initialized on Windows.
  */
@@ -987,6 +1046,7 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		bad_object(delta_obj->idx.offset, _("failed to apply delta"));
 	hash_object_file(the_hash_algo, result_data, result_size,
 			 delta_obj->real_type, &delta_obj->idx.oid);
+	place_in_oid_index(delta_obj);
 	sha1_object(result_data, NULL, result_size, delta_obj->real_type,
 		    &delta_obj->idx.oid);
 
@@ -1188,12 +1248,16 @@ static void parse_pack_objects(unsigned char *hash)
 			ref_deltas[nr_ref_deltas].obj_no = i;
 			nr_ref_deltas++;
 		} else if (!data) {
+			place_in_oid_index(obj);
+
 			/* large blobs, check later */
 			obj->real_type = OBJ_BAD;
 			nr_delays++;
-		} else
+		} else {
+			place_in_oid_index(obj);
 			sha1_object(data, NULL, obj->size, obj->type,
 				    &obj->idx.oid);
+		}
 		free(data);
 		display_progress(progress, i+1);
 	}
@@ -1918,6 +1982,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (show_stat)
 		CALLOC_ARRAY(obj_stat, st_add(nr_objects, 1));
 	CALLOC_ARRAY(ofs_deltas, nr_objects);
+	oid_index_size = closest_pow2(nr_objects * 3);
+	CALLOC_ARRAY(oid_index, oid_index_size);
 	parse_pack_objects(pack_hash);
 	if (report_end_of_input)
 		write_in_full(2, "\0", 1);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 29/32] builtin/index-pack:  Compute the compatibility hash
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (27 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 28/32] builtin/index-pack: Add a simple oid index Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 30/32] builtin/index-pack: Make the stack in compute_compat_oid explicit Eric W. Biederman
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

When a pack is received encoded with the same algorithm as our
repository it is necessary to compute it's hash values and create it's
indexes.  That is the job of "git index-pack".  To compute the primary
hash values of the objects, the objects must be loaded in memory.
With the objects loaded into memory this is the perfect time to also
compute the compatiblity hash values of the objects as loading the
objects into memory is the primary cost of that operation.

This is limited by the the fact that to compute the compatiblity hash
for tree objects, commit objects, and tag objects the objects need to
encoded into their compatbiilty form which requires replacing
references to objects encoded with the primary hash to references to
the same objects encoded with the compatibility hash.

Which means that before the compatibility hash for a tree object,
commit object or tag object can be computed the compatibility hash
for all objects to which they refer must be computed first.

In general this requires an extra pass so that the dependencies between
objects can be resolved.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/index-pack.c | 335 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 328 insertions(+), 7 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 75c2113e455c..f5da671ed82d 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -18,12 +18,15 @@
 #include "thread-utils.h"
 #include "packfile.h"
 #include "pack-revindex.h"
+#include "pack-compat-map.h"
 #include "object-file.h"
 #include "object-store-ll.h"
 #include "oid-array.h"
 #include "replace-object.h"
 #include "promisor-remote.h"
 #include "setup.h"
+#include "strbuf.h"
+#include "object-file-convert.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -124,6 +127,8 @@ static int nr_ofs_deltas;
 static int nr_ref_deltas;
 static int ref_deltas_alloc;
 static int nr_resolved_deltas;
+static int nr_pending_mappings;
+static int nr_resolved_mappings;
 static int nr_threads;
 
 static int32_t *oid_index;
@@ -505,28 +510,76 @@ static void prune_base_data(struct base_data *retain)
 	}
 }
 
+static int compat_hash_object_file(const void *buf, size_t len, enum object_type type,
+				   struct object_id *oid)
+{
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct object_file_convert_state state;
+	struct strbuf out = STRBUF_INIT;
+	int ret;
+
+	convert_object_file_begin(&state, &out, algo, compat,
+				  buf, len, type);
+	for (;;) {
+		struct object_entry *pobj;
+		ret = convert_object_file_step(&state);
+		if (ret != 1)
+			break;
+
+		pobj = find_in_oid_index(&state.oid);
+
+		ret = -1;
+		if (pobj && pobj->idx.compat_oid.algo)
+			oidcpy(&state.mapped_oid, &pobj->idx.compat_oid);
+		else if (pobj)
+			break;
+		else if (repo_oid_to_algop(repo, &state.oid, compat,
+					   &state.mapped_oid))
+			break;
+	}
+	convert_object_file_end(&state, ret);
+	if (ret == 0) {
+		hash_object_file(compat, out.buf, out.len, type, oid);
+		strbuf_release(&out);
+	}
+	return ret;
+}
+
 static int is_delta_type(enum object_type type)
 {
 	return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
 }
 
 static void *unpack_entry_data(off_t offset, unsigned long size,
-			       enum object_type type, struct object_id *oid)
+			       enum object_type type, struct object_id *oid,
+			       struct object_id *compat_oid)
 {
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
 	static char fixed_buf[8192];
 	int status;
 	git_zstream stream;
 	void *buf;
-	git_hash_ctx c;
+	git_hash_ctx c, compat_c;
 	char hdr[32];
 	int hdrlen;
 
+	if (!compat)
+		compat_oid = NULL;
+
 	if (!is_delta_type(type)) {
 		hdrlen = format_object_header(hdr, sizeof(hdr), type, size);
 		the_hash_algo->init_fn(&c);
 		the_hash_algo->update_fn(&c, hdr, hdrlen);
-	} else
+		if (compat_oid && (type == OBJ_BLOB)) {
+			compat->init_fn(&compat_c);
+			compat->update_fn(&compat_c, hdr, hdrlen);
+		}
+	} else {
 		oid = NULL;
+		compat_oid = NULL;
+	}
 	if (type == OBJ_BLOB && size > big_file_threshold)
 		buf = fixed_buf;
 	else
@@ -545,6 +598,8 @@ static void *unpack_entry_data(off_t offset, unsigned long size,
 		use(input_len - stream.avail_in);
 		if (oid)
 			the_hash_algo->update_fn(&c, last_out, stream.next_out - last_out);
+		if (compat_oid && (type == OBJ_BLOB))
+			compat->update_fn(&compat_c, last_out, stream.next_out - last_out);
 		if (buf == fixed_buf) {
 			stream.next_out = buf;
 			stream.avail_out = sizeof(fixed_buf);
@@ -555,13 +610,20 @@ static void *unpack_entry_data(off_t offset, unsigned long size,
 	git_inflate_end(&stream);
 	if (oid)
 		the_hash_algo->final_oid_fn(oid, &c);
+	if (compat_oid && (type == OBJ_BLOB))
+		compat->final_oid_fn(compat_oid, &compat_c);
+	else if (compat_oid &&
+		 compat_hash_object_file(buf, size, type, compat_oid)) {
+		nr_pending_mappings++;
+	}
 	return buf == fixed_buf ? NULL : buf;
 }
 
 static void *unpack_raw_entry(struct object_entry *obj,
 			      off_t *ofs_offset,
 			      struct object_id *ref_oid,
-			      struct object_id *oid)
+			      struct object_id *oid,
+			      struct object_id *compat_oid)
 {
 	unsigned char *p;
 	unsigned long size, c;
@@ -620,7 +682,8 @@ static void *unpack_raw_entry(struct object_entry *obj,
 	}
 	obj->hdr_size = consumed_bytes - obj->idx.offset;
 
-	data = unpack_entry_data(obj->idx.offset, obj->size, obj->type, oid);
+	data = unpack_entry_data(obj->idx.offset, obj->size, obj->type, oid,
+				 compat_oid);
 	obj->idx.crc32 = input_crc32;
 	return data;
 }
@@ -1023,9 +1086,11 @@ static struct base_data *make_base(struct object_entry *obj,
 static struct base_data *resolve_delta(struct object_entry *delta_obj,
 				       struct base_data *base)
 {
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
 	void *delta_data, *result_data;
 	struct base_data *result;
 	unsigned long result_size;
+	int pending_map = 0;
 
 	if (show_stat) {
 		int i = delta_obj - objects;
@@ -1046,6 +1111,16 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		bad_object(delta_obj->idx.offset, _("failed to apply delta"));
 	hash_object_file(the_hash_algo, result_data, result_size,
 			 delta_obj->real_type, &delta_obj->idx.oid);
+	if (compat && (delta_obj->real_type == OBJ_BLOB))
+		hash_object_file(compat, result_data, result_size,
+				 delta_obj->real_type, &delta_obj->idx.compat_oid);
+	else if (compat &&
+		 compat_hash_object_file(result_data, result_size,
+					 delta_obj->real_type,
+					 &delta_obj->idx.compat_oid)) {
+		pending_map = 1;
+	}
+
 	place_in_oid_index(delta_obj);
 	sha1_object(result_data, NULL, result_size, delta_obj->real_type,
 		    &delta_obj->idx.oid);
@@ -1056,6 +1131,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 
 	counter_lock();
 	nr_resolved_deltas++;
+	if (pending_map)
+		nr_pending_mappings++;
 	counter_unlock();
 
 	return result;
@@ -1236,7 +1313,8 @@ static void parse_pack_objects(unsigned char *hash)
 		struct object_entry *obj = &objects[i];
 		void *data = unpack_raw_entry(obj, &ofs_delta->offset,
 					      &ref_delta_oid,
-					      &obj->idx.oid);
+					      &obj->idx.oid,
+					      &obj->idx.compat_oid);
 		obj->real_type = obj->type;
 		if (obj->type == OBJ_OFS_DELTA) {
 			nr_ofs_deltas++;
@@ -1578,6 +1656,7 @@ static void rename_tmp_packfile(const char **final_name,
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
 		  const char *final_rev_index_name, const char *curr_rev_index_name,
+		  const char *final_compat_index_name, const char *curr_compat_index_name,
 		  const char *keep_msg, const char *promisor_msg,
 		  unsigned char *hash)
 {
@@ -1585,6 +1664,7 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	struct strbuf pack_name = STRBUF_INIT;
 	struct strbuf index_name = STRBUF_INIT;
 	struct strbuf rev_index_name = STRBUF_INIT;
+	struct strbuf compat_index_name = STRBUF_INIT;
 	int err;
 
 	if (!from_stdin) {
@@ -1608,6 +1688,9 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	if (curr_rev_index_name)
 		rename_tmp_packfile(&final_rev_index_name, curr_rev_index_name,
 				    &rev_index_name, hash, "rev", 1);
+	if (curr_compat_index_name)
+		rename_tmp_packfile(&final_compat_index_name, curr_compat_index_name,
+				    &compat_index_name, hash, "compat", 1);
 	rename_tmp_packfile(&final_index_name, curr_index_name, &index_name,
 			    hash, "idx", 1);
 
@@ -1640,6 +1723,7 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 		}
 	}
 
+	strbuf_release(&compat_index_name);
 	strbuf_release(&rev_index_name);
 	strbuf_release(&index_name);
 	strbuf_release(&pack_name);
@@ -1789,16 +1873,236 @@ static void show_pack_info(int stat_only)
 	free(chain_histogram);
 }
 
+static int compare_ofs_delta_entry_obj_no(const void *a, const void *b)
+{
+	const struct ofs_delta_entry *delta_a = a;
+	const struct ofs_delta_entry *delta_b = b;
+
+	return delta_a->obj_no < delta_b->obj_no ? -1 :
+	       delta_a->obj_no > delta_b->obj_no ?  1 :
+	       0;
+}
+
+static int compare_ref_delta_entry_obj_no(const void *a, const void *b)
+{
+	const struct ref_delta_entry *delta_a = a;
+	const struct ref_delta_entry *delta_b = b;
+
+	return delta_a->obj_no < delta_b->obj_no ? -1 :
+	       delta_a->obj_no > delta_b->obj_no ?  1 :
+	       0;
+}
+
+static struct ofs_delta_entry *find_ofs_delta_obj_no(int obj_no)
+{
+	int first = 0, last = nr_ofs_deltas;
+
+	while (first < last) {
+		int next = first + (last - first) / 2;
+		struct ofs_delta_entry *entry = &ofs_deltas[next];
+
+		if (obj_no == entry->obj_no)
+			return entry;
+		if (obj_no < entry->obj_no) {
+			last = next;
+			continue;
+		}
+		first = next + 1;
+	}
+	return NULL;
+}
+
+static struct ref_delta_entry *find_ref_delta_obj_no(int obj_no)
+{
+	int first = 0, last = nr_ref_deltas;
+
+	while (first < last) {
+		int next = first + (last - first) / 2;
+		struct ref_delta_entry *entry = &ref_deltas[next];
+
+		if (obj_no == entry->obj_no)
+			return entry;
+		if (obj_no < entry->obj_no) {
+			last = next;
+			continue;
+		}
+		first = next + 1;
+	}
+	return NULL;
+}
+
+static struct object_entry *find_obj_offset(off_t offset)
+{
+	int first = 0, last = nr_objects;
+
+	while (first < last) {
+		int next = first + (last - first) / 2;
+		struct object_entry *entry = &objects[next];
+
+		if (offset == entry->idx.offset)
+			return entry;
+		if (offset < entry->idx.offset) {
+			last = next;
+			continue;
+		}
+		first = next + 1;
+	}
+	return NULL;
+}
+
+static void *get_object_data(struct object_entry *obj, size_t *result_size)
+{
+	/* Allow random reading objects */
+	void *data;
+
+	if (!is_delta_type(obj->type)) {
+		data = get_data_from_pack(obj);
+		*result_size = obj->size;
+		return data;
+	}
+	if (obj->type == OBJ_OFS_DELTA) {
+		struct ofs_delta_entry *delta;
+		struct object_entry *bobj;
+		size_t base_size;
+		void *base, *raw;
+
+		delta = find_ofs_delta_obj_no(obj - objects);
+		if (!delta)
+			BUG("Delta object without ofs_delta entry");
+
+		bobj = find_obj_offset(delta->offset);
+		if (!bobj)
+			BUG("Delta object without object entry");
+
+		base = get_object_data(bobj, &base_size);
+		raw = get_data_from_pack(obj);
+		data = patch_delta(
+			base, base_size,
+			raw, obj->size,
+			result_size);
+		if (!data)
+			BUG("patch_delta failed");
+		free(raw);
+		free(base);
+		return data;
+	}
+	if (obj->type == OBJ_REF_DELTA) {
+		struct ref_delta_entry *delta;
+		enum object_type base_type;
+		size_t base_size;
+		void *base, *raw;
+
+		delta = find_ref_delta_obj_no(obj - objects);
+		if (!delta)
+			BUG("Delta object without ref_delta entry");
+
+		base = repo_read_object_file(the_repository, &delta->oid,
+					     &base_type, &base_size);
+		if (!base)
+			BUG("ref_delta oid %s not present in repository",
+			    oid_to_hex(&delta->oid));
+		raw = get_data_from_pack(obj);
+		data = patch_delta(
+			base, base_size,
+			raw, obj->size,
+			result_size);
+		if (!data)
+			BUG("patch_delta failed");
+		free(raw);
+		free(base);
+		return data;
+	}
+	return NULL; /* The code never reaches here */
+}
+
+static void compute_compat_oid(struct object_entry *obj)
+{
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct object_file_convert_state state;
+	struct strbuf out = STRBUF_INIT;
+	size_t data_size;
+	void *data;
+	int ret;
+
+	if (obj->idx.compat_oid.algo)
+		return;
+
+	if (obj->real_type == OBJ_BLOB)
+		die("Blob object not converted");
+
+	data = get_object_data(obj, &data_size);
+
+	convert_object_file_begin(&state, &out, algo, compat,
+				  data, data_size, obj->real_type);
+
+	for (;;) {
+		struct object_entry *pobj;
+		ret = convert_object_file_step(&state);
+		if (ret != 1)
+			break;
+		/* Does it name an object in the pack? */
+		pobj = find_in_oid_index(&state.oid);
+		if (pobj) {
+			compute_compat_oid(pobj);
+			oidcpy(&state.mapped_oid, &pobj->idx.compat_oid);
+		} else if (repo_oid_to_algop(repo, &state.oid, compat,
+					     &state.mapped_oid))
+			die(_("No mapping for oid %s to %s\n"),
+			    oid_to_hex(&state.oid), compat->name);
+	}
+	convert_object_file_end(&state, ret);
+	if (ret != 0)
+		die(_("Bad object %s\n"), oid_to_hex(&obj->idx.oid));
+	hash_object_file(compat, out.buf, out.len, obj->real_type,
+			 &obj->idx.compat_oid);
+	strbuf_release(&out);
+
+	free(data);
+
+	nr_resolved_mappings++;
+	display_progress(progress, nr_resolved_mappings);
+}
+
+static void compute_compat_oids(void)
+{
+	unsigned i;
+
+	if (verbose)
+		progress = start_progress(_("Mapping objects"),
+			nr_pending_mappings);
+
+	/* Sort deltas by obj_no for fast searching */
+	QSORT(ofs_deltas, nr_ofs_deltas, compare_ofs_delta_entry_obj_no);
+	QSORT(ref_deltas, nr_ref_deltas, compare_ref_delta_entry_obj_no);
+
+	for (i = 0; i < nr_objects; i++) {
+		struct object_entry *obj = &objects[i];
+		if (obj->idx.compat_oid.algo)
+			continue;
+		if (is_delta_type(obj->real_type))
+			continue;
+		compute_compat_oid(obj);
+	}
+
+	stop_progress(&progress);
+}
+
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
 	const char *curr_index;
 	const char *curr_rev_index = NULL;
+	const char *curr_compat_index = NULL;
 	const char *index_name = NULL, *pack_name = NULL, *rev_index_name = NULL;
+	const char *compat_index_name = NULL;
 	const char *keep_msg = NULL;
 	const char *promisor_msg = NULL;
 	struct strbuf index_name_buf = STRBUF_INIT;
 	struct strbuf rev_index_name_buf = STRBUF_INIT;
+	struct strbuf compat_index_name_buf = STRBUF_INIT;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_hash[GIT_MAX_RAWSZ];
@@ -1946,6 +2250,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 							 "idx", "rev",
 							 &rev_index_name_buf);
 	}
+	if (compat) {
+		if (index_name)
+			compat_index_name = derive_filename(index_name,
+							    "idx", "compat",
+							    &compat_index_name_buf);
+	}
 
 	if (verify) {
 		if (!index_name)
@@ -1989,6 +2299,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		write_in_full(2, "\0", 1);
 	resolve_deltas();
 	conclude_pack(fix_thin_pack, curr_pack, pack_hash);
+	if (compat)
+		compute_compat_oids();
 	free(ofs_deltas);
 	free(ref_deltas);
 	if (strict)
@@ -1999,18 +2311,24 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	ALLOC_ARRAY(idx_objects, nr_objects);
 	for (i = 0; i < nr_objects; i++)
-		idx_objects[i] = &objects[i].idx;
+		idx_objects[i] = (struct pack_idx_entry *)&objects[i].idx;
 	curr_index = write_idx_file(index_name, idx_objects, nr_objects, &opts, pack_hash);
 	if (rev_index)
 		curr_rev_index = write_rev_file(rev_index_name, idx_objects,
 						nr_objects, pack_hash,
 						opts.flags);
+
+	if (compat)
+		curr_compat_index = write_compat_map_file(
+			(opts.flags & WRITE_IDX_VERIFY) ? compat_index_name : NULL,
+			idx_objects, nr_objects, pack_hash);
 	free(idx_objects);
 
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
 		      rev_index_name, curr_rev_index,
+		      compat_index_name, curr_compat_index,
 		      keep_msg, promisor_msg,
 		      pack_hash);
 	else
@@ -2023,12 +2341,15 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	free(objects);
 	strbuf_release(&index_name_buf);
 	strbuf_release(&rev_index_name_buf);
+	strbuf_release(&compat_index_name_buf);
 	if (!pack_name)
 		free((void *) curr_pack);
 	if (!index_name)
 		free((void *) curr_index);
 	if (!rev_index_name)
 		free((void *) curr_rev_index);
+	if (!compat_index_name)
+		free((void *) curr_compat_index);
 
 	/*
 	 * Let the caller know this pack is not self contained
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 30/32] builtin/index-pack: Make the stack in compute_compat_oid explicit
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (28 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 29/32] builtin/index-pack: Compute the compatibility hash Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 31/32] unpack-objects: Update to compute and write the compatibility hashes Eric W. Biederman
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

Testing index-pack generating the compatibilty hashes on a large
repository (in this case the linux kernel) resulted in a stack
overflow.  I confirmed this by using ulimit -s to force the stack to a
much larger size, and rerunning the test and the code succeeded.
Still it is not a good look to overflow the stack in the default
configuration.

Ideally the objects would be ordered such that no object has any
references to any object that comes after it.  With such an ordering
convert_object_file followed by hash_object_file to could just be on
every object in order to compute the compatibility hashes for every
object.

Unfortunately the work to compute such an order is roughly equaivalent
to the depth first processing compute_compat_oid is doing.  The
objects have to be loaded to get which other objects they reference.
Knowning which objects reference which others is necessary to compute
such an order.

Long story short I can see how to move the depth first traversal into
a topological sort, but that just moves the problem that caused the
deep recursion into another function, and makes everything more
expensive by requiring reading the objects yet another time.

Avoid stack overflow by using an explicitly stack made of heap
allocated objects instead of using the C call stack.

To get a feel for how much this explicit stack consumes I instrumented
up the code.  Testing against a linux kernel 2.16GiB packfile.  This
packfile had 9,033,248 objects, and 7,470,317 deltas.  There were
6,543,758 mappings that cound not be computed opportunistically when
the data was first read.  In the function compute_compat_oid I
measured a maximum cco stack depth of 66,415.  I measured a maximum
memory consumption of 103,783,520 bytes, or about 1563 bytes per level
of the stack. In short call it 100MiB extra to compute the mappings in
a 2GiB packfile.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/index-pack.c | 106 +++++++++++++++++++++++++++++--------------
 1 file changed, 71 insertions(+), 35 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f5da671ed82d..6827d14b91ce 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -2015,54 +2015,90 @@ static void *get_object_data(struct object_entry *obj, size_t *result_size)
 	return NULL; /* The code never reaches here */
 }
 
-static void compute_compat_oid(struct object_entry *obj)
-{
-	struct repository *repo = the_repository;
-	const struct git_hash_algo *algo = repo->hash_algo;
-	const struct git_hash_algo *compat = repo->compat_hash_algo;
+struct cco {
+	struct cco *prev;
+	struct object_entry *obj;
 	struct object_file_convert_state state;
-	struct strbuf out = STRBUF_INIT;
+	struct strbuf out;
 	size_t data_size;
 	void *data;
-	int ret;
+};
 
-	if (obj->idx.compat_oid.algo)
-		return;
+static struct cco *cco_push(struct cco *prev, struct object_entry *obj)
+{
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct cco *cco;
 
 	if (obj->real_type == OBJ_BLOB)
-		die("Blob object not converted");
+		BUG("Blob object not converted");
 
-	data = get_object_data(obj, &data_size);
+	cco = xmallocz(sizeof(*cco));
+	cco->prev = prev;
+	cco->obj = obj;
+	strbuf_init(&cco->out, 0);
 
-	convert_object_file_begin(&state, &out, algo, compat,
-				  data, data_size, obj->real_type);
+	cco->data = get_object_data(obj, &cco->data_size);
 
-	for (;;) {
-		struct object_entry *pobj;
-		ret = convert_object_file_step(&state);
-		if (ret != 1)
-			break;
-		/* Does it name an object in the pack? */
-		pobj = find_in_oid_index(&state.oid);
-		if (pobj) {
-			compute_compat_oid(pobj);
-			oidcpy(&state.mapped_oid, &pobj->idx.compat_oid);
-		} else if (repo_oid_to_algop(repo, &state.oid, compat,
-					     &state.mapped_oid))
-			die(_("No mapping for oid %s to %s\n"),
-			    oid_to_hex(&state.oid), compat->name);
-	}
-	convert_object_file_end(&state, ret);
-	if (ret != 0)
-		die(_("Bad object %s\n"), oid_to_hex(&obj->idx.oid));
-	hash_object_file(compat, out.buf, out.len, obj->real_type,
-			 &obj->idx.compat_oid);
-	strbuf_release(&out);
+	convert_object_file_begin(&cco->state, &cco->out, algo, compat,
+				  cco->data, cco->data_size, obj->real_type);
+	return cco;
+}
 
-	free(data);
+static struct cco *cco_pop(struct cco *cco, int ret)
+{
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct cco *prev = cco->prev;
+
+	convert_object_file_end(&cco->state, ret);
+	if (ret != 0)
+		die(_("Bad object %s\n"), oid_to_hex(&cco->obj->idx.oid));
+	hash_object_file(compat, cco->out.buf, cco->out.len,
+			 cco->obj->real_type, &cco->obj->idx.compat_oid);
+	strbuf_release(&cco->out);
+	if (prev)
+		oidcpy(&prev->state.mapped_oid, &cco->obj->idx.compat_oid);
 
 	nr_resolved_mappings++;
 	display_progress(progress, nr_resolved_mappings);
+
+	free(cco->data);
+	free(cco);
+
+	return prev;
+}
+
+static void compute_compat_oid(struct object_entry *obj)
+{
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct cco *cco;
+
+	cco = cco_push(NULL, obj);
+	for (;cco;) {
+		struct object_entry *pobj;
+
+		int ret = convert_object_file_step(&cco->state);
+		if (ret != 1) {
+			cco = cco_pop(cco, ret);
+			continue;
+		}
+
+		/* Does it name an object in the pack? */
+		pobj = find_in_oid_index(&cco->state.oid);
+		if (pobj && pobj->idx.compat_oid.algo)
+			oidcpy(&cco->state.mapped_oid, &pobj->idx.compat_oid);
+		else if (pobj)
+			cco = cco_push(cco, pobj);
+		else if (repo_oid_to_algop(repo, &cco->state.oid, compat,
+					   &cco->state.mapped_oid))
+			die(_("When converting %s no mapping for oid %s to %s\n"),
+			    oid_to_hex(&cco->obj->idx.oid),
+			    oid_to_hex(&cco->state.oid),
+			    compat->name);
+	}
 }
 
 static void compute_compat_oids(void)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 31/32] unpack-objects: Update to compute and write the compatibility hashes
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (29 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 30/32] builtin/index-pack: Make the stack in compute_compat_oid explicit Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-08 23:10 ` [PATCH 32/32] object-file-convert: Implement repo_submodule_oid_to_algop Eric W. Biederman
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

To properly generate the compatibility hash objects that are referred
to must be written before the objects that refer to them.  When
--strict is set the unpack-objects already writes objects in that
order.

When a compatibilty hash is desired force use of the same code
path that --strict uses.   If --strict is not wanted don't
actually fsck the object buffers, just use fsck_walk to
walk to the parents of the objects recursively.

Unlike in index-pack nothing special needs to be done when an object
is written.  The guarantee that referred to objects are written to the
loose object store before their refers ensures that the object
mappings are in the loose object map.  The object mapings being in the
loose object map guarantees that the call to convert_object_file can
find all of the mappings of the referred to objects.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/unpack-objects.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 32505255a009..834551142cd8 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -241,7 +241,8 @@ static int check_object(struct object *obj, enum object_type type,
 	obj_buf = lookup_object_buffer(obj);
 	if (!obj_buf)
 		die("Whoops! Cannot find object '%s'", oid_to_hex(&obj->oid));
-	if (fsck_object(obj, obj_buf->buffer, obj_buf->size, &fsck_options))
+	if (strict &&
+	    fsck_object(obj, obj_buf->buffer, obj_buf->size, &fsck_options))
 		die("fsck error in packed object");
 	fsck_options.walk = check_object;
 	if (fsck_walk(obj, NULL, &fsck_options))
@@ -270,7 +271,7 @@ static void added_object(unsigned nr, enum object_type type,
 static void write_object(unsigned nr, enum object_type type,
 			 void *buf, unsigned long size)
 {
-	if (!strict) {
+	if (!strict && !the_repository->compat_hash_algo) {
 		if (write_object_file(buf, size, type,
 				      &obj_list[nr].oid) < 0)
 			die("failed to write object");
@@ -409,7 +410,7 @@ static void stream_blob(unsigned long size, unsigned nr)
 		die(_("inflate returned (%d)"), data.status);
 	git_inflate_end(&zstream);
 
-	if (strict) {
+	if (strict || the_repository->compat_hash_algo) {
 		struct blob *blob = lookup_blob(the_repository, &info->oid);
 
 		if (!blob)
@@ -670,11 +671,10 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
 	unpack_all();
 	the_hash_algo->update_fn(&ctx, buffer, offset);
 	the_hash_algo->final_oid_fn(&oid, &ctx);
-	if (strict) {
+	if (strict || the_repository->compat_hash_algo)
 		write_rest();
-		if (fsck_finish(&fsck_options))
-			die(_("fsck error in pack objects"));
-	}
+	if (strict && fsck_finish(&fsck_options))
+		die(_("fsck error in pack objects"));
 	if (!hasheq(fill(the_hash_algo->rawsz), oid.hash))
 		die("final sha1 did not match");
 	use(the_hash_algo->rawsz);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 32/32] object-file-convert: Implement repo_submodule_oid_to_algop
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (30 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 31/32] unpack-objects: Update to compute and write the compatibility hashes Eric W. Biederman
@ 2023-09-08 23:10 ` Eric W. Biederman
  2023-09-09 12:58 ` [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-08 23:10 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson, Eric W. Biederman

From time to time git tree objects contain gitlinks.  These gitlinks
contain the oid of an object in another git repository.  To
succesfully translate these oids it is necessary to look at the
mapping tables in the submodules where the mapping tables live.

Limiting myself to submodule interfaces I can see in the code
repo_submodule_oid_to_algop is the best I can figure out how to do,
for a gitlink agnostic implementation.

The big downsides are that the code as implemented is not thread
safe, it depends upon a worktree, and it always walks through
all of the submodules.

There are interfaces in the code to lookup the submodule for an
individual gitlink.  As such iterating all of the submodules could be
avoided if care was taken to compute the path to the gitlink and to
recognizes the code is translating a gitlink.

The dependency on a worktree, and the thread safety issues
I do not see a solution to short of reworking how git
deals with submodules.

For now repo_oid_to_algop does not call repo_submodule_oid_to_algop to
allow avoiding the thread safety issues.

Update callers of repo_oid_to_algop that can benefit from a submodule
translation to also call repo_sumodule_oid_to_algop.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/fast-import.c |  5 ++++-
 builtin/index-pack.c  |  4 +++-
 object-file-convert.c | 45 +++++++++++++++++++++++++++++++++++++++++++
 object-file-convert.h |  5 +++++
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index f1c250dd3c8f..66c471bc730e 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -1070,7 +1070,10 @@ static int store_object(
 			else if (pobj)
 				break;
 			else if (repo_oid_to_algop(repo, &state.oid, compat,
-						   &state.mapped_oid))
+						   &state.mapped_oid) &&
+				 repo_submodule_oid_to_algop(repo, &state.oid,
+							     compat,
+							     &state.mapped_oid))
 				break;
 		}
 		convert_object_file_end(&state, ret);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 6827d14b91ce..4100fd56a845 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -2093,7 +2093,9 @@ static void compute_compat_oid(struct object_entry *obj)
 		else if (pobj)
 			cco = cco_push(cco, pobj);
 		else if (repo_oid_to_algop(repo, &cco->state.oid, compat,
-					   &cco->state.mapped_oid))
+					   &cco->state.mapped_oid) &&
+			 repo_submodule_oid_to_algop(repo, &cco->state.oid, compat,
+						     &cco->state.mapped_oid))
 			die(_("When converting %s no mapping for oid %s to %s\n"),
 			    oid_to_hex(&cco->obj->idx.oid),
 			    oid_to_hex(&cco->state.oid),
diff --git a/object-file-convert.c b/object-file-convert.c
index 3fd080ebc112..2306e17dd57e 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -11,6 +11,45 @@
 #include "gpg-interface.h"
 #include "pack-compat-map.h"
 #include "object-file-convert.h"
+#include "read-cache.h"
+#include "submodule-config.h"
+
+int repo_submodule_oid_to_algop(struct repository *repo,
+				const struct object_id *src,
+				const struct git_hash_algo *to,
+				struct object_id *dest)
+{
+	int i;
+
+	if (repo_read_index(repo) < 0)
+		die(_("index file corrupt"));
+
+	for (i = 0; i < repo->index->cache_nr; i++) {
+		const struct cache_entry *ce = repo->index->cache[i];
+		struct repository subrepo = {};
+		int ret;
+
+		if (!S_ISGITLINK(ce->ce_mode))
+			continue;
+
+		while (i + 1 < repo->index->cache_nr &&
+		       !strcmp(ce->name, repo->index->cache[i + 1]->name))
+			/*
+			 * Skip entries with the same name in different stages
+			 * to make sure an entry is returned only once.
+			 */
+			i++;
+
+		if (repo_submodule_init(&subrepo, repo, ce->name, null_oid()))
+			continue;
+
+		ret = repo_oid_to_algop(&subrepo, src, to, dest);
+		repo_clear(&subrepo);
+		if (ret == 0)
+			return 0;
+	}
+	return -1;
+}
 
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 		      const struct git_hash_algo *to, struct object_id *dest)
@@ -34,6 +73,7 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 		 */
 		if (!repo_packed_oid_to_algop(repo, src, to, dest))
 			return 0;
+
 		/*
 		 * We may have loaded the object map at repo initialization but
 		 * another process (perhaps upstream of a pipe from us) may have
@@ -306,6 +346,11 @@ int convert_object_file(struct strbuf *outbuf,
 			break;
 		ret = repo_oid_to_algop(the_repository, &state.oid, state.to,
 					&state.mapped_oid);
+		if (ret)
+			ret = repo_submodule_oid_to_algop(the_repository,
+							  &state.oid,
+							  state.to,
+							  &state.mapped_oid);
 		if (ret) {
 			error(_("failed to map %s entry for %s"),
 			      type_name(type), oid_to_hex(&state.oid));
diff --git a/object-file-convert.h b/object-file-convert.h
index da032d7a91ef..7a19feda5f0c 100644
--- a/object-file-convert.h
+++ b/object-file-convert.h
@@ -10,6 +10,11 @@ struct strbuf;
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 		      const struct git_hash_algo *to, struct object_id *dest);
 
+int repo_submodule_oid_to_algop(struct repository *repo,
+				const struct object_id *src,
+				const struct git_hash_algo *to,
+				struct object_id *dest);
+
 struct object_file_convert_state {
 	struct strbuf *outbuf;
 	const struct git_hash_algo *from;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (31 preceding siblings ...)
  2023-09-08 23:10 ` [PATCH 32/32] object-file-convert: Implement repo_submodule_oid_to_algop Eric W. Biederman
@ 2023-09-09 12:58 ` Eric W. Biederman
  2023-09-10 15:38 ` brian m. carlson
  2023-09-11  6:37 ` Junio C Hamano
  34 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-09 12:58 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, brian m. carlson


I forgot to mention the patches are against 2.42.

Eric

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256
  2023-09-08 23:10 ` [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256 Eric W. Biederman
@ 2023-09-10 14:24   ` brian m. carlson
  2023-09-10 18:07     ` Eric W. Biederman
  2023-09-12  0:14   ` brian m. carlson
  1 sibling, 1 reply; 59+ messages in thread
From: brian m. carlson @ 2023-09-10 14:24 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

On 2023-09-08 at 23:10:18, Eric W. Biederman wrote:
> The v3 pack index file as documented has a lot of complexity making it
> difficult to implement correctly.  I worked with bryan's preliminary
> implementation and it took several passes to get the bugs out.
> 
> The complexity also requires multiple table look-ups to find all of
> the information that is needed to translate from one kind of oid to
> another.  Which can't be good for cache locality.
> 
> Even worse coming up with a new index file version requires making
> changes that have the potentialy to break anything that uses the index
> of a pack file.
> 
> Instead of continuing to deal with the chance of braking things
> besides the oid mapping functionality, the additional complexity in
> the file format, and worry if the performance would be reasonable I
> stripped down the problem to it's fundamental complexity and came up
> with a file format that is exactly about mapping one kind of oid to
> another, and only supports two kinds of oids.

I think this is a fine approach, and as I'm sure you noticed from my
series, it's a lot more robust than trying to implement pack v3.  I'd be
fine with going with this approach instead of pack v3.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap
  2023-09-08 23:10 ` [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap Eric W. Biederman
@ 2023-09-10 14:34   ` brian m. carlson
  2023-09-10 18:00     ` Eric W. Biederman
  2023-09-11  6:11     ` Junio C Hamano
  0 siblings, 2 replies; 59+ messages in thread
From: brian m. carlson @ 2023-09-10 14:34 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2248 bytes --]

On 2023-09-08 at 23:10:19, Eric W. Biederman wrote:
> Ir makes a lot of sense for the hash algorithm that determines how all

Minor nit: "It".

> diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
> index 4b937480848a..10572c5794f9 100644
> --- a/Documentation/technical/hash-function-transition.txt
> +++ b/Documentation/technical/hash-function-transition.txt
> @@ -148,14 +148,14 @@ Detailed Design
>  Repository format extension
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  A SHA-256 repository uses repository format version `1` (see
> -Documentation/technical/repository-version.txt) with extensions
> -`objectFormat` and `compatObjectFormat`:
> +Documentation/technical/repository-version.txt) with the extension
> +`objectFormat`, and an optional core.compatMap configuration.
>  
>  	[core]
>  		repositoryFormatVersion = 1
> +		compatMap = on
>  	[extensions]
>  		objectFormat = sha256
> -		compatObjectFormat = sha1

While I'm in favour of an approach that uses the compat map, the
situation we've implemented here doesn't specify the extra hash
algorithm.  We want this approach to work just as well for moving from
SHA-1 to SHA-256 as it might for a future transition from SHA-256 to,
say, SHA-3-512, if that becomes necessary.

Making a future transition easier has been a goal of my SHA-256 work
(because who wants to write several hundred patches in such a case?), so
my hope is we can keep that here as well by explicitly naming the
algorithm we're using.

I also wonder if an approach that doesn't use an extension is going to
be helpful.  Say, that I have a repository that is using Git 3.x, which
supports interop, but I also need to use Git 2.x, which does not.  While
it's true that Git 2.x can read my SHA-256 repository, it won't write
the appropriate objects into the map, and thus it will be practically
very difficult to actually use Git 3.x to push data to a repository of a
different hash function.  We might well prefer to have Git 2.x not work
with the repository at all rather than have incomplete data preventing
us from, well, interoperating.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (32 preceding siblings ...)
  2023-09-09 12:58 ` [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
@ 2023-09-10 15:38 ` brian m. carlson
  2023-09-10 18:20   ` Eric W. Biederman
  2023-09-11  6:37 ` Junio C Hamano
  34 siblings, 1 reply; 59+ messages in thread
From: brian m. carlson @ 2023-09-10 15:38 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]

On 2023-09-08 at 23:05:52, Eric W. Biederman wrote:
> 
> I would like to see the SHA256 transition happen so I started playing
> with the k2204-transition-interop branch of brian m. carlson's tree.
> 
> Before I go farther I need to some other folks to look at this and see
> if this is a general direction that the git project can stand.

I'm really excited to see this and I think it's a great way forward.
I've taken a brief look at each patch, and I don't see anything that
should be a dealbreaker.  I left a few comments, although I think your
mailserver is blocking mine at the moment, so you may not have received
them (hopefully you can read them on the list in the interim).

You may also feel free to simply adjust the commit message for the
patches of mine you've modified without needing to document that you've
changed them.  I expect that you will have changed them when you submit
them, if only to resolve conflicts.  After all, Junio does so all the
time.

> This patchset is not complete it does not implement converting a
> received pack of the compatibility hash into the hash function of the
> repository, nor have I written any automated tests.  Both need to happen
> before this is finalized.

Speaking of tests, one set of tests I had intended to write and think
should be written, but had not yet implemented, is tests for
round-tripping objects.  That is, the SHA-1 value we get for a revision
in a pure SHA-1 repository should obviously be the same as the SHA-1
value we get in a SHA-256 repository in interop mode, and we should be
able to use the `test_oid_cache` functionality to hard-code the desired
objects.  I think it would be also helpful to do this for fixed objects
that are doubly-signed (with both algorithms) as well, since that's a
tricky edge case that we'll want to avoid breaking.  Other edge cases
will include things like merge commits, including octopus merges.

But overall, I think this is a great improvement, and I'm very excited
to see someone picking up some of this work and moving it forward.
Thanks for doing so.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap
  2023-09-10 14:34   ` brian m. carlson
@ 2023-09-10 18:00     ` Eric W. Biederman
  2023-09-11  6:11     ` Junio C Hamano
  1 sibling, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-10 18:00 UTC (permalink / raw
  To: brian m. carlson; +Cc: git, Junio C Hamano

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-09-08 at 23:10:19, Eric W. Biederman wrote:
>> Ir makes a lot of sense for the hash algorithm that determines how all
>
> Minor nit: "It".
>
>> diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
>> index 4b937480848a..10572c5794f9 100644
>> --- a/Documentation/technical/hash-function-transition.txt
>> +++ b/Documentation/technical/hash-function-transition.txt
>> @@ -148,14 +148,14 @@ Detailed Design
>>  Repository format extension
>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>  A SHA-256 repository uses repository format version `1` (see
>> -Documentation/technical/repository-version.txt) with extensions
>> -`objectFormat` and `compatObjectFormat`:
>> +Documentation/technical/repository-version.txt) with the extension
>> +`objectFormat`, and an optional core.compatMap configuration.
>>  
>>  	[core]
>>  		repositoryFormatVersion = 1
>> +		compatMap = on
>>  	[extensions]
>>  		objectFormat = sha256
>> -		compatObjectFormat = sha1
>
> While I'm in favour of an approach that uses the compat map, the
> situation we've implemented here doesn't specify the extra hash
> algorithm.  We want this approach to work just as well for moving from
> SHA-1 to SHA-256 as it might for a future transition from SHA-256 to,
> say, SHA-3-512, if that becomes necessary.
>
> Making a future transition easier has been a goal of my SHA-256 work
> (because who wants to write several hundred patches in such a case?), so
> my hope is we can keep that here as well by explicitly naming the
> algorithm we're using.
>
> I also wonder if an approach that doesn't use an extension is going to
> be helpful.  Say, that I have a repository that is using Git 3.x, which
> supports interop, but I also need to use Git 2.x, which does not.  While
> it's true that Git 2.x can read my SHA-256 repository, it won't write
> the appropriate objects into the map, and thus it will be practically
> very difficult to actually use Git 3.x to push data to a repository of a
> different hash function.  We might well prefer to have Git 2.x not work
> with the repository at all rather than have incomplete data preventing
> us from, well, interoperating.

First it is my hope that we can get a command such as "git gc" to scan
the repository and fill in all of the missing compatibility hashes.

Not so much for day to day work, but for people able to enable
compatibility hashes on an existing repository.  Enabling compatibility
hashes on a sha1 repository is going to be necessary to create a sha256
repository from it.  A depth first walk, or a topological sort of the
objects pretty much has to happen as a separate pass.  So it makes sense
just to require all of the objects have their compatibility hash
computed before attempting to generate a pack in the compatibility
format.

I say all of that and I feel silly.

The core and optimized path is what whatever receive pack does to deal
with a pack in the repositories compatibility format.  Once that is
built we can create a sha256 repository from a sha1 repository just
by cloning it, and letting receive-pack figure out the details.

Before we can generate a sha256 pack from a sha1 pack we still need
to compute the sha256 hash of every object, but that can be very
optimized and local to the case of receiving a non-native pack.  So a
repository that generates a compatibility hash for all of it's objects
is not necessary to transition to another hash algorithm.  All we need
is another repository in the other format.

That said there is value in being able to add compatibility hashes
to an existing repository.  The upstream repository can just convert
to the new hash function and all of the downstream repositories
can compute their compatibility hashes and convert when they are ready.

Basically once a git with transition support exists any repository can
convert at any time without creating a problem for other repositories.

In my head it seems cheaper/safer to compute the compatibility hash of
every object in an existing repository than it does to convert a
repository.  Is it?

I think that if the first pull from a repository in another format can
trigger the initial computation of the compatibility hash (like the
first use of a reverse index triggers the creation of the reverse
index), then it will definitely be easier to just enable compatibility
hashes in an existing repository.

The additional hash computation step every pull from upstream (even when
well optimized) should be an incentive for people to fully convert their
repositories after the upstream has converted.

That is when things get tricky and the transition plan has not talked
about.  There are references to existing oid's in email, bug trackers,
and commit comments.  Digging through the history and dealing with those
references is something that developers are going to need to do for the
rest of the life of a project.

Which means eventually we will need to support a mode where we have some
packs with a ``.compat'' index but we no longer compute or generate the
old hash for new objects.

In summary.  I agree that compatMap is likely insufficient. So far I
think it is too cheap/easy to generate the missing mappings to make it a
mandatory requirement that all operations always generate them.

I also agree that making the configuration resilient foreseeable future
demands is a good idea.

So I will push this change farther out in the patch series.

Eric

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256
  2023-09-10 14:24   ` brian m. carlson
@ 2023-09-10 18:07     ` Eric W. Biederman
  0 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-10 18:07 UTC (permalink / raw
  To: brian m. carlson; +Cc: git, Junio C Hamano

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-09-08 at 23:10:18, Eric W. Biederman wrote:
>> The v3 pack index file as documented has a lot of complexity making it
>> difficult to implement correctly.  I worked with bryan's preliminary
>> implementation and it took several passes to get the bugs out.
>> 
>> The complexity also requires multiple table look-ups to find all of
>> the information that is needed to translate from one kind of oid to
>> another.  Which can't be good for cache locality.
>> 
>> Even worse coming up with a new index file version requires making
>> changes that have the potentialy to break anything that uses the index
>> of a pack file.
>> 
>> Instead of continuing to deal with the chance of braking things
>> besides the oid mapping functionality, the additional complexity in
>> the file format, and worry if the performance would be reasonable I
>> stripped down the problem to it's fundamental complexity and came up
>> with a file format that is exactly about mapping one kind of oid to
>> another, and only supports two kinds of oids.
>
> I think this is a fine approach, and as I'm sure you noticed from my
> series, it's a lot more robust than trying to implement pack v3.  I'd be
> fine with going with this approach instead of pack v3.

I think I got your pack v3 working but it was at a minimum a serious
distraction.

I worry a little bit that this might leave some performance on the
table, with something like a 256 way jump table like we have in the
index file.

Still I figure we can start simple and when we start optimizing and
profiling we can revisit the format if it shows up as a performance
issue.

Eric


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
  2023-09-10 15:38 ` brian m. carlson
@ 2023-09-10 18:20   ` Eric W. Biederman
  0 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-10 18:20 UTC (permalink / raw
  To: brian m. carlson; +Cc: git, Junio C Hamano

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-09-08 at 23:05:52, Eric W. Biederman wrote:
>> 
>> I would like to see the SHA256 transition happen so I started playing
>> with the k2204-transition-interop branch of brian m. carlson's tree.
>> 
>> Before I go farther I need to some other folks to look at this and see
>> if this is a general direction that the git project can stand.
>
> I'm really excited to see this and I think it's a great way forward.
> I've taken a brief look at each patch, and I don't see anything that
> should be a dealbreaker.  I left a few comments, although I think your
> mailserver is blocking mine at the moment, so you may not have received
> them (hopefully you can read them on the list in the interim).

I can.  I will see if I can figure out what is happening with direct
reception tomorrow.

> You may also feel free to simply adjust the commit message for the
> patches of mine you've modified without needing to document that you've
> changed them.  I expect that you will have changed them when you submit
> them, if only to resolve conflicts.  After all, Junio does so all the
> time.

Thanks.  I was doing my best at striking a balance between giving credit
where is credit is due, and pointing out the bugs are probably mine.

>> This patchset is not complete it does not implement converting a
>> received pack of the compatibility hash into the hash function of the
>> repository, nor have I written any automated tests.  Both need to happen
>> before this is finalized.
>
> Speaking of tests, one set of tests I had intended to write and think
> should be written, but had not yet implemented, is tests for
> round-tripping objects.  That is, the SHA-1 value we get for a revision
> in a pure SHA-1 repository should obviously be the same as the SHA-1
> value we get in a SHA-256 repository in interop mode, and we should be
> able to use the `test_oid_cache` functionality to hard-code the desired
> objects.  I think it would be also helpful to do this for fixed objects
> that are doubly-signed (with both algorithms) as well, since that's a
> tricky edge case that we'll want to avoid breaking.  Other edge cases
> will include things like merge commits, including octopus merges.

Yes.  I think we can use cat-file to do that.  Have two repositories one
in each format.  Verify that when cat-file prints out an object given
the native oid cat-file prints out what was put in.  Similarly verify
that when cat-file prints out an object given the compatibility oid
cat-file prints out the expected conversion.  That logic performed in
both repositories should work.

> But overall, I think this is a great improvement, and I'm very excited
> to see someone picking up some of this work and moving it forward.
> Thanks for doing so.

Thanks.

Then next goal is to get enough merged that I can test the round-trip
conversions.  More than anything else we need to know the conversion
functionality is solid.

Plus I expect that while 32 patches were important to show the scope of
the work, but a bit much to fully review and merge all at once.

Eric

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap
  2023-09-10 14:34   ` brian m. carlson
  2023-09-10 18:00     ` Eric W. Biederman
@ 2023-09-11  6:11     ` Junio C Hamano
  2023-09-11 16:35       ` [PATCH v2 02/32] doc hash-function-transition: Replace compatObjectFormat with mapObjectFormat Eric W. Biederman
  1 sibling, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2023-09-11  6:11 UTC (permalink / raw
  To: brian m. carlson; +Cc: Eric W. Biederman, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> +Documentation/technical/repository-version.txt) with the extension
>> +`objectFormat`, and an optional core.compatMap configuration.
>>  
>>  	[core]
>>  		repositoryFormatVersion = 1
>> +		compatMap = on
>>  	[extensions]
>>  		objectFormat = sha256
>> -		compatObjectFormat = sha1
>
> While I'm in favour of an approach that uses the compat map, the
> situation we've implemented here doesn't specify the extra hash
> algorithm.  We want this approach to work just as well for moving from
> SHA-1 to SHA-256 as it might for a future transition from SHA-256 to,
> say, SHA-3-512, if that becomes necessary.
>
> Making a future transition easier has been a goal of my SHA-256 work
> (because who wants to write several hundred patches in such a case?), so
> my hope is we can keep that here as well by explicitly naming the
> algorithm we're using.
>
> I also wonder if an approach that doesn't use an extension is going to
> be helpful.  Say, that I have a repository that is using Git 3.x, which
> supports interop, but I also need to use Git 2.x, which does not.  While
> it's true that Git 2.x can read my SHA-256 repository, it won't write
> the appropriate objects into the map, and thus it will be practically
> very difficult to actually use Git 3.x to push data to a repository of a
> different hash function.  We might well prefer to have Git 2.x not work
> with the repository at all rather than have incomplete data preventing
> us from, well, interoperating.

Very sensible line of thought and suggestion to move the topic
forward.  Very much appreciated.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 12/32] bulk-checkin: hash object with compatibility algorithm
  2023-09-08 23:10 ` [PATCH 12/32] bulk-checkin: hash object with compatibility algorithm Eric W. Biederman
@ 2023-09-11  6:17   ` Junio C Hamano
  0 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2023-09-11  6:17 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, brian m. carlson

"Eric W. Biederman" <ebiederm@xmission.com> writes:

>  	struct hashfile_checkpoint checkpoint = {0};
>  	struct pack_idx_entry *idx = NULL;
> +	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
> +	struct object_id compat_oid = {};


bulk-checkin.c:267:39: error: ISO C forbids empty initializer braces [-Werror=pedantic]


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 14/32] commit: write commits for both hashes
  2023-09-08 23:10 ` [PATCH 14/32] commit: write commits for both hashes Eric W. Biederman
@ 2023-09-11  6:25   ` Junio C Hamano
  0 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2023-09-11  6:25 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, brian m. carlson

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> +	struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
> +	struct object_id *parent_buf = NULL;
> +	struct object_id compat_oid = {};

Ditto.

	struct object_id compat_oid = { 0 };

would be our zero-initialization convention.

Thanks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 26/32] object-file-convert: Implement convert_object_file_{begin,step,end}
  2023-09-08 23:10 ` [PATCH 26/32] object-file-convert: Implement convert_object_file_{begin,step,end} Eric W. Biederman
@ 2023-09-11  6:28   ` Junio C Hamano
  0 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2023-09-11  6:28 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, brian m. carlson

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> +	const struct git_hash_algo *from = state->from;
> +	const struct git_hash_algo *to = state->to;
> +	struct strbuf *out = state->outbuf;
> +	const char *buffer = state->buf;
> +	size_t payload_size, size = state->buf_len;;

The excess ';' at the end is an empty statment, hence ...

> +	struct object_id oid;
>  	const char *p;
> +	int ret = 0;

... these three violate our "no declaration after statement" house rule.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 25/32] pack-compat-map:  Add support for .compat files of a packfile
  2023-09-08 23:10 ` [PATCH 25/32] pack-compat-map: Add support for .compat files of a packfile Eric W. Biederman
@ 2023-09-11  6:30   ` Junio C Hamano
  2023-10-05 18:14     ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2023-09-11  6:30 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, brian m. carlson

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> diff --git a/pack-write.c b/pack-write.c
> index b19ddf15b284..f22eea964f77 100644
> --- a/pack-write.c
> +++ b/pack-write.c
> @@ -12,6 +12,7 @@
>  #include "pack-revindex.h"
>  #include "path.h"
>  #include "strbuf.h"
> +#include "object-file-convert.h"
> ...
> +/*
> + * The *hash contains the pack content hash.
> + * The objects array is passed in sorted.
> + */
> +const char *write_compat_map_file(const char *compat_map_name,
> +				  struct pack_idx_entry **objects,
> +				  int nr_objects, const unsigned char *hash)

Include "pack-compat-map.h"; otherwise the compiler would complain
for missing prototypes.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
  2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
                   ` (33 preceding siblings ...)
  2023-09-10 15:38 ` brian m. carlson
@ 2023-09-11  6:37 ` Junio C Hamano
  2023-09-11 16:13   ` Eric W. Biederman
  34 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2023-09-11  6:37 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, brian m. carlson

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> I would like to see the SHA256 transition happen so I started playing
> with the k2204-transition-interop branch of brian m. carlson's tree.

I needed these tweaks to build the series standalone on 'master' (or
2.42).  There are semantic merge conflicts with some topics in flight
when this is merged to 'seen', so it may take me a bit more time to
push the integration result.

Thanks.

 builtin/fast-import.c | 2 +-
 bulk-checkin.c        | 2 +-
 commit.c              | 2 +-
 object-file-convert.c | 4 ++--
 pack-write.c          | 1 +
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 66c471bc73..93cc4a491c 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -784,7 +784,7 @@ struct pack_index_names {
 
 static struct pack_index_names create_index(void)
 {
-	struct pack_index_names tmp = {};
+	struct pack_index_names tmp = { 0 };
 	struct pack_idx_entry **idx, **c, **last;
 	struct object_entry *e;
 	struct object_entry_pool *o;
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 3206412a19..d63b3ffa01 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -264,7 +264,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 	struct hashfile_checkpoint checkpoint = {0};
 	struct pack_idx_entry *idx = NULL;
 	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
-	struct object_id compat_oid = {};
+	struct object_id compat_oid = { 0 };
 
 	seekback = lseek(fd, 0, SEEK_CUR);
 	if (seekback == (off_t) -1)
diff --git a/commit.c b/commit.c
index 54f19ed032..2e2b805d5e 100644
--- a/commit.c
+++ b/commit.c
@@ -1654,7 +1654,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 	struct strbuf buffer, compat_buffer;
 	struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
 	struct object_id *parent_buf = NULL;
-	struct object_id compat_oid = {};
+	struct object_id compat_oid = { 0 };
 	size_t i, nparents;
 
 	/* Not having i18n.commitencoding is the same as having utf-8 */
diff --git a/object-file-convert.c b/object-file-convert.c
index 2306e17dd5..148e61d24f 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -26,7 +26,7 @@ int repo_submodule_oid_to_algop(struct repository *repo,
 
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
-		struct repository subrepo = {};
+		struct repository subrepo = { 0 };
 		int ret;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -205,7 +205,7 @@ static int convert_tag_object_step(struct object_file_convert_state *state)
 	const struct git_hash_algo *to = state->to;
 	struct strbuf *out = state->outbuf;
 	const char *buffer = state->buf;
-	size_t payload_size, size = state->buf_len;;
+	size_t payload_size, size = state->buf_len;
 	struct object_id oid;
 	const char *p;
 	int ret = 0;
diff --git a/pack-write.c b/pack-write.c
index f22eea964f..b2ec09737e 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -7,6 +7,7 @@
 #include "remote.h"
 #include "chunk-format.h"
 #include "pack-mtimes.h"
+#include "pack-compat-map.h"
 #include "oidmap.h"
 #include "pack-objects.h"
 #include "pack-revindex.h"
-- 
2.42.0-158-g94e83dcf5b


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
  2023-09-11  6:37 ` Junio C Hamano
@ 2023-09-11 16:13   ` Eric W. Biederman
  2023-09-11 22:05     ` brian m. carlson
  2023-09-12 16:20     ` Junio C Hamano
  0 siblings, 2 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-11 16:13 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, brian m. carlson

Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
>> I would like to see the SHA256 transition happen so I started playing
>> with the k2204-transition-interop branch of brian m. carlson's tree.
>
> I needed these tweaks to build the series standalone on 'master' (or
> 2.42).  There are semantic merge conflicts with some topics in flight
> when this is merged to 'seen', so it may take me a bit more time to
> push the integration result.

Junio, brian for the very warm reception of this, it is very
encouraging.

I am not worried about what it will take time to get the changes I
posted into the integration.  I had only envisioned them as good enough
to get the technical ideas across, and had never envisioned them as
being accepted as is.

What I am envisioning as my future directions are:

- Post non controversial cleanups, so they can be merged.
  (I can only see about 4 of them the most significant is:
   bulk-checkin: Only accept blobs)

- Sort out the configuration options

- Post the smallest patchset I can that will allow testing the code in
  object-file-convert.c.  Unfortunately for that I need configuration
  options to enable the mapping.

  In starting to write the tests I have already found a bug in
  the conversion of tags (an extra newline is added), and I haven't
  even gotten to testing the tricky bits with signatures.

- Once the object file conversion is tested and is solid work on
  the more substantial pieces.

Does that sound like a reasonable plan?

Eric

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 02/32] doc hash-function-transition: Replace compatObjectFormat with mapObjectFormat
  2023-09-11  6:11     ` Junio C Hamano
@ 2023-09-11 16:35       ` Eric W. Biederman
  2023-09-11 23:46         ` [PATCH v3 02/32] doc hash-function-transition: Augment compatObjectFormat with readCompatMap Eric W. Biederman
  0 siblings, 1 reply; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-11 16:35 UTC (permalink / raw
  To: Junio C Hamano; +Cc: brian m. carlson, git

Deeply and fundamentally the plan is to only operate one one hash
function for the core of git, to use only one hash function for what
is stored in the repository.

To avoid requring a flag day to transition hash functions for naming
objects, and to support being able to access objects using legacy object
names a mapping functionality will be provided.

We want to provide user facing configuration that is robust enough
that it can accomodate multiple different scenarios on how git
evolves and how people use their repositories.

There are two different ways it is envisioned to use mapped object
ids.  The first is to require every object in the repository to have a
mapping, so that pushes and pulls from repositories using a different
hash algorithm can work.  The second is to have an incomplete mapping
of object ids so that old references to objects in emails, commit
messages, bug trackers and are usable in a read-only manner
with tools like "git show".

The first way fundamentally needs every object in the repository to
have a mapping, which requires the repository to be marked incompatible
for writes fron older versions of git.  Thus the mapObjectFormat option
is placed in [extensions].

The ext2 family of filesystems has 3 ways of describing new features
compatible, read-only-compatible, and incompatible.  The current git
configurtation has compat (any feature mentioned anywhere in the
configuration outside of [extensions] section), and incompatible (any
configuration inside of the [extensions] section.  It would be nice to
have a read-only compatible section for the mandatory mapping
function.  Would it be worth adding it now so that we have it for
future extensions?

Having a mapping that is just used in a read-only mode for looking up
old objects with old object ids will be needed post-transition.  Such
a mode does not require computing the old hash function or even
support automatically writing any new mappings.  So it is completely
safe to enable in a backwards compatible mode.  Fort that let's
use core.readObjectMap to make it clear the mappings only read.

I have documented that both of the options readObjectMap and
mapObjectFormat can be specified multiple times if that is needed to
support the desired configuration of git.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Posting this to hopefully move the conversation forward.  Unfortunately
I need something like this so I can tests so I guess now is the time to
resolve this detail.

 .../technical/hash-function-transition.txt    | 49 ++++++++++++++++---
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index 4b937480848a..9f5c672d9ad1 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -149,13 +149,13 @@ Repository format extension
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 A SHA-256 repository uses repository format version `1` (see
 Documentation/technical/repository-version.txt) with extensions
-`objectFormat` and `compatObjectFormat`:
+`objectFormat` and `mapObjectFormat`:

 	[core]
 		repositoryFormatVersion = 1
 	[extensions]
 		objectFormat = sha256
-		compatObjectFormat = sha1
+		mapObjectFormat = sha1

 The combination of setting `core.repositoryFormatVersion=1` and
 populating `extensions.*` ensures that all versions of Git later than
@@ -171,6 +171,43 @@ repository, instead producing an error message.
 		objectformat
 		compatobjectformat

+Configurate for a future hash function transition would be:
+
+	[core]
+		repositoryFormatVersion = 1
+	[extensions]
+		objectFormat = futureHash
+		mapObjectFormat = sha256
+		mapObjectFormat = sha1
+
+Or possibly:
+
+	[core]
+		repositoryFormatVersion = 1
+		readObjectMap = sha1
+	[extensions]
+		objectFormat = futureHash
+		mapObjectFormat = sha256
+
+Or post transition to futureHash:
+
+	[core]
+		repositoryFormatVersion = 1
+		readObjectMap = sha1
+		readObjectMap = sha256
+	[extensions]
+		objectFormat = futureHash
+
+The difference between mapObjectFormat and readObjectMap would be that
+mapObjectFormat would ask git to read existing maps, but would not ask
+git to write or create them.  Which is enough to support looking up
+old oids post transition, when they are only needed to support
+references in commit logs, bug trackers, emails and the like.
+
+Meanwhile with mapObjectFormat set every object in the entire
+repository would be required to have a bi-directional mapping from the
+the mapped object format to the repositories storage hash function.
+
 See the "Transition plan" section below for more details on these
 repository extensions.

@@ -682,7 +719,7 @@ Some initial steps can be implemented independently of one another:
 - adding support for the PSRC field and safer object pruning

 The first user-visible change is the introduction of the objectFormat
-extension (without compatObjectFormat). This requires:
+extension. This requires:

 - teaching fsck about this mode of operation
 - using the hash function API (vtable) when computing object names
@@ -690,7 +727,7 @@ extension (without compatObjectFormat). This requires:
 - rejecting attempts to fetch from or push to an incompatible
   repository

-Next comes introduction of compatObjectFormat:
+Next comes introduction of mapObjectFormat:

 - implementing the loose-object-idx
 - translating object names between object formats
@@ -724,9 +761,9 @@ Over time projects would encourage their users to adopt the "early
 transition" and then "late transition" modes to take advantage of the
 new, more futureproof SHA-256 object names.

-When objectFormat and compatObjectFormat are both set, commands
+When objectFormat and mapObjectFormat are both set, commands
 generating signatures would generate both SHA-1 and SHA-256 signatures
 by default to support both new and old users.

 In projects using SHA-256 heavily, users could be encouraged to adopt
 the "post-transition" mode to avoid accidentally making implicit use
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
  2023-09-11 16:13   ` Eric W. Biederman
@ 2023-09-11 22:05     ` brian m. carlson
  2023-09-12 21:19       ` Eric W. Biederman
  2023-09-12 16:20     ` Junio C Hamano
  1 sibling, 1 reply; 59+ messages in thread
From: brian m. carlson @ 2023-09-11 22:05 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 1506 bytes --]

On 2023-09-11 at 16:13:27, Eric W. Biederman wrote:
> Junio, brian for the very warm reception of this, it is very
> encouraging.
> 
> I am not worried about what it will take time to get the changes I
> posted into the integration.  I had only envisioned them as good enough
> to get the technical ideas across, and had never envisioned them as
> being accepted as is.
> 
> What I am envisioning as my future directions are:
> 
> - Post non controversial cleanups, so they can be merged.
>   (I can only see about 4 of them the most significant is:
>    bulk-checkin: Only accept blobs)
> 
> - Sort out the configuration options
> 
> - Post the smallest patchset I can that will allow testing the code in
>   object-file-convert.c.  Unfortunately for that I need configuration
>   options to enable the mapping.
> 
>   In starting to write the tests I have already found a bug in
>   the conversion of tags (an extra newline is added), and I haven't
>   even gotten to testing the tricky bits with signatures.

I wonder if unit tests are a possibility here now that we're starting to
use them.  They're not obligatory, of course, but it may be more
convenient for you if they turn out to be a suitable option.  If not, no
big deal.

> - Once the object file conversion is tested and is solid work on
>   the more substantial pieces.
> 
> Does that sound like a reasonable plan?

Yeah, that seems fine.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 02/32] doc hash-function-transition: Augment compatObjectFormat with readCompatMap
  2023-09-11 16:35       ` [PATCH v2 02/32] doc hash-function-transition: Replace compatObjectFormat with mapObjectFormat Eric W. Biederman
@ 2023-09-11 23:46         ` Eric W. Biederman
  2023-09-12  7:57           ` Oswald Buddenhagen
  0 siblings, 1 reply; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-11 23:46 UTC (permalink / raw
  To: Junio C Hamano; +Cc: brian m. carlson, git

Deeply and fundamentally the plan is to only operate one one hash
function for the core of git, to use only one hash function for what
is stored in the repository.

To avoid requring a flag day to transition hash functions for naming
objects, and to support being able to access objects using legacy object
names a mapping functionality will be provided.

We want to provide user facing configuration that is robust enough
that it can accomodate multiple different scenarios on how git
evolves and how people use their repositories.

There are two different ways it is envisioned to use mapped object
ids.  The first is to require every object in the repository to have a
mapping, so that pushes and pulls from repositories using a different
hash algorithm can work.  The second is to have an incomplete mapping
of object ids so that old references to objects in emails, commit
messages, bug trackers and are usable in a read-only manner
with tools like "git show".

The first way fundamentally needs every object in the repository to
have a mapping, which requires the repository to be marked incompatible
for writes fron older versions of git.  Thus the compatObjectFormat option
is placed in [extensions].

The ext2 family of filesystems has 3 ways of describing new features
compatible, read-only-compatible, and incompatible.  The current git
configurtation has compat (any feature mentioned anywhere in the
configuration outside of [extensions] section), and incompatible (any
configuration inside of the [extensions] section.  It would be nice to
have a read-only compatible section for the mandatory mapping
function.  Would it be worth adding it now so that we have it for
future extensions?

Having a mapping that is just used in a read-only mode for looking up
old objects with old object ids will be needed post-transition.  Such
a mode does not require computing the old hash function or even
support automatically writing any new mappings.  So it is completely
safe to enable in a backwards compatible mode.  Fort that let's
use core.readCompatMap to make it clear the mappings only read.

I have documented that both of the options readCompatMap and
compatObjectFormat can be specified multiple times if that is needed to
support the desired configuration of git.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

My v2 version was just silly.  Changing the name of the option in
the [extensions] section made practical sense.  It was just me being
contrary for no good reason.  I still think we should have an additional
option for reading old hashes and to document that we expect multiple of
these.

So here is my proposal for extending the documentation along those
lines.

Additionally just accepting the existing option name means I am not
bottlenecked for writing tests convert_object_file which is the
important part right now.

My apologies for all of the noise.

 .../technical/hash-function-transition.txt    | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index 4b937480848a..26dfc3138b3b 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -171,6 +171,43 @@ repository, instead producing an error message.
 		objectformat
 		compatobjectformat

+Configurate for a future hash function transition would be:
+
+	[core]
+		repositoryFormatVersion = 1
+	[extensions]
+		objectFormat = futureHash
+		compatObjectFormat = sha256
+		compatObjectFormat = sha1
+
+Or possibly:
+
+	[core]
+		repositoryFormatVersion = 1
+		readCompatMap = sha1
+	[extensions]
+		objectFormat = futureHash
+		compatObjectFormat = sha256
+
+Or post transition to futureHash:
+
+	[core]
+		repositoryFormatVersion = 1
+		readCompatMap = sha1
+		readComaptMap = sha256
+	[extensions]
+		objectFormat = futureHash
+
+The difference between compatObjectFormat and readCompatMap would be that
+compatObjectFormat would ask git to read existing maps, but would not ask
+git to write or create them.  Which is enough to support looking up
+old oids post transition, when they are only needed to support
+references in commit logs, bug trackers, emails and the like.
+
+Meanwhile with compatObjectFormat set every object in the entire
+repository would be required to have a bi-directional mapping from the
+the mapped object format to the repositories storage hash function.
+
 See the "Transition plan" section below for more details on these
 repository extensions.

-- 
2.41.0

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256
  2023-09-08 23:10 ` [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256 Eric W. Biederman
  2023-09-10 14:24   ` brian m. carlson
@ 2023-09-12  0:14   ` brian m. carlson
  2023-09-12 13:36     ` Eric W. Biederman
  1 sibling, 1 reply; 59+ messages in thread
From: brian m. carlson @ 2023-09-12  0:14 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3718 bytes --]

On 2023-09-08 at 23:10:18, Eric W. Biederman wrote:
> The v3 pack index file as documented has a lot of complexity making it
> difficult to implement correctly.  I worked with bryan's preliminary
> implementation and it took several passes to get the bugs out.
> 
> The complexity also requires multiple table look-ups to find all of
> the information that is needed to translate from one kind of oid to
> another.  Which can't be good for cache locality.
> 
> Even worse coming up with a new index file version requires making
> changes that have the potentialy to break anything that uses the index
> of a pack file.
> 
> Instead of continuing to deal with the chance of braking things
> besides the oid mapping functionality, the additional complexity in
> the file format, and worry if the performance would be reasonable I
> stripped down the problem to it's fundamental complexity and came up
> with a file format that is exactly about mapping one kind of oid to
> another, and only supports two kinds of oids.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  .../technical/hash-function-transition.txt    | 40 +++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
> index ed574810891c..4b937480848a 100644
> --- a/Documentation/technical/hash-function-transition.txt
> +++ b/Documentation/technical/hash-function-transition.txt
> @@ -209,6 +209,46 @@ format described in linkgit:gitformat-pack[5], just like
>  today. The content that is compressed and stored uses SHA-256 content
>  instead of SHA-1 content.
>  
> +Per Pack Mapping Table
> +~~~~~~~~~~~~~~~~~~~~~~
> +A pack compat map file (.compat) files have the following format:
> +
> +HEADER:
> +	4-byte signature:
> +	    The signature is: {'C', 'M', 'A', 'P'}
> +	1-byte version number:
> +	    Git only writes or recognizes version 1.
> +	1-byte First Object Id Version
> +	    We infer the length of object IDs (OIDs) from this value:
> +		1 => SHA-1
> +		2 => SHA-256

One thing I forgot to mention here, is that we have 32-bit format IDs
for these in the structure, so we should use them here and below.  These
are GIT_SHA1_FORMAT_ID and GIT_SHA256_FORMAT_ID.

Not that I would encourage distributing such software, but it makes it
much easier for people to experiment with additional hash algorithms (in
terms of performance, etc.) if we make the space a little sparser.

> +	1-byte Second Object Id Version
> +	    We infer the length of object IDs (OIDs) from this value:
> +		1 => SHA-1
> +		2 => SHA-256

In your new patch for the next part, you consider that there might be
multiple compatibility hash algorithms.  I had anticipated only one at
a time in my series, but I'm not opposed to multiple if you want to
support that.

However, here you're making the assumption that there are only two.  If
you want to support multiple values, we need to explicitly consider that
both here (where we need a count of object ID version and multiple
tables, one for each algorithm), and in the follow-up series.

I had not considered more than two algorithms because it substantially
complicates the code and requires us to develop n*(n-1) tables, but I'm
not the one volunteering to do most of the work here, so I'll defer to
your preference.  (I do intend to send a patch or two, though.)

It's also possible we could be somewhat provident and define the on-disk
formats for multiple algorithms and then punt on the code until later if
you prefer that.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 02/32] doc hash-function-transition: Augment compatObjectFormat with readCompatMap
  2023-09-11 23:46         ` [PATCH v3 02/32] doc hash-function-transition: Augment compatObjectFormat with readCompatMap Eric W. Biederman
@ 2023-09-12  7:57           ` Oswald Buddenhagen
  2023-09-12 12:11             ` Eric W. Biederman
  0 siblings, 1 reply; 59+ messages in thread
From: Oswald Buddenhagen @ 2023-09-12  7:57 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: Junio C Hamano, brian m. carlson, git

On Mon, Sep 11, 2023 at 06:46:19PM -0500, Eric W. Biederman wrote:
>+The difference between compatObjectFormat and readCompatMap would be that
>+compatObjectFormat would ask git to read existing maps, but would not ask
>+git to write or create them.
> 
the argument makes sense, but the asymmetry in the naming bugs me. in 
particular "[read]compatMap" seems too non-descript.

regards

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 02/32] doc hash-function-transition: Augment compatObjectFormat with readCompatMap
  2023-09-12  7:57           ` Oswald Buddenhagen
@ 2023-09-12 12:11             ` Eric W. Biederman
  2023-09-13  8:10               ` Oswald Buddenhagen
  0 siblings, 1 reply; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-12 12:11 UTC (permalink / raw
  To: Oswald Buddenhagen; +Cc: Junio C Hamano, brian m. carlson, git

Oswald Buddenhagen <oswald.buddenhagen@gmx.de> writes:

> On Mon, Sep 11, 2023 at 06:46:19PM -0500, Eric W. Biederman wrote:
>>+The difference between compatObjectFormat and readCompatMap would be that
>>+compatObjectFormat would ask git to read existing maps, but would not ask
>>+git to write or create them.
>> 
> the argument makes sense, but the asymmetry in the naming bugs me. in particular
> "[read]compatMap" seems too non-descript.

I am open to suggestions for better names.

From a code point of view I am intending readCompatMap only supporting
the things that can be support with just the mapping functions aka
repo_oid_to_algop for the "readComatMap" case.

While the compatObjectFormat case includes what can be done with using
the compatible hash algorithm and convert_object_file.

There is quite a large variation.  So there is some fundamental
asymmetry in the implementation.   I am just not certain how to name it.

Eric

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256
  2023-09-12  0:14   ` brian m. carlson
@ 2023-09-12 13:36     ` Eric W. Biederman
  0 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-12 13:36 UTC (permalink / raw
  To: brian m. carlson; +Cc: git, Junio C Hamano

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-09-08 at 23:10:18, Eric W. Biederman wrote:
>> The v3 pack index file as documented has a lot of complexity making it
>> difficult to implement correctly.  I worked with bryan's preliminary
>> implementation and it took several passes to get the bugs out.
>> 
>> The complexity also requires multiple table look-ups to find all of
>> the information that is needed to translate from one kind of oid to
>> another.  Which can't be good for cache locality.
>> 
>> Even worse coming up with a new index file version requires making
>> changes that have the potentialy to break anything that uses the index
>> of a pack file.
>> 
>> Instead of continuing to deal with the chance of braking things
>> besides the oid mapping functionality, the additional complexity in
>> the file format, and worry if the performance would be reasonable I
>> stripped down the problem to it's fundamental complexity and came up
>> with a file format that is exactly about mapping one kind of oid to
>> another, and only supports two kinds of oids.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  .../technical/hash-function-transition.txt    | 40 +++++++++++++++++++
>>  1 file changed, 40 insertions(+)
>> 
>> diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
>> index ed574810891c..4b937480848a 100644
>> --- a/Documentation/technical/hash-function-transition.txt
>> +++ b/Documentation/technical/hash-function-transition.txt
>> @@ -209,6 +209,46 @@ format described in linkgit:gitformat-pack[5], just like
>>  today. The content that is compressed and stored uses SHA-256 content
>>  instead of SHA-1 content.
>>  
>> +Per Pack Mapping Table
>> +~~~~~~~~~~~~~~~~~~~~~~
>> +A pack compat map file (.compat) files have the following format:
>> +
>> +HEADER:
>> +	4-byte signature:
>> +	    The signature is: {'C', 'M', 'A', 'P'}
>> +	1-byte version number:
>> +	    Git only writes or recognizes version 1.
>> +	1-byte First Object Id Version
>> +	    We infer the length of object IDs (OIDs) from this value:
>> +		1 => SHA-1
>> +		2 => SHA-256
>
> One thing I forgot to mention here, is that we have 32-bit format IDs
> for these in the structure, so we should use them here and below.  These
> are GIT_SHA1_FORMAT_ID and GIT_SHA256_FORMAT_ID.
>
> Not that I would encourage distributing such software, but it makes it
> much easier for people to experiment with additional hash algorithms (in
> terms of performance, etc.) if we make the space a little sparser.

Unfortunately that ship has already sailed. If you look at pack reverse
indices, pack mtime files, multi-pack-index files, they all use an
oid_version field.  So to experiment with a new hash function a new
number has to be picked.

The only use I can find of your 4 byte format_id's is in the reftable
code.

Using a 4 byte magic number in this case also conflicts with basic
simplicity.  With a one byte field I can specify it easily, and read it
back with no special tools, and understand what it means at a glance.

I admit I can only understand what a oid version field means at a glance
because the variation of object id's is low, but that is fundamental.
We require global agreement on names.  Fundamentally git can not
support many object id transitions.  Names are just too expensive.

When I come to how the map file is specified a single byte has real
advantages.  A single byte never needs byte swapping.  So it won't
be misread.  Using a single byte for each format allows me to
keep the header for the file at 16 bytes.  Which guarantees good
alignment of everything in the file without having to be clever.

All of this is for a file that is strictly local and the entire function
of the bytes is a sanity check to make certain that something weird is
not going on, or to assist recover if something bad happens.

So in this case I don't see any additional agility provided by longer
names helping.

>> +	1-byte Second Object Id Version
>> +	    We infer the length of object IDs (OIDs) from this value:
>> +		1 => SHA-1
>> +		2 => SHA-256
>
> In your new patch for the next part, you consider that there might be
> multiple compatibility hash algorithms.  I had anticipated only one at
> a time in my series, but I'm not opposed to multiple if you want to
> support that.
>
> However, here you're making the assumption that there are only two.  If
> you want to support multiple values, we need to explicitly consider that
> both here (where we need a count of object ID version and multiple
> tables, one for each algorithm), and in the follow-up series.
>
> I had not considered more than two algorithms because it substantially
> complicates the code and requires us to develop n*(n-1) tables, but I'm
> not the one volunteering to do most of the work here, so I'll defer to
> your preference.  (I do intend to send a patch or two, though.)
>
> It's also possible we could be somewhat provident and define the on-disk
> formats for multiple algorithms and then punt on the code until later if
> you prefer that.

In the long term I anticipate people disabling compatObjectFormat and
switching to readCompatMap so they still have access to their old
objects by their original names, but they don't have the over head
of computing a compatibility hash.

In a world where there is a transition to futureHash I anticipate
the files associated with an old pack looking something like:
pack-abcdefg.compat12
pack-abcdefg.compat32

For a repository still using hash version sha256 for storage, with a
mapping to some sha1 names, and a mapping of everything to new
names for compatibility with futureHash.

After transitioning to the futureHash those files would look like:
pack-abcdefg.compat13
pack-abcdefg.compat23

I deeply and fundamentally care about having some way to look up
old names because I do that all of the time.

In my work on the linux-kernel I have found myself frequently digging
into old issues.  I have on my hard drive tglx's git import of the
old bitkeeper tree.  I also have an import of all of the old kernel
releases into git from before the code was stored in bitkeeper.  I find
myself actually using all of those trees when digging into issues.

So I think the idea that we will ever be able to get rid of the mapping
for old converted repositories is unlikely.  We have entirely too many
references out there.

Which means that for every hash format conversion a repository goes
through I am going to have another collection of old names.

I don't honestly anticipate ever needing to have multiple
compatObjectFormat entries specified for a single repository.  I do
agree that if we are going to worry about forward and backward
compatibility we should be robust and have a configuration file syntax
that can handle the possibility.

I do very much anticipate needing to have multiple readCompatMap
entries, and pretty much only using them in get_short_oid in
object-name.c.  It will make the loop in find_short_packed_compat_object
a little longer but that is about all that will need to be implemented
and maintained long term.

I view this compat map format a lot like the loose objects.  It is
simple and good enough to get us started.  If it turns out we need
to optimize it's simplicity means all of the interfaces in the code
to use it have already been built, and we can just concentrate on
optimizing.

Eric

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
  2023-09-11 16:13   ` Eric W. Biederman
  2023-09-11 22:05     ` brian m. carlson
@ 2023-09-12 16:20     ` Junio C Hamano
  2023-09-14 19:57       ` Eric W. Biederman
  1 sibling, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2023-09-12 16:20 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: git, brian m. carlson

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> I am not worried about what it will take time to get the changes I
> posted into the integration.  I had only envisioned them as good enough
> to get the technical ideas across, and had never envisioned them as
> being accepted as is.

Ah, no worries.  By "integration" I did not mean "patches considered
perfect, they are accepted, and are now part of the Git codebase".

All that happens when the patches become part of the 'master'
branch, but before that, patches that prove testable and worthy of
getting tested will be merged to the 'next' branch and spend about a
week there.  What I meant to refer to is a step _before_ that, i.e.
before the patches probe to be testable.  New patches first appear
on the 'seen' branch that merges "everything else" to see the
interaction with all the topics "in flight" (i.e.  not yet in
'master').  The 'seen' branch is reassembled from the latest
iteration of the patches twice of thrice per day, and some patches
are merged to 'next' and down to 'master', these "merging to prepare
'master', 'next' and 'seen' branches for publishing" was what I
meant by "integration".  In short, being queued on 'seen' does not
mean all that much.  It gives project participants an easy access to
view how topics look in the larger picture, potentially interacting
with other topics in flight, but the patches in there can be
replaced wholesale or even dropped if they do not turn out to be
desirable.

I resolved textual conflicts and also compiler detectable semantic
conflicts (e.g. some in-flight topics may have added callsites to a
function your topic changes the function sigunature, or vice versa)
to the point that the result compiles while merging this topic to
'seen', but tests are broken the big time, it seems, even though the
topic by itself seems to pass the tests standalone.

> What I am envisioning as my future directions are:
> ...
> Does that sound like a reasonable plan?

Nice.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
  2023-09-11 22:05     ` brian m. carlson
@ 2023-09-12 21:19       ` Eric W. Biederman
  0 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-12 21:19 UTC (permalink / raw
  To: brian m. carlson; +Cc: Junio C Hamano, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-09-11 at 16:13:27, Eric W. Biederman wrote:
>> Junio, brian for the very warm reception of this, it is very
>> encouraging.
>> 
>> I am not worried about what it will take time to get the changes I
>> posted into the integration.  I had only envisioned them as good enough
>> to get the technical ideas across, and had never envisioned them as
>> being accepted as is.
>> 
>> What I am envisioning as my future directions are:
>> 
>> - Post non controversial cleanups, so they can be merged.
>>   (I can only see about 4 of them the most significant is:
>>    bulk-checkin: Only accept blobs)
>> 
>> - Sort out the configuration options
>> 
>> - Post the smallest patchset I can that will allow testing the code in
>>   object-file-convert.c.  Unfortunately for that I need configuration
>>   options to enable the mapping.
>> 
>>   In starting to write the tests I have already found a bug in
>>   the conversion of tags (an extra newline is added), and I haven't
>>   even gotten to testing the tricky bits with signatures.
>
> I wonder if unit tests are a possibility here now that we're starting to
> use them.  They're not obligatory, of course, but it may be more
> convenient for you if they turn out to be a suitable option.  If not, no
> big deal.

I believe you mean using test-tool and making a very narrow focused test
on just the functionality.

If the number of patches I have to go through before I can test anything
becomes a problem I might go there.  Unfortunately it would take some
refactoring to make object-file-convert independent of the object
mapping layer, and that is extra work and is likely to introduces bugs
as anything.

I have managed to get a set of tests working.  I am just going through
now and plugging the holes.

My big strategy for testing convert_object_file is to build two
repositories one sha1 and the other sha256 both with compatibility
support enabled.  I add a series of objects to those repositories and
compare them to ensure the objects are identical.

It is working well and is finding bugs not just in convert_object_file
but in code such as commit and tag that perform interesting work with
signed commits.

I discovered I had bungled the placement of hash_object_file for
the compatibility hash in commit.

I found that git tag did not yet support building tags with both
hash algorithms.

Right now I am looking at commits with mergetag lines.  It is not fun.
The mergetag instead of pointing to the tag objects they include the
body of the tag objects in the commit object.  So I have convert
the embedded tag objects from one hash function to another.
Which given the presence of preceding space on every line of
the embedded tag object makes it doubly interesting.  Perhaps
someone has already written code to extract the embedded tag.

Or in short everything is moving along steadily.

Eric

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 02/32] doc hash-function-transition: Augment compatObjectFormat with readCompatMap
  2023-09-12 12:11             ` Eric W. Biederman
@ 2023-09-13  8:10               ` Oswald Buddenhagen
  0 siblings, 0 replies; 59+ messages in thread
From: Oswald Buddenhagen @ 2023-09-13  8:10 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: Junio C Hamano, brian m. carlson, git

On Tue, Sep 12, 2023 at 07:11:26AM -0500, Eric W. Biederman wrote:
>Oswald Buddenhagen <oswald.buddenhagen@gmx.de> writes:
>> On Mon, Sep 11, 2023 at 06:46:19PM -0500, Eric W. Biederman wrote:
>>>+The difference between compatObjectFormat and readCompatMap would be that
>>>+compatObjectFormat would ask git to read existing maps, but would not ask
>>>+git to write or create them.
>>> 
>> the argument makes sense, but the asymmetry in the naming bugs me. in particular
>> "[read]compatMap" seems too non-descript.
>
>I am open to suggestions for better names.
>
isn't readCompatObjectFormat an obvious choice?
(and for symmetry, the other then would be writeCompatObjectFormat, i 
guess.)

regards

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC][PATCH 0/32] SHA256 and SHA1 interoperability
  2023-09-12 16:20     ` Junio C Hamano
@ 2023-09-14 19:57       ` Eric W. Biederman
  0 siblings, 0 replies; 59+ messages in thread
From: Eric W. Biederman @ 2023-09-14 19:57 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, brian m. carlson

Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
>> I am not worried about what it will take time to get the changes I
>> posted into the integration.  I had only envisioned them as good enough
>> to get the technical ideas across, and had never envisioned them as
>> being accepted as is.
>
> Ah, no worries.  By "integration" I did not mean "patches considered
> perfect, they are accepted, and are now part of the Git codebase".
>
> All that happens when the patches become part of the 'master'
> branch, but before that, patches that prove testable and worthy of
> getting tested will be merged to the 'next' branch and spend about a
> week there.  What I meant to refer to is a step _before_ that, i.e.
> before the patches probe to be testable.  New patches first appear
> on the 'seen' branch that merges "everything else" to see the
> interaction with all the topics "in flight" (i.e.  not yet in
> 'master').  The 'seen' branch is reassembled from the latest
> iteration of the patches twice of thrice per day, and some patches
> are merged to 'next' and down to 'master', these "merging to prepare
> 'master', 'next' and 'seen' branches for publishing" was what I
> meant by "integration".  In short, being queued on 'seen' does not
> mean all that much.  It gives project participants an easy access to
> view how topics look in the larger picture, potentially interacting
> with other topics in flight, but the patches in there can be
> replaced wholesale or even dropped if they do not turn out to be
> desirable.
>
> I resolved textual conflicts and also compiler detectable semantic
> conflicts (e.g. some in-flight topics may have added callsites to a
> function your topic changes the function sigunature, or vice versa)
> to the point that the result compiles while merging this topic to
> 'seen', but tests are broken the big time, it seems, even though the
> topic by itself seems to pass the tests standalone.

That the tests are broken is very unfortunate.

I took at look at What's cooking in git.git and I did not see my topic
mentioned.  So I presume I would have to perform the test merge myself
to have a sense of what the conflicts were.

Is there a time when in flight topics is low?  I had a hunch that basing
my work on a brand new release would achieve that but I saw a lot of
topics in your "What's cooking" email.

I am just trying to figure out a good plan to deal with conflicts,
because the bugs need to be hunted down.

Eric




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 25/32] pack-compat-map:  Add support for .compat files of a packfile
  2023-09-11  6:30   ` Junio C Hamano
@ 2023-10-05 18:14     ` Taylor Blau
  0 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2023-10-05 18:14 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Eric W. Biederman, git, brian m. carlson

On Sun, Sep 10, 2023 at 11:30:49PM -0700, Junio C Hamano wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
> > diff --git a/pack-write.c b/pack-write.c
> > index b19ddf15b284..f22eea964f77 100644
> > --- a/pack-write.c
> > +++ b/pack-write.c
> > @@ -12,6 +12,7 @@
> >  #include "pack-revindex.h"
> >  #include "path.h"
> >  #include "strbuf.h"
> > +#include "object-file-convert.h"
> > ...
> > +/*
> > + * The *hash contains the pack content hash.
> > + * The objects array is passed in sorted.
> > + */
> > +const char *write_compat_map_file(const char *compat_map_name,
> > +				  struct pack_idx_entry **objects,
> > +				  int nr_objects, const unsigned char *hash)
>
> Include "pack-compat-map.h"; otherwise the compiler would complain
> for missing prototypes.

Likewise this is missing an entry in the .gitignore:

--- >8 ---
diff --git a/.gitignore b/.gitignore
index 5e56e471b3..7f5a93a6f6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -152,6 +152,7 @@
 /git-shortlog
 /git-show
 /git-show-branch
+/git-show-compat-map
 /git-show-index
 /git-show-ref
 /git-sparse-checkout
--- 8< ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2023-10-05 18:15 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
2023-09-08 23:10 ` [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256 Eric W. Biederman
2023-09-10 14:24   ` brian m. carlson
2023-09-10 18:07     ` Eric W. Biederman
2023-09-12  0:14   ` brian m. carlson
2023-09-12 13:36     ` Eric W. Biederman
2023-09-08 23:10 ` [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap Eric W. Biederman
2023-09-10 14:34   ` brian m. carlson
2023-09-10 18:00     ` Eric W. Biederman
2023-09-11  6:11     ` Junio C Hamano
2023-09-11 16:35       ` [PATCH v2 02/32] doc hash-function-transition: Replace compatObjectFormat with mapObjectFormat Eric W. Biederman
2023-09-11 23:46         ` [PATCH v3 02/32] doc hash-function-transition: Augment compatObjectFormat with readCompatMap Eric W. Biederman
2023-09-12  7:57           ` Oswald Buddenhagen
2023-09-12 12:11             ` Eric W. Biederman
2023-09-13  8:10               ` Oswald Buddenhagen
2023-09-08 23:10 ` [PATCH 03/32] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
2023-09-08 23:10 ` [PATCH 04/32] object-name: Initial support for ^{sha1} and ^{sha256} Eric W. Biederman
2023-09-08 23:10 ` [PATCH 05/32] repository: add a compatibility hash algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 06/32] repository: Implement core.compatMap Eric W. Biederman
2023-09-08 23:10 ` [PATCH 07/32] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
2023-09-08 23:10 ` [PATCH 08/32] loose: Compatibilty short name support Eric W. Biederman
2023-09-08 23:10 ` [PATCH 09/32] object-file: Update the loose object map when writing loose objects Eric W. Biederman
2023-09-08 23:10 ` [PATCH 10/32] bulk-checkin: Only accept blobs Eric W. Biederman
2023-09-08 23:10 ` [PATCH 11/32] pack: Communicate the compat_oid through struct pack_idx_entry Eric W. Biederman
2023-09-08 23:10 ` [PATCH 12/32] bulk-checkin: hash object with compatibility algorithm Eric W. Biederman
2023-09-11  6:17   ` Junio C Hamano
2023-09-08 23:10 ` [PATCH 13/32] object-file: Add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
2023-09-08 23:10 ` [PATCH 14/32] commit: write commits for both hashes Eric W. Biederman
2023-09-11  6:25   ` Junio C Hamano
2023-09-08 23:10 ` [PATCH 15/32] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 16/32] object: Factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
2023-09-08 23:10 ` [PATCH 17/32] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
2023-09-08 23:10 ` [PATCH 18/32] object-file-convert: convert commit objects when writing Eric W. Biederman
2023-09-08 23:10 ` [PATCH 19/32] object-file-convert: convert tag commits " Eric W. Biederman
2023-09-08 23:10 ` [PATCH 20/32] builtin/cat-file: Let the oid determine the output algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 21/32] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 22/32] object-file: Handle compat objects in check_object_signature Eric W. Biederman
2023-09-08 23:10 ` [PATCH 23/32] builtin/ls-tree: Let the oid determine the output algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 24/32] builtin/pack-objects: Communicate the compatibility hash through struct pack_idx_entry Eric W. Biederman
2023-09-08 23:10 ` [PATCH 25/32] pack-compat-map: Add support for .compat files of a packfile Eric W. Biederman
2023-09-11  6:30   ` Junio C Hamano
2023-10-05 18:14     ` Taylor Blau
2023-09-08 23:10 ` [PATCH 26/32] object-file-convert: Implement convert_object_file_{begin,step,end} Eric W. Biederman
2023-09-11  6:28   ` Junio C Hamano
2023-09-08 23:10 ` [PATCH 27/32] builtin/fast-import: compute compatibility hashs for imported objects Eric W. Biederman
2023-09-08 23:10 ` [PATCH 28/32] builtin/index-pack: Add a simple oid index Eric W. Biederman
2023-09-08 23:10 ` [PATCH 29/32] builtin/index-pack: Compute the compatibility hash Eric W. Biederman
2023-09-08 23:10 ` [PATCH 30/32] builtin/index-pack: Make the stack in compute_compat_oid explicit Eric W. Biederman
2023-09-08 23:10 ` [PATCH 31/32] unpack-objects: Update to compute and write the compatibility hashes Eric W. Biederman
2023-09-08 23:10 ` [PATCH 32/32] object-file-convert: Implement repo_submodule_oid_to_algop Eric W. Biederman
2023-09-09 12:58 ` [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
2023-09-10 15:38 ` brian m. carlson
2023-09-10 18:20   ` Eric W. Biederman
2023-09-11  6:37 ` Junio C Hamano
2023-09-11 16:13   ` Eric W. Biederman
2023-09-11 22:05     ` brian m. carlson
2023-09-12 21:19       ` Eric W. Biederman
2023-09-12 16:20     ` Junio C Hamano
2023-09-14 19:57       ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).