Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup
@ 2024-03-25 17:24 Taylor Blau
  2024-03-25 17:24 ` [PATCH 01/11] midx-write: initial commit Taylor Blau
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

This is a small-ish series that I worked on to split the functions from
midx.c which are responsible for writing MIDX files into a separate
compilation unit, midx-write.c.

This is done for a couple of reasons:

  - It reduces the size of midx.c, which is already quite large, thus
    making it easier to read.

  - It more clearly separates responsibility between the two, similar to
    the division between pack-bitmap.c, and pack-bitmap-write.c.

Most importantly, it makes midx.c easier to read in preparation for
future changes that I hope to make to midx.c in order to support
incremental MIDXs in a repository.

The series is structured as follows:

  - The first seven patches move all writing-related functions from
    midx.c to their new home in midx-write.c.

  - The remaining four patches rewrite `git multi-pack-index repack` in
    terms of `git pack-objects --stdin-packs`, yielding more
    tightly-packed output as a result of the supplemental traversal
    performed by `--stdin-packs`.

The first seven are the main goal of this series, and the remaining four
could be queued separately, depending on how folks feel about them.

Thanks in advance for your review!

Taylor Blau (11):
  midx-write: initial commit
  midx: extern a pair of shared functions
  midx: move `midx_repack` (and related functions) to midx-write.c
  midx: move `expire_midx_packs` to midx-write.c
  midx: move `write_midx_file_only` to midx-write.c
  midx: move `write_midx_file` to midx-write.c
  midx: move `write_midx_internal` (and related functions) to
    midx-write.c
  midx-write.c: avoid directly managed temporary strbuf
  midx-write.c: factor out common want_included_pack() routine
  midx-write.c: check count of packs to repack after grouping
  midx-write.c: use `--stdin-packs` when repacking

 Makefile     |    1 +
 midx-write.c | 1525 ++++++++++++++++++++++++++++++++++++++++++++++++
 midx.c       | 1558 +-------------------------------------------------
 midx.h       |   19 +
 4 files changed, 1559 insertions(+), 1544 deletions(-)
 create mode 100644 midx-write.c


base-commit: 3bd955d26919e149552f34aacf8a4e6368c26cec
-- 
2.44.0.290.g736be63234b

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/11] midx-write: initial commit
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 20:30   ` Junio C Hamano
  2024-03-25 17:24 ` [PATCH 02/11] midx: extern a pair of shared functions Taylor Blau
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

Introduce a new empty midx-write.c source file. Similar to the
relationship between "pack-bitmap.c" and "pack-bitmap-write.c", this
source file will hold code that is specific to writing MIDX files as
opposed to reading them (the latter will remain in midx.c).

This is a preparatory step which will reduce the overall size of midx.c
and make it easier to read as we prepare for future changes to that file
(outside the immediate scope of this series).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Makefile     | 1 +
 midx-write.c | 2 ++
 2 files changed, 3 insertions(+)
 create mode 100644 midx-write.c

diff --git a/Makefile b/Makefile
index 4e255c81f2..cf44a964c0 100644
--- a/Makefile
+++ b/Makefile
@@ -1072,6 +1072,7 @@ LIB_OBJS += merge-ort-wrappers.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += merge.o
 LIB_OBJS += midx.o
+LIB_OBJS += midx-write.o
 LIB_OBJS += name-hash.o
 LIB_OBJS += negotiator/default.o
 LIB_OBJS += negotiator/noop.o
diff --git a/midx-write.c b/midx-write.c
new file mode 100644
index 0000000000..214179d308
--- /dev/null
+++ b/midx-write.c
@@ -0,0 +1,2 @@
+#include "git-compat-util.h"
+#include "midx.h"
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/11] midx: extern a pair of shared functions
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
  2024-03-25 17:24 ` [PATCH 01/11] midx-write: initial commit Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 17:24 ` [PATCH 03/11] midx: move `midx_repack` (and related functions) to midx-write.c Taylor Blau
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

The following commits will incrementally move code specific to writing
MIDXs from midx.c to their new home in midx-write.c.

Prepare for that by externing a pair of functions which:

  - will (temporarily) need to be called from both midx.c and
    midx-write.c, but

  - are implementation details that should not be exposed via the midx.h
    header.

Declare these functions as extern within midx-write.c, and introduce a
similar (non-extern) declaration within midx.c. This change will be
effectively reverted by the end of this series.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 10 ++++++++++
 midx.c       | 24 +++++++++++++++++-------
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 214179d308..4aab273243 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1,2 +1,12 @@
 #include "git-compat-util.h"
 #include "midx.h"
+
+extern int write_midx_internal(const char *object_dir,
+			       struct string_list *packs_to_include,
+			       struct string_list *packs_to_drop,
+			       const char *preferred_pack_name,
+			       const char *refs_snapshot,
+			       unsigned flags);
+
+extern struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
+							const char *object_dir);
diff --git a/midx.c b/midx.c
index 85e1c2cd12..5f22f01716 100644
--- a/midx.c
+++ b/midx.c
@@ -23,6 +23,16 @@
 #include "list-objects.h"
 #include "pack-revindex.h"
 
+struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
+							const char *object_dir);
+
+int write_midx_internal(const char *object_dir,
+			       struct string_list *packs_to_include,
+			       struct string_list *packs_to_drop,
+			       const char *preferred_pack_name,
+			       const char *refs_snapshot,
+			       unsigned flags);
+
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
 #define MIDX_BYTE_FILE_VERSION 4
@@ -1347,7 +1357,7 @@ static int write_midx_bitmap(const char *midx_name,
 	return ret;
 }
 
-static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
+struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 							const char *object_dir)
 {
 	struct multi_pack_index *result = NULL;
@@ -1372,12 +1382,12 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 	return result;
 }
 
-static int write_midx_internal(const char *object_dir,
-			       struct string_list *packs_to_include,
-			       struct string_list *packs_to_drop,
-			       const char *preferred_pack_name,
-			       const char *refs_snapshot,
-			       unsigned flags)
+int write_midx_internal(const char *object_dir,
+			struct string_list *packs_to_include,
+			struct string_list *packs_to_drop,
+			const char *preferred_pack_name,
+			const char *refs_snapshot,
+			unsigned flags)
 {
 	struct strbuf midx_name = STRBUF_INIT;
 	unsigned char midx_hash[GIT_MAX_RAWSZ];
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/11] midx: move `midx_repack` (and related functions) to midx-write.c
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
  2024-03-25 17:24 ` [PATCH 01/11] midx-write: initial commit Taylor Blau
  2024-03-25 17:24 ` [PATCH 02/11] midx: extern a pair of shared functions Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 17:24 ` [PATCH 04/11] midx: move `expire_midx_packs` " Taylor Blau
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

Move `midx_repack()`, the main function which implements the sub-command
'git multi-pack-index repack' into midx-write.c.

This patch does not introduce any behavioral changes and is best viewed
with `--color-moved`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 202 +++++++++++++++++++++++++++++++++++++++++++++++++++
 midx.c       | 196 -------------------------------------------------
 2 files changed, 202 insertions(+), 196 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 4aab273243..6dd58be7e0 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1,5 +1,11 @@
 #include "git-compat-util.h"
+#include "config.h"
+#include "hex.h"
+#include "packfile.h"
 #include "midx.h"
+#include "run-command.h"
+#include "pack-bitmap.h"
+#include "revision.h"
 
 extern int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_include,
@@ -10,3 +16,199 @@ extern int write_midx_internal(const char *object_dir,
 
 extern struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 							const char *object_dir);
+
+struct repack_info {
+	timestamp_t mtime;
+	uint32_t referenced_objects;
+	uint32_t pack_int_id;
+};
+
+static int compare_by_mtime(const void *a_, const void *b_)
+{
+	const struct repack_info *a, *b;
+
+	a = (const struct repack_info *)a_;
+	b = (const struct repack_info *)b_;
+
+	if (a->mtime < b->mtime)
+		return -1;
+	if (a->mtime > b->mtime)
+		return 1;
+	return 0;
+}
+
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
+				   unsigned char *include_pack)
+{
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
+		if (m->packs[i]->is_cruft)
+			continue;
+
+		include_pack[i] = 1;
+		count++;
+	}
+
+	return count < 2;
+}
+
+static int fill_included_packs_batch(struct repository *r,
+				     struct multi_pack_index *m,
+				     unsigned char *include_pack,
+				     size_t batch_size)
+{
+	uint32_t i, packs_to_repack;
+	size_t total_size;
+	struct repack_info *pack_info;
+	int pack_kept_objects = 0;
+
+	CALLOC_ARRAY(pack_info, m->num_packs);
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		pack_info[i].pack_int_id = i;
+
+		if (prepare_midx_pack(r, m, i))
+			continue;
+
+		pack_info[i].mtime = m->packs[i]->mtime;
+	}
+
+	for (i = 0; i < m->num_objects; i++) {
+		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
+		pack_info[pack_int_id].referenced_objects++;
+	}
+
+	QSORT(pack_info, m->num_packs, compare_by_mtime);
+
+	total_size = 0;
+	packs_to_repack = 0;
+	for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
+		int pack_int_id = pack_info[i].pack_int_id;
+		struct packed_git *p = m->packs[pack_int_id];
+		size_t expected_size;
+
+		if (!p)
+			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
+		if (p->is_cruft)
+			continue;
+		if (open_pack_index(p) || !p->num_objects)
+			continue;
+
+		expected_size = st_mult(p->pack_size,
+					pack_info[i].referenced_objects);
+		expected_size /= p->num_objects;
+
+		if (expected_size >= batch_size)
+			continue;
+
+		packs_to_repack++;
+		total_size += expected_size;
+		include_pack[pack_int_id] = 1;
+	}
+
+	free(pack_info);
+
+	if (packs_to_repack < 2)
+		return 1;
+
+	return 0;
+}
+
+int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
+{
+	int result = 0;
+	uint32_t i;
+	unsigned char *include_pack;
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	FILE *cmd_in;
+	struct strbuf base_name = STRBUF_INIT;
+	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
+
+	/*
+	 * When updating the default for these configuration
+	 * variables in builtin/repack.c, these must be adjusted
+	 * to match.
+	 */
+	int delta_base_offset = 1;
+	int use_delta_islands = 0;
+
+	if (!m)
+		return 0;
+
+	CALLOC_ARRAY(include_pack, m->num_packs);
+
+	if (batch_size) {
+		if (fill_included_packs_batch(r, m, include_pack, batch_size))
+			goto cleanup;
+	} else if (fill_included_packs_all(r, m, include_pack))
+		goto cleanup;
+
+	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
+	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
+
+	strvec_push(&cmd.args, "pack-objects");
+
+	strbuf_addstr(&base_name, object_dir);
+	strbuf_addstr(&base_name, "/pack/pack");
+	strvec_push(&cmd.args, base_name.buf);
+
+	if (delta_base_offset)
+		strvec_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		strvec_push(&cmd.args, "--delta-islands");
+
+	if (flags & MIDX_PROGRESS)
+		strvec_push(&cmd.args, "--progress");
+	else
+		strvec_push(&cmd.args, "-q");
+
+	strbuf_release(&base_name);
+
+	cmd.git_cmd = 1;
+	cmd.in = cmd.out = -1;
+
+	if (start_command(&cmd)) {
+		error(_("could not start pack-objects"));
+		result = 1;
+		goto cleanup;
+	}
+
+	cmd_in = xfdopen(cmd.in, "w");
+
+	for (i = 0; i < m->num_objects; i++) {
+		struct object_id oid;
+		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
+
+		if (!include_pack[pack_int_id])
+			continue;
+
+		nth_midxed_object_oid(&oid, m, i);
+		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
+	}
+	fclose(cmd_in);
+
+	if (finish_command(&cmd)) {
+		error(_("could not finish pack-objects"));
+		result = 1;
+		goto cleanup;
+	}
+
+	result = write_midx_internal(object_dir, NULL, NULL, NULL, NULL, flags);
+
+cleanup:
+	free(include_pack);
+	return result;
+}
diff --git a/midx.c b/midx.c
index 5f22f01716..3bd8c58642 100644
--- a/midx.c
+++ b/midx.c
@@ -2055,199 +2055,3 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 
 	return result;
 }
-
-struct repack_info {
-	timestamp_t mtime;
-	uint32_t referenced_objects;
-	uint32_t pack_int_id;
-};
-
-static int compare_by_mtime(const void *a_, const void *b_)
-{
-	const struct repack_info *a, *b;
-
-	a = (const struct repack_info *)a_;
-	b = (const struct repack_info *)b_;
-
-	if (a->mtime < b->mtime)
-		return -1;
-	if (a->mtime > b->mtime)
-		return 1;
-	return 0;
-}
-
-static int fill_included_packs_all(struct repository *r,
-				   struct multi_pack_index *m,
-				   unsigned char *include_pack)
-{
-	uint32_t i, count = 0;
-	int pack_kept_objects = 0;
-
-	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
-
-	for (i = 0; i < m->num_packs; i++) {
-		if (prepare_midx_pack(r, m, i))
-			continue;
-		if (!pack_kept_objects && m->packs[i]->pack_keep)
-			continue;
-		if (m->packs[i]->is_cruft)
-			continue;
-
-		include_pack[i] = 1;
-		count++;
-	}
-
-	return count < 2;
-}
-
-static int fill_included_packs_batch(struct repository *r,
-				     struct multi_pack_index *m,
-				     unsigned char *include_pack,
-				     size_t batch_size)
-{
-	uint32_t i, packs_to_repack;
-	size_t total_size;
-	struct repack_info *pack_info;
-	int pack_kept_objects = 0;
-
-	CALLOC_ARRAY(pack_info, m->num_packs);
-
-	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
-
-	for (i = 0; i < m->num_packs; i++) {
-		pack_info[i].pack_int_id = i;
-
-		if (prepare_midx_pack(r, m, i))
-			continue;
-
-		pack_info[i].mtime = m->packs[i]->mtime;
-	}
-
-	for (i = 0; i < m->num_objects; i++) {
-		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
-		pack_info[pack_int_id].referenced_objects++;
-	}
-
-	QSORT(pack_info, m->num_packs, compare_by_mtime);
-
-	total_size = 0;
-	packs_to_repack = 0;
-	for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
-		int pack_int_id = pack_info[i].pack_int_id;
-		struct packed_git *p = m->packs[pack_int_id];
-		size_t expected_size;
-
-		if (!p)
-			continue;
-		if (!pack_kept_objects && p->pack_keep)
-			continue;
-		if (p->is_cruft)
-			continue;
-		if (open_pack_index(p) || !p->num_objects)
-			continue;
-
-		expected_size = st_mult(p->pack_size,
-					pack_info[i].referenced_objects);
-		expected_size /= p->num_objects;
-
-		if (expected_size >= batch_size)
-			continue;
-
-		packs_to_repack++;
-		total_size += expected_size;
-		include_pack[pack_int_id] = 1;
-	}
-
-	free(pack_info);
-
-	if (packs_to_repack < 2)
-		return 1;
-
-	return 0;
-}
-
-int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
-{
-	int result = 0;
-	uint32_t i;
-	unsigned char *include_pack;
-	struct child_process cmd = CHILD_PROCESS_INIT;
-	FILE *cmd_in;
-	struct strbuf base_name = STRBUF_INIT;
-	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
-
-	/*
-	 * When updating the default for these configuration
-	 * variables in builtin/repack.c, these must be adjusted
-	 * to match.
-	 */
-	int delta_base_offset = 1;
-	int use_delta_islands = 0;
-
-	if (!m)
-		return 0;
-
-	CALLOC_ARRAY(include_pack, m->num_packs);
-
-	if (batch_size) {
-		if (fill_included_packs_batch(r, m, include_pack, batch_size))
-			goto cleanup;
-	} else if (fill_included_packs_all(r, m, include_pack))
-		goto cleanup;
-
-	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
-	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
-
-	strvec_push(&cmd.args, "pack-objects");
-
-	strbuf_addstr(&base_name, object_dir);
-	strbuf_addstr(&base_name, "/pack/pack");
-	strvec_push(&cmd.args, base_name.buf);
-
-	if (delta_base_offset)
-		strvec_push(&cmd.args, "--delta-base-offset");
-	if (use_delta_islands)
-		strvec_push(&cmd.args, "--delta-islands");
-
-	if (flags & MIDX_PROGRESS)
-		strvec_push(&cmd.args, "--progress");
-	else
-		strvec_push(&cmd.args, "-q");
-
-	strbuf_release(&base_name);
-
-	cmd.git_cmd = 1;
-	cmd.in = cmd.out = -1;
-
-	if (start_command(&cmd)) {
-		error(_("could not start pack-objects"));
-		result = 1;
-		goto cleanup;
-	}
-
-	cmd_in = xfdopen(cmd.in, "w");
-
-	for (i = 0; i < m->num_objects; i++) {
-		struct object_id oid;
-		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
-
-		if (!include_pack[pack_int_id])
-			continue;
-
-		nth_midxed_object_oid(&oid, m, i);
-		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
-	}
-	fclose(cmd_in);
-
-	if (finish_command(&cmd)) {
-		error(_("could not finish pack-objects"));
-		result = 1;
-		goto cleanup;
-	}
-
-	result = write_midx_internal(object_dir, NULL, NULL, NULL, NULL, flags);
-
-cleanup:
-	free(include_pack);
-	return result;
-}
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/11] midx: move `expire_midx_packs` to midx-write.c
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (2 preceding siblings ...)
  2024-03-25 17:24 ` [PATCH 03/11] midx: move `midx_repack` (and related functions) to midx-write.c Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 17:24 ` [PATCH 05/11] midx: move `write_midx_file_only` " Taylor Blau
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

Move `expire_midx_packs()`, the main function which implements the
sub-command 'git multi-pack-index expire' into midx-write.c.

Similar to the previous patch, this patch does not introduce any
behavioral changes and is best viewed with `--color-moved`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 midx.c       | 57 ---------------------------------------------------
 2 files changed, 58 insertions(+), 57 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 6dd58be7e0..d679e0a131 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -3,6 +3,7 @@
 #include "hex.h"
 #include "packfile.h"
 #include "midx.h"
+#include "progress.h"
 #include "run-command.h"
 #include "pack-bitmap.h"
 #include "revision.h"
@@ -17,6 +18,63 @@ extern int write_midx_internal(const char *object_dir,
 extern struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 							const char *object_dir);
 
+int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
+{
+	uint32_t i, *count, result = 0;
+	struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
+	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
+	struct progress *progress = NULL;
+
+	if (!m)
+		return 0;
+
+	CALLOC_ARRAY(count, m->num_packs);
+
+	if (flags & MIDX_PROGRESS)
+		progress = start_delayed_progress(_("Counting referenced objects"),
+					  m->num_objects);
+	for (i = 0; i < m->num_objects; i++) {
+		int pack_int_id = nth_midxed_pack_int_id(m, i);
+		count[pack_int_id]++;
+		display_progress(progress, i + 1);
+	}
+	stop_progress(&progress);
+
+	if (flags & MIDX_PROGRESS)
+		progress = start_delayed_progress(_("Finding and deleting unreferenced packfiles"),
+					  m->num_packs);
+	for (i = 0; i < m->num_packs; i++) {
+		char *pack_name;
+		display_progress(progress, i + 1);
+
+		if (count[i])
+			continue;
+
+		if (prepare_midx_pack(r, m, i))
+			continue;
+
+		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
+			continue;
+
+		pack_name = xstrdup(m->packs[i]->pack_name);
+		close_pack(m->packs[i]);
+
+		string_list_insert(&packs_to_drop, m->pack_names[i]);
+		unlink_pack_path(pack_name, 0);
+		free(pack_name);
+	}
+	stop_progress(&progress);
+
+	free(count);
+
+	if (packs_to_drop.nr)
+		result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, NULL, flags);
+
+	string_list_clear(&packs_to_drop, 0);
+
+	return result;
+}
+
 struct repack_info {
 	timestamp_t mtime;
 	uint32_t referenced_objects;
diff --git a/midx.c b/midx.c
index 3bd8c58642..5936bc5b9e 100644
--- a/midx.c
+++ b/midx.c
@@ -1998,60 +1998,3 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 
 	return verify_midx_error;
 }
-
-int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
-{
-	uint32_t i, *count, result = 0;
-	struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
-	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
-	struct progress *progress = NULL;
-
-	if (!m)
-		return 0;
-
-	CALLOC_ARRAY(count, m->num_packs);
-
-	if (flags & MIDX_PROGRESS)
-		progress = start_delayed_progress(_("Counting referenced objects"),
-					  m->num_objects);
-	for (i = 0; i < m->num_objects; i++) {
-		int pack_int_id = nth_midxed_pack_int_id(m, i);
-		count[pack_int_id]++;
-		display_progress(progress, i + 1);
-	}
-	stop_progress(&progress);
-
-	if (flags & MIDX_PROGRESS)
-		progress = start_delayed_progress(_("Finding and deleting unreferenced packfiles"),
-					  m->num_packs);
-	for (i = 0; i < m->num_packs; i++) {
-		char *pack_name;
-		display_progress(progress, i + 1);
-
-		if (count[i])
-			continue;
-
-		if (prepare_midx_pack(r, m, i))
-			continue;
-
-		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
-			continue;
-
-		pack_name = xstrdup(m->packs[i]->pack_name);
-		close_pack(m->packs[i]);
-
-		string_list_insert(&packs_to_drop, m->pack_names[i]);
-		unlink_pack_path(pack_name, 0);
-		free(pack_name);
-	}
-	stop_progress(&progress);
-
-	free(count);
-
-	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, NULL, flags);
-
-	string_list_clear(&packs_to_drop, 0);
-
-	return result;
-}
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/11] midx: move `write_midx_file_only` to midx-write.c
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (3 preceding siblings ...)
  2024-03-25 17:24 ` [PATCH 04/11] midx: move `expire_midx_packs` " Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 17:24 ` [PATCH 06/11] midx: move `write_midx_file` " Taylor Blau
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

Prepare to move the last substantial function related to writing from
midx.c to to midx-write.c by first moving a thin wrapper around it.

Like previous changes, this patch does not introduce any behavioral
changes and is best viewed with `--color-moved`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 10 ++++++++++
 midx.c       | 10 ----------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index d679e0a131..635e6af193 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -18,6 +18,16 @@ extern int write_midx_internal(const char *object_dir,
 extern struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 							const char *object_dir);
 
+int write_midx_file_only(const char *object_dir,
+			 struct string_list *packs_to_include,
+			 const char *preferred_pack_name,
+			 const char *refs_snapshot,
+			 unsigned flags)
+{
+	return write_midx_internal(object_dir, packs_to_include, NULL,
+				   preferred_pack_name, refs_snapshot, flags);
+}
+
 int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
 {
 	uint32_t i, *count, result = 0;
diff --git a/midx.c b/midx.c
index 5936bc5b9e..702eca805a 100644
--- a/midx.c
+++ b/midx.c
@@ -1764,16 +1764,6 @@ int write_midx_file(const char *object_dir,
 				   refs_snapshot, flags);
 }
 
-int write_midx_file_only(const char *object_dir,
-			 struct string_list *packs_to_include,
-			 const char *preferred_pack_name,
-			 const char *refs_snapshot,
-			 unsigned flags)
-{
-	return write_midx_internal(object_dir, packs_to_include, NULL,
-				   preferred_pack_name, refs_snapshot, flags);
-}
-
 struct clear_midx_data {
 	char *keep;
 	const char *ext;
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/11] midx: move `write_midx_file` to midx-write.c
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (4 preceding siblings ...)
  2024-03-25 17:24 ` [PATCH 05/11] midx: move `write_midx_file_only` " Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 17:24 ` [PATCH 07/11] midx: move `write_midx_internal` (and related functions) " Taylor Blau
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

Prepare to move the last substantial function related to writing from
midx.c to to midx-write.c by moving another thin wrapper around it.

Like previous changes, this patch does not introduce any behavioral
changes and is best viewed with `--color-moved`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 9 +++++++++
 midx.c       | 9 ---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 635e6af193..3d7697d8a2 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -18,6 +18,15 @@ extern int write_midx_internal(const char *object_dir,
 extern struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 							const char *object_dir);
 
+int write_midx_file(const char *object_dir,
+		    const char *preferred_pack_name,
+		    const char *refs_snapshot,
+		    unsigned flags)
+{
+	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
+				   refs_snapshot, flags);
+}
+
 int write_midx_file_only(const char *object_dir,
 			 struct string_list *packs_to_include,
 			 const char *preferred_pack_name,
diff --git a/midx.c b/midx.c
index 702eca805a..39b5c86736 100644
--- a/midx.c
+++ b/midx.c
@@ -1755,15 +1755,6 @@ int write_midx_internal(const char *object_dir,
 	return result;
 }
 
-int write_midx_file(const char *object_dir,
-		    const char *preferred_pack_name,
-		    const char *refs_snapshot,
-		    unsigned flags)
-{
-	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
-				   refs_snapshot, flags);
-}
-
 struct clear_midx_data {
 	char *keep;
 	const char *ext;
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/11] midx: move `write_midx_internal` (and related functions) to midx-write.c
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (5 preceding siblings ...)
  2024-03-25 17:24 ` [PATCH 06/11] midx: move `write_midx_file` " Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 17:24 ` [PATCH 08/11] midx-write.c: avoid directly managed temporary strbuf Taylor Blau
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

Move the last writing-related function from midx.c to midx-write.c. This
patch moves `write_midx_internal()`, along with all of the functions
used to implement it into midx-write.c.

Like previous patch, this patch too does not introduce any functional
changes, and is best moved with `--color-moved`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 1243 ++++++++++++++++++++++++++++++++++++++++++++++-
 midx.c       | 1296 +-------------------------------------------------
 midx.h       |   19 +
 3 files changed, 1272 insertions(+), 1286 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 3d7697d8a2..c812156cbd 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1,22 +1,1257 @@
 #include "git-compat-util.h"
+#include "abspath.h"
 #include "config.h"
 #include "hex.h"
+#include "lockfile.h"
 #include "packfile.h"
+#include "object-file.h"
+#include "hash-lookup.h"
 #include "midx.h"
 #include "progress.h"
+#include "trace2.h"
 #include "run-command.h"
+#include "chunk-format.h"
 #include "pack-bitmap.h"
+#include "refs.h"
 #include "revision.h"
+#include "list-objects.h"
 
-extern int write_midx_internal(const char *object_dir,
+#define PACK_EXPIRED UINT_MAX
+#define BITMAP_POS_UNKNOWN (~((uint32_t)0))
+#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
+#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
+
+extern int midx_checksum_valid(struct multi_pack_index *m);
+extern void clear_midx_files_ext(const char *object_dir, const char *ext,
+				 unsigned char *keep_hash);
+extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
+				const char *idx_name);
+
+static size_t write_midx_header(struct hashfile *f,
+				unsigned char num_chunks,
+				uint32_t num_packs)
+{
+	hashwrite_be32(f, MIDX_SIGNATURE);
+	hashwrite_u8(f, MIDX_VERSION);
+	hashwrite_u8(f, oid_version(the_hash_algo));
+	hashwrite_u8(f, num_chunks);
+	hashwrite_u8(f, 0); /* unused */
+	hashwrite_be32(f, num_packs);
+
+	return MIDX_HEADER_SIZE;
+}
+
+struct pack_info {
+	uint32_t orig_pack_int_id;
+	char *pack_name;
+	struct packed_git *p;
+
+	uint32_t bitmap_pos;
+	uint32_t bitmap_nr;
+
+	unsigned expired : 1;
+};
+
+static void fill_pack_info(struct pack_info *info,
+			   struct packed_git *p, const char *pack_name,
+			   uint32_t orig_pack_int_id)
+{
+	memset(info, 0, sizeof(struct pack_info));
+
+	info->orig_pack_int_id = orig_pack_int_id;
+	info->pack_name = xstrdup(pack_name);
+	info->p = p;
+	info->bitmap_pos = BITMAP_POS_UNKNOWN;
+}
+
+static int pack_info_compare(const void *_a, const void *_b)
+{
+	struct pack_info *a = (struct pack_info *)_a;
+	struct pack_info *b = (struct pack_info *)_b;
+	return strcmp(a->pack_name, b->pack_name);
+}
+
+static int idx_or_pack_name_cmp(const void *_va, const void *_vb)
+{
+	const char *pack_name = _va;
+	const struct pack_info *compar = _vb;
+
+	return cmp_idx_or_pack_name(pack_name, compar->pack_name);
+}
+
+struct write_midx_context {
+	struct pack_info *info;
+	size_t nr;
+	size_t alloc;
+	struct multi_pack_index *m;
+	struct progress *progress;
+	unsigned pack_paths_checked;
+
+	struct pack_midx_entry *entries;
+	size_t entries_nr;
+
+	uint32_t *pack_perm;
+	uint32_t *pack_order;
+	unsigned large_offsets_needed:1;
+	uint32_t num_large_offsets;
+
+	int preferred_pack_idx;
+
+	struct string_list *to_include;
+};
+
+static void add_pack_to_midx(const char *full_path, size_t full_path_len,
+			     const char *file_name, void *data)
+{
+	struct write_midx_context *ctx = data;
+	struct packed_git *p;
+
+	if (ends_with(file_name, ".idx")) {
+		display_progress(ctx->progress, ++ctx->pack_paths_checked);
+		/*
+		 * Note that at most one of ctx->m and ctx->to_include are set,
+		 * so we are testing midx_contains_pack() and
+		 * string_list_has_string() independently (guarded by the
+		 * appropriate NULL checks).
+		 *
+		 * We could support passing to_include while reusing an existing
+		 * MIDX, but don't currently since the reuse process drags
+		 * forward all packs from an existing MIDX (without checking
+		 * whether or not they appear in the to_include list).
+		 *
+		 * If we added support for that, these next two conditional
+		 * should be performed independently (likely checking
+		 * to_include before the existing MIDX).
+		 */
+		if (ctx->m && midx_contains_pack(ctx->m, file_name))
+			return;
+		else if (ctx->to_include &&
+			 !string_list_has_string(ctx->to_include, file_name))
+			return;
+
+		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+
+		p = add_packed_git(full_path, full_path_len, 0);
+		if (!p) {
+			warning(_("failed to add packfile '%s'"),
+				full_path);
+			return;
+		}
+
+		if (open_pack_index(p)) {
+			warning(_("failed to open pack-index '%s'"),
+				full_path);
+			close_pack(p);
+			free(p);
+			return;
+		}
+
+		fill_pack_info(&ctx->info[ctx->nr], p, file_name, ctx->nr);
+		ctx->nr++;
+	}
+}
+
+struct pack_midx_entry {
+	struct object_id oid;
+	uint32_t pack_int_id;
+	time_t pack_mtime;
+	uint64_t offset;
+	unsigned preferred : 1;
+};
+
+static int midx_oid_compare(const void *_a, const void *_b)
+{
+	const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
+	const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
+	int cmp = oidcmp(&a->oid, &b->oid);
+
+	if (cmp)
+		return cmp;
+
+	/* Sort objects in a preferred pack first when multiple copies exist. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
+	if (a->pack_mtime > b->pack_mtime)
+		return -1;
+	else if (a->pack_mtime < b->pack_mtime)
+		return 1;
+
+	return a->pack_int_id - b->pack_int_id;
+}
+
+static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
+				      struct pack_midx_entry *e,
+				      uint32_t pos)
+{
+	if (pos >= m->num_objects)
+		return 1;
+
+	nth_midxed_object_oid(&e->oid, m, pos);
+	e->pack_int_id = nth_midxed_pack_int_id(m, pos);
+	e->offset = nth_midxed_offset(m, pos);
+
+	/* consider objects in midx to be from "old" packs */
+	e->pack_mtime = 0;
+	return 0;
+}
+
+static void fill_pack_entry(uint32_t pack_int_id,
+			    struct packed_git *p,
+			    uint32_t cur_object,
+			    struct pack_midx_entry *entry,
+			    int preferred)
+{
+	if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
+		die(_("failed to locate object %d in packfile"), cur_object);
+
+	entry->pack_int_id = pack_int_id;
+	entry->pack_mtime = p->mtime;
+
+	entry->offset = nth_packed_object_offset(p, cur_object);
+	entry->preferred = !!preferred;
+}
+
+struct midx_fanout {
+	struct pack_midx_entry *entries;
+	size_t nr, alloc;
+};
+
+static void midx_fanout_grow(struct midx_fanout *fanout, size_t nr)
+{
+	if (nr < fanout->nr)
+		BUG("negative growth in midx_fanout_grow() (%"PRIuMAX" < %"PRIuMAX")",
+		    (uintmax_t)nr, (uintmax_t)fanout->nr);
+	ALLOC_GROW(fanout->entries, nr, fanout->alloc);
+}
+
+static void midx_fanout_sort(struct midx_fanout *fanout)
+{
+	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
+}
+
+static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
+					struct multi_pack_index *m,
+					uint32_t cur_fanout,
+					int preferred_pack)
+{
+	uint32_t start = 0, end;
+	uint32_t cur_object;
+
+	if (cur_fanout)
+		start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
+	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
+
+	for (cur_object = start; cur_object < end; cur_object++) {
+		if ((preferred_pack > -1) &&
+		    (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
+			/*
+			 * Objects from preferred packs are added
+			 * separately.
+			 */
+			continue;
+		}
+
+		midx_fanout_grow(fanout, fanout->nr + 1);
+		nth_midxed_pack_midx_entry(m,
+					   &fanout->entries[fanout->nr],
+					   cur_object);
+		fanout->entries[fanout->nr].preferred = 0;
+		fanout->nr++;
+	}
+}
+
+static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
+					struct pack_info *info,
+					uint32_t cur_pack,
+					int preferred,
+					uint32_t cur_fanout)
+{
+	struct packed_git *pack = info[cur_pack].p;
+	uint32_t start = 0, end;
+	uint32_t cur_object;
+
+	if (cur_fanout)
+		start = get_pack_fanout(pack, cur_fanout - 1);
+	end = get_pack_fanout(pack, cur_fanout);
+
+	for (cur_object = start; cur_object < end; cur_object++) {
+		midx_fanout_grow(fanout, fanout->nr + 1);
+		fill_pack_entry(cur_pack,
+				info[cur_pack].p,
+				cur_object,
+				&fanout->entries[fanout->nr],
+				preferred);
+		fanout->nr++;
+	}
+}
+
+/*
+ * It is possible to artificially get into a state where there are many
+ * duplicate copies of objects. That can create high memory pressure if
+ * we are to create a list of all objects before de-duplication. To reduce
+ * this memory pressure without a significant performance drop, automatically
+ * group objects by the first byte of their object id. Use the IDX fanout
+ * tables to group the data, copy to a local array, then sort.
+ *
+ * Copy only the de-duplicated entries (selected by most-recent modified time
+ * of a packfile containing the object).
+ */
+static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
+						  struct pack_info *info,
+						  uint32_t nr_packs,
+						  size_t *nr_objects,
+						  int preferred_pack)
+{
+	uint32_t cur_fanout, cur_pack, cur_object;
+	size_t alloc_objects, total_objects = 0;
+	struct midx_fanout fanout = { 0 };
+	struct pack_midx_entry *deduplicated_entries = NULL;
+	uint32_t start_pack = m ? m->num_packs : 0;
+
+	for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
+		total_objects = st_add(total_objects,
+				       info[cur_pack].p->num_objects);
+
+	/*
+	 * As we de-duplicate by fanout value, we expect the fanout
+	 * slices to be evenly distributed, with some noise. Hence,
+	 * allocate slightly more than one 256th.
+	 */
+	alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
+
+	ALLOC_ARRAY(fanout.entries, fanout.alloc);
+	ALLOC_ARRAY(deduplicated_entries, alloc_objects);
+	*nr_objects = 0;
+
+	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
+		fanout.nr = 0;
+
+		if (m)
+			midx_fanout_add_midx_fanout(&fanout, m, cur_fanout,
+						    preferred_pack);
+
+		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
+			int preferred = cur_pack == preferred_pack;
+			midx_fanout_add_pack_fanout(&fanout,
+						    info, cur_pack,
+						    preferred, cur_fanout);
+		}
+
+		if (-1 < preferred_pack && preferred_pack < start_pack)
+			midx_fanout_add_pack_fanout(&fanout, info,
+						    preferred_pack, 1,
+						    cur_fanout);
+
+		midx_fanout_sort(&fanout);
+
+		/*
+		 * The batch is now sorted by OID and then mtime (descending).
+		 * Take only the first duplicate.
+		 */
+		for (cur_object = 0; cur_object < fanout.nr; cur_object++) {
+			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
+						&fanout.entries[cur_object].oid))
+				continue;
+
+			ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1),
+				   alloc_objects);
+			memcpy(&deduplicated_entries[*nr_objects],
+			       &fanout.entries[cur_object],
+			       sizeof(struct pack_midx_entry));
+			(*nr_objects)++;
+		}
+	}
+
+	free(fanout.entries);
+	return deduplicated_entries;
+}
+
+static int write_midx_pack_names(struct hashfile *f, void *data)
+{
+	struct write_midx_context *ctx = data;
+	uint32_t i;
+	unsigned char padding[MIDX_CHUNK_ALIGNMENT];
+	size_t written = 0;
+
+	for (i = 0; i < ctx->nr; i++) {
+		size_t writelen;
+
+		if (ctx->info[i].expired)
+			continue;
+
+		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
+			BUG("incorrect pack-file order: %s before %s",
+			    ctx->info[i - 1].pack_name,
+			    ctx->info[i].pack_name);
+
+		writelen = strlen(ctx->info[i].pack_name) + 1;
+		hashwrite(f, ctx->info[i].pack_name, writelen);
+		written += writelen;
+	}
+
+	/* add padding to be aligned */
+	i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT);
+	if (i < MIDX_CHUNK_ALIGNMENT) {
+		memset(padding, 0, sizeof(padding));
+		hashwrite(f, padding, i);
+	}
+
+	return 0;
+}
+
+static int write_midx_bitmapped_packs(struct hashfile *f, void *data)
+{
+	struct write_midx_context *ctx = data;
+	size_t i;
+
+	for (i = 0; i < ctx->nr; i++) {
+		struct pack_info *pack = &ctx->info[i];
+		if (pack->expired)
+			continue;
+
+		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN && pack->bitmap_nr)
+			BUG("pack '%s' has no bitmap position, but has %d bitmapped object(s)",
+			    pack->pack_name, pack->bitmap_nr);
+
+		hashwrite_be32(f, pack->bitmap_pos);
+		hashwrite_be32(f, pack->bitmap_nr);
+	}
+	return 0;
+}
+
+static int write_midx_oid_fanout(struct hashfile *f,
+				 void *data)
+{
+	struct write_midx_context *ctx = data;
+	struct pack_midx_entry *list = ctx->entries;
+	struct pack_midx_entry *last = ctx->entries + ctx->entries_nr;
+	uint32_t count = 0;
+	uint32_t i;
+
+	/*
+	* Write the first-level table (the list is sorted,
+	* but we use a 256-entry lookup to be able to avoid
+	* having to do eight extra binary search iterations).
+	*/
+	for (i = 0; i < 256; i++) {
+		struct pack_midx_entry *next = list;
+
+		while (next < last && next->oid.hash[0] == i) {
+			count++;
+			next++;
+		}
+
+		hashwrite_be32(f, count);
+		list = next;
+	}
+
+	return 0;
+}
+
+static int write_midx_oid_lookup(struct hashfile *f,
+				 void *data)
+{
+	struct write_midx_context *ctx = data;
+	unsigned char hash_len = the_hash_algo->rawsz;
+	struct pack_midx_entry *list = ctx->entries;
+	uint32_t i;
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *obj = list++;
+
+		if (i < ctx->entries_nr - 1) {
+			struct pack_midx_entry *next = list;
+			if (oidcmp(&obj->oid, &next->oid) >= 0)
+				BUG("OIDs not in order: %s >= %s",
+				    oid_to_hex(&obj->oid),
+				    oid_to_hex(&next->oid));
+		}
+
+		hashwrite(f, obj->oid.hash, (int)hash_len);
+	}
+
+	return 0;
+}
+
+static int write_midx_object_offsets(struct hashfile *f,
+				     void *data)
+{
+	struct write_midx_context *ctx = data;
+	struct pack_midx_entry *list = ctx->entries;
+	uint32_t i, nr_large_offset = 0;
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *obj = list++;
+
+		if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
+			BUG("object %s is in an expired pack with int-id %d",
+			    oid_to_hex(&obj->oid),
+			    obj->pack_int_id);
+
+		hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
+
+		if (ctx->large_offsets_needed && obj->offset >> 31)
+			hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
+		else if (!ctx->large_offsets_needed && obj->offset >> 32)
+			BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!",
+			    oid_to_hex(&obj->oid),
+			    obj->offset);
+		else
+			hashwrite_be32(f, (uint32_t)obj->offset);
+	}
+
+	return 0;
+}
+
+static int write_midx_large_offsets(struct hashfile *f,
+				    void *data)
+{
+	struct write_midx_context *ctx = data;
+	struct pack_midx_entry *list = ctx->entries;
+	struct pack_midx_entry *end = ctx->entries + ctx->entries_nr;
+	uint32_t nr_large_offset = ctx->num_large_offsets;
+
+	while (nr_large_offset) {
+		struct pack_midx_entry *obj;
+		uint64_t offset;
+
+		if (list >= end)
+			BUG("too many large-offset objects");
+
+		obj = list++;
+		offset = obj->offset;
+
+		if (!(offset >> 31))
+			continue;
+
+		hashwrite_be64(f, offset);
+
+		nr_large_offset--;
+	}
+
+	return 0;
+}
+
+static int write_midx_revindex(struct hashfile *f,
+			       void *data)
+{
+	struct write_midx_context *ctx = data;
+	uint32_t i;
+
+	for (i = 0; i < ctx->entries_nr; i++)
+		hashwrite_be32(f, ctx->pack_order[i]);
+
+	return 0;
+}
+
+struct midx_pack_order_data {
+	uint32_t nr;
+	uint32_t pack;
+	off_t offset;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
+{
+	const struct midx_pack_order_data *a = va, *b = vb;
+	if (a->pack < b->pack)
+		return -1;
+	else if (a->pack > b->pack)
+		return 1;
+	else if (a->offset < b->offset)
+		return -1;
+	else if (a->offset > b->offset)
+		return 1;
+	else
+		return 0;
+}
+
+static uint32_t *midx_pack_order(struct write_midx_context *ctx)
+{
+	struct midx_pack_order_data *data;
+	uint32_t *pack_order;
+	uint32_t i;
+
+	trace2_region_enter("midx", "midx_pack_order", the_repository);
+
+	ALLOC_ARRAY(data, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *e = &ctx->entries[i];
+		data[i].nr = i;
+		data[i].pack = ctx->pack_perm[e->pack_int_id];
+		if (!e->preferred)
+			data[i].pack |= (1U << 31);
+		data[i].offset = e->offset;
+	}
+
+	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
+
+	ALLOC_ARRAY(pack_order, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
+		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
+		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
+			pack->bitmap_pos = i;
+		pack->bitmap_nr++;
+		pack_order[i] = data[i].nr;
+	}
+	for (i = 0; i < ctx->nr; i++) {
+		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
+		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
+			pack->bitmap_pos = 0;
+	}
+	free(data);
+
+	trace2_region_leave("midx", "midx_pack_order", the_repository);
+
+	return pack_order;
+}
+
+static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
+				     struct write_midx_context *ctx)
+{
+	struct strbuf buf = STRBUF_INIT;
+	const char *tmp_file;
+
+	trace2_region_enter("midx", "write_midx_reverse_index", the_repository);
+
+	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+
+	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
+					midx_hash, WRITE_REV);
+
+	if (finalize_object_file(tmp_file, buf.buf))
+		die(_("cannot store reverse index file"));
+
+	strbuf_release(&buf);
+
+	trace2_region_leave("midx", "write_midx_reverse_index", the_repository);
+}
+
+static void prepare_midx_packing_data(struct packing_data *pdata,
+				      struct write_midx_context *ctx)
+{
+	uint32_t i;
+
+	trace2_region_enter("midx", "prepare_midx_packing_data", the_repository);
+
+	memset(pdata, 0, sizeof(struct packing_data));
+	prepare_packing_data(the_repository, pdata);
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		struct object_entry *to = packlist_alloc(pdata, &from->oid);
+
+		oe_set_in_pack(pdata, to,
+			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+	}
+
+	trace2_region_leave("midx", "prepare_midx_packing_data", the_repository);
+}
+
+static int add_ref_to_pending(const char *refname,
+			      const struct object_id *oid,
+			      int flag, void *cb_data)
+{
+	struct rev_info *revs = (struct rev_info*)cb_data;
+	struct object_id peeled;
+	struct object *object;
+
+	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
+		warning("symbolic ref is dangling: %s", refname);
+		return 0;
+	}
+
+	if (!peel_iterated_oid(oid, &peeled))
+		oid = &peeled;
+
+	object = parse_object_or_die(oid, refname);
+	if (object->type != OBJ_COMMIT)
+		return 0;
+
+	add_pending_object(revs, object, "");
+	if (bitmap_is_preferred_refname(revs->repo, refname))
+		object->flags |= NEEDS_BITMAP;
+	return 0;
+}
+
+struct bitmap_commit_cb {
+	struct commit **commits;
+	size_t commits_nr, commits_alloc;
+
+	struct write_midx_context *ctx;
+};
+
+static const struct object_id *bitmap_oid_access(size_t index,
+						 const void *_entries)
+{
+	const struct pack_midx_entry *entries = _entries;
+	return &entries[index].oid;
+}
+
+static void bitmap_show_commit(struct commit *commit, void *_data)
+{
+	struct bitmap_commit_cb *data = _data;
+	int pos = oid_pos(&commit->object.oid, data->ctx->entries,
+			  data->ctx->entries_nr,
+			  bitmap_oid_access);
+	if (pos < 0)
+		return;
+
+	ALLOC_GROW(data->commits, data->commits_nr + 1, data->commits_alloc);
+	data->commits[data->commits_nr++] = commit;
+}
+
+static int read_refs_snapshot(const char *refs_snapshot,
+			      struct rev_info *revs)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct object_id oid;
+	FILE *f = xfopen(refs_snapshot, "r");
+
+	while (strbuf_getline(&buf, f) != EOF) {
+		struct object *object;
+		int preferred = 0;
+		char *hex = buf.buf;
+		const char *end = NULL;
+
+		if (buf.len && *buf.buf == '+') {
+			preferred = 1;
+			hex = &buf.buf[1];
+		}
+
+		if (parse_oid_hex(hex, &oid, &end) < 0)
+			die(_("could not parse line: %s"), buf.buf);
+		if (*end)
+			die(_("malformed line: %s"), buf.buf);
+
+		object = parse_object_or_die(&oid, NULL);
+		if (preferred)
+			object->flags |= NEEDS_BITMAP;
+
+		add_pending_object(revs, object, "");
+	}
+
+	fclose(f);
+	strbuf_release(&buf);
+	return 0;
+}
+static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
+						    const char *refs_snapshot,
+						    struct write_midx_context *ctx)
+{
+	struct rev_info revs;
+	struct bitmap_commit_cb cb = {0};
+
+	trace2_region_enter("midx", "find_commits_for_midx_bitmap",
+			    the_repository);
+
+	cb.ctx = ctx;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	if (refs_snapshot) {
+		read_refs_snapshot(refs_snapshot, &revs);
+	} else {
+		setup_revisions(0, NULL, &revs, NULL);
+		for_each_ref(add_ref_to_pending, &revs);
+	}
+
+	/*
+	 * Skipping promisor objects here is intentional, since it only excludes
+	 * them from the list of reachable commits that we want to select from
+	 * when computing the selection of MIDX'd commits to receive bitmaps.
+	 *
+	 * Reachability bitmaps do require that their objects be closed under
+	 * reachability, but fetching any objects missing from promisors at this
+	 * point is too late. But, if one of those objects can be reached from
+	 * an another object that is included in the bitmap, then we will
+	 * complain later that we don't have reachability closure (and fail
+	 * appropriately).
+	 */
+	fetch_if_missing = 0;
+	revs.exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
+	if (indexed_commits_nr_p)
+		*indexed_commits_nr_p = cb.commits_nr;
+
+	release_revisions(&revs);
+
+	trace2_region_leave("midx", "find_commits_for_midx_bitmap",
+			    the_repository);
+
+	return cb.commits;
+}
+
+static int write_midx_bitmap(const char *midx_name,
+			     const unsigned char *midx_hash,
+			     struct packing_data *pdata,
+			     struct commit **commits,
+			     uint32_t commits_nr,
+			     uint32_t *pack_order,
+			     unsigned flags)
+{
+	int ret, i;
+	uint16_t options = 0;
+	struct pack_idx_entry **index;
+	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
+					hash_to_hex(midx_hash));
+
+	trace2_region_enter("midx", "write_midx_bitmap", the_repository);
+
+	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
+		options |= BITMAP_OPT_HASH_CACHE;
+
+	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
+		options |= BITMAP_OPT_LOOKUP_TABLE;
+
+	/*
+	 * Build the MIDX-order index based on pdata.objects (which is already
+	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+	 * this order).
+	 */
+	ALLOC_ARRAY(index, pdata->nr_objects);
+	for (i = 0; i < pdata->nr_objects; i++)
+		index[i] = &pdata->objects[i].idx;
+
+	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+	bitmap_writer_build_type_index(pdata, index, pdata->nr_objects);
+
+	/*
+	 * bitmap_writer_finish expects objects in lex order, but pack_order
+	 * gives us exactly that. use it directly instead of re-sorting the
+	 * array.
+	 *
+	 * This changes the order of objects in 'index' between
+	 * bitmap_writer_build_type_index and bitmap_writer_finish.
+	 *
+	 * The same re-ordering takes place in the single-pack bitmap code via
+	 * write_idx_file(), which is called by finish_tmp_packfile(), which
+	 * happens between bitmap_writer_build_type_index() and
+	 * bitmap_writer_finish().
+	 */
+	for (i = 0; i < pdata->nr_objects; i++)
+		index[pack_order[i]] = &pdata->objects[i].idx;
+
+	bitmap_writer_select_commits(commits, commits_nr, -1);
+	ret = bitmap_writer_build(pdata);
+	if (ret < 0)
+		goto cleanup;
+
+	bitmap_writer_set_checksum(midx_hash);
+	bitmap_writer_finish(index, pdata->nr_objects, bitmap_name, options);
+
+cleanup:
+	free(index);
+	free(bitmap_name);
+
+	trace2_region_leave("midx", "write_midx_bitmap", the_repository);
+
+	return ret;
+}
+
+static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
+							const char *object_dir)
+{
+	struct multi_pack_index *result = NULL;
+	struct multi_pack_index *cur;
+	char *obj_dir_real = real_pathdup(object_dir, 1);
+	struct strbuf cur_path_real = STRBUF_INIT;
+
+	/* Ensure the given object_dir is local, or a known alternate. */
+	find_odb(r, obj_dir_real);
+
+	for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
+		strbuf_realpath(&cur_path_real, cur->object_dir, 1);
+		if (!strcmp(obj_dir_real, cur_path_real.buf)) {
+			result = cur;
+			goto cleanup;
+		}
+	}
+
+cleanup:
+	free(obj_dir_real);
+	strbuf_release(&cur_path_real);
+	return result;
+}
+
+static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_include,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
 			       const char *refs_snapshot,
-			       unsigned flags);
+			       unsigned flags)
+{
+	struct strbuf midx_name = STRBUF_INIT;
+	unsigned char midx_hash[GIT_MAX_RAWSZ];
+	uint32_t i;
+	struct hashfile *f = NULL;
+	struct lock_file lk;
+	struct write_midx_context ctx = { 0 };
+	int bitmapped_packs_concat_len = 0;
+	int pack_name_concat_len = 0;
+	int dropped_packs = 0;
+	int result = 0;
+	struct chunkfile *cf;
 
-extern struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
-							const char *object_dir);
+	trace2_region_enter("midx", "write_midx_internal", the_repository);
+
+	get_midx_filename(&midx_name, object_dir);
+	if (safe_create_leading_directories(midx_name.buf))
+		die_errno(_("unable to create leading directories of %s"),
+			  midx_name.buf);
+
+	if (!packs_to_include) {
+		/*
+		 * Only reference an existing MIDX when not filtering which
+		 * packs to include, since all packs and objects are copied
+		 * blindly from an existing MIDX if one is present.
+		 */
+		ctx.m = lookup_multi_pack_index(the_repository, object_dir);
+	}
+
+	if (ctx.m && !midx_checksum_valid(ctx.m)) {
+		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
+		ctx.m = NULL;
+	}
+
+	ctx.nr = 0;
+	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
+	ctx.info = NULL;
+	ALLOC_ARRAY(ctx.info, ctx.alloc);
+
+	if (ctx.m) {
+		for (i = 0; i < ctx.m->num_packs; i++) {
+			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
+
+			if (flags & MIDX_WRITE_REV_INDEX) {
+				/*
+				 * If generating a reverse index, need to have
+				 * packed_git's loaded to compare their
+				 * mtimes and object count.
+				 */
+				if (prepare_midx_pack(the_repository, ctx.m, i)) {
+					error(_("could not load pack"));
+					result = 1;
+					goto cleanup;
+				}
+
+				if (open_pack_index(ctx.m->packs[i]))
+					die(_("could not open index for %s"),
+					    ctx.m->packs[i]->pack_name);
+			}
+
+			fill_pack_info(&ctx.info[ctx.nr++], ctx.m->packs[i],
+				       ctx.m->pack_names[i], i);
+		}
+	}
+
+	ctx.pack_paths_checked = 0;
+	if (flags & MIDX_PROGRESS)
+		ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
+	else
+		ctx.progress = NULL;
+
+	ctx.to_include = packs_to_include;
+
+	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
+	stop_progress(&ctx.progress);
+
+	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
+	    !(packs_to_include || packs_to_drop)) {
+		struct bitmap_index *bitmap_git;
+		int bitmap_exists;
+		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+
+		bitmap_git = prepare_midx_bitmap_git(ctx.m);
+		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+		free_bitmap_index(bitmap_git);
+
+		if (bitmap_exists || !want_bitmap) {
+			/*
+			 * The correct MIDX already exists, and so does a
+			 * corresponding bitmap (or one wasn't requested).
+			 */
+			if (!want_bitmap)
+				clear_midx_files_ext(object_dir, ".bitmap",
+						     NULL);
+			goto cleanup;
+		}
+	}
+
+	if (preferred_pack_name) {
+		ctx.preferred_pack_idx = -1;
+
+		for (i = 0; i < ctx.nr; i++) {
+			if (!cmp_idx_or_pack_name(preferred_pack_name,
+						  ctx.info[i].pack_name)) {
+				ctx.preferred_pack_idx = i;
+				break;
+			}
+		}
+
+		if (ctx.preferred_pack_idx == -1)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+	} else if (ctx.nr &&
+		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
+		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
+		ctx.preferred_pack_idx = 0;
+
+		if (packs_to_drop && packs_to_drop->nr)
+			BUG("cannot write a MIDX bitmap during expiration");
+
+		/*
+		 * set a preferred pack when writing a bitmap to ensure that
+		 * the pack from which the first object is selected in pseudo
+		 * pack-order has all of its objects selected from that pack
+		 * (and not another pack containing a duplicate)
+		 */
+		for (i = 1; i < ctx.nr; i++) {
+			struct packed_git *p = ctx.info[i].p;
+
+			if (!oldest->num_objects || p->mtime < oldest->mtime) {
+				oldest = p;
+				ctx.preferred_pack_idx = i;
+			}
+		}
+
+		if (!oldest->num_objects) {
+			/*
+			 * If all packs are empty; unset the preferred index.
+			 * This is acceptable since there will be no duplicate
+			 * objects to resolve, so the preferred value doesn't
+			 * matter.
+			 */
+			ctx.preferred_pack_idx = -1;
+		}
+	} else {
+		/*
+		 * otherwise don't mark any pack as preferred to avoid
+		 * interfering with expiration logic below
+		 */
+		ctx.preferred_pack_idx = -1;
+	}
+
+	if (ctx.preferred_pack_idx > -1) {
+		struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
+		if (!preferred->num_objects) {
+			error(_("cannot select preferred pack %s with no objects"),
+			      preferred->pack_name);
+			result = 1;
+			goto cleanup;
+		}
+	}
+
+	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
+					 ctx.preferred_pack_idx);
+
+	ctx.large_offsets_needed = 0;
+	for (i = 0; i < ctx.entries_nr; i++) {
+		if (ctx.entries[i].offset > 0x7fffffff)
+			ctx.num_large_offsets++;
+		if (ctx.entries[i].offset > 0xffffffff)
+			ctx.large_offsets_needed = 1;
+	}
+
+	QSORT(ctx.info, ctx.nr, pack_info_compare);
+
+	if (packs_to_drop && packs_to_drop->nr) {
+		int drop_index = 0;
+		int missing_drops = 0;
+
+		for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
+			int cmp = strcmp(ctx.info[i].pack_name,
+					 packs_to_drop->items[drop_index].string);
+
+			if (!cmp) {
+				drop_index++;
+				ctx.info[i].expired = 1;
+			} else if (cmp > 0) {
+				error(_("did not see pack-file %s to drop"),
+				      packs_to_drop->items[drop_index].string);
+				drop_index++;
+				missing_drops++;
+				i--;
+			} else {
+				ctx.info[i].expired = 0;
+			}
+		}
+
+		if (missing_drops) {
+			result = 1;
+			goto cleanup;
+		}
+	}
+
+	/*
+	 * pack_perm stores a permutation between pack-int-ids from the
+	 * previous multi-pack-index to the new one we are writing:
+	 *
+	 * pack_perm[old_id] = new_id
+	 */
+	ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
+	for (i = 0; i < ctx.nr; i++) {
+		if (ctx.info[i].expired) {
+			dropped_packs++;
+			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
+		} else {
+			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
+		}
+	}
+
+	for (i = 0; i < ctx.nr; i++) {
+		if (ctx.info[i].expired)
+			continue;
+		pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
+		bitmapped_packs_concat_len += 2 * sizeof(uint32_t);
+	}
+
+	/* Check that the preferred pack wasn't expired (if given). */
+	if (preferred_pack_name) {
+		struct pack_info *preferred = bsearch(preferred_pack_name,
+						      ctx.info, ctx.nr,
+						      sizeof(*ctx.info),
+						      idx_or_pack_name_cmp);
+		if (preferred) {
+			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
+			if (perm == PACK_EXPIRED)
+				warning(_("preferred pack '%s' is expired"),
+					preferred_pack_name);
+		}
+	}
+
+	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
+		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
+					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
+
+	hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
+	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
+
+	if (ctx.nr - dropped_packs == 0) {
+		error(_("no pack files to index."));
+		result = 1;
+		goto cleanup;
+	}
+
+	if (!ctx.entries_nr) {
+		if (flags & MIDX_WRITE_BITMAP)
+			warning(_("refusing to write multi-pack .bitmap without any objects"));
+		flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
+	}
+
+	cf = init_chunkfile(f);
+
+	add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len,
+		  write_midx_pack_names);
+	add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE,
+		  write_midx_oid_fanout);
+	add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP,
+		  st_mult(ctx.entries_nr, the_hash_algo->rawsz),
+		  write_midx_oid_lookup);
+	add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS,
+		  st_mult(ctx.entries_nr, MIDX_CHUNK_OFFSET_WIDTH),
+		  write_midx_object_offsets);
+
+	if (ctx.large_offsets_needed)
+		add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS,
+			st_mult(ctx.num_large_offsets,
+				MIDX_CHUNK_LARGE_OFFSET_WIDTH),
+			write_midx_large_offsets);
+
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
+		ctx.pack_order = midx_pack_order(&ctx);
+		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
+			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
+			  write_midx_revindex);
+		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
+			  bitmapped_packs_concat_len,
+			  write_midx_bitmapped_packs);
+	}
+
+	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
+	write_chunkfile(cf, &ctx);
+
+	finalize_hashfile(f, midx_hash, FSYNC_COMPONENT_PACK_METADATA,
+			  CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+	free_chunkfile(cf);
+
+	if (flags & MIDX_WRITE_REV_INDEX &&
+	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
+		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+
+	if (flags & MIDX_WRITE_BITMAP) {
+		struct packing_data pdata;
+		struct commit **commits;
+		uint32_t commits_nr;
+
+		if (!ctx.entries_nr)
+			BUG("cannot write a bitmap without any objects");
+
+		prepare_midx_packing_data(&pdata, &ctx);
+
+		commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
+
+		/*
+		 * The previous steps translated the information from
+		 * 'entries' into information suitable for constructing
+		 * bitmaps. We no longer need that array, so clear it to
+		 * reduce memory pressure.
+		 */
+		FREE_AND_NULL(ctx.entries);
+		ctx.entries_nr = 0;
+
+		if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
+				      commits, commits_nr, ctx.pack_order,
+				      flags) < 0) {
+			error(_("could not write multi-pack bitmap"));
+			result = 1;
+			clear_packing_data(&pdata);
+			free(commits);
+			goto cleanup;
+		}
+
+		clear_packing_data(&pdata);
+		free(commits);
+	}
+	/*
+	 * NOTE: Do not use ctx.entries beyond this point, since it might
+	 * have been freed in the previous if block.
+	 */
+
+	if (ctx.m)
+		close_object_store(the_repository->objects);
+
+	if (commit_lock_file(&lk) < 0)
+		die_errno(_("could not write multi-pack-index"));
+
+	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
+	clear_midx_files_ext(object_dir, ".rev", midx_hash);
+
+cleanup:
+	for (i = 0; i < ctx.nr; i++) {
+		if (ctx.info[i].p) {
+			close_pack(ctx.info[i].p);
+			free(ctx.info[i].p);
+		}
+		free(ctx.info[i].pack_name);
+	}
+
+	free(ctx.info);
+	free(ctx.entries);
+	free(ctx.pack_perm);
+	free(ctx.pack_order);
+	strbuf_release(&midx_name);
+
+	trace2_region_leave("midx", "write_midx_internal", the_repository);
+
+	return result;
+}
 
 int write_midx_file(const char *object_dir,
 		    const char *preferred_pack_name,
diff --git a/midx.c b/midx.c
index 39b5c86736..ae3b49166c 100644
--- a/midx.c
+++ b/midx.c
@@ -1,62 +1,22 @@
 #include "git-compat-util.h"
-#include "abspath.h"
 #include "config.h"
-#include "csum-file.h"
 #include "dir.h"
-#include "gettext.h"
 #include "hex.h"
-#include "lockfile.h"
 #include "packfile.h"
 #include "object-file.h"
-#include "object-store-ll.h"
 #include "hash-lookup.h"
 #include "midx.h"
 #include "progress.h"
 #include "trace2.h"
-#include "run-command.h"
-#include "repository.h"
 #include "chunk-format.h"
-#include "pack.h"
 #include "pack-bitmap.h"
-#include "refs.h"
-#include "revision.h"
-#include "list-objects.h"
 #include "pack-revindex.h"
 
-struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
-							const char *object_dir);
-
-int write_midx_internal(const char *object_dir,
-			       struct string_list *packs_to_include,
-			       struct string_list *packs_to_drop,
-			       const char *preferred_pack_name,
-			       const char *refs_snapshot,
-			       unsigned flags);
-
-#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
-#define MIDX_VERSION 1
-#define MIDX_BYTE_FILE_VERSION 4
-#define MIDX_BYTE_HASH_VERSION 5
-#define MIDX_BYTE_NUM_CHUNKS 6
-#define MIDX_BYTE_NUM_PACKS 8
-#define MIDX_HEADER_SIZE 12
-#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
-
-#define MIDX_CHUNK_ALIGNMENT 4
-#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
-#define MIDX_CHUNKID_BITMAPPEDPACKS 0x42544d50 /* "BTMP" */
-#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
-#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
-#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
-#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
-#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
-#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
-#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
-#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
-#define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
-#define MIDX_LARGE_OFFSET_NEEDED 0x80000000
-
-#define PACK_EXPIRED UINT_MAX
+int midx_checksum_valid(struct multi_pack_index *m);
+void clear_midx_files_ext(const char *object_dir, const char *ext,
+			  unsigned char *keep_hash);
+int cmp_idx_or_pack_name(const char *idx_or_pack_name,
+			 const char *idx_name);
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m)
 {
@@ -125,6 +85,8 @@ static int midx_read_object_offsets(const unsigned char *chunk_start,
 	return 0;
 }
 
+#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
+
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
 {
 	struct multi_pack_index *m = NULL;
@@ -304,6 +266,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t
 	return 0;
 }
 
+#define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
+
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id)
 {
@@ -410,8 +374,8 @@ int fill_midx_entry(struct repository *r,
 }
 
 /* Match "foo.idx" against either "foo.pack" _or_ "foo.idx". */
-static int cmp_idx_or_pack_name(const char *idx_or_pack_name,
-				const char *idx_name)
+int cmp_idx_or_pack_name(const char *idx_or_pack_name,
+			 const char *idx_name)
 {
 	/* Skip past any initial matching prefix. */
 	while (*idx_name && *idx_name == *idx_or_pack_name) {
@@ -518,1243 +482,11 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 	return 0;
 }
 
-static size_t write_midx_header(struct hashfile *f,
-				unsigned char num_chunks,
-				uint32_t num_packs)
-{
-	hashwrite_be32(f, MIDX_SIGNATURE);
-	hashwrite_u8(f, MIDX_VERSION);
-	hashwrite_u8(f, oid_version(the_hash_algo));
-	hashwrite_u8(f, num_chunks);
-	hashwrite_u8(f, 0); /* unused */
-	hashwrite_be32(f, num_packs);
-
-	return MIDX_HEADER_SIZE;
-}
-
-#define BITMAP_POS_UNKNOWN (~((uint32_t)0))
-
-struct pack_info {
-	uint32_t orig_pack_int_id;
-	char *pack_name;
-	struct packed_git *p;
-
-	uint32_t bitmap_pos;
-	uint32_t bitmap_nr;
-
-	unsigned expired : 1;
-};
-
-static void fill_pack_info(struct pack_info *info,
-			   struct packed_git *p, const char *pack_name,
-			   uint32_t orig_pack_int_id)
-{
-	memset(info, 0, sizeof(struct pack_info));
-
-	info->orig_pack_int_id = orig_pack_int_id;
-	info->pack_name = xstrdup(pack_name);
-	info->p = p;
-	info->bitmap_pos = BITMAP_POS_UNKNOWN;
-}
-
-static int pack_info_compare(const void *_a, const void *_b)
-{
-	struct pack_info *a = (struct pack_info *)_a;
-	struct pack_info *b = (struct pack_info *)_b;
-	return strcmp(a->pack_name, b->pack_name);
-}
-
-static int idx_or_pack_name_cmp(const void *_va, const void *_vb)
-{
-	const char *pack_name = _va;
-	const struct pack_info *compar = _vb;
-
-	return cmp_idx_or_pack_name(pack_name, compar->pack_name);
-}
-
-struct write_midx_context {
-	struct pack_info *info;
-	size_t nr;
-	size_t alloc;
-	struct multi_pack_index *m;
-	struct progress *progress;
-	unsigned pack_paths_checked;
-
-	struct pack_midx_entry *entries;
-	size_t entries_nr;
-
-	uint32_t *pack_perm;
-	uint32_t *pack_order;
-	unsigned large_offsets_needed:1;
-	uint32_t num_large_offsets;
-
-	int preferred_pack_idx;
-
-	struct string_list *to_include;
-};
-
-static void add_pack_to_midx(const char *full_path, size_t full_path_len,
-			     const char *file_name, void *data)
-{
-	struct write_midx_context *ctx = data;
-	struct packed_git *p;
-
-	if (ends_with(file_name, ".idx")) {
-		display_progress(ctx->progress, ++ctx->pack_paths_checked);
-		/*
-		 * Note that at most one of ctx->m and ctx->to_include are set,
-		 * so we are testing midx_contains_pack() and
-		 * string_list_has_string() independently (guarded by the
-		 * appropriate NULL checks).
-		 *
-		 * We could support passing to_include while reusing an existing
-		 * MIDX, but don't currently since the reuse process drags
-		 * forward all packs from an existing MIDX (without checking
-		 * whether or not they appear in the to_include list).
-		 *
-		 * If we added support for that, these next two conditional
-		 * should be performed independently (likely checking
-		 * to_include before the existing MIDX).
-		 */
-		if (ctx->m && midx_contains_pack(ctx->m, file_name))
-			return;
-		else if (ctx->to_include &&
-			 !string_list_has_string(ctx->to_include, file_name))
-			return;
-
-		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
-		p = add_packed_git(full_path, full_path_len, 0);
-		if (!p) {
-			warning(_("failed to add packfile '%s'"),
-				full_path);
-			return;
-		}
-
-		if (open_pack_index(p)) {
-			warning(_("failed to open pack-index '%s'"),
-				full_path);
-			close_pack(p);
-			free(p);
-			return;
-		}
-
-		fill_pack_info(&ctx->info[ctx->nr], p, file_name, ctx->nr);
-		ctx->nr++;
-	}
-}
-
-struct pack_midx_entry {
-	struct object_id oid;
-	uint32_t pack_int_id;
-	time_t pack_mtime;
-	uint64_t offset;
-	unsigned preferred : 1;
-};
-
-static int midx_oid_compare(const void *_a, const void *_b)
-{
-	const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
-	const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
-	int cmp = oidcmp(&a->oid, &b->oid);
-
-	if (cmp)
-		return cmp;
-
-	/* Sort objects in a preferred pack first when multiple copies exist. */
-	if (a->preferred > b->preferred)
-		return -1;
-	if (a->preferred < b->preferred)
-		return 1;
-
-	if (a->pack_mtime > b->pack_mtime)
-		return -1;
-	else if (a->pack_mtime < b->pack_mtime)
-		return 1;
-
-	return a->pack_int_id - b->pack_int_id;
-}
-
-static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
-				      struct pack_midx_entry *e,
-				      uint32_t pos)
-{
-	if (pos >= m->num_objects)
-		return 1;
-
-	nth_midxed_object_oid(&e->oid, m, pos);
-	e->pack_int_id = nth_midxed_pack_int_id(m, pos);
-	e->offset = nth_midxed_offset(m, pos);
-
-	/* consider objects in midx to be from "old" packs */
-	e->pack_mtime = 0;
-	return 0;
-}
-
-static void fill_pack_entry(uint32_t pack_int_id,
-			    struct packed_git *p,
-			    uint32_t cur_object,
-			    struct pack_midx_entry *entry,
-			    int preferred)
-{
-	if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
-		die(_("failed to locate object %d in packfile"), cur_object);
-
-	entry->pack_int_id = pack_int_id;
-	entry->pack_mtime = p->mtime;
-
-	entry->offset = nth_packed_object_offset(p, cur_object);
-	entry->preferred = !!preferred;
-}
-
-struct midx_fanout {
-	struct pack_midx_entry *entries;
-	size_t nr, alloc;
-};
-
-static void midx_fanout_grow(struct midx_fanout *fanout, size_t nr)
-{
-	if (nr < fanout->nr)
-		BUG("negative growth in midx_fanout_grow() (%"PRIuMAX" < %"PRIuMAX")",
-		    (uintmax_t)nr, (uintmax_t)fanout->nr);
-	ALLOC_GROW(fanout->entries, nr, fanout->alloc);
-}
-
-static void midx_fanout_sort(struct midx_fanout *fanout)
-{
-	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
-}
-
-static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
-					struct multi_pack_index *m,
-					uint32_t cur_fanout,
-					int preferred_pack)
-{
-	uint32_t start = 0, end;
-	uint32_t cur_object;
-
-	if (cur_fanout)
-		start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
-	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
-
-	for (cur_object = start; cur_object < end; cur_object++) {
-		if ((preferred_pack > -1) &&
-		    (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
-			/*
-			 * Objects from preferred packs are added
-			 * separately.
-			 */
-			continue;
-		}
-
-		midx_fanout_grow(fanout, fanout->nr + 1);
-		nth_midxed_pack_midx_entry(m,
-					   &fanout->entries[fanout->nr],
-					   cur_object);
-		fanout->entries[fanout->nr].preferred = 0;
-		fanout->nr++;
-	}
-}
-
-static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
-					struct pack_info *info,
-					uint32_t cur_pack,
-					int preferred,
-					uint32_t cur_fanout)
-{
-	struct packed_git *pack = info[cur_pack].p;
-	uint32_t start = 0, end;
-	uint32_t cur_object;
-
-	if (cur_fanout)
-		start = get_pack_fanout(pack, cur_fanout - 1);
-	end = get_pack_fanout(pack, cur_fanout);
-
-	for (cur_object = start; cur_object < end; cur_object++) {
-		midx_fanout_grow(fanout, fanout->nr + 1);
-		fill_pack_entry(cur_pack,
-				info[cur_pack].p,
-				cur_object,
-				&fanout->entries[fanout->nr],
-				preferred);
-		fanout->nr++;
-	}
-}
-
-/*
- * It is possible to artificially get into a state where there are many
- * duplicate copies of objects. That can create high memory pressure if
- * we are to create a list of all objects before de-duplication. To reduce
- * this memory pressure without a significant performance drop, automatically
- * group objects by the first byte of their object id. Use the IDX fanout
- * tables to group the data, copy to a local array, then sort.
- *
- * Copy only the de-duplicated entries (selected by most-recent modified time
- * of a packfile containing the object).
- */
-static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
-						  struct pack_info *info,
-						  uint32_t nr_packs,
-						  size_t *nr_objects,
-						  int preferred_pack)
-{
-	uint32_t cur_fanout, cur_pack, cur_object;
-	size_t alloc_objects, total_objects = 0;
-	struct midx_fanout fanout = { 0 };
-	struct pack_midx_entry *deduplicated_entries = NULL;
-	uint32_t start_pack = m ? m->num_packs : 0;
-
-	for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
-		total_objects = st_add(total_objects,
-				       info[cur_pack].p->num_objects);
-
-	/*
-	 * As we de-duplicate by fanout value, we expect the fanout
-	 * slices to be evenly distributed, with some noise. Hence,
-	 * allocate slightly more than one 256th.
-	 */
-	alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
-
-	ALLOC_ARRAY(fanout.entries, fanout.alloc);
-	ALLOC_ARRAY(deduplicated_entries, alloc_objects);
-	*nr_objects = 0;
-
-	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
-		fanout.nr = 0;
-
-		if (m)
-			midx_fanout_add_midx_fanout(&fanout, m, cur_fanout,
-						    preferred_pack);
-
-		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
-			int preferred = cur_pack == preferred_pack;
-			midx_fanout_add_pack_fanout(&fanout,
-						    info, cur_pack,
-						    preferred, cur_fanout);
-		}
-
-		if (-1 < preferred_pack && preferred_pack < start_pack)
-			midx_fanout_add_pack_fanout(&fanout, info,
-						    preferred_pack, 1,
-						    cur_fanout);
-
-		midx_fanout_sort(&fanout);
-
-		/*
-		 * The batch is now sorted by OID and then mtime (descending).
-		 * Take only the first duplicate.
-		 */
-		for (cur_object = 0; cur_object < fanout.nr; cur_object++) {
-			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
-						&fanout.entries[cur_object].oid))
-				continue;
-
-			ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1),
-				   alloc_objects);
-			memcpy(&deduplicated_entries[*nr_objects],
-			       &fanout.entries[cur_object],
-			       sizeof(struct pack_midx_entry));
-			(*nr_objects)++;
-		}
-	}
-
-	free(fanout.entries);
-	return deduplicated_entries;
-}
-
-static int write_midx_pack_names(struct hashfile *f, void *data)
-{
-	struct write_midx_context *ctx = data;
-	uint32_t i;
-	unsigned char padding[MIDX_CHUNK_ALIGNMENT];
-	size_t written = 0;
-
-	for (i = 0; i < ctx->nr; i++) {
-		size_t writelen;
-
-		if (ctx->info[i].expired)
-			continue;
-
-		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
-			BUG("incorrect pack-file order: %s before %s",
-			    ctx->info[i - 1].pack_name,
-			    ctx->info[i].pack_name);
-
-		writelen = strlen(ctx->info[i].pack_name) + 1;
-		hashwrite(f, ctx->info[i].pack_name, writelen);
-		written += writelen;
-	}
-
-	/* add padding to be aligned */
-	i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT);
-	if (i < MIDX_CHUNK_ALIGNMENT) {
-		memset(padding, 0, sizeof(padding));
-		hashwrite(f, padding, i);
-	}
-
-	return 0;
-}
-
-static int write_midx_bitmapped_packs(struct hashfile *f, void *data)
-{
-	struct write_midx_context *ctx = data;
-	size_t i;
-
-	for (i = 0; i < ctx->nr; i++) {
-		struct pack_info *pack = &ctx->info[i];
-		if (pack->expired)
-			continue;
-
-		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN && pack->bitmap_nr)
-			BUG("pack '%s' has no bitmap position, but has %d bitmapped object(s)",
-			    pack->pack_name, pack->bitmap_nr);
-
-		hashwrite_be32(f, pack->bitmap_pos);
-		hashwrite_be32(f, pack->bitmap_nr);
-	}
-	return 0;
-}
-
-static int write_midx_oid_fanout(struct hashfile *f,
-				 void *data)
-{
-	struct write_midx_context *ctx = data;
-	struct pack_midx_entry *list = ctx->entries;
-	struct pack_midx_entry *last = ctx->entries + ctx->entries_nr;
-	uint32_t count = 0;
-	uint32_t i;
-
-	/*
-	* Write the first-level table (the list is sorted,
-	* but we use a 256-entry lookup to be able to avoid
-	* having to do eight extra binary search iterations).
-	*/
-	for (i = 0; i < 256; i++) {
-		struct pack_midx_entry *next = list;
-
-		while (next < last && next->oid.hash[0] == i) {
-			count++;
-			next++;
-		}
-
-		hashwrite_be32(f, count);
-		list = next;
-	}
-
-	return 0;
-}
-
-static int write_midx_oid_lookup(struct hashfile *f,
-				 void *data)
-{
-	struct write_midx_context *ctx = data;
-	unsigned char hash_len = the_hash_algo->rawsz;
-	struct pack_midx_entry *list = ctx->entries;
-	uint32_t i;
-
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *obj = list++;
-
-		if (i < ctx->entries_nr - 1) {
-			struct pack_midx_entry *next = list;
-			if (oidcmp(&obj->oid, &next->oid) >= 0)
-				BUG("OIDs not in order: %s >= %s",
-				    oid_to_hex(&obj->oid),
-				    oid_to_hex(&next->oid));
-		}
-
-		hashwrite(f, obj->oid.hash, (int)hash_len);
-	}
-
-	return 0;
-}
-
-static int write_midx_object_offsets(struct hashfile *f,
-				     void *data)
-{
-	struct write_midx_context *ctx = data;
-	struct pack_midx_entry *list = ctx->entries;
-	uint32_t i, nr_large_offset = 0;
-
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *obj = list++;
-
-		if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
-			BUG("object %s is in an expired pack with int-id %d",
-			    oid_to_hex(&obj->oid),
-			    obj->pack_int_id);
-
-		hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
-
-		if (ctx->large_offsets_needed && obj->offset >> 31)
-			hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
-		else if (!ctx->large_offsets_needed && obj->offset >> 32)
-			BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!",
-			    oid_to_hex(&obj->oid),
-			    obj->offset);
-		else
-			hashwrite_be32(f, (uint32_t)obj->offset);
-	}
-
-	return 0;
-}
-
-static int write_midx_large_offsets(struct hashfile *f,
-				    void *data)
-{
-	struct write_midx_context *ctx = data;
-	struct pack_midx_entry *list = ctx->entries;
-	struct pack_midx_entry *end = ctx->entries + ctx->entries_nr;
-	uint32_t nr_large_offset = ctx->num_large_offsets;
-
-	while (nr_large_offset) {
-		struct pack_midx_entry *obj;
-		uint64_t offset;
-
-		if (list >= end)
-			BUG("too many large-offset objects");
-
-		obj = list++;
-		offset = obj->offset;
-
-		if (!(offset >> 31))
-			continue;
-
-		hashwrite_be64(f, offset);
-
-		nr_large_offset--;
-	}
-
-	return 0;
-}
-
-static int write_midx_revindex(struct hashfile *f,
-			       void *data)
-{
-	struct write_midx_context *ctx = data;
-	uint32_t i;
-
-	for (i = 0; i < ctx->entries_nr; i++)
-		hashwrite_be32(f, ctx->pack_order[i]);
-
-	return 0;
-}
-
-struct midx_pack_order_data {
-	uint32_t nr;
-	uint32_t pack;
-	off_t offset;
-};
-
-static int midx_pack_order_cmp(const void *va, const void *vb)
-{
-	const struct midx_pack_order_data *a = va, *b = vb;
-	if (a->pack < b->pack)
-		return -1;
-	else if (a->pack > b->pack)
-		return 1;
-	else if (a->offset < b->offset)
-		return -1;
-	else if (a->offset > b->offset)
-		return 1;
-	else
-		return 0;
-}
-
-static uint32_t *midx_pack_order(struct write_midx_context *ctx)
-{
-	struct midx_pack_order_data *data;
-	uint32_t *pack_order;
-	uint32_t i;
-
-	trace2_region_enter("midx", "midx_pack_order", the_repository);
-
-	ALLOC_ARRAY(data, ctx->entries_nr);
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *e = &ctx->entries[i];
-		data[i].nr = i;
-		data[i].pack = ctx->pack_perm[e->pack_int_id];
-		if (!e->preferred)
-			data[i].pack |= (1U << 31);
-		data[i].offset = e->offset;
-	}
-
-	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
-
-	ALLOC_ARRAY(pack_order, ctx->entries_nr);
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
-		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
-			pack->bitmap_pos = i;
-		pack->bitmap_nr++;
-		pack_order[i] = data[i].nr;
-	}
-	for (i = 0; i < ctx->nr; i++) {
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
-		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
-			pack->bitmap_pos = 0;
-	}
-	free(data);
-
-	trace2_region_leave("midx", "midx_pack_order", the_repository);
-
-	return pack_order;
-}
-
-static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
-				     struct write_midx_context *ctx)
-{
-	struct strbuf buf = STRBUF_INIT;
-	const char *tmp_file;
-
-	trace2_region_enter("midx", "write_midx_reverse_index", the_repository);
-
-	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
-
-	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
-					midx_hash, WRITE_REV);
-
-	if (finalize_object_file(tmp_file, buf.buf))
-		die(_("cannot store reverse index file"));
-
-	strbuf_release(&buf);
-
-	trace2_region_leave("midx", "write_midx_reverse_index", the_repository);
-}
-
-static void clear_midx_files_ext(const char *object_dir, const char *ext,
-				 unsigned char *keep_hash);
-
-static int midx_checksum_valid(struct multi_pack_index *m)
+int midx_checksum_valid(struct multi_pack_index *m)
 {
 	return hashfile_checksum_valid(m->data, m->data_len);
 }
 
-static void prepare_midx_packing_data(struct packing_data *pdata,
-				      struct write_midx_context *ctx)
-{
-	uint32_t i;
-
-	trace2_region_enter("midx", "prepare_midx_packing_data", the_repository);
-
-	memset(pdata, 0, sizeof(struct packing_data));
-	prepare_packing_data(the_repository, pdata);
-
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
-		struct object_entry *to = packlist_alloc(pdata, &from->oid);
-
-		oe_set_in_pack(pdata, to,
-			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
-	}
-
-	trace2_region_leave("midx", "prepare_midx_packing_data", the_repository);
-}
-
-static int add_ref_to_pending(const char *refname,
-			      const struct object_id *oid,
-			      int flag, void *cb_data)
-{
-	struct rev_info *revs = (struct rev_info*)cb_data;
-	struct object_id peeled;
-	struct object *object;
-
-	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
-		warning("symbolic ref is dangling: %s", refname);
-		return 0;
-	}
-
-	if (!peel_iterated_oid(oid, &peeled))
-		oid = &peeled;
-
-	object = parse_object_or_die(oid, refname);
-	if (object->type != OBJ_COMMIT)
-		return 0;
-
-	add_pending_object(revs, object, "");
-	if (bitmap_is_preferred_refname(revs->repo, refname))
-		object->flags |= NEEDS_BITMAP;
-	return 0;
-}
-
-struct bitmap_commit_cb {
-	struct commit **commits;
-	size_t commits_nr, commits_alloc;
-
-	struct write_midx_context *ctx;
-};
-
-static const struct object_id *bitmap_oid_access(size_t index,
-						 const void *_entries)
-{
-	const struct pack_midx_entry *entries = _entries;
-	return &entries[index].oid;
-}
-
-static void bitmap_show_commit(struct commit *commit, void *_data)
-{
-	struct bitmap_commit_cb *data = _data;
-	int pos = oid_pos(&commit->object.oid, data->ctx->entries,
-			  data->ctx->entries_nr,
-			  bitmap_oid_access);
-	if (pos < 0)
-		return;
-
-	ALLOC_GROW(data->commits, data->commits_nr + 1, data->commits_alloc);
-	data->commits[data->commits_nr++] = commit;
-}
-
-static int read_refs_snapshot(const char *refs_snapshot,
-			      struct rev_info *revs)
-{
-	struct strbuf buf = STRBUF_INIT;
-	struct object_id oid;
-	FILE *f = xfopen(refs_snapshot, "r");
-
-	while (strbuf_getline(&buf, f) != EOF) {
-		struct object *object;
-		int preferred = 0;
-		char *hex = buf.buf;
-		const char *end = NULL;
-
-		if (buf.len && *buf.buf == '+') {
-			preferred = 1;
-			hex = &buf.buf[1];
-		}
-
-		if (parse_oid_hex(hex, &oid, &end) < 0)
-			die(_("could not parse line: %s"), buf.buf);
-		if (*end)
-			die(_("malformed line: %s"), buf.buf);
-
-		object = parse_object_or_die(&oid, NULL);
-		if (preferred)
-			object->flags |= NEEDS_BITMAP;
-
-		add_pending_object(revs, object, "");
-	}
-
-	fclose(f);
-	strbuf_release(&buf);
-	return 0;
-}
-
-static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
-						    const char *refs_snapshot,
-						    struct write_midx_context *ctx)
-{
-	struct rev_info revs;
-	struct bitmap_commit_cb cb = {0};
-
-	trace2_region_enter("midx", "find_commits_for_midx_bitmap",
-			    the_repository);
-
-	cb.ctx = ctx;
-
-	repo_init_revisions(the_repository, &revs, NULL);
-	if (refs_snapshot) {
-		read_refs_snapshot(refs_snapshot, &revs);
-	} else {
-		setup_revisions(0, NULL, &revs, NULL);
-		for_each_ref(add_ref_to_pending, &revs);
-	}
-
-	/*
-	 * Skipping promisor objects here is intentional, since it only excludes
-	 * them from the list of reachable commits that we want to select from
-	 * when computing the selection of MIDX'd commits to receive bitmaps.
-	 *
-	 * Reachability bitmaps do require that their objects be closed under
-	 * reachability, but fetching any objects missing from promisors at this
-	 * point is too late. But, if one of those objects can be reached from
-	 * an another object that is included in the bitmap, then we will
-	 * complain later that we don't have reachability closure (and fail
-	 * appropriately).
-	 */
-	fetch_if_missing = 0;
-	revs.exclude_promisor_objects = 1;
-
-	if (prepare_revision_walk(&revs))
-		die(_("revision walk setup failed"));
-
-	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
-	if (indexed_commits_nr_p)
-		*indexed_commits_nr_p = cb.commits_nr;
-
-	release_revisions(&revs);
-
-	trace2_region_leave("midx", "find_commits_for_midx_bitmap",
-			    the_repository);
-
-	return cb.commits;
-}
-
-static int write_midx_bitmap(const char *midx_name,
-			     const unsigned char *midx_hash,
-			     struct packing_data *pdata,
-			     struct commit **commits,
-			     uint32_t commits_nr,
-			     uint32_t *pack_order,
-			     unsigned flags)
-{
-	int ret, i;
-	uint16_t options = 0;
-	struct pack_idx_entry **index;
-	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
-					hash_to_hex(midx_hash));
-
-	trace2_region_enter("midx", "write_midx_bitmap", the_repository);
-
-	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
-		options |= BITMAP_OPT_HASH_CACHE;
-
-	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
-		options |= BITMAP_OPT_LOOKUP_TABLE;
-
-	/*
-	 * Build the MIDX-order index based on pdata.objects (which is already
-	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
-	 * this order).
-	 */
-	ALLOC_ARRAY(index, pdata->nr_objects);
-	for (i = 0; i < pdata->nr_objects; i++)
-		index[i] = &pdata->objects[i].idx;
-
-	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
-	bitmap_writer_build_type_index(pdata, index, pdata->nr_objects);
-
-	/*
-	 * bitmap_writer_finish expects objects in lex order, but pack_order
-	 * gives us exactly that. use it directly instead of re-sorting the
-	 * array.
-	 *
-	 * This changes the order of objects in 'index' between
-	 * bitmap_writer_build_type_index and bitmap_writer_finish.
-	 *
-	 * The same re-ordering takes place in the single-pack bitmap code via
-	 * write_idx_file(), which is called by finish_tmp_packfile(), which
-	 * happens between bitmap_writer_build_type_index() and
-	 * bitmap_writer_finish().
-	 */
-	for (i = 0; i < pdata->nr_objects; i++)
-		index[pack_order[i]] = &pdata->objects[i].idx;
-
-	bitmap_writer_select_commits(commits, commits_nr, -1);
-	ret = bitmap_writer_build(pdata);
-	if (ret < 0)
-		goto cleanup;
-
-	bitmap_writer_set_checksum(midx_hash);
-	bitmap_writer_finish(index, pdata->nr_objects, bitmap_name, options);
-
-cleanup:
-	free(index);
-	free(bitmap_name);
-
-	trace2_region_leave("midx", "write_midx_bitmap", the_repository);
-
-	return ret;
-}
-
-struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
-							const char *object_dir)
-{
-	struct multi_pack_index *result = NULL;
-	struct multi_pack_index *cur;
-	char *obj_dir_real = real_pathdup(object_dir, 1);
-	struct strbuf cur_path_real = STRBUF_INIT;
-
-	/* Ensure the given object_dir is local, or a known alternate. */
-	find_odb(r, obj_dir_real);
-
-	for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
-		strbuf_realpath(&cur_path_real, cur->object_dir, 1);
-		if (!strcmp(obj_dir_real, cur_path_real.buf)) {
-			result = cur;
-			goto cleanup;
-		}
-	}
-
-cleanup:
-	free(obj_dir_real);
-	strbuf_release(&cur_path_real);
-	return result;
-}
-
-int write_midx_internal(const char *object_dir,
-			struct string_list *packs_to_include,
-			struct string_list *packs_to_drop,
-			const char *preferred_pack_name,
-			const char *refs_snapshot,
-			unsigned flags)
-{
-	struct strbuf midx_name = STRBUF_INIT;
-	unsigned char midx_hash[GIT_MAX_RAWSZ];
-	uint32_t i;
-	struct hashfile *f = NULL;
-	struct lock_file lk;
-	struct write_midx_context ctx = { 0 };
-	int bitmapped_packs_concat_len = 0;
-	int pack_name_concat_len = 0;
-	int dropped_packs = 0;
-	int result = 0;
-	struct chunkfile *cf;
-
-	trace2_region_enter("midx", "write_midx_internal", the_repository);
-
-	get_midx_filename(&midx_name, object_dir);
-	if (safe_create_leading_directories(midx_name.buf))
-		die_errno(_("unable to create leading directories of %s"),
-			  midx_name.buf);
-
-	if (!packs_to_include) {
-		/*
-		 * Only reference an existing MIDX when not filtering which
-		 * packs to include, since all packs and objects are copied
-		 * blindly from an existing MIDX if one is present.
-		 */
-		ctx.m = lookup_multi_pack_index(the_repository, object_dir);
-	}
-
-	if (ctx.m && !midx_checksum_valid(ctx.m)) {
-		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
-		ctx.m = NULL;
-	}
-
-	ctx.nr = 0;
-	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
-	ctx.info = NULL;
-	ALLOC_ARRAY(ctx.info, ctx.alloc);
-
-	if (ctx.m) {
-		for (i = 0; i < ctx.m->num_packs; i++) {
-			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
-
-			if (flags & MIDX_WRITE_REV_INDEX) {
-				/*
-				 * If generating a reverse index, need to have
-				 * packed_git's loaded to compare their
-				 * mtimes and object count.
-				 */
-				if (prepare_midx_pack(the_repository, ctx.m, i)) {
-					error(_("could not load pack"));
-					result = 1;
-					goto cleanup;
-				}
-
-				if (open_pack_index(ctx.m->packs[i]))
-					die(_("could not open index for %s"),
-					    ctx.m->packs[i]->pack_name);
-			}
-
-			fill_pack_info(&ctx.info[ctx.nr++], ctx.m->packs[i],
-				       ctx.m->pack_names[i], i);
-		}
-	}
-
-	ctx.pack_paths_checked = 0;
-	if (flags & MIDX_PROGRESS)
-		ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
-	else
-		ctx.progress = NULL;
-
-	ctx.to_include = packs_to_include;
-
-	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
-	stop_progress(&ctx.progress);
-
-	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
-	    !(packs_to_include || packs_to_drop)) {
-		struct bitmap_index *bitmap_git;
-		int bitmap_exists;
-		int want_bitmap = flags & MIDX_WRITE_BITMAP;
-
-		bitmap_git = prepare_midx_bitmap_git(ctx.m);
-		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
-		free_bitmap_index(bitmap_git);
-
-		if (bitmap_exists || !want_bitmap) {
-			/*
-			 * The correct MIDX already exists, and so does a
-			 * corresponding bitmap (or one wasn't requested).
-			 */
-			if (!want_bitmap)
-				clear_midx_files_ext(object_dir, ".bitmap",
-						     NULL);
-			goto cleanup;
-		}
-	}
-
-	if (preferred_pack_name) {
-		ctx.preferred_pack_idx = -1;
-
-		for (i = 0; i < ctx.nr; i++) {
-			if (!cmp_idx_or_pack_name(preferred_pack_name,
-						  ctx.info[i].pack_name)) {
-				ctx.preferred_pack_idx = i;
-				break;
-			}
-		}
-
-		if (ctx.preferred_pack_idx == -1)
-			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
-	} else if (ctx.nr &&
-		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
-		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
-		ctx.preferred_pack_idx = 0;
-
-		if (packs_to_drop && packs_to_drop->nr)
-			BUG("cannot write a MIDX bitmap during expiration");
-
-		/*
-		 * set a preferred pack when writing a bitmap to ensure that
-		 * the pack from which the first object is selected in pseudo
-		 * pack-order has all of its objects selected from that pack
-		 * (and not another pack containing a duplicate)
-		 */
-		for (i = 1; i < ctx.nr; i++) {
-			struct packed_git *p = ctx.info[i].p;
-
-			if (!oldest->num_objects || p->mtime < oldest->mtime) {
-				oldest = p;
-				ctx.preferred_pack_idx = i;
-			}
-		}
-
-		if (!oldest->num_objects) {
-			/*
-			 * If all packs are empty; unset the preferred index.
-			 * This is acceptable since there will be no duplicate
-			 * objects to resolve, so the preferred value doesn't
-			 * matter.
-			 */
-			ctx.preferred_pack_idx = -1;
-		}
-	} else {
-		/*
-		 * otherwise don't mark any pack as preferred to avoid
-		 * interfering with expiration logic below
-		 */
-		ctx.preferred_pack_idx = -1;
-	}
-
-	if (ctx.preferred_pack_idx > -1) {
-		struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
-		if (!preferred->num_objects) {
-			error(_("cannot select preferred pack %s with no objects"),
-			      preferred->pack_name);
-			result = 1;
-			goto cleanup;
-		}
-	}
-
-	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
-					 ctx.preferred_pack_idx);
-
-	ctx.large_offsets_needed = 0;
-	for (i = 0; i < ctx.entries_nr; i++) {
-		if (ctx.entries[i].offset > 0x7fffffff)
-			ctx.num_large_offsets++;
-		if (ctx.entries[i].offset > 0xffffffff)
-			ctx.large_offsets_needed = 1;
-	}
-
-	QSORT(ctx.info, ctx.nr, pack_info_compare);
-
-	if (packs_to_drop && packs_to_drop->nr) {
-		int drop_index = 0;
-		int missing_drops = 0;
-
-		for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
-			int cmp = strcmp(ctx.info[i].pack_name,
-					 packs_to_drop->items[drop_index].string);
-
-			if (!cmp) {
-				drop_index++;
-				ctx.info[i].expired = 1;
-			} else if (cmp > 0) {
-				error(_("did not see pack-file %s to drop"),
-				      packs_to_drop->items[drop_index].string);
-				drop_index++;
-				missing_drops++;
-				i--;
-			} else {
-				ctx.info[i].expired = 0;
-			}
-		}
-
-		if (missing_drops) {
-			result = 1;
-			goto cleanup;
-		}
-	}
-
-	/*
-	 * pack_perm stores a permutation between pack-int-ids from the
-	 * previous multi-pack-index to the new one we are writing:
-	 *
-	 * pack_perm[old_id] = new_id
-	 */
-	ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
-	for (i = 0; i < ctx.nr; i++) {
-		if (ctx.info[i].expired) {
-			dropped_packs++;
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
-		} else {
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
-		}
-	}
-
-	for (i = 0; i < ctx.nr; i++) {
-		if (ctx.info[i].expired)
-			continue;
-		pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
-		bitmapped_packs_concat_len += 2 * sizeof(uint32_t);
-	}
-
-	/* Check that the preferred pack wasn't expired (if given). */
-	if (preferred_pack_name) {
-		struct pack_info *preferred = bsearch(preferred_pack_name,
-						      ctx.info, ctx.nr,
-						      sizeof(*ctx.info),
-						      idx_or_pack_name_cmp);
-		if (preferred) {
-			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
-			if (perm == PACK_EXPIRED)
-				warning(_("preferred pack '%s' is expired"),
-					preferred_pack_name);
-		}
-	}
-
-	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
-		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
-					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
-
-	hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
-	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
-
-	if (ctx.nr - dropped_packs == 0) {
-		error(_("no pack files to index."));
-		result = 1;
-		goto cleanup;
-	}
-
-	if (!ctx.entries_nr) {
-		if (flags & MIDX_WRITE_BITMAP)
-			warning(_("refusing to write multi-pack .bitmap without any objects"));
-		flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
-	}
-
-	cf = init_chunkfile(f);
-
-	add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len,
-		  write_midx_pack_names);
-	add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE,
-		  write_midx_oid_fanout);
-	add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP,
-		  st_mult(ctx.entries_nr, the_hash_algo->rawsz),
-		  write_midx_oid_lookup);
-	add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS,
-		  st_mult(ctx.entries_nr, MIDX_CHUNK_OFFSET_WIDTH),
-		  write_midx_object_offsets);
-
-	if (ctx.large_offsets_needed)
-		add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS,
-			st_mult(ctx.num_large_offsets,
-				MIDX_CHUNK_LARGE_OFFSET_WIDTH),
-			write_midx_large_offsets);
-
-	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
-		ctx.pack_order = midx_pack_order(&ctx);
-		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
-			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
-			  write_midx_revindex);
-		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
-			  bitmapped_packs_concat_len,
-			  write_midx_bitmapped_packs);
-	}
-
-	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
-	write_chunkfile(cf, &ctx);
-
-	finalize_hashfile(f, midx_hash, FSYNC_COMPONENT_PACK_METADATA,
-			  CSUM_FSYNC | CSUM_HASH_IN_STREAM);
-	free_chunkfile(cf);
-
-	if (flags & MIDX_WRITE_REV_INDEX &&
-	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
-		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
-
-	if (flags & MIDX_WRITE_BITMAP) {
-		struct packing_data pdata;
-		struct commit **commits;
-		uint32_t commits_nr;
-
-		if (!ctx.entries_nr)
-			BUG("cannot write a bitmap without any objects");
-
-		prepare_midx_packing_data(&pdata, &ctx);
-
-		commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
-
-		/*
-		 * The previous steps translated the information from
-		 * 'entries' into information suitable for constructing
-		 * bitmaps. We no longer need that array, so clear it to
-		 * reduce memory pressure.
-		 */
-		FREE_AND_NULL(ctx.entries);
-		ctx.entries_nr = 0;
-
-		if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
-				      commits, commits_nr, ctx.pack_order,
-				      flags) < 0) {
-			error(_("could not write multi-pack bitmap"));
-			result = 1;
-			clear_packing_data(&pdata);
-			free(commits);
-			goto cleanup;
-		}
-
-		clear_packing_data(&pdata);
-		free(commits);
-	}
-	/*
-	 * NOTE: Do not use ctx.entries beyond this point, since it might
-	 * have been freed in the previous if block.
-	 */
-
-	if (ctx.m)
-		close_object_store(the_repository->objects);
-
-	if (commit_lock_file(&lk) < 0)
-		die_errno(_("could not write multi-pack-index"));
-
-	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
-	clear_midx_files_ext(object_dir, ".rev", midx_hash);
-
-cleanup:
-	for (i = 0; i < ctx.nr; i++) {
-		if (ctx.info[i].p) {
-			close_pack(ctx.info[i].p);
-			free(ctx.info[i].p);
-		}
-		free(ctx.info[i].pack_name);
-	}
-
-	free(ctx.info);
-	free(ctx.entries);
-	free(ctx.pack_perm);
-	free(ctx.pack_order);
-	strbuf_release(&midx_name);
-
-	trace2_region_leave("midx", "write_midx_internal", the_repository);
-
-	return result;
-}
-
 struct clear_midx_data {
 	char *keep;
 	const char *ext;
@@ -1775,8 +507,8 @@ static void clear_midx_file_ext(const char *full_path, size_t full_path_len UNUS
 		die_errno(_("failed to remove %s"), full_path);
 }
 
-static void clear_midx_files_ext(const char *object_dir, const char *ext,
-				 unsigned char *keep_hash)
+void clear_midx_files_ext(const char *object_dir, const char *ext,
+			  unsigned char *keep_hash)
 {
 	struct clear_midx_data data;
 	memset(&data, 0, sizeof(struct clear_midx_data));
diff --git a/midx.h b/midx.h
index b374a7afaf..dc477dff44 100644
--- a/midx.h
+++ b/midx.h
@@ -8,6 +8,25 @@ struct pack_entry;
 struct repository;
 struct bitmapped_pack;
 
+#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
+#define MIDX_VERSION 1
+#define MIDX_BYTE_FILE_VERSION 4
+#define MIDX_BYTE_HASH_VERSION 5
+#define MIDX_BYTE_NUM_CHUNKS 6
+#define MIDX_BYTE_NUM_PACKS 8
+#define MIDX_HEADER_SIZE 12
+
+#define MIDX_CHUNK_ALIGNMENT 4
+#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_BITMAPPEDPACKS 0x42544d50 /* "BTMP" */
+#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
+#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
+#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
+#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
+#define MIDX_LARGE_OFFSET_NEEDED 0x80000000
+
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
 #define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
 	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/11] midx-write.c: avoid directly managed temporary strbuf
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (6 preceding siblings ...)
  2024-03-25 17:24 ` [PATCH 07/11] midx: move `write_midx_internal` (and related functions) " Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 20:33   ` Junio C Hamano
  2024-03-25 17:24 ` [PATCH 09/11] midx-write.c: factor out common want_included_pack() routine Taylor Blau
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

In midx-write.c::midx_repack(), we construct the command-line arguments
for a pack-objects invocation which will combine objects from the packs
below our `--batch-size` option.

To construct the base name of the output pack, we use a temporary
strbuf, and then push the result of that onto the strvec which holds the
command-line arguments, after which point we release the strbuf.

We could replace this by doing something like:

    struct strbuf buf = STRBUF_INIT;
    strbuf_addf(&buf, "%s/pack/pack", object_dir);
    strvec_push_nodup(&cmd.args, strbuf_detach(&buf));

(combining the two separate `strbuf_addstr()` calls into a single
`strbuf_addf()`). But that is more or less an open-coded version of
strvec_pushf(), which we could use directly instead.

(Note that at the time this code was written back in ce1e4a105b4 (midx:
implement midx_repack(), 2019-06-10), strvec did not yet exist, so the
above example would have replaced the last line with:

    argv_array_push_nodup(&cmd.args, strbuf_detach(&buf));

, but the code is otherwise unchanged).

Avoid directly managing the temporary strbuf used to construct the
base name for pack-object's command-line arguments, and instead use the
purpose-built `strvec_pushf()` instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index c812156cbd..89e325d08e 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1446,7 +1446,6 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	unsigned char *include_pack;
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	FILE *cmd_in;
-	struct strbuf base_name = STRBUF_INIT;
 	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
 
 	/*
@@ -1473,10 +1472,6 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 
 	strvec_push(&cmd.args, "pack-objects");
 
-	strbuf_addstr(&base_name, object_dir);
-	strbuf_addstr(&base_name, "/pack/pack");
-	strvec_push(&cmd.args, base_name.buf);
-
 	if (delta_base_offset)
 		strvec_push(&cmd.args, "--delta-base-offset");
 	if (use_delta_islands)
@@ -1487,7 +1482,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	else
 		strvec_push(&cmd.args, "-q");
 
-	strbuf_release(&base_name);
+	strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
 
 	cmd.git_cmd = 1;
 	cmd.in = cmd.out = -1;
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/11] midx-write.c: factor out common want_included_pack() routine
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (7 preceding siblings ...)
  2024-03-25 17:24 ` [PATCH 08/11] midx-write.c: avoid directly managed temporary strbuf Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 20:36   ` Junio C Hamano
  2024-03-27  8:29   ` Jeff King
  2024-03-25 17:24 ` [PATCH 10/11] midx-write.c: check count of packs to repack after grouping Taylor Blau
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

When performing a 'git multi-pack-index repack', the MIDX machinery
tries to aggregate MIDX'd packs together either to (a) fill the given
`--batch-size` argument, or (b) combine all packs together.

In either case (using the `midx-write.c::fill_included_packs_batch()` or
`midx-write.c::fill_included_packs_all()` function, respectively), we
evaluate whether or not we want to repack each MIDX'd pack, according to
whether or it is loadable, kept, cruft, or non-empty.

Between the two `fill_included_packs_` callers, they both care about the
same conditions, except for `fill_included_packs_batch()` which also
cares that the pack is non-empty.

We could extract two functions (say, `want_included_pack()` and a
`_nonempty()` variant), but this is not necessary. For the case in
`fill_included_packs_all()` which does not check the pack size, we add
all of the pack's objects assuming that the pack meets all other
criteria. But if the pack is empty in the first place, we add all of its
zero objects, so whether or not we "accept" or "reject" it in the first
place is irrelevant.

This change improves the readability in both `fill_included_packs_`
functions.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 89e325d08e..2f0f5d133f 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1349,6 +1349,24 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
+static int want_included_pack(struct repository *r,
+			      struct multi_pack_index *m,
+			      int pack_kept_objects,
+			      uint32_t pack_int_id)
+{
+	struct packed_git *p;
+	if (prepare_midx_pack(r, m, pack_int_id))
+		return 0;
+	p = m->packs[pack_int_id];
+	if (!pack_kept_objects && p->pack_keep)
+		return 0;
+	if (p->is_cruft)
+		return 0;
+	if (open_pack_index(p) || !p->num_objects)
+		return 0;
+	return 1;
+}
+
 static int fill_included_packs_all(struct repository *r,
 				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
@@ -1359,11 +1377,7 @@ static int fill_included_packs_all(struct repository *r,
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
-		if (prepare_midx_pack(r, m, i))
-			continue;
-		if (!pack_kept_objects && m->packs[i]->pack_keep)
-			continue;
-		if (m->packs[i]->is_cruft)
+		if (!want_included_pack(r, m, pack_kept_objects, i))
 			continue;
 
 		include_pack[i] = 1;
@@ -1410,13 +1424,7 @@ static int fill_included_packs_batch(struct repository *r,
 		struct packed_git *p = m->packs[pack_int_id];
 		size_t expected_size;
 
-		if (!p)
-			continue;
-		if (!pack_kept_objects && p->pack_keep)
-			continue;
-		if (p->is_cruft)
-			continue;
-		if (open_pack_index(p) || !p->num_objects)
+		if (!want_included_pack(r, m, pack_kept_objects, pack_int_id))
 			continue;
 
 		expected_size = st_mult(p->pack_size,
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/11] midx-write.c: check count of packs to repack after grouping
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (8 preceding siblings ...)
  2024-03-25 17:24 ` [PATCH 09/11] midx-write.c: factor out common want_included_pack() routine Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-25 20:41   ` Junio C Hamano
  2024-03-25 17:24 ` [PATCH 11/11] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

In both fill_included_packs_all() and fill_included_packs_batch(), we
accumulate a list of packs whose contents we want to repack together,
and then use that information to feed a list of objects as input to
pack-objects.

In both cases, the `fill_included_packs_` functions keep track of how
many packs they want to repack together, and only execute pack-objects
if there are at least two packs that need repacking.

Having both of these functions keep track of this information themselves
is not strictly necessary, since they also log which packs to repack via
the `include_pack` array, so we can simply count the non-zero entries in
that array after either function is done executing, reducing the overall
amount of code necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 2f0f5d133f..4f1d649aa6 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1367,11 +1367,11 @@ static int want_included_pack(struct repository *r,
 	return 1;
 }
 
-static int fill_included_packs_all(struct repository *r,
-				   struct multi_pack_index *m,
-				   unsigned char *include_pack)
+static void fill_included_packs_all(struct repository *r,
+				    struct multi_pack_index *m,
+				    unsigned char *include_pack)
 {
-	uint32_t i, count = 0;
+	uint32_t i;
 	int pack_kept_objects = 0;
 
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
@@ -1381,18 +1381,15 @@ static int fill_included_packs_all(struct repository *r,
 			continue;
 
 		include_pack[i] = 1;
-		count++;
 	}
-
-	return count < 2;
 }
 
-static int fill_included_packs_batch(struct repository *r,
-				     struct multi_pack_index *m,
-				     unsigned char *include_pack,
-				     size_t batch_size)
+static void fill_included_packs_batch(struct repository *r,
+				      struct multi_pack_index *m,
+				      unsigned char *include_pack,
+				      size_t batch_size)
 {
-	uint32_t i, packs_to_repack;
+	uint32_t i;
 	size_t total_size;
 	struct repack_info *pack_info;
 	int pack_kept_objects = 0;
@@ -1418,7 +1415,6 @@ static int fill_included_packs_batch(struct repository *r,
 	QSORT(pack_info, m->num_packs, compare_by_mtime);
 
 	total_size = 0;
-	packs_to_repack = 0;
 	for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
 		int pack_int_id = pack_info[i].pack_int_id;
 		struct packed_git *p = m->packs[pack_int_id];
@@ -1434,23 +1430,17 @@ static int fill_included_packs_batch(struct repository *r,
 		if (expected_size >= batch_size)
 			continue;
 
-		packs_to_repack++;
 		total_size += expected_size;
 		include_pack[pack_int_id] = 1;
 	}
 
 	free(pack_info);
-
-	if (packs_to_repack < 2)
-		return 1;
-
-	return 0;
 }
 
 int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
 {
 	int result = 0;
-	uint32_t i;
+	uint32_t i, packs_to_repack = 0;
 	unsigned char *include_pack;
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	FILE *cmd_in;
@@ -1469,10 +1459,16 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 
 	CALLOC_ARRAY(include_pack, m->num_packs);
 
-	if (batch_size) {
-		if (fill_included_packs_batch(r, m, include_pack, batch_size))
-			goto cleanup;
-	} else if (fill_included_packs_all(r, m, include_pack))
+	if (batch_size)
+		fill_included_packs_batch(r, m, include_pack, batch_size);
+	else
+		fill_included_packs_all(r, m, include_pack);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (include_pack[i])
+			packs_to_repack++;
+	}
+	if (packs_to_repack <= 1)
 		goto cleanup;
 
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
-- 
2.44.0.290.g736be63234b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/11] midx-write.c: use `--stdin-packs` when repacking
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (9 preceding siblings ...)
  2024-03-25 17:24 ` [PATCH 10/11] midx-write.c: check count of packs to repack after grouping Taylor Blau
@ 2024-03-25 17:24 ` Taylor Blau
  2024-03-27  8:37   ` Jeff King
  2024-03-27  8:39 ` [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Jeff King
  2024-04-01 21:16 ` [PATCH v2 0/4] " Taylor Blau
  12 siblings, 1 reply; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 17:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

When constructing a new pack `git multi-pack-index repack` provides a
list of objects which is the union of objects in all MIDX'd packs which
were "included" in the repack.

Though correct, this typically yields a poorly structured pack, since
providing the objects list over stdin does not give pack-objects a
chance to discover the namehash values for each object, leading to
sub-optimal delta selection.

We can use `--stdin-packs` instead, which has a couple of benefits:

  - it does a supplemental walk over objects in the supplied list of
    packs to discover their namehash, leading to higher-quality delta
    selection

  - it requires us to list far less data over stdin; instead of listing
    each object in the resulting pack, we need only list the
    constituent packs from which those objects were selected in the MIDX

Of course, this comes at a slight cost: though we save time on listing
packs versus objects over stdin[^1] (around ~650 milliseconds), we add a
non-trivial amount of time walking over the given objects in order to
find better deltas.

In general, this is likely to more closely match the user's expectations
(i.e. that packs generated via `git multi-pack-index repack` are written
with high-quality deltas). But if not, we can always introduce a new
option in pack-objects to disable the supplemental object walk, which
would yield a pure CPU-time savings, at the cost of the on-disk size of
the resulting pack.

[^1]: In a patched version of Git that doesn't perform the supplemental
  object walk in `pack-objects --stdin-packs`, we save around ~650ms
  (from 5.968 to 5.325 seconds) when running `git multi-pack-index
  repack --batch-size=0` on git.git with all objects packed, and all
  packs in a MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 4f1d649aa6..d341b9c628 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1474,7 +1474,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
 	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
 
-	strvec_push(&cmd.args, "pack-objects");
+	strvec_pushl(&cmd.args, "pack-objects", "--stdin-packs", "--non-empty",
+		     NULL);
 
 	if (delta_base_offset)
 		strvec_push(&cmd.args, "--delta-base-offset");
@@ -1498,16 +1499,15 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	}
 
 	cmd_in = xfdopen(cmd.in, "w");
-
-	for (i = 0; i < m->num_objects; i++) {
-		struct object_id oid;
-		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
-
-		if (!include_pack[pack_int_id])
+	for (i = 0; i < m->num_packs; i++) {
+		struct packed_git *p = m->packs[i];
+		if (!p)
 			continue;
 
-		nth_midxed_object_oid(&oid, m, i);
-		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
+		if (include_pack[i])
+			fprintf(cmd_in, "%s\n", pack_basename(p));
+		else
+			fprintf(cmd_in, "^%s\n", pack_basename(p));
 	}
 	fclose(cmd_in);
 
-- 
2.44.0.290.g736be63234b

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] midx-write: initial commit
  2024-03-25 17:24 ` [PATCH 01/11] midx-write: initial commit Taylor Blau
@ 2024-03-25 20:30   ` Junio C Hamano
  2024-03-25 22:09     ` Taylor Blau
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2024-03-25 20:30 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> Introduce a new empty midx-write.c source file. Similar to the
> relationship between "pack-bitmap.c" and "pack-bitmap-write.c", this
> source file will hold code that is specific to writing MIDX files as
> opposed to reading them (the latter will remain in midx.c).
>
> This is a preparatory step which will reduce the overall size of midx.c
> and make it easier to read as we prepare for future changes to that file
> (outside the immediate scope of this series).
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Makefile     | 1 +
>  midx-write.c | 2 ++
>  2 files changed, 3 insertions(+)
>  create mode 100644 midx-write.c
>
> diff --git a/Makefile b/Makefile
> index 4e255c81f2..cf44a964c0 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1072,6 +1072,7 @@ LIB_OBJS += merge-ort-wrappers.o
>  LIB_OBJS += merge-recursive.o
>  LIB_OBJS += merge.o
>  LIB_OBJS += midx.o
> +LIB_OBJS += midx-write.o
>  LIB_OBJS += name-hash.o
>  LIB_OBJS += negotiator/default.o
>  LIB_OBJS += negotiator/noop.o
> diff --git a/midx-write.c b/midx-write.c
> new file mode 100644
> index 0000000000..214179d308
> --- /dev/null
> +++ b/midx-write.c
> @@ -0,0 +1,2 @@
> +#include "git-compat-util.h"
> +#include "midx.h"

I noticed that "nm midx-write.o" after this step gives us nothing.
A source that produces absolutely nothing may not successfully
compile everywhere.  It is unlikely we will stop the series at this
step and it won't break bisection, though.

I do not quite see why the first 3 patches need to be split this
way, rather than being done as a single step.  Is it making the
review any simpler?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/11] midx-write.c: avoid directly managed temporary strbuf
  2024-03-25 17:24 ` [PATCH 08/11] midx-write.c: avoid directly managed temporary strbuf Taylor Blau
@ 2024-03-25 20:33   ` Junio C Hamano
  2024-03-25 22:11     ` Taylor Blau
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2024-03-25 20:33 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> In midx-write.c::midx_repack(), we construct the command-line arguments
> for a pack-objects invocation which will combine objects from the packs
> below our `--batch-size` option.
>
> To construct the base name of the output pack, we use a temporary
> strbuf, and then push the result of that onto the strvec which holds the
> command-line arguments, after which point we release the strbuf.
>
> We could replace this by doing something like:
>
>     struct strbuf buf = STRBUF_INIT;
>     strbuf_addf(&buf, "%s/pack/pack", object_dir);
>     strvec_push_nodup(&cmd.args, strbuf_detach(&buf));

Hmph, I thought I saw another patch recently that uses
strvec_pushf() to simplify such a sequence.  Does the technique
apply here as well?

Ah, yes, exactly.  See <9483038c-9529-4243-9b9a-97254fac29c1@web.de>

> @@ -1473,10 +1472,6 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  
>  	strvec_push(&cmd.args, "pack-objects");
>  
> -	strbuf_addstr(&base_name, object_dir);
> -	strbuf_addstr(&base_name, "/pack/pack");
> -	strvec_push(&cmd.args, base_name.buf);
> -
>  	if (delta_base_offset)
>  		strvec_push(&cmd.args, "--delta-base-offset");
>  	if (use_delta_islands)
> @@ -1487,7 +1482,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	else
>  		strvec_push(&cmd.args, "-q");
>  
> -	strbuf_release(&base_name);
> +	strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
>  
>  	cmd.git_cmd = 1;
>  	cmd.in = cmd.out = -1;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/11] midx-write.c: factor out common want_included_pack() routine
  2024-03-25 17:24 ` [PATCH 09/11] midx-write.c: factor out common want_included_pack() routine Taylor Blau
@ 2024-03-25 20:36   ` Junio C Hamano
  2024-03-27  8:29   ` Jeff King
  1 sibling, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2024-03-25 20:36 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> diff --git a/midx-write.c b/midx-write.c
> index 89e325d08e..2f0f5d133f 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1349,6 +1349,24 @@ static int compare_by_mtime(const void *a_, const void *b_)
>  	return 0;
>  }
>  
> +static int want_included_pack(struct repository *r,
> +			      struct multi_pack_index *m,
> +			      int pack_kept_objects,
> +			      uint32_t pack_int_id)
> +{
> +	struct packed_git *p;
> +	if (prepare_midx_pack(r, m, pack_int_id))
> +		return 0;
> +	p = m->packs[pack_int_id];
> +	if (!pack_kept_objects && p->pack_keep)
> +		return 0;
> +	if (p->is_cruft)
> +		return 0;
> +	if (open_pack_index(p) || !p->num_objects)
> +		return 0;
> +	return 1;
> +}

OK, makes sense to return false when we do not want the caller
proceed with the pack for a function whose name is want_*_pack().


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/11] midx-write.c: check count of packs to repack after grouping
  2024-03-25 17:24 ` [PATCH 10/11] midx-write.c: check count of packs to repack after grouping Taylor Blau
@ 2024-03-25 20:41   ` Junio C Hamano
  2024-03-25 22:11     ` Taylor Blau
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2024-03-25 20:41 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> In both fill_included_packs_all() and fill_included_packs_batch(), we
> accumulate a list of packs whose contents we want to repack together,
> and then use that information to feed a list of objects as input to
> pack-objects.
>
> In both cases, the `fill_included_packs_` functions keep track of how
> many packs they want to repack together, and only execute pack-objects
> if there are at least two packs that need repacking.
>
> Having both of these functions keep track of this information themselves
> is not strictly necessary, since they also log which packs to repack via
> the `include_pack` array, so we can simply count the non-zero entries in
> that array after either function is done executing, reducing the overall
> amount of code necessary.

It does make the logic at the caller simpler to follow.

> -	if (batch_size) {
> -		if (fill_included_packs_batch(r, m, include_pack, batch_size))
> -			goto cleanup;
> -	} else if (fill_included_packs_all(r, m, include_pack))
> +	if (batch_size)
> +		fill_included_packs_batch(r, m, include_pack, batch_size);
> +	else
> +		fill_included_packs_all(r, m, include_pack);
> +
> +	for (i = 0; i < m->num_packs; i++) {
> +		if (include_pack[i])
> +			packs_to_repack++;
> +	}
> +	if (packs_to_repack <= 1)
>  		goto cleanup;
>  
>  	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);

Queued.  Thanks.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] midx-write: initial commit
  2024-03-25 20:30   ` Junio C Hamano
@ 2024-03-25 22:09     ` Taylor Blau
  0 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 22:09 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Jeff King

On Mon, Mar 25, 2024 at 01:30:32PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Introduce a new empty midx-write.c source file. Similar to the
> > relationship between "pack-bitmap.c" and "pack-bitmap-write.c", this
> > source file will hold code that is specific to writing MIDX files as
> > opposed to reading them (the latter will remain in midx.c).
> >
> > This is a preparatory step which will reduce the overall size of midx.c
> > and make it easier to read as we prepare for future changes to that file
> > (outside the immediate scope of this series).
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  Makefile     | 1 +
> >  midx-write.c | 2 ++
> >  2 files changed, 3 insertions(+)
> >  create mode 100644 midx-write.c
> >
> > diff --git a/Makefile b/Makefile
> > index 4e255c81f2..cf44a964c0 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1072,6 +1072,7 @@ LIB_OBJS += merge-ort-wrappers.o
> >  LIB_OBJS += merge-recursive.o
> >  LIB_OBJS += merge.o
> >  LIB_OBJS += midx.o
> > +LIB_OBJS += midx-write.o
> >  LIB_OBJS += name-hash.o
> >  LIB_OBJS += negotiator/default.o
> >  LIB_OBJS += negotiator/noop.o
> > diff --git a/midx-write.c b/midx-write.c
> > new file mode 100644
> > index 0000000000..214179d308
> > --- /dev/null
> > +++ b/midx-write.c
> > @@ -0,0 +1,2 @@
> > +#include "git-compat-util.h"
> > +#include "midx.h"
>
> I noticed that "nm midx-write.o" after this step gives us nothing.
> A source that produces absolutely nothing may not successfully
> compile everywhere.  It is unlikely we will stop the series at this
> step and it won't break bisection, though.
>
> I do not quite see why the first 3 patches need to be split this
> way, rather than being done as a single step.  Is it making the
> review any simpler?

I was hoping that it would make review simpler. If you're concerned
about an empty compilation unit, we could:

  - combine the first three patches into one, so that we start by just
    porting `midx_repack()`

  - combine the first seven patches into one, so that the move is done in
    a single step, or

  - leave it as-is

Whatever you think makes most sense is fine with me. The original
motivation behind splitting them was to make the steps easier to review,
but if that doesn't seem to be the case, I'm happy to combine them.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/11] midx-write.c: avoid directly managed temporary strbuf
  2024-03-25 20:33   ` Junio C Hamano
@ 2024-03-25 22:11     ` Taylor Blau
  0 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 22:11 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Jeff King

On Mon, Mar 25, 2024 at 01:33:11PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > In midx-write.c::midx_repack(), we construct the command-line arguments
> > for a pack-objects invocation which will combine objects from the packs
> > below our `--batch-size` option.
> >
> > To construct the base name of the output pack, we use a temporary
> > strbuf, and then push the result of that onto the strvec which holds the
> > command-line arguments, after which point we release the strbuf.
> >
> > We could replace this by doing something like:
> >
> >     struct strbuf buf = STRBUF_INIT;
> >     strbuf_addf(&buf, "%s/pack/pack", object_dir);
> >     strvec_push_nodup(&cmd.args, strbuf_detach(&buf));
>
> Hmph, I thought I saw another patch recently that uses
> strvec_pushf() to simplify such a sequence.  Does the technique
> apply here as well?
>
> Ah, yes, exactly.  See <9483038c-9529-4243-9b9a-97254fac29c1@web.de>

Hah. I wrote this patch on Saturday but didn't read the list before
sending it on Monday. Serves me right ;-). Feel free to drop this one,
or I can reroll based on René's patch.

Either is fine, just let me know :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/11] midx-write.c: check count of packs to repack after grouping
  2024-03-25 20:41   ` Junio C Hamano
@ 2024-03-25 22:11     ` Taylor Blau
  0 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-03-25 22:11 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Jeff King

On Mon, Mar 25, 2024 at 01:41:27PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> [...]
>
> Queued.  Thanks.

Thanks very much!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/11] midx-write.c: factor out common want_included_pack() routine
  2024-03-25 17:24 ` [PATCH 09/11] midx-write.c: factor out common want_included_pack() routine Taylor Blau
  2024-03-25 20:36   ` Junio C Hamano
@ 2024-03-27  8:29   ` Jeff King
  1 sibling, 0 replies; 30+ messages in thread
From: Jeff King @ 2024-03-27  8:29 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Junio C Hamano

On Mon, Mar 25, 2024 at 01:24:44PM -0400, Taylor Blau wrote:

> We could extract two functions (say, `want_included_pack()` and a
> `_nonempty()` variant), but this is not necessary. For the case in
> `fill_included_packs_all()` which does not check the pack size, we add
> all of the pack's objects assuming that the pack meets all other
> criteria. But if the pack is empty in the first place, we add all of its
> zero objects, so whether or not we "accept" or "reject" it in the first
> place is irrelevant.

OK, that makes sense. It does mean that we call the expensive-ish
open_pack_index() just to find out that it has 0 objects. But I guess if
we didn't reject it at this point, we'd soon open it anyway, so it's
probably not a big deal.

> +static int want_included_pack(struct repository *r,
> +			      struct multi_pack_index *m,
> +			      int pack_kept_objects,
> +			      uint32_t pack_int_id)

I wondered about this funky pack_int_id interface, rather than just
having the caller pass in the pack struct. But we need it because
one of the callers needs to load the pack struct by calling
prepare_midx_pack().

That should be a quick noop for the other caller, since the entry in
m->packs[] will already be filled in (and if it's not, then you've just
fixed a bug!).

So this all looks good to me.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/11] midx-write.c: use `--stdin-packs` when repacking
  2024-03-25 17:24 ` [PATCH 11/11] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
@ 2024-03-27  8:37   ` Jeff King
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2024-03-27  8:37 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Junio C Hamano

On Mon, Mar 25, 2024 at 01:24:50PM -0400, Taylor Blau wrote:

> Though correct, this typically yields a poorly structured pack, since
> providing the objects list over stdin does not give pack-objects a
> chance to discover the namehash values for each object, leading to
> sub-optimal delta selection.
> 
> We can use `--stdin-packs` instead, which has a couple of benefits:
> 
>   - it does a supplemental walk over objects in the supplied list of
>     packs to discover their namehash, leading to higher-quality delta
>     selection
> 
>   - it requires us to list far less data over stdin; instead of listing
>     each object in the resulting pack, we need only list the
>     constituent packs from which those objects were selected in the MIDX

Yeah, using --stdin-packs makes much more sense to me. I have to admit
that I do not really see the point "multi-pack-index repack" in the
first place. You'd presumably be better off with geometric repacking,
and I think "repack --geometric --write-midx" will do things in the
right order (new pack, then write midx, then delete redundant packs).

I guess "multi-pack-index repack" came before geometric repacking, and
is maybe redundant-ish now? If that's the case, then I guess I don't
care that much about optimizing its packs. ;) But we can't really delete
it without a deprecation period, so making it more sensible in the
meantime is reasonable to me.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (10 preceding siblings ...)
  2024-03-25 17:24 ` [PATCH 11/11] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
@ 2024-03-27  8:39 ` Jeff King
  2024-04-01 21:16 ` [PATCH v2 0/4] " Taylor Blau
  12 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2024-03-27  8:39 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Junio C Hamano

On Mon, Mar 25, 2024 at 01:24:11PM -0400, Taylor Blau wrote:

> This is a small-ish series that I worked on to split the functions from
> midx.c which are responsible for writing MIDX files into a separate
> compilation unit, midx-write.c.
> 
> This is done for a couple of reasons:
> 
>   - It reduces the size of midx.c, which is already quite large, thus
>     making it easier to read.
> 
>   - It more clearly separates responsibility between the two, similar to
>     the division between pack-bitmap.c, and pack-bitmap-write.c.

Yeah, I think this is a good cleanup. Like Junio, I did wonder about the
piece-meal movement. I kind of wonder if one big patch moving everything
would have been just as easy to review. But I'm happy either way.

The other refactoring patches all looked good to me.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 0/4] midx: split MIDX writing routines into midx-write.c, cleanup
  2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
                   ` (11 preceding siblings ...)
  2024-03-27  8:39 ` [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Jeff King
@ 2024-04-01 21:16 ` Taylor Blau
  2024-04-01 21:16   ` [PATCH v2 1/4] midx-write: move writing-related functions from midx.c Taylor Blau
                     ` (3 more replies)
  12 siblings, 4 replies; 30+ messages in thread
From: Taylor Blau @ 2024-04-01 21:16 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano, René Scharfe

Here is a reroll of my series 'tb/midx-write', now based on
'rs/midx-use-strvec-pushf', and reworked to incorporate feedback.

The series is mostly unchanged since last time, and the goal remains to
split out the writing-related functions from midx.c into a separate
compilation unit, midx-write.c.

Notable changes since last time include:

  - Rebasing on top of 'rs/midx-use-strvec-pushf', and dropping my patch
    which does the same thing as René's.

  - Combining the piece-meal migration patches from midx.c ->
    midx-write.c into a single patch.

Thanks in advance for your review!

Taylor Blau (4):
  midx-write: move writing-related functions from midx.c
  midx-write.c: factor out common want_included_pack() routine
  midx-write.c: check count of packs to repack after grouping
  midx-write.c: use `--stdin-packs` when repacking

 Makefile     |    1 +
 midx-write.c | 1525 +++++++++++++++++++++++++++++++++++++++++++++++++
 midx.c       | 1553 +-------------------------------------------------
 midx.h       |   19 +
 4 files changed, 1559 insertions(+), 1539 deletions(-)
 create mode 100644 midx-write.c

Range-diff against v1:
 1:  ffa8ba18de <  -:  ---------- midx-write: initial commit
 2:  b776fd528d <  -:  ---------- midx: extern a pair of shared functions
 3:  487a0ccda8 <  -:  ---------- midx: move `midx_repack` (and related functions) to midx-write.c
 4:  e2b6459aa8 <  -:  ---------- midx: move `expire_midx_packs` to midx-write.c
 5:  31d2e074fb <  -:  ---------- midx: move `write_midx_file_only` to midx-write.c
 6:  73977036d7 <  -:  ---------- midx: move `write_midx_file` to midx-write.c
 7:  e83ef66cf5 !  1:  8637ee3d0e midx: move `write_midx_internal` (and related functions) to midx-write.c
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    midx: move `write_midx_internal` (and related functions) to midx-write.c
    +    midx-write: move writing-related functions from midx.c
     
    -    Move the last writing-related function from midx.c to midx-write.c. This
    -    patch moves `write_midx_internal()`, along with all of the functions
    -    used to implement it into midx-write.c.
    +    Introduce a new midx-write.c source file, which holds all of the
    +    functionality from the MIDX sub-system related to writing new MIDX files.
     
    -    Like previous patch, this patch too does not introduce any functional
    -    changes, and is best moved with `--color-moved`.
    +    Similar to the relationship between "pack-bitmap.c" and
    +    "pack-bitmap-write.c", this source file will hold code that is specific
    +    to writing MIDX files as opposed to reading them (the latter will remain
    +    in midx.c).
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    - ## midx-write.c ##
    + ## Makefile ##
    +@@ Makefile: LIB_OBJS += merge-ort-wrappers.o
    + LIB_OBJS += merge-recursive.o
    + LIB_OBJS += merge.o
    + LIB_OBJS += midx.o
    ++LIB_OBJS += midx-write.o
    + LIB_OBJS += name-hash.o
    + LIB_OBJS += negotiator/default.o
    + LIB_OBJS += negotiator/noop.o
    +
    + ## midx-write.c (new) ##
     @@
    - #include "git-compat-util.h"
    ++#include "git-compat-util.h"
     +#include "abspath.h"
    - #include "config.h"
    - #include "hex.h"
    ++#include "config.h"
    ++#include "hex.h"
     +#include "lockfile.h"
    - #include "packfile.h"
    ++#include "packfile.h"
     +#include "object-file.h"
     +#include "hash-lookup.h"
    - #include "midx.h"
    - #include "progress.h"
    ++#include "midx.h"
    ++#include "progress.h"
     +#include "trace2.h"
    - #include "run-command.h"
    ++#include "run-command.h"
     +#include "chunk-format.h"
    - #include "pack-bitmap.h"
    ++#include "pack-bitmap.h"
     +#include "refs.h"
    - #include "revision.h"
    ++#include "revision.h"
     +#include "list-objects.h"
    - 
    --extern int write_midx_internal(const char *object_dir,
    ++
     +#define PACK_EXPIRED UINT_MAX
     +#define BITMAP_POS_UNKNOWN (~((uint32_t)0))
     +#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
    @@ midx-write.c
     +}
     +
     +static int write_midx_internal(const char *object_dir,
    - 			       struct string_list *packs_to_include,
    - 			       struct string_list *packs_to_drop,
    - 			       const char *preferred_pack_name,
    - 			       const char *refs_snapshot,
    --			       unsigned flags);
    ++			       struct string_list *packs_to_include,
    ++			       struct string_list *packs_to_drop,
    ++			       const char *preferred_pack_name,
    ++			       const char *refs_snapshot,
     +			       unsigned flags)
     +{
     +	struct strbuf midx_name = STRBUF_INIT;
    @@ midx-write.c
     +	int dropped_packs = 0;
     +	int result = 0;
     +	struct chunkfile *cf;
    - 
    --extern struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
    --							const char *object_dir);
    ++
     +	trace2_region_enter("midx", "write_midx_internal", the_repository);
     +
     +	get_midx_filename(&midx_name, object_dir);
    @@ midx-write.c
     +
     +	return result;
     +}
    - 
    - int write_midx_file(const char *object_dir,
    - 		    const char *preferred_pack_name,
    ++
    ++int write_midx_file(const char *object_dir,
    ++		    const char *preferred_pack_name,
    ++		    const char *refs_snapshot,
    ++		    unsigned flags)
    ++{
    ++	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
    ++				   refs_snapshot, flags);
    ++}
    ++
    ++int write_midx_file_only(const char *object_dir,
    ++			 struct string_list *packs_to_include,
    ++			 const char *preferred_pack_name,
    ++			 const char *refs_snapshot,
    ++			 unsigned flags)
    ++{
    ++	return write_midx_internal(object_dir, packs_to_include, NULL,
    ++				   preferred_pack_name, refs_snapshot, flags);
    ++}
    ++
    ++int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
    ++{
    ++	uint32_t i, *count, result = 0;
    ++	struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
    ++	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
    ++	struct progress *progress = NULL;
    ++
    ++	if (!m)
    ++		return 0;
    ++
    ++	CALLOC_ARRAY(count, m->num_packs);
    ++
    ++	if (flags & MIDX_PROGRESS)
    ++		progress = start_delayed_progress(_("Counting referenced objects"),
    ++					  m->num_objects);
    ++	for (i = 0; i < m->num_objects; i++) {
    ++		int pack_int_id = nth_midxed_pack_int_id(m, i);
    ++		count[pack_int_id]++;
    ++		display_progress(progress, i + 1);
    ++	}
    ++	stop_progress(&progress);
    ++
    ++	if (flags & MIDX_PROGRESS)
    ++		progress = start_delayed_progress(_("Finding and deleting unreferenced packfiles"),
    ++					  m->num_packs);
    ++	for (i = 0; i < m->num_packs; i++) {
    ++		char *pack_name;
    ++		display_progress(progress, i + 1);
    ++
    ++		if (count[i])
    ++			continue;
    ++
    ++		if (prepare_midx_pack(r, m, i))
    ++			continue;
    ++
    ++		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
    ++			continue;
    ++
    ++		pack_name = xstrdup(m->packs[i]->pack_name);
    ++		close_pack(m->packs[i]);
    ++
    ++		string_list_insert(&packs_to_drop, m->pack_names[i]);
    ++		unlink_pack_path(pack_name, 0);
    ++		free(pack_name);
    ++	}
    ++	stop_progress(&progress);
    ++
    ++	free(count);
    ++
    ++	if (packs_to_drop.nr)
    ++		result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, NULL, flags);
    ++
    ++	string_list_clear(&packs_to_drop, 0);
    ++
    ++	return result;
    ++}
    ++
    ++struct repack_info {
    ++	timestamp_t mtime;
    ++	uint32_t referenced_objects;
    ++	uint32_t pack_int_id;
    ++};
    ++
    ++static int compare_by_mtime(const void *a_, const void *b_)
    ++{
    ++	const struct repack_info *a, *b;
    ++
    ++	a = (const struct repack_info *)a_;
    ++	b = (const struct repack_info *)b_;
    ++
    ++	if (a->mtime < b->mtime)
    ++		return -1;
    ++	if (a->mtime > b->mtime)
    ++		return 1;
    ++	return 0;
    ++}
    ++
    ++static int fill_included_packs_all(struct repository *r,
    ++				   struct multi_pack_index *m,
    ++				   unsigned char *include_pack)
    ++{
    ++	uint32_t i, count = 0;
    ++	int pack_kept_objects = 0;
    ++
    ++	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
    ++
    ++	for (i = 0; i < m->num_packs; i++) {
    ++		if (prepare_midx_pack(r, m, i))
    ++			continue;
    ++		if (!pack_kept_objects && m->packs[i]->pack_keep)
    ++			continue;
    ++		if (m->packs[i]->is_cruft)
    ++			continue;
    ++
    ++		include_pack[i] = 1;
    ++		count++;
    ++	}
    ++
    ++	return count < 2;
    ++}
    ++
    ++static int fill_included_packs_batch(struct repository *r,
    ++				     struct multi_pack_index *m,
    ++				     unsigned char *include_pack,
    ++				     size_t batch_size)
    ++{
    ++	uint32_t i, packs_to_repack;
    ++	size_t total_size;
    ++	struct repack_info *pack_info;
    ++	int pack_kept_objects = 0;
    ++
    ++	CALLOC_ARRAY(pack_info, m->num_packs);
    ++
    ++	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
    ++
    ++	for (i = 0; i < m->num_packs; i++) {
    ++		pack_info[i].pack_int_id = i;
    ++
    ++		if (prepare_midx_pack(r, m, i))
    ++			continue;
    ++
    ++		pack_info[i].mtime = m->packs[i]->mtime;
    ++	}
    ++
    ++	for (i = 0; i < m->num_objects; i++) {
    ++		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
    ++		pack_info[pack_int_id].referenced_objects++;
    ++	}
    ++
    ++	QSORT(pack_info, m->num_packs, compare_by_mtime);
    ++
    ++	total_size = 0;
    ++	packs_to_repack = 0;
    ++	for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
    ++		int pack_int_id = pack_info[i].pack_int_id;
    ++		struct packed_git *p = m->packs[pack_int_id];
    ++		size_t expected_size;
    ++
    ++		if (!p)
    ++			continue;
    ++		if (!pack_kept_objects && p->pack_keep)
    ++			continue;
    ++		if (p->is_cruft)
    ++			continue;
    ++		if (open_pack_index(p) || !p->num_objects)
    ++			continue;
    ++
    ++		expected_size = st_mult(p->pack_size,
    ++					pack_info[i].referenced_objects);
    ++		expected_size /= p->num_objects;
    ++
    ++		if (expected_size >= batch_size)
    ++			continue;
    ++
    ++		packs_to_repack++;
    ++		total_size += expected_size;
    ++		include_pack[pack_int_id] = 1;
    ++	}
    ++
    ++	free(pack_info);
    ++
    ++	if (packs_to_repack < 2)
    ++		return 1;
    ++
    ++	return 0;
    ++}
    ++
    ++int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
    ++{
    ++	int result = 0;
    ++	uint32_t i;
    ++	unsigned char *include_pack;
    ++	struct child_process cmd = CHILD_PROCESS_INIT;
    ++	FILE *cmd_in;
    ++	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
    ++
    ++	/*
    ++	 * When updating the default for these configuration
    ++	 * variables in builtin/repack.c, these must be adjusted
    ++	 * to match.
    ++	 */
    ++	int delta_base_offset = 1;
    ++	int use_delta_islands = 0;
    ++
    ++	if (!m)
    ++		return 0;
    ++
    ++	CALLOC_ARRAY(include_pack, m->num_packs);
    ++
    ++	if (batch_size) {
    ++		if (fill_included_packs_batch(r, m, include_pack, batch_size))
    ++			goto cleanup;
    ++	} else if (fill_included_packs_all(r, m, include_pack))
    ++		goto cleanup;
    ++
    ++	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
    ++	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
    ++
    ++	strvec_push(&cmd.args, "pack-objects");
    ++
    ++	strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
    ++
    ++	if (delta_base_offset)
    ++		strvec_push(&cmd.args, "--delta-base-offset");
    ++	if (use_delta_islands)
    ++		strvec_push(&cmd.args, "--delta-islands");
    ++
    ++	if (flags & MIDX_PROGRESS)
    ++		strvec_push(&cmd.args, "--progress");
    ++	else
    ++		strvec_push(&cmd.args, "-q");
    ++
    ++	cmd.git_cmd = 1;
    ++	cmd.in = cmd.out = -1;
    ++
    ++	if (start_command(&cmd)) {
    ++		error(_("could not start pack-objects"));
    ++		result = 1;
    ++		goto cleanup;
    ++	}
    ++
    ++	cmd_in = xfdopen(cmd.in, "w");
    ++
    ++	for (i = 0; i < m->num_objects; i++) {
    ++		struct object_id oid;
    ++		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
    ++
    ++		if (!include_pack[pack_int_id])
    ++			continue;
    ++
    ++		nth_midxed_object_oid(&oid, m, i);
    ++		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
    ++	}
    ++	fclose(cmd_in);
    ++
    ++	if (finish_command(&cmd)) {
    ++		error(_("could not finish pack-objects"));
    ++		result = 1;
    ++		goto cleanup;
    ++	}
    ++
    ++	result = write_midx_internal(object_dir, NULL, NULL, NULL, NULL, flags);
    ++
    ++cleanup:
    ++	free(include_pack);
    ++	return result;
    ++}
     
      ## midx.c ##
     @@
    @@ midx.c
     -#include "list-objects.h"
      #include "pack-revindex.h"
      
    --struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
    --							const char *object_dir);
    --
    --int write_midx_internal(const char *object_dir,
    --			       struct string_list *packs_to_include,
    --			       struct string_list *packs_to_drop,
    --			       const char *preferred_pack_name,
    --			       const char *refs_snapshot,
    --			       unsigned flags);
    --
     -#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
     -#define MIDX_VERSION 1
     -#define MIDX_BYTE_FILE_VERSION 4
    @@ midx.c: int prepare_multi_pack_index_one(struct repository *r, const char *objec
     -	return ret;
     -}
     -
    --struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
    +-static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
     -							const char *object_dir)
     -{
     -	struct multi_pack_index *result = NULL;
    @@ midx.c: int prepare_multi_pack_index_one(struct repository *r, const char *objec
     -	return result;
     -}
     -
    --int write_midx_internal(const char *object_dir,
    --			struct string_list *packs_to_include,
    --			struct string_list *packs_to_drop,
    --			const char *preferred_pack_name,
    --			const char *refs_snapshot,
    --			unsigned flags)
    +-static int write_midx_internal(const char *object_dir,
    +-			       struct string_list *packs_to_include,
    +-			       struct string_list *packs_to_drop,
    +-			       const char *preferred_pack_name,
    +-			       const char *refs_snapshot,
    +-			       unsigned flags)
     -{
     -	struct strbuf midx_name = STRBUF_INIT;
     -	unsigned char midx_hash[GIT_MAX_RAWSZ];
    @@ midx.c: int prepare_multi_pack_index_one(struct repository *r, const char *objec
     -
     -	return result;
     -}
    +-
    +-int write_midx_file(const char *object_dir,
    +-		    const char *preferred_pack_name,
    +-		    const char *refs_snapshot,
    +-		    unsigned flags)
    +-{
    +-	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
    +-				   refs_snapshot, flags);
    +-}
    +-
    +-int write_midx_file_only(const char *object_dir,
    +-			 struct string_list *packs_to_include,
    +-			 const char *preferred_pack_name,
    +-			 const char *refs_snapshot,
    +-			 unsigned flags)
    +-{
    +-	return write_midx_internal(object_dir, packs_to_include, NULL,
    +-				   preferred_pack_name, refs_snapshot, flags);
    +-}
     -
      struct clear_midx_data {
      	char *keep;
    @@ midx.c: static void clear_midx_file_ext(const char *full_path, size_t full_path_
      {
      	struct clear_midx_data data;
      	memset(&data, 0, sizeof(struct clear_midx_data));
    +@@ midx.c: int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
    + 
    + 	return verify_midx_error;
    + }
    +-
    +-int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
    +-{
    +-	uint32_t i, *count, result = 0;
    +-	struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
    +-	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
    +-	struct progress *progress = NULL;
    +-
    +-	if (!m)
    +-		return 0;
    +-
    +-	CALLOC_ARRAY(count, m->num_packs);
    +-
    +-	if (flags & MIDX_PROGRESS)
    +-		progress = start_delayed_progress(_("Counting referenced objects"),
    +-					  m->num_objects);
    +-	for (i = 0; i < m->num_objects; i++) {
    +-		int pack_int_id = nth_midxed_pack_int_id(m, i);
    +-		count[pack_int_id]++;
    +-		display_progress(progress, i + 1);
    +-	}
    +-	stop_progress(&progress);
    +-
    +-	if (flags & MIDX_PROGRESS)
    +-		progress = start_delayed_progress(_("Finding and deleting unreferenced packfiles"),
    +-					  m->num_packs);
    +-	for (i = 0; i < m->num_packs; i++) {
    +-		char *pack_name;
    +-		display_progress(progress, i + 1);
    +-
    +-		if (count[i])
    +-			continue;
    +-
    +-		if (prepare_midx_pack(r, m, i))
    +-			continue;
    +-
    +-		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
    +-			continue;
    +-
    +-		pack_name = xstrdup(m->packs[i]->pack_name);
    +-		close_pack(m->packs[i]);
    +-
    +-		string_list_insert(&packs_to_drop, m->pack_names[i]);
    +-		unlink_pack_path(pack_name, 0);
    +-		free(pack_name);
    +-	}
    +-	stop_progress(&progress);
    +-
    +-	free(count);
    +-
    +-	if (packs_to_drop.nr)
    +-		result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, NULL, flags);
    +-
    +-	string_list_clear(&packs_to_drop, 0);
    +-
    +-	return result;
    +-}
    +-
    +-struct repack_info {
    +-	timestamp_t mtime;
    +-	uint32_t referenced_objects;
    +-	uint32_t pack_int_id;
    +-};
    +-
    +-static int compare_by_mtime(const void *a_, const void *b_)
    +-{
    +-	const struct repack_info *a, *b;
    +-
    +-	a = (const struct repack_info *)a_;
    +-	b = (const struct repack_info *)b_;
    +-
    +-	if (a->mtime < b->mtime)
    +-		return -1;
    +-	if (a->mtime > b->mtime)
    +-		return 1;
    +-	return 0;
    +-}
    +-
    +-static int fill_included_packs_all(struct repository *r,
    +-				   struct multi_pack_index *m,
    +-				   unsigned char *include_pack)
    +-{
    +-	uint32_t i, count = 0;
    +-	int pack_kept_objects = 0;
    +-
    +-	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
    +-
    +-	for (i = 0; i < m->num_packs; i++) {
    +-		if (prepare_midx_pack(r, m, i))
    +-			continue;
    +-		if (!pack_kept_objects && m->packs[i]->pack_keep)
    +-			continue;
    +-		if (m->packs[i]->is_cruft)
    +-			continue;
    +-
    +-		include_pack[i] = 1;
    +-		count++;
    +-	}
    +-
    +-	return count < 2;
    +-}
    +-
    +-static int fill_included_packs_batch(struct repository *r,
    +-				     struct multi_pack_index *m,
    +-				     unsigned char *include_pack,
    +-				     size_t batch_size)
    +-{
    +-	uint32_t i, packs_to_repack;
    +-	size_t total_size;
    +-	struct repack_info *pack_info;
    +-	int pack_kept_objects = 0;
    +-
    +-	CALLOC_ARRAY(pack_info, m->num_packs);
    +-
    +-	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
    +-
    +-	for (i = 0; i < m->num_packs; i++) {
    +-		pack_info[i].pack_int_id = i;
    +-
    +-		if (prepare_midx_pack(r, m, i))
    +-			continue;
    +-
    +-		pack_info[i].mtime = m->packs[i]->mtime;
    +-	}
    +-
    +-	for (i = 0; i < m->num_objects; i++) {
    +-		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
    +-		pack_info[pack_int_id].referenced_objects++;
    +-	}
    +-
    +-	QSORT(pack_info, m->num_packs, compare_by_mtime);
    +-
    +-	total_size = 0;
    +-	packs_to_repack = 0;
    +-	for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
    +-		int pack_int_id = pack_info[i].pack_int_id;
    +-		struct packed_git *p = m->packs[pack_int_id];
    +-		size_t expected_size;
    +-
    +-		if (!p)
    +-			continue;
    +-		if (!pack_kept_objects && p->pack_keep)
    +-			continue;
    +-		if (p->is_cruft)
    +-			continue;
    +-		if (open_pack_index(p) || !p->num_objects)
    +-			continue;
    +-
    +-		expected_size = st_mult(p->pack_size,
    +-					pack_info[i].referenced_objects);
    +-		expected_size /= p->num_objects;
    +-
    +-		if (expected_size >= batch_size)
    +-			continue;
    +-
    +-		packs_to_repack++;
    +-		total_size += expected_size;
    +-		include_pack[pack_int_id] = 1;
    +-	}
    +-
    +-	free(pack_info);
    +-
    +-	if (packs_to_repack < 2)
    +-		return 1;
    +-
    +-	return 0;
    +-}
    +-
    +-int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
    +-{
    +-	int result = 0;
    +-	uint32_t i;
    +-	unsigned char *include_pack;
    +-	struct child_process cmd = CHILD_PROCESS_INIT;
    +-	FILE *cmd_in;
    +-	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
    +-
    +-	/*
    +-	 * When updating the default for these configuration
    +-	 * variables in builtin/repack.c, these must be adjusted
    +-	 * to match.
    +-	 */
    +-	int delta_base_offset = 1;
    +-	int use_delta_islands = 0;
    +-
    +-	if (!m)
    +-		return 0;
    +-
    +-	CALLOC_ARRAY(include_pack, m->num_packs);
    +-
    +-	if (batch_size) {
    +-		if (fill_included_packs_batch(r, m, include_pack, batch_size))
    +-			goto cleanup;
    +-	} else if (fill_included_packs_all(r, m, include_pack))
    +-		goto cleanup;
    +-
    +-	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
    +-	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
    +-
    +-	strvec_push(&cmd.args, "pack-objects");
    +-
    +-	strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
    +-
    +-	if (delta_base_offset)
    +-		strvec_push(&cmd.args, "--delta-base-offset");
    +-	if (use_delta_islands)
    +-		strvec_push(&cmd.args, "--delta-islands");
    +-
    +-	if (flags & MIDX_PROGRESS)
    +-		strvec_push(&cmd.args, "--progress");
    +-	else
    +-		strvec_push(&cmd.args, "-q");
    +-
    +-	cmd.git_cmd = 1;
    +-	cmd.in = cmd.out = -1;
    +-
    +-	if (start_command(&cmd)) {
    +-		error(_("could not start pack-objects"));
    +-		result = 1;
    +-		goto cleanup;
    +-	}
    +-
    +-	cmd_in = xfdopen(cmd.in, "w");
    +-
    +-	for (i = 0; i < m->num_objects; i++) {
    +-		struct object_id oid;
    +-		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
    +-
    +-		if (!include_pack[pack_int_id])
    +-			continue;
    +-
    +-		nth_midxed_object_oid(&oid, m, i);
    +-		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
    +-	}
    +-	fclose(cmd_in);
    +-
    +-	if (finish_command(&cmd)) {
    +-		error(_("could not finish pack-objects"));
    +-		result = 1;
    +-		goto cleanup;
    +-	}
    +-
    +-	result = write_midx_internal(object_dir, NULL, NULL, NULL, NULL, flags);
    +-
    +-cleanup:
    +-	free(include_pack);
    +-	return result;
    +-}
     
      ## midx.h ##
     @@ midx.h: struct pack_entry;
 8:  8e32755c49 <  -:  ---------- midx-write.c: avoid directly managed temporary strbuf
 9:  5475b09a7a =  2:  0064e363c0 midx-write.c: factor out common want_included_pack() routine
10:  f77e3167aa =  3:  b121f05a32 midx-write.c: check count of packs to repack after grouping
11:  736be63234 !  4:  b5d6ba5802 midx-write.c: use `--stdin-packs` when repacking
    @@ midx-write.c: int midx_repack(struct repository *r, const char *object_dir, size
     +	strvec_pushl(&cmd.args, "pack-objects", "--stdin-packs", "--non-empty",
     +		     NULL);
      
    - 	if (delta_base_offset)
    - 		strvec_push(&cmd.args, "--delta-base-offset");
    + 	strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
    + 
     @@ midx-write.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
      	}
      

base-commit: 11c821f2f2a31e70fb5cc449f9a29401c333aad2
prerequisite-patch-id: 3f7553c1b52071c935dd9ad3f94f7e675247498b
-- 
2.44.0.330.g158d2a670b4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/4] midx-write: move writing-related functions from midx.c
  2024-04-01 21:16 ` [PATCH v2 0/4] " Taylor Blau
@ 2024-04-01 21:16   ` Taylor Blau
  2024-04-01 21:16   ` [PATCH v2 2/4] midx-write.c: factor out common want_included_pack() routine Taylor Blau
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-04-01 21:16 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano, René Scharfe

Introduce a new midx-write.c source file, which holds all of the
functionality from the MIDX sub-system related to writing new MIDX files.

Similar to the relationship between "pack-bitmap.c" and
"pack-bitmap-write.c", this source file will hold code that is specific
to writing MIDX files as opposed to reading them (the latter will remain
in midx.c).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Makefile     |    1 +
 midx-write.c | 1521 ++++++++++++++++++++++++++++++++++++++++++++++++
 midx.c       | 1553 +-------------------------------------------------
 midx.h       |   19 +
 4 files changed, 1555 insertions(+), 1539 deletions(-)
 create mode 100644 midx-write.c

diff --git a/Makefile b/Makefile
index 4e255c81f2..cf44a964c0 100644
--- a/Makefile
+++ b/Makefile
@@ -1072,6 +1072,7 @@ LIB_OBJS += merge-ort-wrappers.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += merge.o
 LIB_OBJS += midx.o
+LIB_OBJS += midx-write.o
 LIB_OBJS += name-hash.o
 LIB_OBJS += negotiator/default.o
 LIB_OBJS += negotiator/noop.o
diff --git a/midx-write.c b/midx-write.c
new file mode 100644
index 0000000000..5242d2a724
--- /dev/null
+++ b/midx-write.c
@@ -0,0 +1,1521 @@
+#include "git-compat-util.h"
+#include "abspath.h"
+#include "config.h"
+#include "hex.h"
+#include "lockfile.h"
+#include "packfile.h"
+#include "object-file.h"
+#include "hash-lookup.h"
+#include "midx.h"
+#include "progress.h"
+#include "trace2.h"
+#include "run-command.h"
+#include "chunk-format.h"
+#include "pack-bitmap.h"
+#include "refs.h"
+#include "revision.h"
+#include "list-objects.h"
+
+#define PACK_EXPIRED UINT_MAX
+#define BITMAP_POS_UNKNOWN (~((uint32_t)0))
+#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
+#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
+
+extern int midx_checksum_valid(struct multi_pack_index *m);
+extern void clear_midx_files_ext(const char *object_dir, const char *ext,
+				 unsigned char *keep_hash);
+extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
+				const char *idx_name);
+
+static size_t write_midx_header(struct hashfile *f,
+				unsigned char num_chunks,
+				uint32_t num_packs)
+{
+	hashwrite_be32(f, MIDX_SIGNATURE);
+	hashwrite_u8(f, MIDX_VERSION);
+	hashwrite_u8(f, oid_version(the_hash_algo));
+	hashwrite_u8(f, num_chunks);
+	hashwrite_u8(f, 0); /* unused */
+	hashwrite_be32(f, num_packs);
+
+	return MIDX_HEADER_SIZE;
+}
+
+struct pack_info {
+	uint32_t orig_pack_int_id;
+	char *pack_name;
+	struct packed_git *p;
+
+	uint32_t bitmap_pos;
+	uint32_t bitmap_nr;
+
+	unsigned expired : 1;
+};
+
+static void fill_pack_info(struct pack_info *info,
+			   struct packed_git *p, const char *pack_name,
+			   uint32_t orig_pack_int_id)
+{
+	memset(info, 0, sizeof(struct pack_info));
+
+	info->orig_pack_int_id = orig_pack_int_id;
+	info->pack_name = xstrdup(pack_name);
+	info->p = p;
+	info->bitmap_pos = BITMAP_POS_UNKNOWN;
+}
+
+static int pack_info_compare(const void *_a, const void *_b)
+{
+	struct pack_info *a = (struct pack_info *)_a;
+	struct pack_info *b = (struct pack_info *)_b;
+	return strcmp(a->pack_name, b->pack_name);
+}
+
+static int idx_or_pack_name_cmp(const void *_va, const void *_vb)
+{
+	const char *pack_name = _va;
+	const struct pack_info *compar = _vb;
+
+	return cmp_idx_or_pack_name(pack_name, compar->pack_name);
+}
+
+struct write_midx_context {
+	struct pack_info *info;
+	size_t nr;
+	size_t alloc;
+	struct multi_pack_index *m;
+	struct progress *progress;
+	unsigned pack_paths_checked;
+
+	struct pack_midx_entry *entries;
+	size_t entries_nr;
+
+	uint32_t *pack_perm;
+	uint32_t *pack_order;
+	unsigned large_offsets_needed:1;
+	uint32_t num_large_offsets;
+
+	int preferred_pack_idx;
+
+	struct string_list *to_include;
+};
+
+static void add_pack_to_midx(const char *full_path, size_t full_path_len,
+			     const char *file_name, void *data)
+{
+	struct write_midx_context *ctx = data;
+	struct packed_git *p;
+
+	if (ends_with(file_name, ".idx")) {
+		display_progress(ctx->progress, ++ctx->pack_paths_checked);
+		/*
+		 * Note that at most one of ctx->m and ctx->to_include are set,
+		 * so we are testing midx_contains_pack() and
+		 * string_list_has_string() independently (guarded by the
+		 * appropriate NULL checks).
+		 *
+		 * We could support passing to_include while reusing an existing
+		 * MIDX, but don't currently since the reuse process drags
+		 * forward all packs from an existing MIDX (without checking
+		 * whether or not they appear in the to_include list).
+		 *
+		 * If we added support for that, these next two conditional
+		 * should be performed independently (likely checking
+		 * to_include before the existing MIDX).
+		 */
+		if (ctx->m && midx_contains_pack(ctx->m, file_name))
+			return;
+		else if (ctx->to_include &&
+			 !string_list_has_string(ctx->to_include, file_name))
+			return;
+
+		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+
+		p = add_packed_git(full_path, full_path_len, 0);
+		if (!p) {
+			warning(_("failed to add packfile '%s'"),
+				full_path);
+			return;
+		}
+
+		if (open_pack_index(p)) {
+			warning(_("failed to open pack-index '%s'"),
+				full_path);
+			close_pack(p);
+			free(p);
+			return;
+		}
+
+		fill_pack_info(&ctx->info[ctx->nr], p, file_name, ctx->nr);
+		ctx->nr++;
+	}
+}
+
+struct pack_midx_entry {
+	struct object_id oid;
+	uint32_t pack_int_id;
+	time_t pack_mtime;
+	uint64_t offset;
+	unsigned preferred : 1;
+};
+
+static int midx_oid_compare(const void *_a, const void *_b)
+{
+	const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
+	const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
+	int cmp = oidcmp(&a->oid, &b->oid);
+
+	if (cmp)
+		return cmp;
+
+	/* Sort objects in a preferred pack first when multiple copies exist. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
+	if (a->pack_mtime > b->pack_mtime)
+		return -1;
+	else if (a->pack_mtime < b->pack_mtime)
+		return 1;
+
+	return a->pack_int_id - b->pack_int_id;
+}
+
+static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
+				      struct pack_midx_entry *e,
+				      uint32_t pos)
+{
+	if (pos >= m->num_objects)
+		return 1;
+
+	nth_midxed_object_oid(&e->oid, m, pos);
+	e->pack_int_id = nth_midxed_pack_int_id(m, pos);
+	e->offset = nth_midxed_offset(m, pos);
+
+	/* consider objects in midx to be from "old" packs */
+	e->pack_mtime = 0;
+	return 0;
+}
+
+static void fill_pack_entry(uint32_t pack_int_id,
+			    struct packed_git *p,
+			    uint32_t cur_object,
+			    struct pack_midx_entry *entry,
+			    int preferred)
+{
+	if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
+		die(_("failed to locate object %d in packfile"), cur_object);
+
+	entry->pack_int_id = pack_int_id;
+	entry->pack_mtime = p->mtime;
+
+	entry->offset = nth_packed_object_offset(p, cur_object);
+	entry->preferred = !!preferred;
+}
+
+struct midx_fanout {
+	struct pack_midx_entry *entries;
+	size_t nr, alloc;
+};
+
+static void midx_fanout_grow(struct midx_fanout *fanout, size_t nr)
+{
+	if (nr < fanout->nr)
+		BUG("negative growth in midx_fanout_grow() (%"PRIuMAX" < %"PRIuMAX")",
+		    (uintmax_t)nr, (uintmax_t)fanout->nr);
+	ALLOC_GROW(fanout->entries, nr, fanout->alloc);
+}
+
+static void midx_fanout_sort(struct midx_fanout *fanout)
+{
+	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
+}
+
+static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
+					struct multi_pack_index *m,
+					uint32_t cur_fanout,
+					int preferred_pack)
+{
+	uint32_t start = 0, end;
+	uint32_t cur_object;
+
+	if (cur_fanout)
+		start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
+	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
+
+	for (cur_object = start; cur_object < end; cur_object++) {
+		if ((preferred_pack > -1) &&
+		    (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
+			/*
+			 * Objects from preferred packs are added
+			 * separately.
+			 */
+			continue;
+		}
+
+		midx_fanout_grow(fanout, fanout->nr + 1);
+		nth_midxed_pack_midx_entry(m,
+					   &fanout->entries[fanout->nr],
+					   cur_object);
+		fanout->entries[fanout->nr].preferred = 0;
+		fanout->nr++;
+	}
+}
+
+static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
+					struct pack_info *info,
+					uint32_t cur_pack,
+					int preferred,
+					uint32_t cur_fanout)
+{
+	struct packed_git *pack = info[cur_pack].p;
+	uint32_t start = 0, end;
+	uint32_t cur_object;
+
+	if (cur_fanout)
+		start = get_pack_fanout(pack, cur_fanout - 1);
+	end = get_pack_fanout(pack, cur_fanout);
+
+	for (cur_object = start; cur_object < end; cur_object++) {
+		midx_fanout_grow(fanout, fanout->nr + 1);
+		fill_pack_entry(cur_pack,
+				info[cur_pack].p,
+				cur_object,
+				&fanout->entries[fanout->nr],
+				preferred);
+		fanout->nr++;
+	}
+}
+
+/*
+ * It is possible to artificially get into a state where there are many
+ * duplicate copies of objects. That can create high memory pressure if
+ * we are to create a list of all objects before de-duplication. To reduce
+ * this memory pressure without a significant performance drop, automatically
+ * group objects by the first byte of their object id. Use the IDX fanout
+ * tables to group the data, copy to a local array, then sort.
+ *
+ * Copy only the de-duplicated entries (selected by most-recent modified time
+ * of a packfile containing the object).
+ */
+static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
+						  struct pack_info *info,
+						  uint32_t nr_packs,
+						  size_t *nr_objects,
+						  int preferred_pack)
+{
+	uint32_t cur_fanout, cur_pack, cur_object;
+	size_t alloc_objects, total_objects = 0;
+	struct midx_fanout fanout = { 0 };
+	struct pack_midx_entry *deduplicated_entries = NULL;
+	uint32_t start_pack = m ? m->num_packs : 0;
+
+	for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
+		total_objects = st_add(total_objects,
+				       info[cur_pack].p->num_objects);
+
+	/*
+	 * As we de-duplicate by fanout value, we expect the fanout
+	 * slices to be evenly distributed, with some noise. Hence,
+	 * allocate slightly more than one 256th.
+	 */
+	alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
+
+	ALLOC_ARRAY(fanout.entries, fanout.alloc);
+	ALLOC_ARRAY(deduplicated_entries, alloc_objects);
+	*nr_objects = 0;
+
+	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
+		fanout.nr = 0;
+
+		if (m)
+			midx_fanout_add_midx_fanout(&fanout, m, cur_fanout,
+						    preferred_pack);
+
+		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
+			int preferred = cur_pack == preferred_pack;
+			midx_fanout_add_pack_fanout(&fanout,
+						    info, cur_pack,
+						    preferred, cur_fanout);
+		}
+
+		if (-1 < preferred_pack && preferred_pack < start_pack)
+			midx_fanout_add_pack_fanout(&fanout, info,
+						    preferred_pack, 1,
+						    cur_fanout);
+
+		midx_fanout_sort(&fanout);
+
+		/*
+		 * The batch is now sorted by OID and then mtime (descending).
+		 * Take only the first duplicate.
+		 */
+		for (cur_object = 0; cur_object < fanout.nr; cur_object++) {
+			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
+						&fanout.entries[cur_object].oid))
+				continue;
+
+			ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1),
+				   alloc_objects);
+			memcpy(&deduplicated_entries[*nr_objects],
+			       &fanout.entries[cur_object],
+			       sizeof(struct pack_midx_entry));
+			(*nr_objects)++;
+		}
+	}
+
+	free(fanout.entries);
+	return deduplicated_entries;
+}
+
+static int write_midx_pack_names(struct hashfile *f, void *data)
+{
+	struct write_midx_context *ctx = data;
+	uint32_t i;
+	unsigned char padding[MIDX_CHUNK_ALIGNMENT];
+	size_t written = 0;
+
+	for (i = 0; i < ctx->nr; i++) {
+		size_t writelen;
+
+		if (ctx->info[i].expired)
+			continue;
+
+		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
+			BUG("incorrect pack-file order: %s before %s",
+			    ctx->info[i - 1].pack_name,
+			    ctx->info[i].pack_name);
+
+		writelen = strlen(ctx->info[i].pack_name) + 1;
+		hashwrite(f, ctx->info[i].pack_name, writelen);
+		written += writelen;
+	}
+
+	/* add padding to be aligned */
+	i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT);
+	if (i < MIDX_CHUNK_ALIGNMENT) {
+		memset(padding, 0, sizeof(padding));
+		hashwrite(f, padding, i);
+	}
+
+	return 0;
+}
+
+static int write_midx_bitmapped_packs(struct hashfile *f, void *data)
+{
+	struct write_midx_context *ctx = data;
+	size_t i;
+
+	for (i = 0; i < ctx->nr; i++) {
+		struct pack_info *pack = &ctx->info[i];
+		if (pack->expired)
+			continue;
+
+		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN && pack->bitmap_nr)
+			BUG("pack '%s' has no bitmap position, but has %d bitmapped object(s)",
+			    pack->pack_name, pack->bitmap_nr);
+
+		hashwrite_be32(f, pack->bitmap_pos);
+		hashwrite_be32(f, pack->bitmap_nr);
+	}
+	return 0;
+}
+
+static int write_midx_oid_fanout(struct hashfile *f,
+				 void *data)
+{
+	struct write_midx_context *ctx = data;
+	struct pack_midx_entry *list = ctx->entries;
+	struct pack_midx_entry *last = ctx->entries + ctx->entries_nr;
+	uint32_t count = 0;
+	uint32_t i;
+
+	/*
+	* Write the first-level table (the list is sorted,
+	* but we use a 256-entry lookup to be able to avoid
+	* having to do eight extra binary search iterations).
+	*/
+	for (i = 0; i < 256; i++) {
+		struct pack_midx_entry *next = list;
+
+		while (next < last && next->oid.hash[0] == i) {
+			count++;
+			next++;
+		}
+
+		hashwrite_be32(f, count);
+		list = next;
+	}
+
+	return 0;
+}
+
+static int write_midx_oid_lookup(struct hashfile *f,
+				 void *data)
+{
+	struct write_midx_context *ctx = data;
+	unsigned char hash_len = the_hash_algo->rawsz;
+	struct pack_midx_entry *list = ctx->entries;
+	uint32_t i;
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *obj = list++;
+
+		if (i < ctx->entries_nr - 1) {
+			struct pack_midx_entry *next = list;
+			if (oidcmp(&obj->oid, &next->oid) >= 0)
+				BUG("OIDs not in order: %s >= %s",
+				    oid_to_hex(&obj->oid),
+				    oid_to_hex(&next->oid));
+		}
+
+		hashwrite(f, obj->oid.hash, (int)hash_len);
+	}
+
+	return 0;
+}
+
+static int write_midx_object_offsets(struct hashfile *f,
+				     void *data)
+{
+	struct write_midx_context *ctx = data;
+	struct pack_midx_entry *list = ctx->entries;
+	uint32_t i, nr_large_offset = 0;
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *obj = list++;
+
+		if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
+			BUG("object %s is in an expired pack with int-id %d",
+			    oid_to_hex(&obj->oid),
+			    obj->pack_int_id);
+
+		hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
+
+		if (ctx->large_offsets_needed && obj->offset >> 31)
+			hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
+		else if (!ctx->large_offsets_needed && obj->offset >> 32)
+			BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!",
+			    oid_to_hex(&obj->oid),
+			    obj->offset);
+		else
+			hashwrite_be32(f, (uint32_t)obj->offset);
+	}
+
+	return 0;
+}
+
+static int write_midx_large_offsets(struct hashfile *f,
+				    void *data)
+{
+	struct write_midx_context *ctx = data;
+	struct pack_midx_entry *list = ctx->entries;
+	struct pack_midx_entry *end = ctx->entries + ctx->entries_nr;
+	uint32_t nr_large_offset = ctx->num_large_offsets;
+
+	while (nr_large_offset) {
+		struct pack_midx_entry *obj;
+		uint64_t offset;
+
+		if (list >= end)
+			BUG("too many large-offset objects");
+
+		obj = list++;
+		offset = obj->offset;
+
+		if (!(offset >> 31))
+			continue;
+
+		hashwrite_be64(f, offset);
+
+		nr_large_offset--;
+	}
+
+	return 0;
+}
+
+static int write_midx_revindex(struct hashfile *f,
+			       void *data)
+{
+	struct write_midx_context *ctx = data;
+	uint32_t i;
+
+	for (i = 0; i < ctx->entries_nr; i++)
+		hashwrite_be32(f, ctx->pack_order[i]);
+
+	return 0;
+}
+
+struct midx_pack_order_data {
+	uint32_t nr;
+	uint32_t pack;
+	off_t offset;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
+{
+	const struct midx_pack_order_data *a = va, *b = vb;
+	if (a->pack < b->pack)
+		return -1;
+	else if (a->pack > b->pack)
+		return 1;
+	else if (a->offset < b->offset)
+		return -1;
+	else if (a->offset > b->offset)
+		return 1;
+	else
+		return 0;
+}
+
+static uint32_t *midx_pack_order(struct write_midx_context *ctx)
+{
+	struct midx_pack_order_data *data;
+	uint32_t *pack_order;
+	uint32_t i;
+
+	trace2_region_enter("midx", "midx_pack_order", the_repository);
+
+	ALLOC_ARRAY(data, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *e = &ctx->entries[i];
+		data[i].nr = i;
+		data[i].pack = ctx->pack_perm[e->pack_int_id];
+		if (!e->preferred)
+			data[i].pack |= (1U << 31);
+		data[i].offset = e->offset;
+	}
+
+	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
+
+	ALLOC_ARRAY(pack_order, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
+		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
+		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
+			pack->bitmap_pos = i;
+		pack->bitmap_nr++;
+		pack_order[i] = data[i].nr;
+	}
+	for (i = 0; i < ctx->nr; i++) {
+		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
+		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
+			pack->bitmap_pos = 0;
+	}
+	free(data);
+
+	trace2_region_leave("midx", "midx_pack_order", the_repository);
+
+	return pack_order;
+}
+
+static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
+				     struct write_midx_context *ctx)
+{
+	struct strbuf buf = STRBUF_INIT;
+	const char *tmp_file;
+
+	trace2_region_enter("midx", "write_midx_reverse_index", the_repository);
+
+	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+
+	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
+					midx_hash, WRITE_REV);
+
+	if (finalize_object_file(tmp_file, buf.buf))
+		die(_("cannot store reverse index file"));
+
+	strbuf_release(&buf);
+
+	trace2_region_leave("midx", "write_midx_reverse_index", the_repository);
+}
+
+static void prepare_midx_packing_data(struct packing_data *pdata,
+				      struct write_midx_context *ctx)
+{
+	uint32_t i;
+
+	trace2_region_enter("midx", "prepare_midx_packing_data", the_repository);
+
+	memset(pdata, 0, sizeof(struct packing_data));
+	prepare_packing_data(the_repository, pdata);
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		struct object_entry *to = packlist_alloc(pdata, &from->oid);
+
+		oe_set_in_pack(pdata, to,
+			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+	}
+
+	trace2_region_leave("midx", "prepare_midx_packing_data", the_repository);
+}
+
+static int add_ref_to_pending(const char *refname,
+			      const struct object_id *oid,
+			      int flag, void *cb_data)
+{
+	struct rev_info *revs = (struct rev_info*)cb_data;
+	struct object_id peeled;
+	struct object *object;
+
+	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
+		warning("symbolic ref is dangling: %s", refname);
+		return 0;
+	}
+
+	if (!peel_iterated_oid(oid, &peeled))
+		oid = &peeled;
+
+	object = parse_object_or_die(oid, refname);
+	if (object->type != OBJ_COMMIT)
+		return 0;
+
+	add_pending_object(revs, object, "");
+	if (bitmap_is_preferred_refname(revs->repo, refname))
+		object->flags |= NEEDS_BITMAP;
+	return 0;
+}
+
+struct bitmap_commit_cb {
+	struct commit **commits;
+	size_t commits_nr, commits_alloc;
+
+	struct write_midx_context *ctx;
+};
+
+static const struct object_id *bitmap_oid_access(size_t index,
+						 const void *_entries)
+{
+	const struct pack_midx_entry *entries = _entries;
+	return &entries[index].oid;
+}
+
+static void bitmap_show_commit(struct commit *commit, void *_data)
+{
+	struct bitmap_commit_cb *data = _data;
+	int pos = oid_pos(&commit->object.oid, data->ctx->entries,
+			  data->ctx->entries_nr,
+			  bitmap_oid_access);
+	if (pos < 0)
+		return;
+
+	ALLOC_GROW(data->commits, data->commits_nr + 1, data->commits_alloc);
+	data->commits[data->commits_nr++] = commit;
+}
+
+static int read_refs_snapshot(const char *refs_snapshot,
+			      struct rev_info *revs)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct object_id oid;
+	FILE *f = xfopen(refs_snapshot, "r");
+
+	while (strbuf_getline(&buf, f) != EOF) {
+		struct object *object;
+		int preferred = 0;
+		char *hex = buf.buf;
+		const char *end = NULL;
+
+		if (buf.len && *buf.buf == '+') {
+			preferred = 1;
+			hex = &buf.buf[1];
+		}
+
+		if (parse_oid_hex(hex, &oid, &end) < 0)
+			die(_("could not parse line: %s"), buf.buf);
+		if (*end)
+			die(_("malformed line: %s"), buf.buf);
+
+		object = parse_object_or_die(&oid, NULL);
+		if (preferred)
+			object->flags |= NEEDS_BITMAP;
+
+		add_pending_object(revs, object, "");
+	}
+
+	fclose(f);
+	strbuf_release(&buf);
+	return 0;
+}
+static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
+						    const char *refs_snapshot,
+						    struct write_midx_context *ctx)
+{
+	struct rev_info revs;
+	struct bitmap_commit_cb cb = {0};
+
+	trace2_region_enter("midx", "find_commits_for_midx_bitmap",
+			    the_repository);
+
+	cb.ctx = ctx;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	if (refs_snapshot) {
+		read_refs_snapshot(refs_snapshot, &revs);
+	} else {
+		setup_revisions(0, NULL, &revs, NULL);
+		for_each_ref(add_ref_to_pending, &revs);
+	}
+
+	/*
+	 * Skipping promisor objects here is intentional, since it only excludes
+	 * them from the list of reachable commits that we want to select from
+	 * when computing the selection of MIDX'd commits to receive bitmaps.
+	 *
+	 * Reachability bitmaps do require that their objects be closed under
+	 * reachability, but fetching any objects missing from promisors at this
+	 * point is too late. But, if one of those objects can be reached from
+	 * an another object that is included in the bitmap, then we will
+	 * complain later that we don't have reachability closure (and fail
+	 * appropriately).
+	 */
+	fetch_if_missing = 0;
+	revs.exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
+	if (indexed_commits_nr_p)
+		*indexed_commits_nr_p = cb.commits_nr;
+
+	release_revisions(&revs);
+
+	trace2_region_leave("midx", "find_commits_for_midx_bitmap",
+			    the_repository);
+
+	return cb.commits;
+}
+
+static int write_midx_bitmap(const char *midx_name,
+			     const unsigned char *midx_hash,
+			     struct packing_data *pdata,
+			     struct commit **commits,
+			     uint32_t commits_nr,
+			     uint32_t *pack_order,
+			     unsigned flags)
+{
+	int ret, i;
+	uint16_t options = 0;
+	struct pack_idx_entry **index;
+	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
+					hash_to_hex(midx_hash));
+
+	trace2_region_enter("midx", "write_midx_bitmap", the_repository);
+
+	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
+		options |= BITMAP_OPT_HASH_CACHE;
+
+	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
+		options |= BITMAP_OPT_LOOKUP_TABLE;
+
+	/*
+	 * Build the MIDX-order index based on pdata.objects (which is already
+	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+	 * this order).
+	 */
+	ALLOC_ARRAY(index, pdata->nr_objects);
+	for (i = 0; i < pdata->nr_objects; i++)
+		index[i] = &pdata->objects[i].idx;
+
+	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+	bitmap_writer_build_type_index(pdata, index, pdata->nr_objects);
+
+	/*
+	 * bitmap_writer_finish expects objects in lex order, but pack_order
+	 * gives us exactly that. use it directly instead of re-sorting the
+	 * array.
+	 *
+	 * This changes the order of objects in 'index' between
+	 * bitmap_writer_build_type_index and bitmap_writer_finish.
+	 *
+	 * The same re-ordering takes place in the single-pack bitmap code via
+	 * write_idx_file(), which is called by finish_tmp_packfile(), which
+	 * happens between bitmap_writer_build_type_index() and
+	 * bitmap_writer_finish().
+	 */
+	for (i = 0; i < pdata->nr_objects; i++)
+		index[pack_order[i]] = &pdata->objects[i].idx;
+
+	bitmap_writer_select_commits(commits, commits_nr, -1);
+	ret = bitmap_writer_build(pdata);
+	if (ret < 0)
+		goto cleanup;
+
+	bitmap_writer_set_checksum(midx_hash);
+	bitmap_writer_finish(index, pdata->nr_objects, bitmap_name, options);
+
+cleanup:
+	free(index);
+	free(bitmap_name);
+
+	trace2_region_leave("midx", "write_midx_bitmap", the_repository);
+
+	return ret;
+}
+
+static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
+							const char *object_dir)
+{
+	struct multi_pack_index *result = NULL;
+	struct multi_pack_index *cur;
+	char *obj_dir_real = real_pathdup(object_dir, 1);
+	struct strbuf cur_path_real = STRBUF_INIT;
+
+	/* Ensure the given object_dir is local, or a known alternate. */
+	find_odb(r, obj_dir_real);
+
+	for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
+		strbuf_realpath(&cur_path_real, cur->object_dir, 1);
+		if (!strcmp(obj_dir_real, cur_path_real.buf)) {
+			result = cur;
+			goto cleanup;
+		}
+	}
+
+cleanup:
+	free(obj_dir_real);
+	strbuf_release(&cur_path_real);
+	return result;
+}
+
+static int write_midx_internal(const char *object_dir,
+			       struct string_list *packs_to_include,
+			       struct string_list *packs_to_drop,
+			       const char *preferred_pack_name,
+			       const char *refs_snapshot,
+			       unsigned flags)
+{
+	struct strbuf midx_name = STRBUF_INIT;
+	unsigned char midx_hash[GIT_MAX_RAWSZ];
+	uint32_t i;
+	struct hashfile *f = NULL;
+	struct lock_file lk;
+	struct write_midx_context ctx = { 0 };
+	int bitmapped_packs_concat_len = 0;
+	int pack_name_concat_len = 0;
+	int dropped_packs = 0;
+	int result = 0;
+	struct chunkfile *cf;
+
+	trace2_region_enter("midx", "write_midx_internal", the_repository);
+
+	get_midx_filename(&midx_name, object_dir);
+	if (safe_create_leading_directories(midx_name.buf))
+		die_errno(_("unable to create leading directories of %s"),
+			  midx_name.buf);
+
+	if (!packs_to_include) {
+		/*
+		 * Only reference an existing MIDX when not filtering which
+		 * packs to include, since all packs and objects are copied
+		 * blindly from an existing MIDX if one is present.
+		 */
+		ctx.m = lookup_multi_pack_index(the_repository, object_dir);
+	}
+
+	if (ctx.m && !midx_checksum_valid(ctx.m)) {
+		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
+		ctx.m = NULL;
+	}
+
+	ctx.nr = 0;
+	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
+	ctx.info = NULL;
+	ALLOC_ARRAY(ctx.info, ctx.alloc);
+
+	if (ctx.m) {
+		for (i = 0; i < ctx.m->num_packs; i++) {
+			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
+
+			if (flags & MIDX_WRITE_REV_INDEX) {
+				/*
+				 * If generating a reverse index, need to have
+				 * packed_git's loaded to compare their
+				 * mtimes and object count.
+				 */
+				if (prepare_midx_pack(the_repository, ctx.m, i)) {
+					error(_("could not load pack"));
+					result = 1;
+					goto cleanup;
+				}
+
+				if (open_pack_index(ctx.m->packs[i]))
+					die(_("could not open index for %s"),
+					    ctx.m->packs[i]->pack_name);
+			}
+
+			fill_pack_info(&ctx.info[ctx.nr++], ctx.m->packs[i],
+				       ctx.m->pack_names[i], i);
+		}
+	}
+
+	ctx.pack_paths_checked = 0;
+	if (flags & MIDX_PROGRESS)
+		ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
+	else
+		ctx.progress = NULL;
+
+	ctx.to_include = packs_to_include;
+
+	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
+	stop_progress(&ctx.progress);
+
+	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
+	    !(packs_to_include || packs_to_drop)) {
+		struct bitmap_index *bitmap_git;
+		int bitmap_exists;
+		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+
+		bitmap_git = prepare_midx_bitmap_git(ctx.m);
+		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+		free_bitmap_index(bitmap_git);
+
+		if (bitmap_exists || !want_bitmap) {
+			/*
+			 * The correct MIDX already exists, and so does a
+			 * corresponding bitmap (or one wasn't requested).
+			 */
+			if (!want_bitmap)
+				clear_midx_files_ext(object_dir, ".bitmap",
+						     NULL);
+			goto cleanup;
+		}
+	}
+
+	if (preferred_pack_name) {
+		ctx.preferred_pack_idx = -1;
+
+		for (i = 0; i < ctx.nr; i++) {
+			if (!cmp_idx_or_pack_name(preferred_pack_name,
+						  ctx.info[i].pack_name)) {
+				ctx.preferred_pack_idx = i;
+				break;
+			}
+		}
+
+		if (ctx.preferred_pack_idx == -1)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+	} else if (ctx.nr &&
+		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
+		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
+		ctx.preferred_pack_idx = 0;
+
+		if (packs_to_drop && packs_to_drop->nr)
+			BUG("cannot write a MIDX bitmap during expiration");
+
+		/*
+		 * set a preferred pack when writing a bitmap to ensure that
+		 * the pack from which the first object is selected in pseudo
+		 * pack-order has all of its objects selected from that pack
+		 * (and not another pack containing a duplicate)
+		 */
+		for (i = 1; i < ctx.nr; i++) {
+			struct packed_git *p = ctx.info[i].p;
+
+			if (!oldest->num_objects || p->mtime < oldest->mtime) {
+				oldest = p;
+				ctx.preferred_pack_idx = i;
+			}
+		}
+
+		if (!oldest->num_objects) {
+			/*
+			 * If all packs are empty; unset the preferred index.
+			 * This is acceptable since there will be no duplicate
+			 * objects to resolve, so the preferred value doesn't
+			 * matter.
+			 */
+			ctx.preferred_pack_idx = -1;
+		}
+	} else {
+		/*
+		 * otherwise don't mark any pack as preferred to avoid
+		 * interfering with expiration logic below
+		 */
+		ctx.preferred_pack_idx = -1;
+	}
+
+	if (ctx.preferred_pack_idx > -1) {
+		struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
+		if (!preferred->num_objects) {
+			error(_("cannot select preferred pack %s with no objects"),
+			      preferred->pack_name);
+			result = 1;
+			goto cleanup;
+		}
+	}
+
+	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
+					 ctx.preferred_pack_idx);
+
+	ctx.large_offsets_needed = 0;
+	for (i = 0; i < ctx.entries_nr; i++) {
+		if (ctx.entries[i].offset > 0x7fffffff)
+			ctx.num_large_offsets++;
+		if (ctx.entries[i].offset > 0xffffffff)
+			ctx.large_offsets_needed = 1;
+	}
+
+	QSORT(ctx.info, ctx.nr, pack_info_compare);
+
+	if (packs_to_drop && packs_to_drop->nr) {
+		int drop_index = 0;
+		int missing_drops = 0;
+
+		for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
+			int cmp = strcmp(ctx.info[i].pack_name,
+					 packs_to_drop->items[drop_index].string);
+
+			if (!cmp) {
+				drop_index++;
+				ctx.info[i].expired = 1;
+			} else if (cmp > 0) {
+				error(_("did not see pack-file %s to drop"),
+				      packs_to_drop->items[drop_index].string);
+				drop_index++;
+				missing_drops++;
+				i--;
+			} else {
+				ctx.info[i].expired = 0;
+			}
+		}
+
+		if (missing_drops) {
+			result = 1;
+			goto cleanup;
+		}
+	}
+
+	/*
+	 * pack_perm stores a permutation between pack-int-ids from the
+	 * previous multi-pack-index to the new one we are writing:
+	 *
+	 * pack_perm[old_id] = new_id
+	 */
+	ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
+	for (i = 0; i < ctx.nr; i++) {
+		if (ctx.info[i].expired) {
+			dropped_packs++;
+			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
+		} else {
+			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
+		}
+	}
+
+	for (i = 0; i < ctx.nr; i++) {
+		if (ctx.info[i].expired)
+			continue;
+		pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
+		bitmapped_packs_concat_len += 2 * sizeof(uint32_t);
+	}
+
+	/* Check that the preferred pack wasn't expired (if given). */
+	if (preferred_pack_name) {
+		struct pack_info *preferred = bsearch(preferred_pack_name,
+						      ctx.info, ctx.nr,
+						      sizeof(*ctx.info),
+						      idx_or_pack_name_cmp);
+		if (preferred) {
+			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
+			if (perm == PACK_EXPIRED)
+				warning(_("preferred pack '%s' is expired"),
+					preferred_pack_name);
+		}
+	}
+
+	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
+		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
+					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
+
+	hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
+	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
+
+	if (ctx.nr - dropped_packs == 0) {
+		error(_("no pack files to index."));
+		result = 1;
+		goto cleanup;
+	}
+
+	if (!ctx.entries_nr) {
+		if (flags & MIDX_WRITE_BITMAP)
+			warning(_("refusing to write multi-pack .bitmap without any objects"));
+		flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
+	}
+
+	cf = init_chunkfile(f);
+
+	add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len,
+		  write_midx_pack_names);
+	add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE,
+		  write_midx_oid_fanout);
+	add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP,
+		  st_mult(ctx.entries_nr, the_hash_algo->rawsz),
+		  write_midx_oid_lookup);
+	add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS,
+		  st_mult(ctx.entries_nr, MIDX_CHUNK_OFFSET_WIDTH),
+		  write_midx_object_offsets);
+
+	if (ctx.large_offsets_needed)
+		add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS,
+			st_mult(ctx.num_large_offsets,
+				MIDX_CHUNK_LARGE_OFFSET_WIDTH),
+			write_midx_large_offsets);
+
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
+		ctx.pack_order = midx_pack_order(&ctx);
+		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
+			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
+			  write_midx_revindex);
+		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
+			  bitmapped_packs_concat_len,
+			  write_midx_bitmapped_packs);
+	}
+
+	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
+	write_chunkfile(cf, &ctx);
+
+	finalize_hashfile(f, midx_hash, FSYNC_COMPONENT_PACK_METADATA,
+			  CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+	free_chunkfile(cf);
+
+	if (flags & MIDX_WRITE_REV_INDEX &&
+	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
+		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+
+	if (flags & MIDX_WRITE_BITMAP) {
+		struct packing_data pdata;
+		struct commit **commits;
+		uint32_t commits_nr;
+
+		if (!ctx.entries_nr)
+			BUG("cannot write a bitmap without any objects");
+
+		prepare_midx_packing_data(&pdata, &ctx);
+
+		commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
+
+		/*
+		 * The previous steps translated the information from
+		 * 'entries' into information suitable for constructing
+		 * bitmaps. We no longer need that array, so clear it to
+		 * reduce memory pressure.
+		 */
+		FREE_AND_NULL(ctx.entries);
+		ctx.entries_nr = 0;
+
+		if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
+				      commits, commits_nr, ctx.pack_order,
+				      flags) < 0) {
+			error(_("could not write multi-pack bitmap"));
+			result = 1;
+			clear_packing_data(&pdata);
+			free(commits);
+			goto cleanup;
+		}
+
+		clear_packing_data(&pdata);
+		free(commits);
+	}
+	/*
+	 * NOTE: Do not use ctx.entries beyond this point, since it might
+	 * have been freed in the previous if block.
+	 */
+
+	if (ctx.m)
+		close_object_store(the_repository->objects);
+
+	if (commit_lock_file(&lk) < 0)
+		die_errno(_("could not write multi-pack-index"));
+
+	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
+	clear_midx_files_ext(object_dir, ".rev", midx_hash);
+
+cleanup:
+	for (i = 0; i < ctx.nr; i++) {
+		if (ctx.info[i].p) {
+			close_pack(ctx.info[i].p);
+			free(ctx.info[i].p);
+		}
+		free(ctx.info[i].pack_name);
+	}
+
+	free(ctx.info);
+	free(ctx.entries);
+	free(ctx.pack_perm);
+	free(ctx.pack_order);
+	strbuf_release(&midx_name);
+
+	trace2_region_leave("midx", "write_midx_internal", the_repository);
+
+	return result;
+}
+
+int write_midx_file(const char *object_dir,
+		    const char *preferred_pack_name,
+		    const char *refs_snapshot,
+		    unsigned flags)
+{
+	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
+				   refs_snapshot, flags);
+}
+
+int write_midx_file_only(const char *object_dir,
+			 struct string_list *packs_to_include,
+			 const char *preferred_pack_name,
+			 const char *refs_snapshot,
+			 unsigned flags)
+{
+	return write_midx_internal(object_dir, packs_to_include, NULL,
+				   preferred_pack_name, refs_snapshot, flags);
+}
+
+int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
+{
+	uint32_t i, *count, result = 0;
+	struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
+	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
+	struct progress *progress = NULL;
+
+	if (!m)
+		return 0;
+
+	CALLOC_ARRAY(count, m->num_packs);
+
+	if (flags & MIDX_PROGRESS)
+		progress = start_delayed_progress(_("Counting referenced objects"),
+					  m->num_objects);
+	for (i = 0; i < m->num_objects; i++) {
+		int pack_int_id = nth_midxed_pack_int_id(m, i);
+		count[pack_int_id]++;
+		display_progress(progress, i + 1);
+	}
+	stop_progress(&progress);
+
+	if (flags & MIDX_PROGRESS)
+		progress = start_delayed_progress(_("Finding and deleting unreferenced packfiles"),
+					  m->num_packs);
+	for (i = 0; i < m->num_packs; i++) {
+		char *pack_name;
+		display_progress(progress, i + 1);
+
+		if (count[i])
+			continue;
+
+		if (prepare_midx_pack(r, m, i))
+			continue;
+
+		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
+			continue;
+
+		pack_name = xstrdup(m->packs[i]->pack_name);
+		close_pack(m->packs[i]);
+
+		string_list_insert(&packs_to_drop, m->pack_names[i]);
+		unlink_pack_path(pack_name, 0);
+		free(pack_name);
+	}
+	stop_progress(&progress);
+
+	free(count);
+
+	if (packs_to_drop.nr)
+		result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, NULL, flags);
+
+	string_list_clear(&packs_to_drop, 0);
+
+	return result;
+}
+
+struct repack_info {
+	timestamp_t mtime;
+	uint32_t referenced_objects;
+	uint32_t pack_int_id;
+};
+
+static int compare_by_mtime(const void *a_, const void *b_)
+{
+	const struct repack_info *a, *b;
+
+	a = (const struct repack_info *)a_;
+	b = (const struct repack_info *)b_;
+
+	if (a->mtime < b->mtime)
+		return -1;
+	if (a->mtime > b->mtime)
+		return 1;
+	return 0;
+}
+
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
+				   unsigned char *include_pack)
+{
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
+		if (m->packs[i]->is_cruft)
+			continue;
+
+		include_pack[i] = 1;
+		count++;
+	}
+
+	return count < 2;
+}
+
+static int fill_included_packs_batch(struct repository *r,
+				     struct multi_pack_index *m,
+				     unsigned char *include_pack,
+				     size_t batch_size)
+{
+	uint32_t i, packs_to_repack;
+	size_t total_size;
+	struct repack_info *pack_info;
+	int pack_kept_objects = 0;
+
+	CALLOC_ARRAY(pack_info, m->num_packs);
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		pack_info[i].pack_int_id = i;
+
+		if (prepare_midx_pack(r, m, i))
+			continue;
+
+		pack_info[i].mtime = m->packs[i]->mtime;
+	}
+
+	for (i = 0; i < m->num_objects; i++) {
+		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
+		pack_info[pack_int_id].referenced_objects++;
+	}
+
+	QSORT(pack_info, m->num_packs, compare_by_mtime);
+
+	total_size = 0;
+	packs_to_repack = 0;
+	for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
+		int pack_int_id = pack_info[i].pack_int_id;
+		struct packed_git *p = m->packs[pack_int_id];
+		size_t expected_size;
+
+		if (!p)
+			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
+		if (p->is_cruft)
+			continue;
+		if (open_pack_index(p) || !p->num_objects)
+			continue;
+
+		expected_size = st_mult(p->pack_size,
+					pack_info[i].referenced_objects);
+		expected_size /= p->num_objects;
+
+		if (expected_size >= batch_size)
+			continue;
+
+		packs_to_repack++;
+		total_size += expected_size;
+		include_pack[pack_int_id] = 1;
+	}
+
+	free(pack_info);
+
+	if (packs_to_repack < 2)
+		return 1;
+
+	return 0;
+}
+
+int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
+{
+	int result = 0;
+	uint32_t i;
+	unsigned char *include_pack;
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	FILE *cmd_in;
+	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
+
+	/*
+	 * When updating the default for these configuration
+	 * variables in builtin/repack.c, these must be adjusted
+	 * to match.
+	 */
+	int delta_base_offset = 1;
+	int use_delta_islands = 0;
+
+	if (!m)
+		return 0;
+
+	CALLOC_ARRAY(include_pack, m->num_packs);
+
+	if (batch_size) {
+		if (fill_included_packs_batch(r, m, include_pack, batch_size))
+			goto cleanup;
+	} else if (fill_included_packs_all(r, m, include_pack))
+		goto cleanup;
+
+	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
+	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
+
+	strvec_push(&cmd.args, "pack-objects");
+
+	strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
+
+	if (delta_base_offset)
+		strvec_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		strvec_push(&cmd.args, "--delta-islands");
+
+	if (flags & MIDX_PROGRESS)
+		strvec_push(&cmd.args, "--progress");
+	else
+		strvec_push(&cmd.args, "-q");
+
+	cmd.git_cmd = 1;
+	cmd.in = cmd.out = -1;
+
+	if (start_command(&cmd)) {
+		error(_("could not start pack-objects"));
+		result = 1;
+		goto cleanup;
+	}
+
+	cmd_in = xfdopen(cmd.in, "w");
+
+	for (i = 0; i < m->num_objects; i++) {
+		struct object_id oid;
+		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
+
+		if (!include_pack[pack_int_id])
+			continue;
+
+		nth_midxed_object_oid(&oid, m, i);
+		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
+	}
+	fclose(cmd_in);
+
+	if (finish_command(&cmd)) {
+		error(_("could not finish pack-objects"));
+		result = 1;
+		goto cleanup;
+	}
+
+	result = write_midx_internal(object_dir, NULL, NULL, NULL, NULL, flags);
+
+cleanup:
+	free(include_pack);
+	return result;
+}
diff --git a/midx.c b/midx.c
index 41521e019c..ae3b49166c 100644
--- a/midx.c
+++ b/midx.c
@@ -1,52 +1,22 @@
 #include "git-compat-util.h"
-#include "abspath.h"
 #include "config.h"
-#include "csum-file.h"
 #include "dir.h"
-#include "gettext.h"
 #include "hex.h"
-#include "lockfile.h"
 #include "packfile.h"
 #include "object-file.h"
-#include "object-store-ll.h"
 #include "hash-lookup.h"
 #include "midx.h"
 #include "progress.h"
 #include "trace2.h"
-#include "run-command.h"
-#include "repository.h"
 #include "chunk-format.h"
-#include "pack.h"
 #include "pack-bitmap.h"
-#include "refs.h"
-#include "revision.h"
-#include "list-objects.h"
 #include "pack-revindex.h"
 
-#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
-#define MIDX_VERSION 1
-#define MIDX_BYTE_FILE_VERSION 4
-#define MIDX_BYTE_HASH_VERSION 5
-#define MIDX_BYTE_NUM_CHUNKS 6
-#define MIDX_BYTE_NUM_PACKS 8
-#define MIDX_HEADER_SIZE 12
-#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
-
-#define MIDX_CHUNK_ALIGNMENT 4
-#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
-#define MIDX_CHUNKID_BITMAPPEDPACKS 0x42544d50 /* "BTMP" */
-#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
-#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
-#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
-#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
-#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
-#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
-#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
-#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
-#define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
-#define MIDX_LARGE_OFFSET_NEEDED 0x80000000
-
-#define PACK_EXPIRED UINT_MAX
+int midx_checksum_valid(struct multi_pack_index *m);
+void clear_midx_files_ext(const char *object_dir, const char *ext,
+			  unsigned char *keep_hash);
+int cmp_idx_or_pack_name(const char *idx_or_pack_name,
+			 const char *idx_name);
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m)
 {
@@ -115,6 +85,8 @@ static int midx_read_object_offsets(const unsigned char *chunk_start,
 	return 0;
 }
 
+#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
+
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
 {
 	struct multi_pack_index *m = NULL;
@@ -294,6 +266,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t
 	return 0;
 }
 
+#define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
+
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id)
 {
@@ -400,8 +374,8 @@ int fill_midx_entry(struct repository *r,
 }
 
 /* Match "foo.idx" against either "foo.pack" _or_ "foo.idx". */
-static int cmp_idx_or_pack_name(const char *idx_or_pack_name,
-				const char *idx_name)
+int cmp_idx_or_pack_name(const char *idx_or_pack_name,
+			 const char *idx_name)
 {
 	/* Skip past any initial matching prefix. */
 	while (*idx_name && *idx_name == *idx_or_pack_name) {
@@ -508,1262 +482,11 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 	return 0;
 }
 
-static size_t write_midx_header(struct hashfile *f,
-				unsigned char num_chunks,
-				uint32_t num_packs)
-{
-	hashwrite_be32(f, MIDX_SIGNATURE);
-	hashwrite_u8(f, MIDX_VERSION);
-	hashwrite_u8(f, oid_version(the_hash_algo));
-	hashwrite_u8(f, num_chunks);
-	hashwrite_u8(f, 0); /* unused */
-	hashwrite_be32(f, num_packs);
-
-	return MIDX_HEADER_SIZE;
-}
-
-#define BITMAP_POS_UNKNOWN (~((uint32_t)0))
-
-struct pack_info {
-	uint32_t orig_pack_int_id;
-	char *pack_name;
-	struct packed_git *p;
-
-	uint32_t bitmap_pos;
-	uint32_t bitmap_nr;
-
-	unsigned expired : 1;
-};
-
-static void fill_pack_info(struct pack_info *info,
-			   struct packed_git *p, const char *pack_name,
-			   uint32_t orig_pack_int_id)
-{
-	memset(info, 0, sizeof(struct pack_info));
-
-	info->orig_pack_int_id = orig_pack_int_id;
-	info->pack_name = xstrdup(pack_name);
-	info->p = p;
-	info->bitmap_pos = BITMAP_POS_UNKNOWN;
-}
-
-static int pack_info_compare(const void *_a, const void *_b)
-{
-	struct pack_info *a = (struct pack_info *)_a;
-	struct pack_info *b = (struct pack_info *)_b;
-	return strcmp(a->pack_name, b->pack_name);
-}
-
-static int idx_or_pack_name_cmp(const void *_va, const void *_vb)
-{
-	const char *pack_name = _va;
-	const struct pack_info *compar = _vb;
-
-	return cmp_idx_or_pack_name(pack_name, compar->pack_name);
-}
-
-struct write_midx_context {
-	struct pack_info *info;
-	size_t nr;
-	size_t alloc;
-	struct multi_pack_index *m;
-	struct progress *progress;
-	unsigned pack_paths_checked;
-
-	struct pack_midx_entry *entries;
-	size_t entries_nr;
-
-	uint32_t *pack_perm;
-	uint32_t *pack_order;
-	unsigned large_offsets_needed:1;
-	uint32_t num_large_offsets;
-
-	int preferred_pack_idx;
-
-	struct string_list *to_include;
-};
-
-static void add_pack_to_midx(const char *full_path, size_t full_path_len,
-			     const char *file_name, void *data)
-{
-	struct write_midx_context *ctx = data;
-	struct packed_git *p;
-
-	if (ends_with(file_name, ".idx")) {
-		display_progress(ctx->progress, ++ctx->pack_paths_checked);
-		/*
-		 * Note that at most one of ctx->m and ctx->to_include are set,
-		 * so we are testing midx_contains_pack() and
-		 * string_list_has_string() independently (guarded by the
-		 * appropriate NULL checks).
-		 *
-		 * We could support passing to_include while reusing an existing
-		 * MIDX, but don't currently since the reuse process drags
-		 * forward all packs from an existing MIDX (without checking
-		 * whether or not they appear in the to_include list).
-		 *
-		 * If we added support for that, these next two conditional
-		 * should be performed independently (likely checking
-		 * to_include before the existing MIDX).
-		 */
-		if (ctx->m && midx_contains_pack(ctx->m, file_name))
-			return;
-		else if (ctx->to_include &&
-			 !string_list_has_string(ctx->to_include, file_name))
-			return;
-
-		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
-		p = add_packed_git(full_path, full_path_len, 0);
-		if (!p) {
-			warning(_("failed to add packfile '%s'"),
-				full_path);
-			return;
-		}
-
-		if (open_pack_index(p)) {
-			warning(_("failed to open pack-index '%s'"),
-				full_path);
-			close_pack(p);
-			free(p);
-			return;
-		}
-
-		fill_pack_info(&ctx->info[ctx->nr], p, file_name, ctx->nr);
-		ctx->nr++;
-	}
-}
-
-struct pack_midx_entry {
-	struct object_id oid;
-	uint32_t pack_int_id;
-	time_t pack_mtime;
-	uint64_t offset;
-	unsigned preferred : 1;
-};
-
-static int midx_oid_compare(const void *_a, const void *_b)
-{
-	const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
-	const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
-	int cmp = oidcmp(&a->oid, &b->oid);
-
-	if (cmp)
-		return cmp;
-
-	/* Sort objects in a preferred pack first when multiple copies exist. */
-	if (a->preferred > b->preferred)
-		return -1;
-	if (a->preferred < b->preferred)
-		return 1;
-
-	if (a->pack_mtime > b->pack_mtime)
-		return -1;
-	else if (a->pack_mtime < b->pack_mtime)
-		return 1;
-
-	return a->pack_int_id - b->pack_int_id;
-}
-
-static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
-				      struct pack_midx_entry *e,
-				      uint32_t pos)
-{
-	if (pos >= m->num_objects)
-		return 1;
-
-	nth_midxed_object_oid(&e->oid, m, pos);
-	e->pack_int_id = nth_midxed_pack_int_id(m, pos);
-	e->offset = nth_midxed_offset(m, pos);
-
-	/* consider objects in midx to be from "old" packs */
-	e->pack_mtime = 0;
-	return 0;
-}
-
-static void fill_pack_entry(uint32_t pack_int_id,
-			    struct packed_git *p,
-			    uint32_t cur_object,
-			    struct pack_midx_entry *entry,
-			    int preferred)
-{
-	if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
-		die(_("failed to locate object %d in packfile"), cur_object);
-
-	entry->pack_int_id = pack_int_id;
-	entry->pack_mtime = p->mtime;
-
-	entry->offset = nth_packed_object_offset(p, cur_object);
-	entry->preferred = !!preferred;
-}
-
-struct midx_fanout {
-	struct pack_midx_entry *entries;
-	size_t nr, alloc;
-};
-
-static void midx_fanout_grow(struct midx_fanout *fanout, size_t nr)
-{
-	if (nr < fanout->nr)
-		BUG("negative growth in midx_fanout_grow() (%"PRIuMAX" < %"PRIuMAX")",
-		    (uintmax_t)nr, (uintmax_t)fanout->nr);
-	ALLOC_GROW(fanout->entries, nr, fanout->alloc);
-}
-
-static void midx_fanout_sort(struct midx_fanout *fanout)
-{
-	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
-}
-
-static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
-					struct multi_pack_index *m,
-					uint32_t cur_fanout,
-					int preferred_pack)
-{
-	uint32_t start = 0, end;
-	uint32_t cur_object;
-
-	if (cur_fanout)
-		start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
-	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
-
-	for (cur_object = start; cur_object < end; cur_object++) {
-		if ((preferred_pack > -1) &&
-		    (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
-			/*
-			 * Objects from preferred packs are added
-			 * separately.
-			 */
-			continue;
-		}
-
-		midx_fanout_grow(fanout, fanout->nr + 1);
-		nth_midxed_pack_midx_entry(m,
-					   &fanout->entries[fanout->nr],
-					   cur_object);
-		fanout->entries[fanout->nr].preferred = 0;
-		fanout->nr++;
-	}
-}
-
-static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
-					struct pack_info *info,
-					uint32_t cur_pack,
-					int preferred,
-					uint32_t cur_fanout)
-{
-	struct packed_git *pack = info[cur_pack].p;
-	uint32_t start = 0, end;
-	uint32_t cur_object;
-
-	if (cur_fanout)
-		start = get_pack_fanout(pack, cur_fanout - 1);
-	end = get_pack_fanout(pack, cur_fanout);
-
-	for (cur_object = start; cur_object < end; cur_object++) {
-		midx_fanout_grow(fanout, fanout->nr + 1);
-		fill_pack_entry(cur_pack,
-				info[cur_pack].p,
-				cur_object,
-				&fanout->entries[fanout->nr],
-				preferred);
-		fanout->nr++;
-	}
-}
-
-/*
- * It is possible to artificially get into a state where there are many
- * duplicate copies of objects. That can create high memory pressure if
- * we are to create a list of all objects before de-duplication. To reduce
- * this memory pressure without a significant performance drop, automatically
- * group objects by the first byte of their object id. Use the IDX fanout
- * tables to group the data, copy to a local array, then sort.
- *
- * Copy only the de-duplicated entries (selected by most-recent modified time
- * of a packfile containing the object).
- */
-static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
-						  struct pack_info *info,
-						  uint32_t nr_packs,
-						  size_t *nr_objects,
-						  int preferred_pack)
-{
-	uint32_t cur_fanout, cur_pack, cur_object;
-	size_t alloc_objects, total_objects = 0;
-	struct midx_fanout fanout = { 0 };
-	struct pack_midx_entry *deduplicated_entries = NULL;
-	uint32_t start_pack = m ? m->num_packs : 0;
-
-	for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
-		total_objects = st_add(total_objects,
-				       info[cur_pack].p->num_objects);
-
-	/*
-	 * As we de-duplicate by fanout value, we expect the fanout
-	 * slices to be evenly distributed, with some noise. Hence,
-	 * allocate slightly more than one 256th.
-	 */
-	alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
-
-	ALLOC_ARRAY(fanout.entries, fanout.alloc);
-	ALLOC_ARRAY(deduplicated_entries, alloc_objects);
-	*nr_objects = 0;
-
-	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
-		fanout.nr = 0;
-
-		if (m)
-			midx_fanout_add_midx_fanout(&fanout, m, cur_fanout,
-						    preferred_pack);
-
-		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
-			int preferred = cur_pack == preferred_pack;
-			midx_fanout_add_pack_fanout(&fanout,
-						    info, cur_pack,
-						    preferred, cur_fanout);
-		}
-
-		if (-1 < preferred_pack && preferred_pack < start_pack)
-			midx_fanout_add_pack_fanout(&fanout, info,
-						    preferred_pack, 1,
-						    cur_fanout);
-
-		midx_fanout_sort(&fanout);
-
-		/*
-		 * The batch is now sorted by OID and then mtime (descending).
-		 * Take only the first duplicate.
-		 */
-		for (cur_object = 0; cur_object < fanout.nr; cur_object++) {
-			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
-						&fanout.entries[cur_object].oid))
-				continue;
-
-			ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1),
-				   alloc_objects);
-			memcpy(&deduplicated_entries[*nr_objects],
-			       &fanout.entries[cur_object],
-			       sizeof(struct pack_midx_entry));
-			(*nr_objects)++;
-		}
-	}
-
-	free(fanout.entries);
-	return deduplicated_entries;
-}
-
-static int write_midx_pack_names(struct hashfile *f, void *data)
-{
-	struct write_midx_context *ctx = data;
-	uint32_t i;
-	unsigned char padding[MIDX_CHUNK_ALIGNMENT];
-	size_t written = 0;
-
-	for (i = 0; i < ctx->nr; i++) {
-		size_t writelen;
-
-		if (ctx->info[i].expired)
-			continue;
-
-		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
-			BUG("incorrect pack-file order: %s before %s",
-			    ctx->info[i - 1].pack_name,
-			    ctx->info[i].pack_name);
-
-		writelen = strlen(ctx->info[i].pack_name) + 1;
-		hashwrite(f, ctx->info[i].pack_name, writelen);
-		written += writelen;
-	}
-
-	/* add padding to be aligned */
-	i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT);
-	if (i < MIDX_CHUNK_ALIGNMENT) {
-		memset(padding, 0, sizeof(padding));
-		hashwrite(f, padding, i);
-	}
-
-	return 0;
-}
-
-static int write_midx_bitmapped_packs(struct hashfile *f, void *data)
-{
-	struct write_midx_context *ctx = data;
-	size_t i;
-
-	for (i = 0; i < ctx->nr; i++) {
-		struct pack_info *pack = &ctx->info[i];
-		if (pack->expired)
-			continue;
-
-		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN && pack->bitmap_nr)
-			BUG("pack '%s' has no bitmap position, but has %d bitmapped object(s)",
-			    pack->pack_name, pack->bitmap_nr);
-
-		hashwrite_be32(f, pack->bitmap_pos);
-		hashwrite_be32(f, pack->bitmap_nr);
-	}
-	return 0;
-}
-
-static int write_midx_oid_fanout(struct hashfile *f,
-				 void *data)
-{
-	struct write_midx_context *ctx = data;
-	struct pack_midx_entry *list = ctx->entries;
-	struct pack_midx_entry *last = ctx->entries + ctx->entries_nr;
-	uint32_t count = 0;
-	uint32_t i;
-
-	/*
-	* Write the first-level table (the list is sorted,
-	* but we use a 256-entry lookup to be able to avoid
-	* having to do eight extra binary search iterations).
-	*/
-	for (i = 0; i < 256; i++) {
-		struct pack_midx_entry *next = list;
-
-		while (next < last && next->oid.hash[0] == i) {
-			count++;
-			next++;
-		}
-
-		hashwrite_be32(f, count);
-		list = next;
-	}
-
-	return 0;
-}
-
-static int write_midx_oid_lookup(struct hashfile *f,
-				 void *data)
-{
-	struct write_midx_context *ctx = data;
-	unsigned char hash_len = the_hash_algo->rawsz;
-	struct pack_midx_entry *list = ctx->entries;
-	uint32_t i;
-
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *obj = list++;
-
-		if (i < ctx->entries_nr - 1) {
-			struct pack_midx_entry *next = list;
-			if (oidcmp(&obj->oid, &next->oid) >= 0)
-				BUG("OIDs not in order: %s >= %s",
-				    oid_to_hex(&obj->oid),
-				    oid_to_hex(&next->oid));
-		}
-
-		hashwrite(f, obj->oid.hash, (int)hash_len);
-	}
-
-	return 0;
-}
-
-static int write_midx_object_offsets(struct hashfile *f,
-				     void *data)
-{
-	struct write_midx_context *ctx = data;
-	struct pack_midx_entry *list = ctx->entries;
-	uint32_t i, nr_large_offset = 0;
-
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *obj = list++;
-
-		if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
-			BUG("object %s is in an expired pack with int-id %d",
-			    oid_to_hex(&obj->oid),
-			    obj->pack_int_id);
-
-		hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
-
-		if (ctx->large_offsets_needed && obj->offset >> 31)
-			hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
-		else if (!ctx->large_offsets_needed && obj->offset >> 32)
-			BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!",
-			    oid_to_hex(&obj->oid),
-			    obj->offset);
-		else
-			hashwrite_be32(f, (uint32_t)obj->offset);
-	}
-
-	return 0;
-}
-
-static int write_midx_large_offsets(struct hashfile *f,
-				    void *data)
-{
-	struct write_midx_context *ctx = data;
-	struct pack_midx_entry *list = ctx->entries;
-	struct pack_midx_entry *end = ctx->entries + ctx->entries_nr;
-	uint32_t nr_large_offset = ctx->num_large_offsets;
-
-	while (nr_large_offset) {
-		struct pack_midx_entry *obj;
-		uint64_t offset;
-
-		if (list >= end)
-			BUG("too many large-offset objects");
-
-		obj = list++;
-		offset = obj->offset;
-
-		if (!(offset >> 31))
-			continue;
-
-		hashwrite_be64(f, offset);
-
-		nr_large_offset--;
-	}
-
-	return 0;
-}
-
-static int write_midx_revindex(struct hashfile *f,
-			       void *data)
-{
-	struct write_midx_context *ctx = data;
-	uint32_t i;
-
-	for (i = 0; i < ctx->entries_nr; i++)
-		hashwrite_be32(f, ctx->pack_order[i]);
-
-	return 0;
-}
-
-struct midx_pack_order_data {
-	uint32_t nr;
-	uint32_t pack;
-	off_t offset;
-};
-
-static int midx_pack_order_cmp(const void *va, const void *vb)
-{
-	const struct midx_pack_order_data *a = va, *b = vb;
-	if (a->pack < b->pack)
-		return -1;
-	else if (a->pack > b->pack)
-		return 1;
-	else if (a->offset < b->offset)
-		return -1;
-	else if (a->offset > b->offset)
-		return 1;
-	else
-		return 0;
-}
-
-static uint32_t *midx_pack_order(struct write_midx_context *ctx)
-{
-	struct midx_pack_order_data *data;
-	uint32_t *pack_order;
-	uint32_t i;
-
-	trace2_region_enter("midx", "midx_pack_order", the_repository);
-
-	ALLOC_ARRAY(data, ctx->entries_nr);
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *e = &ctx->entries[i];
-		data[i].nr = i;
-		data[i].pack = ctx->pack_perm[e->pack_int_id];
-		if (!e->preferred)
-			data[i].pack |= (1U << 31);
-		data[i].offset = e->offset;
-	}
-
-	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
-
-	ALLOC_ARRAY(pack_order, ctx->entries_nr);
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
-		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
-			pack->bitmap_pos = i;
-		pack->bitmap_nr++;
-		pack_order[i] = data[i].nr;
-	}
-	for (i = 0; i < ctx->nr; i++) {
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
-		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
-			pack->bitmap_pos = 0;
-	}
-	free(data);
-
-	trace2_region_leave("midx", "midx_pack_order", the_repository);
-
-	return pack_order;
-}
-
-static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
-				     struct write_midx_context *ctx)
-{
-	struct strbuf buf = STRBUF_INIT;
-	const char *tmp_file;
-
-	trace2_region_enter("midx", "write_midx_reverse_index", the_repository);
-
-	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
-
-	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
-					midx_hash, WRITE_REV);
-
-	if (finalize_object_file(tmp_file, buf.buf))
-		die(_("cannot store reverse index file"));
-
-	strbuf_release(&buf);
-
-	trace2_region_leave("midx", "write_midx_reverse_index", the_repository);
-}
-
-static void clear_midx_files_ext(const char *object_dir, const char *ext,
-				 unsigned char *keep_hash);
-
-static int midx_checksum_valid(struct multi_pack_index *m)
+int midx_checksum_valid(struct multi_pack_index *m)
 {
 	return hashfile_checksum_valid(m->data, m->data_len);
 }
 
-static void prepare_midx_packing_data(struct packing_data *pdata,
-				      struct write_midx_context *ctx)
-{
-	uint32_t i;
-
-	trace2_region_enter("midx", "prepare_midx_packing_data", the_repository);
-
-	memset(pdata, 0, sizeof(struct packing_data));
-	prepare_packing_data(the_repository, pdata);
-
-	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
-		struct object_entry *to = packlist_alloc(pdata, &from->oid);
-
-		oe_set_in_pack(pdata, to,
-			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
-	}
-
-	trace2_region_leave("midx", "prepare_midx_packing_data", the_repository);
-}
-
-static int add_ref_to_pending(const char *refname,
-			      const struct object_id *oid,
-			      int flag, void *cb_data)
-{
-	struct rev_info *revs = (struct rev_info*)cb_data;
-	struct object_id peeled;
-	struct object *object;
-
-	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
-		warning("symbolic ref is dangling: %s", refname);
-		return 0;
-	}
-
-	if (!peel_iterated_oid(oid, &peeled))
-		oid = &peeled;
-
-	object = parse_object_or_die(oid, refname);
-	if (object->type != OBJ_COMMIT)
-		return 0;
-
-	add_pending_object(revs, object, "");
-	if (bitmap_is_preferred_refname(revs->repo, refname))
-		object->flags |= NEEDS_BITMAP;
-	return 0;
-}
-
-struct bitmap_commit_cb {
-	struct commit **commits;
-	size_t commits_nr, commits_alloc;
-
-	struct write_midx_context *ctx;
-};
-
-static const struct object_id *bitmap_oid_access(size_t index,
-						 const void *_entries)
-{
-	const struct pack_midx_entry *entries = _entries;
-	return &entries[index].oid;
-}
-
-static void bitmap_show_commit(struct commit *commit, void *_data)
-{
-	struct bitmap_commit_cb *data = _data;
-	int pos = oid_pos(&commit->object.oid, data->ctx->entries,
-			  data->ctx->entries_nr,
-			  bitmap_oid_access);
-	if (pos < 0)
-		return;
-
-	ALLOC_GROW(data->commits, data->commits_nr + 1, data->commits_alloc);
-	data->commits[data->commits_nr++] = commit;
-}
-
-static int read_refs_snapshot(const char *refs_snapshot,
-			      struct rev_info *revs)
-{
-	struct strbuf buf = STRBUF_INIT;
-	struct object_id oid;
-	FILE *f = xfopen(refs_snapshot, "r");
-
-	while (strbuf_getline(&buf, f) != EOF) {
-		struct object *object;
-		int preferred = 0;
-		char *hex = buf.buf;
-		const char *end = NULL;
-
-		if (buf.len && *buf.buf == '+') {
-			preferred = 1;
-			hex = &buf.buf[1];
-		}
-
-		if (parse_oid_hex(hex, &oid, &end) < 0)
-			die(_("could not parse line: %s"), buf.buf);
-		if (*end)
-			die(_("malformed line: %s"), buf.buf);
-
-		object = parse_object_or_die(&oid, NULL);
-		if (preferred)
-			object->flags |= NEEDS_BITMAP;
-
-		add_pending_object(revs, object, "");
-	}
-
-	fclose(f);
-	strbuf_release(&buf);
-	return 0;
-}
-
-static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
-						    const char *refs_snapshot,
-						    struct write_midx_context *ctx)
-{
-	struct rev_info revs;
-	struct bitmap_commit_cb cb = {0};
-
-	trace2_region_enter("midx", "find_commits_for_midx_bitmap",
-			    the_repository);
-
-	cb.ctx = ctx;
-
-	repo_init_revisions(the_repository, &revs, NULL);
-	if (refs_snapshot) {
-		read_refs_snapshot(refs_snapshot, &revs);
-	} else {
-		setup_revisions(0, NULL, &revs, NULL);
-		for_each_ref(add_ref_to_pending, &revs);
-	}
-
-	/*
-	 * Skipping promisor objects here is intentional, since it only excludes
-	 * them from the list of reachable commits that we want to select from
-	 * when computing the selection of MIDX'd commits to receive bitmaps.
-	 *
-	 * Reachability bitmaps do require that their objects be closed under
-	 * reachability, but fetching any objects missing from promisors at this
-	 * point is too late. But, if one of those objects can be reached from
-	 * an another object that is included in the bitmap, then we will
-	 * complain later that we don't have reachability closure (and fail
-	 * appropriately).
-	 */
-	fetch_if_missing = 0;
-	revs.exclude_promisor_objects = 1;
-
-	if (prepare_revision_walk(&revs))
-		die(_("revision walk setup failed"));
-
-	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
-	if (indexed_commits_nr_p)
-		*indexed_commits_nr_p = cb.commits_nr;
-
-	release_revisions(&revs);
-
-	trace2_region_leave("midx", "find_commits_for_midx_bitmap",
-			    the_repository);
-
-	return cb.commits;
-}
-
-static int write_midx_bitmap(const char *midx_name,
-			     const unsigned char *midx_hash,
-			     struct packing_data *pdata,
-			     struct commit **commits,
-			     uint32_t commits_nr,
-			     uint32_t *pack_order,
-			     unsigned flags)
-{
-	int ret, i;
-	uint16_t options = 0;
-	struct pack_idx_entry **index;
-	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
-					hash_to_hex(midx_hash));
-
-	trace2_region_enter("midx", "write_midx_bitmap", the_repository);
-
-	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
-		options |= BITMAP_OPT_HASH_CACHE;
-
-	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
-		options |= BITMAP_OPT_LOOKUP_TABLE;
-
-	/*
-	 * Build the MIDX-order index based on pdata.objects (which is already
-	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
-	 * this order).
-	 */
-	ALLOC_ARRAY(index, pdata->nr_objects);
-	for (i = 0; i < pdata->nr_objects; i++)
-		index[i] = &pdata->objects[i].idx;
-
-	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
-	bitmap_writer_build_type_index(pdata, index, pdata->nr_objects);
-
-	/*
-	 * bitmap_writer_finish expects objects in lex order, but pack_order
-	 * gives us exactly that. use it directly instead of re-sorting the
-	 * array.
-	 *
-	 * This changes the order of objects in 'index' between
-	 * bitmap_writer_build_type_index and bitmap_writer_finish.
-	 *
-	 * The same re-ordering takes place in the single-pack bitmap code via
-	 * write_idx_file(), which is called by finish_tmp_packfile(), which
-	 * happens between bitmap_writer_build_type_index() and
-	 * bitmap_writer_finish().
-	 */
-	for (i = 0; i < pdata->nr_objects; i++)
-		index[pack_order[i]] = &pdata->objects[i].idx;
-
-	bitmap_writer_select_commits(commits, commits_nr, -1);
-	ret = bitmap_writer_build(pdata);
-	if (ret < 0)
-		goto cleanup;
-
-	bitmap_writer_set_checksum(midx_hash);
-	bitmap_writer_finish(index, pdata->nr_objects, bitmap_name, options);
-
-cleanup:
-	free(index);
-	free(bitmap_name);
-
-	trace2_region_leave("midx", "write_midx_bitmap", the_repository);
-
-	return ret;
-}
-
-static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
-							const char *object_dir)
-{
-	struct multi_pack_index *result = NULL;
-	struct multi_pack_index *cur;
-	char *obj_dir_real = real_pathdup(object_dir, 1);
-	struct strbuf cur_path_real = STRBUF_INIT;
-
-	/* Ensure the given object_dir is local, or a known alternate. */
-	find_odb(r, obj_dir_real);
-
-	for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
-		strbuf_realpath(&cur_path_real, cur->object_dir, 1);
-		if (!strcmp(obj_dir_real, cur_path_real.buf)) {
-			result = cur;
-			goto cleanup;
-		}
-	}
-
-cleanup:
-	free(obj_dir_real);
-	strbuf_release(&cur_path_real);
-	return result;
-}
-
-static int write_midx_internal(const char *object_dir,
-			       struct string_list *packs_to_include,
-			       struct string_list *packs_to_drop,
-			       const char *preferred_pack_name,
-			       const char *refs_snapshot,
-			       unsigned flags)
-{
-	struct strbuf midx_name = STRBUF_INIT;
-	unsigned char midx_hash[GIT_MAX_RAWSZ];
-	uint32_t i;
-	struct hashfile *f = NULL;
-	struct lock_file lk;
-	struct write_midx_context ctx = { 0 };
-	int bitmapped_packs_concat_len = 0;
-	int pack_name_concat_len = 0;
-	int dropped_packs = 0;
-	int result = 0;
-	struct chunkfile *cf;
-
-	trace2_region_enter("midx", "write_midx_internal", the_repository);
-
-	get_midx_filename(&midx_name, object_dir);
-	if (safe_create_leading_directories(midx_name.buf))
-		die_errno(_("unable to create leading directories of %s"),
-			  midx_name.buf);
-
-	if (!packs_to_include) {
-		/*
-		 * Only reference an existing MIDX when not filtering which
-		 * packs to include, since all packs and objects are copied
-		 * blindly from an existing MIDX if one is present.
-		 */
-		ctx.m = lookup_multi_pack_index(the_repository, object_dir);
-	}
-
-	if (ctx.m && !midx_checksum_valid(ctx.m)) {
-		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
-		ctx.m = NULL;
-	}
-
-	ctx.nr = 0;
-	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
-	ctx.info = NULL;
-	ALLOC_ARRAY(ctx.info, ctx.alloc);
-
-	if (ctx.m) {
-		for (i = 0; i < ctx.m->num_packs; i++) {
-			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
-
-			if (flags & MIDX_WRITE_REV_INDEX) {
-				/*
-				 * If generating a reverse index, need to have
-				 * packed_git's loaded to compare their
-				 * mtimes and object count.
-				 */
-				if (prepare_midx_pack(the_repository, ctx.m, i)) {
-					error(_("could not load pack"));
-					result = 1;
-					goto cleanup;
-				}
-
-				if (open_pack_index(ctx.m->packs[i]))
-					die(_("could not open index for %s"),
-					    ctx.m->packs[i]->pack_name);
-			}
-
-			fill_pack_info(&ctx.info[ctx.nr++], ctx.m->packs[i],
-				       ctx.m->pack_names[i], i);
-		}
-	}
-
-	ctx.pack_paths_checked = 0;
-	if (flags & MIDX_PROGRESS)
-		ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
-	else
-		ctx.progress = NULL;
-
-	ctx.to_include = packs_to_include;
-
-	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
-	stop_progress(&ctx.progress);
-
-	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
-	    !(packs_to_include || packs_to_drop)) {
-		struct bitmap_index *bitmap_git;
-		int bitmap_exists;
-		int want_bitmap = flags & MIDX_WRITE_BITMAP;
-
-		bitmap_git = prepare_midx_bitmap_git(ctx.m);
-		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
-		free_bitmap_index(bitmap_git);
-
-		if (bitmap_exists || !want_bitmap) {
-			/*
-			 * The correct MIDX already exists, and so does a
-			 * corresponding bitmap (or one wasn't requested).
-			 */
-			if (!want_bitmap)
-				clear_midx_files_ext(object_dir, ".bitmap",
-						     NULL);
-			goto cleanup;
-		}
-	}
-
-	if (preferred_pack_name) {
-		ctx.preferred_pack_idx = -1;
-
-		for (i = 0; i < ctx.nr; i++) {
-			if (!cmp_idx_or_pack_name(preferred_pack_name,
-						  ctx.info[i].pack_name)) {
-				ctx.preferred_pack_idx = i;
-				break;
-			}
-		}
-
-		if (ctx.preferred_pack_idx == -1)
-			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
-	} else if (ctx.nr &&
-		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
-		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
-		ctx.preferred_pack_idx = 0;
-
-		if (packs_to_drop && packs_to_drop->nr)
-			BUG("cannot write a MIDX bitmap during expiration");
-
-		/*
-		 * set a preferred pack when writing a bitmap to ensure that
-		 * the pack from which the first object is selected in pseudo
-		 * pack-order has all of its objects selected from that pack
-		 * (and not another pack containing a duplicate)
-		 */
-		for (i = 1; i < ctx.nr; i++) {
-			struct packed_git *p = ctx.info[i].p;
-
-			if (!oldest->num_objects || p->mtime < oldest->mtime) {
-				oldest = p;
-				ctx.preferred_pack_idx = i;
-			}
-		}
-
-		if (!oldest->num_objects) {
-			/*
-			 * If all packs are empty; unset the preferred index.
-			 * This is acceptable since there will be no duplicate
-			 * objects to resolve, so the preferred value doesn't
-			 * matter.
-			 */
-			ctx.preferred_pack_idx = -1;
-		}
-	} else {
-		/*
-		 * otherwise don't mark any pack as preferred to avoid
-		 * interfering with expiration logic below
-		 */
-		ctx.preferred_pack_idx = -1;
-	}
-
-	if (ctx.preferred_pack_idx > -1) {
-		struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
-		if (!preferred->num_objects) {
-			error(_("cannot select preferred pack %s with no objects"),
-			      preferred->pack_name);
-			result = 1;
-			goto cleanup;
-		}
-	}
-
-	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
-					 ctx.preferred_pack_idx);
-
-	ctx.large_offsets_needed = 0;
-	for (i = 0; i < ctx.entries_nr; i++) {
-		if (ctx.entries[i].offset > 0x7fffffff)
-			ctx.num_large_offsets++;
-		if (ctx.entries[i].offset > 0xffffffff)
-			ctx.large_offsets_needed = 1;
-	}
-
-	QSORT(ctx.info, ctx.nr, pack_info_compare);
-
-	if (packs_to_drop && packs_to_drop->nr) {
-		int drop_index = 0;
-		int missing_drops = 0;
-
-		for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
-			int cmp = strcmp(ctx.info[i].pack_name,
-					 packs_to_drop->items[drop_index].string);
-
-			if (!cmp) {
-				drop_index++;
-				ctx.info[i].expired = 1;
-			} else if (cmp > 0) {
-				error(_("did not see pack-file %s to drop"),
-				      packs_to_drop->items[drop_index].string);
-				drop_index++;
-				missing_drops++;
-				i--;
-			} else {
-				ctx.info[i].expired = 0;
-			}
-		}
-
-		if (missing_drops) {
-			result = 1;
-			goto cleanup;
-		}
-	}
-
-	/*
-	 * pack_perm stores a permutation between pack-int-ids from the
-	 * previous multi-pack-index to the new one we are writing:
-	 *
-	 * pack_perm[old_id] = new_id
-	 */
-	ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
-	for (i = 0; i < ctx.nr; i++) {
-		if (ctx.info[i].expired) {
-			dropped_packs++;
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
-		} else {
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
-		}
-	}
-
-	for (i = 0; i < ctx.nr; i++) {
-		if (ctx.info[i].expired)
-			continue;
-		pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
-		bitmapped_packs_concat_len += 2 * sizeof(uint32_t);
-	}
-
-	/* Check that the preferred pack wasn't expired (if given). */
-	if (preferred_pack_name) {
-		struct pack_info *preferred = bsearch(preferred_pack_name,
-						      ctx.info, ctx.nr,
-						      sizeof(*ctx.info),
-						      idx_or_pack_name_cmp);
-		if (preferred) {
-			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
-			if (perm == PACK_EXPIRED)
-				warning(_("preferred pack '%s' is expired"),
-					preferred_pack_name);
-		}
-	}
-
-	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
-		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
-					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
-
-	hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
-	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
-
-	if (ctx.nr - dropped_packs == 0) {
-		error(_("no pack files to index."));
-		result = 1;
-		goto cleanup;
-	}
-
-	if (!ctx.entries_nr) {
-		if (flags & MIDX_WRITE_BITMAP)
-			warning(_("refusing to write multi-pack .bitmap without any objects"));
-		flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
-	}
-
-	cf = init_chunkfile(f);
-
-	add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len,
-		  write_midx_pack_names);
-	add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE,
-		  write_midx_oid_fanout);
-	add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP,
-		  st_mult(ctx.entries_nr, the_hash_algo->rawsz),
-		  write_midx_oid_lookup);
-	add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS,
-		  st_mult(ctx.entries_nr, MIDX_CHUNK_OFFSET_WIDTH),
-		  write_midx_object_offsets);
-
-	if (ctx.large_offsets_needed)
-		add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS,
-			st_mult(ctx.num_large_offsets,
-				MIDX_CHUNK_LARGE_OFFSET_WIDTH),
-			write_midx_large_offsets);
-
-	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
-		ctx.pack_order = midx_pack_order(&ctx);
-		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
-			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
-			  write_midx_revindex);
-		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
-			  bitmapped_packs_concat_len,
-			  write_midx_bitmapped_packs);
-	}
-
-	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
-	write_chunkfile(cf, &ctx);
-
-	finalize_hashfile(f, midx_hash, FSYNC_COMPONENT_PACK_METADATA,
-			  CSUM_FSYNC | CSUM_HASH_IN_STREAM);
-	free_chunkfile(cf);
-
-	if (flags & MIDX_WRITE_REV_INDEX &&
-	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
-		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
-
-	if (flags & MIDX_WRITE_BITMAP) {
-		struct packing_data pdata;
-		struct commit **commits;
-		uint32_t commits_nr;
-
-		if (!ctx.entries_nr)
-			BUG("cannot write a bitmap without any objects");
-
-		prepare_midx_packing_data(&pdata, &ctx);
-
-		commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
-
-		/*
-		 * The previous steps translated the information from
-		 * 'entries' into information suitable for constructing
-		 * bitmaps. We no longer need that array, so clear it to
-		 * reduce memory pressure.
-		 */
-		FREE_AND_NULL(ctx.entries);
-		ctx.entries_nr = 0;
-
-		if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
-				      commits, commits_nr, ctx.pack_order,
-				      flags) < 0) {
-			error(_("could not write multi-pack bitmap"));
-			result = 1;
-			clear_packing_data(&pdata);
-			free(commits);
-			goto cleanup;
-		}
-
-		clear_packing_data(&pdata);
-		free(commits);
-	}
-	/*
-	 * NOTE: Do not use ctx.entries beyond this point, since it might
-	 * have been freed in the previous if block.
-	 */
-
-	if (ctx.m)
-		close_object_store(the_repository->objects);
-
-	if (commit_lock_file(&lk) < 0)
-		die_errno(_("could not write multi-pack-index"));
-
-	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
-	clear_midx_files_ext(object_dir, ".rev", midx_hash);
-
-cleanup:
-	for (i = 0; i < ctx.nr; i++) {
-		if (ctx.info[i].p) {
-			close_pack(ctx.info[i].p);
-			free(ctx.info[i].p);
-		}
-		free(ctx.info[i].pack_name);
-	}
-
-	free(ctx.info);
-	free(ctx.entries);
-	free(ctx.pack_perm);
-	free(ctx.pack_order);
-	strbuf_release(&midx_name);
-
-	trace2_region_leave("midx", "write_midx_internal", the_repository);
-
-	return result;
-}
-
-int write_midx_file(const char *object_dir,
-		    const char *preferred_pack_name,
-		    const char *refs_snapshot,
-		    unsigned flags)
-{
-	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
-				   refs_snapshot, flags);
-}
-
-int write_midx_file_only(const char *object_dir,
-			 struct string_list *packs_to_include,
-			 const char *preferred_pack_name,
-			 const char *refs_snapshot,
-			 unsigned flags)
-{
-	return write_midx_internal(object_dir, packs_to_include, NULL,
-				   preferred_pack_name, refs_snapshot, flags);
-}
-
 struct clear_midx_data {
 	char *keep;
 	const char *ext;
@@ -1784,8 +507,8 @@ static void clear_midx_file_ext(const char *full_path, size_t full_path_len UNUS
 		die_errno(_("failed to remove %s"), full_path);
 }
 
-static void clear_midx_files_ext(const char *object_dir, const char *ext,
-				 unsigned char *keep_hash)
+void clear_midx_files_ext(const char *object_dir, const char *ext,
+			  unsigned char *keep_hash)
 {
 	struct clear_midx_data data;
 	memset(&data, 0, sizeof(struct clear_midx_data));
@@ -1988,251 +711,3 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 
 	return verify_midx_error;
 }
-
-int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
-{
-	uint32_t i, *count, result = 0;
-	struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
-	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
-	struct progress *progress = NULL;
-
-	if (!m)
-		return 0;
-
-	CALLOC_ARRAY(count, m->num_packs);
-
-	if (flags & MIDX_PROGRESS)
-		progress = start_delayed_progress(_("Counting referenced objects"),
-					  m->num_objects);
-	for (i = 0; i < m->num_objects; i++) {
-		int pack_int_id = nth_midxed_pack_int_id(m, i);
-		count[pack_int_id]++;
-		display_progress(progress, i + 1);
-	}
-	stop_progress(&progress);
-
-	if (flags & MIDX_PROGRESS)
-		progress = start_delayed_progress(_("Finding and deleting unreferenced packfiles"),
-					  m->num_packs);
-	for (i = 0; i < m->num_packs; i++) {
-		char *pack_name;
-		display_progress(progress, i + 1);
-
-		if (count[i])
-			continue;
-
-		if (prepare_midx_pack(r, m, i))
-			continue;
-
-		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
-			continue;
-
-		pack_name = xstrdup(m->packs[i]->pack_name);
-		close_pack(m->packs[i]);
-
-		string_list_insert(&packs_to_drop, m->pack_names[i]);
-		unlink_pack_path(pack_name, 0);
-		free(pack_name);
-	}
-	stop_progress(&progress);
-
-	free(count);
-
-	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, NULL, flags);
-
-	string_list_clear(&packs_to_drop, 0);
-
-	return result;
-}
-
-struct repack_info {
-	timestamp_t mtime;
-	uint32_t referenced_objects;
-	uint32_t pack_int_id;
-};
-
-static int compare_by_mtime(const void *a_, const void *b_)
-{
-	const struct repack_info *a, *b;
-
-	a = (const struct repack_info *)a_;
-	b = (const struct repack_info *)b_;
-
-	if (a->mtime < b->mtime)
-		return -1;
-	if (a->mtime > b->mtime)
-		return 1;
-	return 0;
-}
-
-static int fill_included_packs_all(struct repository *r,
-				   struct multi_pack_index *m,
-				   unsigned char *include_pack)
-{
-	uint32_t i, count = 0;
-	int pack_kept_objects = 0;
-
-	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
-
-	for (i = 0; i < m->num_packs; i++) {
-		if (prepare_midx_pack(r, m, i))
-			continue;
-		if (!pack_kept_objects && m->packs[i]->pack_keep)
-			continue;
-		if (m->packs[i]->is_cruft)
-			continue;
-
-		include_pack[i] = 1;
-		count++;
-	}
-
-	return count < 2;
-}
-
-static int fill_included_packs_batch(struct repository *r,
-				     struct multi_pack_index *m,
-				     unsigned char *include_pack,
-				     size_t batch_size)
-{
-	uint32_t i, packs_to_repack;
-	size_t total_size;
-	struct repack_info *pack_info;
-	int pack_kept_objects = 0;
-
-	CALLOC_ARRAY(pack_info, m->num_packs);
-
-	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
-
-	for (i = 0; i < m->num_packs; i++) {
-		pack_info[i].pack_int_id = i;
-
-		if (prepare_midx_pack(r, m, i))
-			continue;
-
-		pack_info[i].mtime = m->packs[i]->mtime;
-	}
-
-	for (i = 0; i < m->num_objects; i++) {
-		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
-		pack_info[pack_int_id].referenced_objects++;
-	}
-
-	QSORT(pack_info, m->num_packs, compare_by_mtime);
-
-	total_size = 0;
-	packs_to_repack = 0;
-	for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
-		int pack_int_id = pack_info[i].pack_int_id;
-		struct packed_git *p = m->packs[pack_int_id];
-		size_t expected_size;
-
-		if (!p)
-			continue;
-		if (!pack_kept_objects && p->pack_keep)
-			continue;
-		if (p->is_cruft)
-			continue;
-		if (open_pack_index(p) || !p->num_objects)
-			continue;
-
-		expected_size = st_mult(p->pack_size,
-					pack_info[i].referenced_objects);
-		expected_size /= p->num_objects;
-
-		if (expected_size >= batch_size)
-			continue;
-
-		packs_to_repack++;
-		total_size += expected_size;
-		include_pack[pack_int_id] = 1;
-	}
-
-	free(pack_info);
-
-	if (packs_to_repack < 2)
-		return 1;
-
-	return 0;
-}
-
-int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
-{
-	int result = 0;
-	uint32_t i;
-	unsigned char *include_pack;
-	struct child_process cmd = CHILD_PROCESS_INIT;
-	FILE *cmd_in;
-	struct multi_pack_index *m = lookup_multi_pack_index(r, object_dir);
-
-	/*
-	 * When updating the default for these configuration
-	 * variables in builtin/repack.c, these must be adjusted
-	 * to match.
-	 */
-	int delta_base_offset = 1;
-	int use_delta_islands = 0;
-
-	if (!m)
-		return 0;
-
-	CALLOC_ARRAY(include_pack, m->num_packs);
-
-	if (batch_size) {
-		if (fill_included_packs_batch(r, m, include_pack, batch_size))
-			goto cleanup;
-	} else if (fill_included_packs_all(r, m, include_pack))
-		goto cleanup;
-
-	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
-	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
-
-	strvec_push(&cmd.args, "pack-objects");
-
-	strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
-
-	if (delta_base_offset)
-		strvec_push(&cmd.args, "--delta-base-offset");
-	if (use_delta_islands)
-		strvec_push(&cmd.args, "--delta-islands");
-
-	if (flags & MIDX_PROGRESS)
-		strvec_push(&cmd.args, "--progress");
-	else
-		strvec_push(&cmd.args, "-q");
-
-	cmd.git_cmd = 1;
-	cmd.in = cmd.out = -1;
-
-	if (start_command(&cmd)) {
-		error(_("could not start pack-objects"));
-		result = 1;
-		goto cleanup;
-	}
-
-	cmd_in = xfdopen(cmd.in, "w");
-
-	for (i = 0; i < m->num_objects; i++) {
-		struct object_id oid;
-		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
-
-		if (!include_pack[pack_int_id])
-			continue;
-
-		nth_midxed_object_oid(&oid, m, i);
-		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
-	}
-	fclose(cmd_in);
-
-	if (finish_command(&cmd)) {
-		error(_("could not finish pack-objects"));
-		result = 1;
-		goto cleanup;
-	}
-
-	result = write_midx_internal(object_dir, NULL, NULL, NULL, NULL, flags);
-
-cleanup:
-	free(include_pack);
-	return result;
-}
diff --git a/midx.h b/midx.h
index b374a7afaf..dc477dff44 100644
--- a/midx.h
+++ b/midx.h
@@ -8,6 +8,25 @@ struct pack_entry;
 struct repository;
 struct bitmapped_pack;
 
+#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
+#define MIDX_VERSION 1
+#define MIDX_BYTE_FILE_VERSION 4
+#define MIDX_BYTE_HASH_VERSION 5
+#define MIDX_BYTE_NUM_CHUNKS 6
+#define MIDX_BYTE_NUM_PACKS 8
+#define MIDX_HEADER_SIZE 12
+
+#define MIDX_CHUNK_ALIGNMENT 4
+#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_BITMAPPEDPACKS 0x42544d50 /* "BTMP" */
+#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
+#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
+#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
+#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
+#define MIDX_LARGE_OFFSET_NEEDED 0x80000000
+
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
 #define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
 	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
-- 
2.44.0.330.g158d2a670b4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/4] midx-write.c: factor out common want_included_pack() routine
  2024-04-01 21:16 ` [PATCH v2 0/4] " Taylor Blau
  2024-04-01 21:16   ` [PATCH v2 1/4] midx-write: move writing-related functions from midx.c Taylor Blau
@ 2024-04-01 21:16   ` Taylor Blau
  2024-04-02 11:47     ` Patrick Steinhardt
  2024-04-01 21:16   ` [PATCH v2 3/4] midx-write.c: check count of packs to repack after grouping Taylor Blau
  2024-04-01 21:16   ` [PATCH v2 4/4] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
  3 siblings, 1 reply; 30+ messages in thread
From: Taylor Blau @ 2024-04-01 21:16 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano, René Scharfe

When performing a 'git multi-pack-index repack', the MIDX machinery
tries to aggregate MIDX'd packs together either to (a) fill the given
`--batch-size` argument, or (b) combine all packs together.

In either case (using the `midx-write.c::fill_included_packs_batch()` or
`midx-write.c::fill_included_packs_all()` function, respectively), we
evaluate whether or not we want to repack each MIDX'd pack, according to
whether or it is loadable, kept, cruft, or non-empty.

Between the two `fill_included_packs_` callers, they both care about the
same conditions, except for `fill_included_packs_batch()` which also
cares that the pack is non-empty.

We could extract two functions (say, `want_included_pack()` and a
`_nonempty()` variant), but this is not necessary. For the case in
`fill_included_packs_all()` which does not check the pack size, we add
all of the pack's objects assuming that the pack meets all other
criteria. But if the pack is empty in the first place, we add all of its
zero objects, so whether or not we "accept" or "reject" it in the first
place is irrelevant.

This change improves the readability in both `fill_included_packs_`
functions.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 5242d2a724..906efa2169 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1349,6 +1349,24 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
+static int want_included_pack(struct repository *r,
+			      struct multi_pack_index *m,
+			      int pack_kept_objects,
+			      uint32_t pack_int_id)
+{
+	struct packed_git *p;
+	if (prepare_midx_pack(r, m, pack_int_id))
+		return 0;
+	p = m->packs[pack_int_id];
+	if (!pack_kept_objects && p->pack_keep)
+		return 0;
+	if (p->is_cruft)
+		return 0;
+	if (open_pack_index(p) || !p->num_objects)
+		return 0;
+	return 1;
+}
+
 static int fill_included_packs_all(struct repository *r,
 				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
@@ -1359,11 +1377,7 @@ static int fill_included_packs_all(struct repository *r,
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
-		if (prepare_midx_pack(r, m, i))
-			continue;
-		if (!pack_kept_objects && m->packs[i]->pack_keep)
-			continue;
-		if (m->packs[i]->is_cruft)
+		if (!want_included_pack(r, m, pack_kept_objects, i))
 			continue;
 
 		include_pack[i] = 1;
@@ -1410,13 +1424,7 @@ static int fill_included_packs_batch(struct repository *r,
 		struct packed_git *p = m->packs[pack_int_id];
 		size_t expected_size;
 
-		if (!p)
-			continue;
-		if (!pack_kept_objects && p->pack_keep)
-			continue;
-		if (p->is_cruft)
-			continue;
-		if (open_pack_index(p) || !p->num_objects)
+		if (!want_included_pack(r, m, pack_kept_objects, pack_int_id))
 			continue;
 
 		expected_size = st_mult(p->pack_size,
-- 
2.44.0.330.g158d2a670b4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/4] midx-write.c: check count of packs to repack after grouping
  2024-04-01 21:16 ` [PATCH v2 0/4] " Taylor Blau
  2024-04-01 21:16   ` [PATCH v2 1/4] midx-write: move writing-related functions from midx.c Taylor Blau
  2024-04-01 21:16   ` [PATCH v2 2/4] midx-write.c: factor out common want_included_pack() routine Taylor Blau
@ 2024-04-01 21:16   ` Taylor Blau
  2024-04-01 21:16   ` [PATCH v2 4/4] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
  3 siblings, 0 replies; 30+ messages in thread
From: Taylor Blau @ 2024-04-01 21:16 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano, René Scharfe

In both fill_included_packs_all() and fill_included_packs_batch(), we
accumulate a list of packs whose contents we want to repack together,
and then use that information to feed a list of objects as input to
pack-objects.

In both cases, the `fill_included_packs_` functions keep track of how
many packs they want to repack together, and only execute pack-objects
if there are at least two packs that need repacking.

Having both of these functions keep track of this information themselves
is not strictly necessary, since they also log which packs to repack via
the `include_pack` array, so we can simply count the non-zero entries in
that array after either function is done executing, reducing the overall
amount of code necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 906efa2169..960cc46250 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1367,11 +1367,11 @@ static int want_included_pack(struct repository *r,
 	return 1;
 }
 
-static int fill_included_packs_all(struct repository *r,
-				   struct multi_pack_index *m,
-				   unsigned char *include_pack)
+static void fill_included_packs_all(struct repository *r,
+				    struct multi_pack_index *m,
+				    unsigned char *include_pack)
 {
-	uint32_t i, count = 0;
+	uint32_t i;
 	int pack_kept_objects = 0;
 
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
@@ -1381,18 +1381,15 @@ static int fill_included_packs_all(struct repository *r,
 			continue;
 
 		include_pack[i] = 1;
-		count++;
 	}
-
-	return count < 2;
 }
 
-static int fill_included_packs_batch(struct repository *r,
-				     struct multi_pack_index *m,
-				     unsigned char *include_pack,
-				     size_t batch_size)
+static void fill_included_packs_batch(struct repository *r,
+				      struct multi_pack_index *m,
+				      unsigned char *include_pack,
+				      size_t batch_size)
 {
-	uint32_t i, packs_to_repack;
+	uint32_t i;
 	size_t total_size;
 	struct repack_info *pack_info;
 	int pack_kept_objects = 0;
@@ -1418,7 +1415,6 @@ static int fill_included_packs_batch(struct repository *r,
 	QSORT(pack_info, m->num_packs, compare_by_mtime);
 
 	total_size = 0;
-	packs_to_repack = 0;
 	for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
 		int pack_int_id = pack_info[i].pack_int_id;
 		struct packed_git *p = m->packs[pack_int_id];
@@ -1434,23 +1430,17 @@ static int fill_included_packs_batch(struct repository *r,
 		if (expected_size >= batch_size)
 			continue;
 
-		packs_to_repack++;
 		total_size += expected_size;
 		include_pack[pack_int_id] = 1;
 	}
 
 	free(pack_info);
-
-	if (packs_to_repack < 2)
-		return 1;
-
-	return 0;
 }
 
 int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
 {
 	int result = 0;
-	uint32_t i;
+	uint32_t i, packs_to_repack = 0;
 	unsigned char *include_pack;
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	FILE *cmd_in;
@@ -1469,10 +1459,16 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 
 	CALLOC_ARRAY(include_pack, m->num_packs);
 
-	if (batch_size) {
-		if (fill_included_packs_batch(r, m, include_pack, batch_size))
-			goto cleanup;
-	} else if (fill_included_packs_all(r, m, include_pack))
+	if (batch_size)
+		fill_included_packs_batch(r, m, include_pack, batch_size);
+	else
+		fill_included_packs_all(r, m, include_pack);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (include_pack[i])
+			packs_to_repack++;
+	}
+	if (packs_to_repack <= 1)
 		goto cleanup;
 
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
-- 
2.44.0.330.g158d2a670b4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/4] midx-write.c: use `--stdin-packs` when repacking
  2024-04-01 21:16 ` [PATCH v2 0/4] " Taylor Blau
                     ` (2 preceding siblings ...)
  2024-04-01 21:16   ` [PATCH v2 3/4] midx-write.c: check count of packs to repack after grouping Taylor Blau
@ 2024-04-01 21:16   ` Taylor Blau
  2024-04-01 21:45     ` Junio C Hamano
  2024-04-02 11:47     ` Patrick Steinhardt
  3 siblings, 2 replies; 30+ messages in thread
From: Taylor Blau @ 2024-04-01 21:16 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano, René Scharfe

When constructing a new pack `git multi-pack-index repack` provides a
list of objects which is the union of objects in all MIDX'd packs which
were "included" in the repack.

Though correct, this typically yields a poorly structured pack, since
providing the objects list over stdin does not give pack-objects a
chance to discover the namehash values for each object, leading to
sub-optimal delta selection.

We can use `--stdin-packs` instead, which has a couple of benefits:

  - it does a supplemental walk over objects in the supplied list of
    packs to discover their namehash, leading to higher-quality delta
    selection

  - it requires us to list far less data over stdin; instead of listing
    each object in the resulting pack, we need only list the
    constituent packs from which those objects were selected in the MIDX

Of course, this comes at a slight cost: though we save time on listing
packs versus objects over stdin[^1] (around ~650 milliseconds), we add a
non-trivial amount of time walking over the given objects in order to
find better deltas.

In general, this is likely to more closely match the user's expectations
(i.e. that packs generated via `git multi-pack-index repack` are written
with high-quality deltas). But if not, we can always introduce a new
option in pack-objects to disable the supplemental object walk, which
would yield a pure CPU-time savings, at the cost of the on-disk size of
the resulting pack.

[^1]: In a patched version of Git that doesn't perform the supplemental
  object walk in `pack-objects --stdin-packs`, we save around ~650ms
  (from 5.968 to 5.325 seconds) when running `git multi-pack-index
  repack --batch-size=0` on git.git with all objects packed, and all
  packs in a MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 960cc46250..65e69d2de7 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1474,7 +1474,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
 	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
 
-	strvec_push(&cmd.args, "pack-objects");
+	strvec_pushl(&cmd.args, "pack-objects", "--stdin-packs", "--non-empty",
+		     NULL);
 
 	strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
 
@@ -1498,16 +1499,15 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	}
 
 	cmd_in = xfdopen(cmd.in, "w");
-
-	for (i = 0; i < m->num_objects; i++) {
-		struct object_id oid;
-		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
-
-		if (!include_pack[pack_int_id])
+	for (i = 0; i < m->num_packs; i++) {
+		struct packed_git *p = m->packs[i];
+		if (!p)
 			continue;
 
-		nth_midxed_object_oid(&oid, m, i);
-		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
+		if (include_pack[i])
+			fprintf(cmd_in, "%s\n", pack_basename(p));
+		else
+			fprintf(cmd_in, "^%s\n", pack_basename(p));
 	}
 	fclose(cmd_in);
 
-- 
2.44.0.330.g158d2a670b4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/4] midx-write.c: use `--stdin-packs` when repacking
  2024-04-01 21:16   ` [PATCH v2 4/4] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
@ 2024-04-01 21:45     ` Junio C Hamano
  2024-04-02 11:47     ` Patrick Steinhardt
  1 sibling, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2024-04-01 21:45 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Jeff King, René Scharfe

Taylor Blau <me@ttaylorr.com> writes:

> When constructing a new pack `git multi-pack-index repack` provides a
> list of objects which is the union of objects in all MIDX'd packs which
> were "included" in the repack.
>
> Though correct, this typically yields a poorly structured pack, since
> providing the objects list over stdin does not give pack-objects a
> chance to discover the namehash values for each object, leading to
> sub-optimal delta selection.
>
> We can use `--stdin-packs` instead, which has a couple of benefits:
>
>   - it does a supplemental walk over objects in the supplied list of
>     packs to discover their namehash, leading to higher-quality delta
>     selection
>
>   - it requires us to list far less data over stdin; instead of listing
>     each object in the resulting pack, we need only list the
>     constituent packs from which those objects were selected in the MIDX
>
> Of course, this comes at a slight cost: though we save time on listing
> packs versus objects over stdin[^1] (around ~650 milliseconds), we add a
> non-trivial amount of time walking over the given objects in order to
> find better deltas.
>
> In general, this is likely to more closely match the user's expectations
> (i.e. that packs generated via `git multi-pack-index repack` are written
> with high-quality deltas). But if not, we can always introduce a new
> option in pack-objects to disable the supplemental object walk, which
> would yield a pure CPU-time savings, at the cost of the on-disk size of
> the resulting pack.
>
> [^1]: In a patched version of Git that doesn't perform the supplemental
>   object walk in `pack-objects --stdin-packs`, we save around ~650ms
>   (from 5.968 to 5.325 seconds) when running `git multi-pack-index
>   repack --batch-size=0` on git.git with all objects packed, and all
>   packs in a MIDX.

There are some measures in the mind of readers' who have read the
explanation so far.

 - So, this gives us a resulting pack with better delta selection.
   How much better would it get in a sample repository?  10%?  40%?

 - Of course, the better delta selection comes with cost.  How much
   more time do we spend?  20%?  150%?

 - As we do not enumerate all the object names, we save some time.
   Around 0.65 seconds in a sample repository.

I think among the three, the first two are more interesting numbers,
no?

I wonder if we can leverage the trick that reuses existing packdata
when we stream packs to feed the "git fetch" clients---we rely on
the fact that existing packs are tightly packed with good delta
selection, and using bitmap stream contiguous section(s) as much as
possible without disturbing the existing delta chain.  Wouldn't the
"we have many packs, let's repack them into one" workload benefit
the same way?

> -	strvec_push(&cmd.args, "pack-objects");
> +	strvec_pushl(&cmd.args, "pack-objects", "--stdin-packs", "--non-empty",
> +		     NULL);
>  
>  	strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
>  
> @@ -1498,16 +1499,15 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	}
>  
>  	cmd_in = xfdopen(cmd.in, "w");
> -
> -	for (i = 0; i < m->num_objects; i++) {
> -		struct object_id oid;
> -		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
> -
> -		if (!include_pack[pack_int_id])
> +	for (i = 0; i < m->num_packs; i++) {
> +		struct packed_git *p = m->packs[i];
> +		if (!p)
>  			continue;
>  
> -		nth_midxed_object_oid(&oid, m, i);
> -		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
> +		if (include_pack[i])
> +			fprintf(cmd_in, "%s\n", pack_basename(p));
> +		else
> +			fprintf(cmd_in, "^%s\n", pack_basename(p));
>  	}
>  	fclose(cmd_in);

Looking very straight-forward.  Will queue.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/4] midx-write.c: factor out common want_included_pack() routine
  2024-04-01 21:16   ` [PATCH v2 2/4] midx-write.c: factor out common want_included_pack() routine Taylor Blau
@ 2024-04-02 11:47     ` Patrick Steinhardt
  0 siblings, 0 replies; 30+ messages in thread
From: Patrick Steinhardt @ 2024-04-02 11:47 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, René Scharfe

[-- Attachment #1: Type: text/plain, Size: 3818 bytes --]

On Mon, Apr 01, 2024 at 05:16:38PM -0400, Taylor Blau wrote:
> When performing a 'git multi-pack-index repack', the MIDX machinery
> tries to aggregate MIDX'd packs together either to (a) fill the given
> `--batch-size` argument, or (b) combine all packs together.
> 
> In either case (using the `midx-write.c::fill_included_packs_batch()` or
> `midx-write.c::fill_included_packs_all()` function, respectively), we
> evaluate whether or not we want to repack each MIDX'd pack, according to
> whether or it is loadable, kept, cruft, or non-empty.
> 
> Between the two `fill_included_packs_` callers, they both care about the
> same conditions, except for `fill_included_packs_batch()` which also
> cares that the pack is non-empty.
> 
> We could extract two functions (say, `want_included_pack()` and a
> `_nonempty()` variant), but this is not necessary. For the case in
> `fill_included_packs_all()` which does not check the pack size, we add
> all of the pack's objects assuming that the pack meets all other
> criteria. But if the pack is empty in the first place, we add all of its
> zero objects, so whether or not we "accept" or "reject" it in the first
> place is irrelevant.
> 
> This change improves the readability in both `fill_included_packs_`
> functions.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx-write.c | 32 ++++++++++++++++++++------------
>  1 file changed, 20 insertions(+), 12 deletions(-)
> 
> diff --git a/midx-write.c b/midx-write.c
> index 5242d2a724..906efa2169 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1349,6 +1349,24 @@ static int compare_by_mtime(const void *a_, const void *b_)
>  	return 0;
>  }
>  
> +static int want_included_pack(struct repository *r,
> +			      struct multi_pack_index *m,
> +			      int pack_kept_objects,
> +			      uint32_t pack_int_id)
> +{
> +	struct packed_git *p;
> +	if (prepare_midx_pack(r, m, pack_int_id))
> +		return 0;
> +	p = m->packs[pack_int_id];
> +	if (!pack_kept_objects && p->pack_keep)
> +		return 0;
> +	if (p->is_cruft)
> +		return 0;
> +	if (open_pack_index(p) || !p->num_objects)
> +		return 0;
> +	return 1;
> +}
> +
>  static int fill_included_packs_all(struct repository *r,
>  				   struct multi_pack_index *m,
>  				   unsigned char *include_pack)
> @@ -1359,11 +1377,7 @@ static int fill_included_packs_all(struct repository *r,
>  	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
>  
>  	for (i = 0; i < m->num_packs; i++) {
> -		if (prepare_midx_pack(r, m, i))
> -			continue;
> -		if (!pack_kept_objects && m->packs[i]->pack_keep)
> -			continue;
> -		if (m->packs[i]->is_cruft)
> +		if (!want_included_pack(r, m, pack_kept_objects, i))
>  			continue;
>  
>  		include_pack[i] = 1;
> @@ -1410,13 +1424,7 @@ static int fill_included_packs_batch(struct repository *r,
>  		struct packed_git *p = m->packs[pack_int_id];
>  		size_t expected_size;

I was briefly wondering whether `m->packs[pack_int_id]` could change
after the above assignment. But there's a loop a bit further up here
that already calls `prepare_midx_pack()`, so this shouldn't happen.

> -		if (!p)
> -			continue;
> -		if (!pack_kept_objects && p->pack_keep)
> -			continue;
> -		if (p->is_cruft)
> -			continue;
> -		if (open_pack_index(p) || !p->num_objects)
> +		if (!want_included_pack(r, m, pack_kept_objects, pack_int_id))
>  			continue;

Another thing I wondered was whether it hurts performance that we now
call `prepare_midx_pack()` twice. But this shouldn't matter either as we
exit early from that function in case `m->packs[pack_int_id]` is already
populated.

So this patch looks good to me.

Patrick

>  		expected_size = st_mult(p->pack_size,
> -- 
> 2.44.0.330.g158d2a670b4
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/4] midx-write.c: use `--stdin-packs` when repacking
  2024-04-01 21:16   ` [PATCH v2 4/4] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
  2024-04-01 21:45     ` Junio C Hamano
@ 2024-04-02 11:47     ` Patrick Steinhardt
  1 sibling, 0 replies; 30+ messages in thread
From: Patrick Steinhardt @ 2024-04-02 11:47 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, René Scharfe

[-- Attachment #1: Type: text/plain, Size: 1310 bytes --]

On Mon, Apr 01, 2024 at 05:16:44PM -0400, Taylor Blau wrote:
[snip]
> @@ -1498,16 +1499,15 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	}
>  
>  	cmd_in = xfdopen(cmd.in, "w");
> -
> -	for (i = 0; i < m->num_objects; i++) {
> -		struct object_id oid;
> -		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
> -
> -		if (!include_pack[pack_int_id])
> +	for (i = 0; i < m->num_packs; i++) {
> +		struct packed_git *p = m->packs[i];
> +		if (!p)
>  			continue;
>  
> -		nth_midxed_object_oid(&oid, m, i);
> -		fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
> +		if (include_pack[i])
> +			fprintf(cmd_in, "%s\n", pack_basename(p));
> +		else
> +			fprintf(cmd_in, "^%s\n", pack_basename(p));
>  	}

The patch looks straight-forward to me overall, but this last part
confused me a bit. Why do we explicitly exclude those packs instead of
merely skipping over them? Is this so that we don't repack objects which
are part of both included and not-included packs?

I was mostly wondering whether it can ever happen here that we decide to
not include a pack because all of its objects are already contained in
other packs. In that case, explicitly excluding it would lead to repo
corruption if we deleted such a not-included pack.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-04-02 11:47 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-25 17:24 [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Taylor Blau
2024-03-25 17:24 ` [PATCH 01/11] midx-write: initial commit Taylor Blau
2024-03-25 20:30   ` Junio C Hamano
2024-03-25 22:09     ` Taylor Blau
2024-03-25 17:24 ` [PATCH 02/11] midx: extern a pair of shared functions Taylor Blau
2024-03-25 17:24 ` [PATCH 03/11] midx: move `midx_repack` (and related functions) to midx-write.c Taylor Blau
2024-03-25 17:24 ` [PATCH 04/11] midx: move `expire_midx_packs` " Taylor Blau
2024-03-25 17:24 ` [PATCH 05/11] midx: move `write_midx_file_only` " Taylor Blau
2024-03-25 17:24 ` [PATCH 06/11] midx: move `write_midx_file` " Taylor Blau
2024-03-25 17:24 ` [PATCH 07/11] midx: move `write_midx_internal` (and related functions) " Taylor Blau
2024-03-25 17:24 ` [PATCH 08/11] midx-write.c: avoid directly managed temporary strbuf Taylor Blau
2024-03-25 20:33   ` Junio C Hamano
2024-03-25 22:11     ` Taylor Blau
2024-03-25 17:24 ` [PATCH 09/11] midx-write.c: factor out common want_included_pack() routine Taylor Blau
2024-03-25 20:36   ` Junio C Hamano
2024-03-27  8:29   ` Jeff King
2024-03-25 17:24 ` [PATCH 10/11] midx-write.c: check count of packs to repack after grouping Taylor Blau
2024-03-25 20:41   ` Junio C Hamano
2024-03-25 22:11     ` Taylor Blau
2024-03-25 17:24 ` [PATCH 11/11] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
2024-03-27  8:37   ` Jeff King
2024-03-27  8:39 ` [PATCH 00/11] midx: split MIDX writing routines into midx-write.c, cleanup Jeff King
2024-04-01 21:16 ` [PATCH v2 0/4] " Taylor Blau
2024-04-01 21:16   ` [PATCH v2 1/4] midx-write: move writing-related functions from midx.c Taylor Blau
2024-04-01 21:16   ` [PATCH v2 2/4] midx-write.c: factor out common want_included_pack() routine Taylor Blau
2024-04-02 11:47     ` Patrick Steinhardt
2024-04-01 21:16   ` [PATCH v2 3/4] midx-write.c: check count of packs to repack after grouping Taylor Blau
2024-04-01 21:16   ` [PATCH v2 4/4] midx-write.c: use `--stdin-packs` when repacking Taylor Blau
2024-04-01 21:45     ` Junio C Hamano
2024-04-02 11:47     ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).