* [PATCH 0/8] repack: refactor pack snapshot-ing logic
@ 2023-09-05 20:36 Taylor Blau
2023-09-05 20:36 ` [PATCH 1/8] builtin/repack.c: extract structure to store existing packs Taylor Blau
` (9 more replies)
0 siblings, 10 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-05 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
This series refactors some of the 'git repack' internals to keep track
of existing packs in a set of string lists stored in a single structure,
instead of passing around multiple string lists to different functions
throughout the various paths.
The result is that the interface around pack deletion, marking packs as
redundant, and handling the set of pre-existing packs (both kept and
non-kept) is significantly cleaner without introducing any functional
changes.
I didn't mean to produce so much churn when I started writing these
patches, which began as a simple effort to rename a couple of variables
for more consistency.
Thanks in advance for your review!
Taylor Blau (8):
builtin/repack.c: extract structure to store existing packs
builtin/repack.c: extract marking packs for deletion
builtin/repack.c: extract redundant pack cleanup for --geometric
builtin/repack.c: extract redundant pack cleanup for existing packs
builtin/repack.c: extract `has_existing_non_kept_packs()`
builtin/repack.c: store existing cruft packs separately
builtin/repack.c: drop `DELETE_PACK` macro
builtin/repack.c: extract common cruft pack loop
builtin/repack.c | 288 ++++++++++++++++++++++++++++-------------------
1 file changed, 174 insertions(+), 114 deletions(-)
--
2.42.0.119.gca7d13e7bf
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 1/8] builtin/repack.c: extract structure to store existing packs
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
@ 2023-09-05 20:36 ` Taylor Blau
2023-09-07 7:54 ` Jeff King
2023-09-05 20:36 ` [PATCH 2/8] builtin/repack.c: extract marking packs for deletion Taylor Blau
` (8 subsequent siblings)
9 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2023-09-05 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
The repack machinery needs to keep track of which packfiles were present
in the repository at the beginning of a repack, segmented by whether or
not each pack is marked as kept.
The names of these packs are stored in two `string_list`s, corresponding
to kept- and non-kept packs, respectively. As a consequence, many
functions within the repack code need to take both `string_list`s as
arguments, leading to code like this:
ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
cruft_expiration, &names,
&existing_nonkept_packs, /* <- */
&existing_kept_packs); /* <- */
Wrap up this pair of `string_list`s into a single structure that stores
both. This saves us from having to pass both string lists separately,
and prepares for adding additional fields to this structure.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 90 ++++++++++++++++++++++++++----------------------
1 file changed, 49 insertions(+), 41 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 2b43a5be08..c3ab89912e 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -95,14 +95,29 @@ static int repack_config(const char *var, const char *value,
return git_default_config(var, value, ctx, cb);
}
+struct existing_packs {
+ struct string_list kept_packs;
+ struct string_list non_kept_packs;
+};
+
+#define EXISTING_PACKS_INIT { \
+ .kept_packs = STRING_LIST_INIT_DUP, \
+ .non_kept_packs = STRING_LIST_INIT_DUP, \
+}
+
+static void existing_packs_release(struct existing_packs *existing)
+{
+ string_list_clear(&existing->kept_packs, 0);
+ string_list_clear(&existing->non_kept_packs, 0);
+}
+
/*
- * Adds all packs hex strings (pack-$HASH) to either fname_nonkept_list
- * or fname_kept_list based on whether each pack has a corresponding
+ * Adds all packs hex strings (pack-$HASH) to either packs->non_kept
+ * or packs->kept based on whether each pack has a corresponding
* .keep file or not. Packs without a .keep file are not to be kept
* if we are going to pack everything into one file.
*/
-static void collect_pack_filenames(struct string_list *fname_nonkept_list,
- struct string_list *fname_kept_list,
+static void collect_pack_filenames(struct existing_packs *existing,
const struct string_list *extra_keep)
{
struct packed_git *p;
@@ -126,16 +141,16 @@ static void collect_pack_filenames(struct string_list *fname_nonkept_list,
strbuf_strip_suffix(&buf, ".pack");
if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
- string_list_append(fname_kept_list, buf.buf);
+ string_list_append(&existing->kept_packs, buf.buf);
else {
struct string_list_item *item;
- item = string_list_append(fname_nonkept_list, buf.buf);
+ item = string_list_append(&existing->non_kept_packs, buf.buf);
if (p->is_cruft)
item->util = (void*)(uintptr_t)CRUFT_PACK;
}
}
- string_list_sort(fname_kept_list);
+ string_list_sort(&existing->kept_packs);
strbuf_release(&buf);
}
@@ -327,7 +342,7 @@ static int geometry_cmp(const void *va, const void *vb)
}
static void init_pack_geometry(struct pack_geometry *geometry,
- struct string_list *existing_kept_packs,
+ struct existing_packs *existing,
const struct pack_objects_args *args)
{
struct packed_git *p;
@@ -344,23 +359,24 @@ static void init_pack_geometry(struct pack_geometry *geometry,
if (!pack_kept_objects) {
/*
- * Any pack that has its pack_keep bit set will appear
- * in existing_kept_packs below, but this saves us from
- * doing a more expensive check.
+ * Any pack that has its pack_keep bit set will
+ * appear in existing->kept_packs below, but
+ * this saves us from doing a more expensive
+ * check.
*/
if (p->pack_keep)
continue;
/*
- * The pack may be kept via the --keep-pack option;
- * check 'existing_kept_packs' to determine whether to
- * ignore it.
+ * The pack may be kept via the --keep-pack
+ * option; check 'existing->kept_packs' to
+ * determine whether to ignore it.
*/
strbuf_reset(&buf);
strbuf_addstr(&buf, pack_basename(p));
strbuf_strip_suffix(&buf, ".pack");
- if (string_list_has_string(existing_kept_packs, buf.buf))
+ if (string_list_has_string(&existing->kept_packs, buf.buf))
continue;
}
if (p->is_cruft)
@@ -565,14 +581,13 @@ static void midx_snapshot_refs(struct tempfile *f)
}
static void midx_included_packs(struct string_list *include,
- struct string_list *existing_nonkept_packs,
- struct string_list *existing_kept_packs,
+ struct existing_packs *existing,
struct string_list *names,
struct pack_geometry *geometry)
{
struct string_list_item *item;
- for_each_string_list_item(item, existing_kept_packs)
+ for_each_string_list_item(item, &existing->kept_packs)
string_list_insert(include, xstrfmt("%s.idx", item->string));
for_each_string_list_item(item, names)
string_list_insert(include, xstrfmt("pack-%s.idx", item->string));
@@ -600,7 +615,7 @@ static void midx_included_packs(struct string_list *include,
string_list_insert(include, strbuf_detach(&buf, NULL));
}
- for_each_string_list_item(item, existing_nonkept_packs) {
+ for_each_string_list_item(item, &existing->non_kept_packs) {
if (!((uintptr_t)item->util & CRUFT_PACK)) {
/*
* no need to check DELETE_PACK, since we're not
@@ -611,7 +626,7 @@ static void midx_included_packs(struct string_list *include,
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
} else {
- for_each_string_list_item(item, existing_nonkept_packs) {
+ for_each_string_list_item(item, &existing->non_kept_packs) {
if ((uintptr_t)item->util & DELETE_PACK)
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
@@ -700,8 +715,7 @@ static int write_cruft_pack(const struct pack_objects_args *args,
const char *pack_prefix,
const char *cruft_expiration,
struct string_list *names,
- struct string_list *existing_packs,
- struct string_list *existing_kept_packs)
+ struct existing_packs *existing)
{
struct child_process cmd = CHILD_PROCESS_INIT;
struct strbuf line = STRBUF_INIT;
@@ -744,9 +758,9 @@ static int write_cruft_pack(const struct pack_objects_args *args,
in = xfdopen(cmd.in, "w");
for_each_string_list_item(item, names)
fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
- for_each_string_list_item(item, existing_packs)
+ for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
- for_each_string_list_item(item, existing_kept_packs)
+ for_each_string_list_item(item, &existing->kept_packs)
fprintf(in, "%s.pack\n", item->string);
fclose(in);
@@ -778,8 +792,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
struct child_process cmd = CHILD_PROCESS_INIT;
struct string_list_item *item;
struct string_list names = STRING_LIST_INIT_DUP;
- struct string_list existing_nonkept_packs = STRING_LIST_INIT_DUP;
- struct string_list existing_kept_packs = STRING_LIST_INIT_DUP;
+ struct existing_packs existing = EXISTING_PACKS_INIT;
struct pack_geometry geometry = { 0 };
struct strbuf line = STRBUF_INIT;
struct tempfile *refs_snapshot = NULL;
@@ -915,13 +928,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
- collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs,
- &keep_pack_list);
+ collect_pack_filenames(&existing, &keep_pack_list);
if (geometry.split_factor) {
if (pack_everything)
die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
- init_pack_geometry(&geometry, &existing_kept_packs, &po_args);
+ init_pack_geometry(&geometry, &existing, &po_args);
split_pack_geometry(&geometry);
}
@@ -965,7 +977,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (pack_everything & ALL_INTO_ONE) {
repack_promisor_objects(&po_args, &names);
- if (existing_nonkept_packs.nr && delete_redundant &&
+ if (existing.non_kept_packs.nr && delete_redundant &&
!(pack_everything & PACK_CRUFT)) {
for_each_string_list_item(item, &names) {
strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack",
@@ -1054,8 +1066,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
cruft_expiration, &names,
- &existing_nonkept_packs,
- &existing_kept_packs);
+ &existing);
if (ret)
goto cleanup;
@@ -1086,8 +1097,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
pack_prefix,
NULL,
&names,
- &existing_nonkept_packs,
- &existing_kept_packs);
+ &existing);
if (ret)
goto cleanup;
}
@@ -1133,7 +1143,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (delete_redundant && pack_everything & ALL_INTO_ONE) {
const int hexsz = the_hash_algo->hexsz;
- for_each_string_list_item(item, &existing_nonkept_packs) {
+ for_each_string_list_item(item, &existing.non_kept_packs) {
char *sha1;
size_t len = strlen(item->string);
if (len < hexsz)
@@ -1152,8 +1162,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (write_midx) {
struct string_list include = STRING_LIST_INIT_NODUP;
- midx_included_packs(&include, &existing_nonkept_packs,
- &existing_kept_packs, &names, &geometry);
+ midx_included_packs(&include, &existing, &names, &geometry);
ret = write_midx_included_packs(&include, &geometry,
refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
@@ -1172,7 +1181,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (delete_redundant) {
int opts = 0;
- for_each_string_list_item(item, &existing_nonkept_packs) {
+ for_each_string_list_item(item, &existing.non_kept_packs) {
if (!((uintptr_t)item->util & DELETE_PACK))
continue;
remove_redundant_pack(packdir, item->string);
@@ -1193,7 +1202,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
strbuf_strip_suffix(&buf, ".pack");
if ((p->pack_keep) ||
- (string_list_has_string(&existing_kept_packs,
+ (string_list_has_string(&existing.kept_packs,
buf.buf)))
continue;
@@ -1224,8 +1233,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
cleanup:
string_list_clear(&names, 1);
- string_list_clear(&existing_nonkept_packs, 0);
- string_list_clear(&existing_kept_packs, 0);
+ existing_packs_release(&existing);
free_pack_geometry(&geometry);
return ret;
--
2.42.0.119.gca7d13e7bf
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 2/8] builtin/repack.c: extract marking packs for deletion
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
2023-09-05 20:36 ` [PATCH 1/8] builtin/repack.c: extract structure to store existing packs Taylor Blau
@ 2023-09-05 20:36 ` Taylor Blau
2023-09-07 7:59 ` Jeff King
2023-09-05 20:36 ` [PATCH 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric Taylor Blau
` (7 subsequent siblings)
9 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2023-09-05 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
At the end of a repack (when given `-d`), Git attempts to remove any
packs which have been made "redundant" as a result of the repacking
operation. For example, an all-into-one (`-A` or `-a`) repack makes
every pre-existing pack which is not marked as kept redundant. Geometric
repacks (with `--geometric=<n>`) make any packs which were rolled up
redundant, and so on.
But before deleting the set of packs we think are redundant, we first
check to see whether or not we just wrote a pack which is identical to
any one of the packs we were going to delete. When this is the case, Git
must avoid deleting that pack, since it matches a pack we just wrote
(so deleting it may cause the repository to become corrupt).
Right now we only process the list of non-kept packs in a single pass.
But a future change will split the existing non-kept packs further into
two lists: one for cruft packs, and another for non-cruft packs.
Factor out this routine to prepare for calling it twice on two separate
lists in a future patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 50 +++++++++++++++++++++++++++++++-----------------
1 file changed, 32 insertions(+), 18 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index c3ab89912e..708556836e 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -105,6 +105,36 @@ struct existing_packs {
.non_kept_packs = STRING_LIST_INIT_DUP, \
}
+static void mark_packs_for_deletion_1(struct string_list *names,
+ struct string_list *list)
+{
+ struct string_list_item *item;
+ const int hexsz = the_hash_algo->hexsz;
+
+ for_each_string_list_item(item, list) {
+ char *sha1;
+ size_t len = strlen(item->string);
+ if (len < hexsz)
+ continue;
+ sha1 = item->string + len - hexsz;
+ /*
+ * Mark this pack for deletion, which ensures that this
+ * pack won't be included in a MIDX (if `--write-midx`
+ * was given) and that we will actually delete this pack
+ * (if `-d` was given).
+ */
+ if (!string_list_has_string(names, sha1))
+ item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
+ }
+}
+
+static void mark_packs_for_deletion(struct existing_packs *existing,
+ struct string_list *names)
+
+{
+ mark_packs_for_deletion_1(names, &existing->non_kept_packs);
+}
+
static void existing_packs_release(struct existing_packs *existing)
{
string_list_clear(&existing->kept_packs, 0);
@@ -1141,24 +1171,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
}
/* End of pack replacement. */
- if (delete_redundant && pack_everything & ALL_INTO_ONE) {
- const int hexsz = the_hash_algo->hexsz;
- for_each_string_list_item(item, &existing.non_kept_packs) {
- char *sha1;
- size_t len = strlen(item->string);
- if (len < hexsz)
- continue;
- sha1 = item->string + len - hexsz;
- /*
- * Mark this pack for deletion, which ensures that this
- * pack won't be included in a MIDX (if `--write-midx`
- * was given) and that we will actually delete this pack
- * (if `-d` was given).
- */
- if (!string_list_has_string(&names, sha1))
- item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
- }
- }
+ if (delete_redundant && pack_everything & ALL_INTO_ONE)
+ mark_packs_for_deletion(&existing, &names);
if (write_midx) {
struct string_list include = STRING_LIST_INIT_NODUP;
--
2.42.0.119.gca7d13e7bf
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
2023-09-05 20:36 ` [PATCH 1/8] builtin/repack.c: extract structure to store existing packs Taylor Blau
2023-09-05 20:36 ` [PATCH 2/8] builtin/repack.c: extract marking packs for deletion Taylor Blau
@ 2023-09-05 20:36 ` Taylor Blau
2023-09-07 8:00 ` Jeff King
2023-09-05 20:36 ` [PATCH 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs Taylor Blau
` (6 subsequent siblings)
9 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2023-09-05 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
To reduce the complexity of the already quite-long `cmd_repack()`
implementation, extract out the parts responsible for deleting redundant
packs from a geometric repack out into its own sub-routine.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 52 +++++++++++++++++++++++++++---------------------
1 file changed, 29 insertions(+), 23 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 708556836e..d3e6326bb9 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -540,6 +540,32 @@ static struct packed_git *get_preferred_pack(struct pack_geometry *geometry)
return NULL;
}
+static void geometry_remove_redundant_packs(struct pack_geometry *geometry,
+ struct string_list *names,
+ struct existing_packs *existing)
+{
+ struct strbuf buf = STRBUF_INIT;
+ uint32_t i;
+
+ for (i = 0; i < geometry->split; i++) {
+ struct packed_git *p = geometry->pack[i];
+ if (string_list_has_string(names, hash_to_hex(p->hash)))
+ continue;
+
+ strbuf_reset(&buf);
+ strbuf_addstr(&buf, pack_basename(p));
+ strbuf_strip_suffix(&buf, ".pack");
+
+ if ((p->pack_keep) ||
+ (string_list_has_string(&existing->kept_packs, buf.buf)))
+ continue;
+
+ remove_redundant_pack(packdir, buf.buf);
+ }
+
+ strbuf_release(&buf);
+}
+
static void free_pack_geometry(struct pack_geometry *geometry)
{
if (!geometry)
@@ -1201,29 +1227,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
remove_redundant_pack(packdir, item->string);
}
- if (geometry.split_factor) {
- struct strbuf buf = STRBUF_INIT;
-
- uint32_t i;
- for (i = 0; i < geometry.split; i++) {
- struct packed_git *p = geometry.pack[i];
- if (string_list_has_string(&names,
- hash_to_hex(p->hash)))
- continue;
-
- strbuf_reset(&buf);
- strbuf_addstr(&buf, pack_basename(p));
- strbuf_strip_suffix(&buf, ".pack");
-
- if ((p->pack_keep) ||
- (string_list_has_string(&existing.kept_packs,
- buf.buf)))
- continue;
-
- remove_redundant_pack(packdir, buf.buf);
- }
- strbuf_release(&buf);
- }
+ if (geometry.split_factor)
+ geometry_remove_redundant_packs(&geometry, &names,
+ &existing);
if (show_progress)
opts |= PRUNE_PACKED_VERBOSE;
prune_packed_objects(opts);
--
2.42.0.119.gca7d13e7bf
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
` (2 preceding siblings ...)
2023-09-05 20:36 ` [PATCH 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric Taylor Blau
@ 2023-09-05 20:36 ` Taylor Blau
2023-09-07 8:02 ` Jeff King
2023-09-05 20:36 ` [PATCH 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()` Taylor Blau
` (5 subsequent siblings)
9 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2023-09-05 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
To remove redundant packs at the end of a repacking operation, Git uses
its `remove_redundant_pack()` function in a loop over the set of
pre-existing, non-kept packs.
In a later commit, we will split this list into two, one for
pre-existing cruft pack(s), and another for non-cruft pack(s). Prepare
for this by factoring out the routine to loop over and delete redundant
packs into its own function.
Instead of calling `remove_redundant_pack()` directly, we now will call
`remove_redundant_existing_packs()`, which itself dispatches a call to
`remove_redundant_packs_1()`. Note that the geometric repacking code
will still call `remove_redundant_pack()` directly, but see the previous
commit for more details.
Having `remove_redundant_packs_1()` exist as a separate function may
seem like overkill in this patch. However, a later patch will call
`remove_redundant_packs_1()` once over two separate lists, so this
refactoring sets us up for that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 45 ++++++++++++++++++++++++++++-----------------
1 file changed, 28 insertions(+), 17 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index d3e6326bb9..f6717e334c 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -135,6 +135,33 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
mark_packs_for_deletion_1(names, &existing->non_kept_packs);
}
+static void remove_redundant_pack(const char *dir_name, const char *base_name)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
+ strbuf_addf(&buf, "%s.pack", base_name);
+ if (m && midx_contains_pack(m, buf.buf))
+ clear_midx_file(the_repository);
+ strbuf_insertf(&buf, 0, "%s/", dir_name);
+ unlink_pack_path(buf.buf, 1);
+ strbuf_release(&buf);
+}
+
+static void remove_redundant_packs_1(struct string_list *packs)
+{
+ struct string_list_item *item;
+ for_each_string_list_item(item, packs) {
+ if (!((uintptr_t)item->util & DELETE_PACK))
+ continue;
+ remove_redundant_pack(packdir, item->string);
+ }
+}
+
+static void remove_redundant_existing_packs(struct existing_packs *existing)
+{
+ remove_redundant_packs_1(&existing->non_kept_packs);
+}
+
static void existing_packs_release(struct existing_packs *existing)
{
string_list_clear(&existing->kept_packs, 0);
@@ -184,18 +211,6 @@ static void collect_pack_filenames(struct existing_packs *existing,
strbuf_release(&buf);
}
-static void remove_redundant_pack(const char *dir_name, const char *base_name)
-{
- struct strbuf buf = STRBUF_INIT;
- struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
- strbuf_addf(&buf, "%s.pack", base_name);
- if (m && midx_contains_pack(m, buf.buf))
- clear_midx_file(the_repository);
- strbuf_insertf(&buf, 0, "%s/", dir_name);
- unlink_pack_path(buf.buf, 1);
- strbuf_release(&buf);
-}
-
static void prepare_pack_objects(struct child_process *cmd,
const struct pack_objects_args *args,
const char *out)
@@ -1221,11 +1236,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (delete_redundant) {
int opts = 0;
- for_each_string_list_item(item, &existing.non_kept_packs) {
- if (!((uintptr_t)item->util & DELETE_PACK))
- continue;
- remove_redundant_pack(packdir, item->string);
- }
+ remove_redundant_existing_packs(&existing);
if (geometry.split_factor)
geometry_remove_redundant_packs(&geometry, &names,
--
2.42.0.119.gca7d13e7bf
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()`
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
` (3 preceding siblings ...)
2023-09-05 20:36 ` [PATCH 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs Taylor Blau
@ 2023-09-05 20:36 ` Taylor Blau
2023-09-07 8:04 ` Jeff King
2023-09-05 20:36 ` [PATCH 6/8] builtin/repack.c: store existing cruft packs separately Taylor Blau
` (4 subsequent siblings)
9 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2023-09-05 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
When there is:
- at least one pre-existing packfile (which is not marked as kept),
- repacking with the `-d` flag, and
- not doing a cruft repack
, then we pass a handful of additional options to the inner
`pack-objects` process, like `--unpack-unreachable`,
`--keep-unreachable`, and `--pack-loose-unreachable`, in addition to
marking any packs we just wrote for promisor remotes as kept in-core
(with `--keep-pack`, as opposed to the presence of a ".keep" file on
disk).
Because we store both cruft and non-cruft packs together in the same
`existing.non_kept_packs` list, it suffices to check its `nr` member to
see if it is zero or not.
But a following change will store cruft- and non-cruft packs separately,
meaning this check would break as a result. Prepare for this by
extracting this part of the check into a new helper function called
`has_existing_non_kept_packs()`.
This patch does not introduce any functional changes, but prepares us to
make a more isolated change in a subsequent patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index f6717e334c..3f0789ff89 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -105,6 +105,11 @@ struct existing_packs {
.non_kept_packs = STRING_LIST_INIT_DUP, \
}
+static int has_existing_non_kept_packs(const struct existing_packs *existing)
+{
+ return existing->non_kept_packs.nr;
+}
+
static void mark_packs_for_deletion_1(struct string_list *names,
struct string_list *list)
{
@@ -1048,7 +1053,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (pack_everything & ALL_INTO_ONE) {
repack_promisor_objects(&po_args, &names);
- if (existing.non_kept_packs.nr && delete_redundant &&
+ if (has_existing_non_kept_packs(&existing) &&
+ delete_redundant &&
!(pack_everything & PACK_CRUFT)) {
for_each_string_list_item(item, &names) {
strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack",
--
2.42.0.119.gca7d13e7bf
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 6/8] builtin/repack.c: store existing cruft packs separately
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
` (4 preceding siblings ...)
2023-09-05 20:36 ` [PATCH 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()` Taylor Blau
@ 2023-09-05 20:36 ` Taylor Blau
2023-09-07 8:09 ` Jeff King
2023-09-05 20:36 ` [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro Taylor Blau
` (3 subsequent siblings)
9 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2023-09-05 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
When repacking with the `--write-midx` option, we invoke the function
`midx_included_packs()` in order to produce the list of packs we want to
include in the resulting MIDX.
This list is comprised of:
- existing .keep packs
- any pack(s) which were written earlier in the same process
- any unchanged packs when doing a `--geometric` repack
- any cruft packs
Prior to this patch, we stored pre-existing cruft and non-cruft packs
together (provided those packs are non-kept). This meant we needed an
additional bit to indicate which non-kept pack(s) were cruft versus
those that aren't.
But alternatively we can store cruft packs in a separate list, avoiding
the need for this extra bit, and simplifying the code below.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 3f0789ff89..478fab96c9 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -27,7 +27,6 @@
#define PACK_CRUFT 4
#define DELETE_PACK 1
-#define CRUFT_PACK 2
static int pack_everything;
static int delta_base_offset = 1;
@@ -98,16 +97,18 @@ static int repack_config(const char *var, const char *value,
struct existing_packs {
struct string_list kept_packs;
struct string_list non_kept_packs;
+ struct string_list cruft_packs;
};
#define EXISTING_PACKS_INIT { \
.kept_packs = STRING_LIST_INIT_DUP, \
.non_kept_packs = STRING_LIST_INIT_DUP, \
+ .cruft_packs = STRING_LIST_INIT_DUP, \
}
static int has_existing_non_kept_packs(const struct existing_packs *existing)
{
- return existing->non_kept_packs.nr;
+ return existing->non_kept_packs.nr || existing->cruft_packs.nr;
}
static void mark_packs_for_deletion_1(struct string_list *names,
@@ -138,6 +139,7 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
{
mark_packs_for_deletion_1(names, &existing->non_kept_packs);
+ mark_packs_for_deletion_1(names, &existing->cruft_packs);
}
static void remove_redundant_pack(const char *dir_name, const char *base_name)
@@ -165,12 +167,14 @@ static void remove_redundant_packs_1(struct string_list *packs)
static void remove_redundant_existing_packs(struct existing_packs *existing)
{
remove_redundant_packs_1(&existing->non_kept_packs);
+ remove_redundant_packs_1(&existing->cruft_packs);
}
static void existing_packs_release(struct existing_packs *existing)
{
string_list_clear(&existing->kept_packs, 0);
string_list_clear(&existing->non_kept_packs, 0);
+ string_list_clear(&existing->cruft_packs, 0);
}
/*
@@ -204,12 +208,10 @@ static void collect_pack_filenames(struct existing_packs *existing,
if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
string_list_append(&existing->kept_packs, buf.buf);
- else {
- struct string_list_item *item;
- item = string_list_append(&existing->non_kept_packs, buf.buf);
- if (p->is_cruft)
- item->util = (void*)(uintptr_t)CRUFT_PACK;
- }
+ else if (p->is_cruft)
+ string_list_append(&existing->cruft_packs, buf.buf);
+ else
+ string_list_append(&existing->non_kept_packs, buf.buf);
}
string_list_sort(&existing->kept_packs);
@@ -691,14 +693,11 @@ static void midx_included_packs(struct string_list *include,
string_list_insert(include, strbuf_detach(&buf, NULL));
}
- for_each_string_list_item(item, &existing->non_kept_packs) {
- if (!((uintptr_t)item->util & CRUFT_PACK)) {
- /*
- * no need to check DELETE_PACK, since we're not
- * doing an ALL_INTO_ONE repack
- */
- continue;
- }
+ for_each_string_list_item(item, &existing->cruft_packs) {
+ /*
+ * no need to check DELETE_PACK, since we're not
+ * doing an ALL_INTO_ONE repack
+ */
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
} else {
@@ -707,6 +706,12 @@ static void midx_included_packs(struct string_list *include,
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
+
+ for_each_string_list_item(item, &existing->cruft_packs) {
+ if ((uintptr_t)item->util & DELETE_PACK)
+ continue;
+ string_list_insert(include, xstrfmt("%s.idx", item->string));
+ }
}
}
@@ -836,6 +841,8 @@ static int write_cruft_pack(const struct pack_objects_args *args,
fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
+ for_each_string_list_item(item, &existing->cruft_packs)
+ fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->kept_packs)
fprintf(in, "%s.pack\n", item->string);
fclose(in);
--
2.42.0.119.gca7d13e7bf
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
` (5 preceding siblings ...)
2023-09-05 20:36 ` [PATCH 6/8] builtin/repack.c: store existing cruft packs separately Taylor Blau
@ 2023-09-05 20:36 ` Taylor Blau
2023-09-06 17:05 ` Junio C Hamano
2023-09-05 20:37 ` [PATCH 8/8] builtin/repack.c: extract common cruft pack loop Taylor Blau
` (2 subsequent siblings)
9 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2023-09-05 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 478fab96c9..6110598a69 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -26,8 +26,6 @@
#define LOOSEN_UNREACHABLE 2
#define PACK_CRUFT 4
-#define DELETE_PACK 1
-
static int pack_everything;
static int delta_base_offset = 1;
static int pack_kept_objects = -1;
@@ -96,6 +94,10 @@ static int repack_config(const char *var, const char *value,
struct existing_packs {
struct string_list kept_packs;
+ /*
+ * for both non_kept_packs, and cruft_packs, a non-NULL
+ * 'util' field indicates the pack should be deleted.
+ */
struct string_list non_kept_packs;
struct string_list cruft_packs;
};
@@ -130,7 +132,7 @@ static void mark_packs_for_deletion_1(struct string_list *names,
* (if `-d` was given).
*/
if (!string_list_has_string(names, sha1))
- item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
+ item->util = (void*)1;
}
}
@@ -158,7 +160,7 @@ static void remove_redundant_packs_1(struct string_list *packs)
{
struct string_list_item *item;
for_each_string_list_item(item, packs) {
- if (!((uintptr_t)item->util & DELETE_PACK))
+ if (!item->util)
continue;
remove_redundant_pack(packdir, item->string);
}
@@ -695,20 +697,20 @@ static void midx_included_packs(struct string_list *include,
for_each_string_list_item(item, &existing->cruft_packs) {
/*
- * no need to check DELETE_PACK, since we're not
- * doing an ALL_INTO_ONE repack
+ * no need to check for deleted packs, since we're
+ * not doing an ALL_INTO_ONE repack
*/
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
} else {
for_each_string_list_item(item, &existing->non_kept_packs) {
- if ((uintptr_t)item->util & DELETE_PACK)
+ if (item->util)
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
for_each_string_list_item(item, &existing->cruft_packs) {
- if ((uintptr_t)item->util & DELETE_PACK)
+ if (item->util)
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
--
2.42.0.119.gca7d13e7bf
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 8/8] builtin/repack.c: extract common cruft pack loop
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
` (6 preceding siblings ...)
2023-09-05 20:36 ` [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro Taylor Blau
@ 2023-09-05 20:37 ` Taylor Blau
2023-09-07 9:00 ` [PATCH 0/8] repack: refactor pack snapshot-ing logic Jeff King
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
9 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-05 20:37 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
When generating the list of packs to store in a MIDX (when given the
`--write-midx` option), we include any cruft packs both during
--geometric and non-geometric repacks.
But the rules for when we do and don't have to check whether any of
those cruft packs were queued for deletion differ slightly between the
two cases.
But the two can be unified, provided there is a little bit of extra
detail added in the comment to clarify when it is safe to avoid checking
for any pending deletions (and why it is OK to do so even when not
required).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 30 ++++++++++++++++++------------
1 file changed, 18 insertions(+), 12 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 6110598a69..e6a1ef9a09 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -695,25 +695,31 @@ static void midx_included_packs(struct string_list *include,
string_list_insert(include, strbuf_detach(&buf, NULL));
}
- for_each_string_list_item(item, &existing->cruft_packs) {
- /*
- * no need to check for deleted packs, since we're
- * not doing an ALL_INTO_ONE repack
- */
- string_list_insert(include, xstrfmt("%s.idx", item->string));
- }
} else {
for_each_string_list_item(item, &existing->non_kept_packs) {
if (item->util)
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
+ }
- for_each_string_list_item(item, &existing->cruft_packs) {
- if (item->util)
- continue;
- string_list_insert(include, xstrfmt("%s.idx", item->string));
- }
+ for_each_string_list_item(item, &existing->cruft_packs) {
+ /*
+ * When doing a --geometric repack, there is no need to check
+ * for deleted packs, since we're by definition not doing an
+ * ALL_INTO_ONE repack (hence no packs will be deleted).
+ * Otherwise we must check for and exclude any packs which are
+ * enqueued for deletion.
+ *
+ * So we could omit the conditional below in the --geometric
+ * case, but doing so is unnecessary since no packs are marked
+ * as pending deletion (since we only call
+ * `mark_packs_for_deletion()` when doing an all-into-one
+ * repack).
+ */
+ if (item->util)
+ continue;
+ string_list_insert(include, xstrfmt("%s.idx", item->string));
}
}
--
2.42.0.119.gca7d13e7bf
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro
2023-09-05 20:36 ` [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro Taylor Blau
@ 2023-09-06 17:05 ` Junio C Hamano
2023-09-06 17:22 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2023-09-06 17:05 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Patrick Steinhardt
Taylor Blau <me@ttaylorr.com> writes:
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> builtin/repack.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
The reason being...?
> @@ -130,7 +132,7 @@ static void mark_packs_for_deletion_1(struct string_list *names,
> * (if `-d` was given).
> */
> if (!string_list_has_string(names, sha1))
> - item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
> + item->util = (void*)1;
Presumably the literal "1" is promoted to an appropriate type
(uintptr_t) here?
We originally planned to use the .util member to store a bitset
to represent various attributes for each pack, and defined
DELETE_PACK as one of the possible bits, but over the N years of
its existence, we never found the need for the second bit.
or something?
It is not a bad idea to demote .util from a set of bits to just a
"is it NULL?" boolean, but updating the above to something like
#define DELETE_PACK ((void*)(uintptr_t)1)
item->util = DELETE_PACK;
may still reap the same simplification benefit without introducing
the "what is that _one_ about? can it be _two_ without changing the
meaning or what?" puzzlement.
Other than that, everything else in the series looked quite
straight-forward and sensible.
Thanks.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro
2023-09-06 17:05 ` Junio C Hamano
@ 2023-09-06 17:22 ` Taylor Blau
2023-09-07 8:19 ` Patrick Steinhardt
2023-09-07 8:58 ` Jeff King
0 siblings, 2 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-06 17:22 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jeff King, Patrick Steinhardt
On Wed, Sep 06, 2023 at 10:05:37AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> > builtin/repack.c | 18 ++++++++++--------
> > 1 file changed, 10 insertions(+), 8 deletions(-)
>
> The reason being...?
Wow, I have no idea how this got sent out without a commit message! At
least it was signed off ;-).
Here's a replacement that I cooked up, with your Helped-by. Let me know
if you want to replace this patch manually, or if you'd prefer a reroll
of the series. Either is fine with me! :-)
--- 8< ---
Subject: [PATCH] builtin/repack.c: treat string_list_item util as booleans
The `->util` field corresponding to each string_list_item used to track
the existence of some pack at the beginning of a repack operation was
originally intended to be used as a bitfield.
This bitfield tracked:
- (1 << 0): whether or not the pack should be deleted
- (1 << 1): whether or not the pack is cruft
The previous commit removed the use of the second bit, meaning that we
can treat this field as a boolean instead of a bitset.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 478fab96c9..575e69b020 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -26,7 +26,7 @@
#define LOOSEN_UNREACHABLE 2
#define PACK_CRUFT 4
-#define DELETE_PACK 1
+#define DELETE_PACK ((void*)(uintptr_t)1)
static int pack_everything;
static int delta_base_offset = 1;
@@ -96,6 +96,10 @@ static int repack_config(const char *var, const char *value,
struct existing_packs {
struct string_list kept_packs;
+ /*
+ * for both non_kept_packs, and cruft_packs, a non-NULL
+ * 'util' field indicates the pack should be deleted.
+ */
struct string_list non_kept_packs;
struct string_list cruft_packs;
};
@@ -130,7 +134,7 @@ static void mark_packs_for_deletion_1(struct string_list *names,
* (if `-d` was given).
*/
if (!string_list_has_string(names, sha1))
- item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
+ item->util = DELETE_PACK;
}
}
@@ -158,7 +162,7 @@ static void remove_redundant_packs_1(struct string_list *packs)
{
struct string_list_item *item;
for_each_string_list_item(item, packs) {
- if (!((uintptr_t)item->util & DELETE_PACK))
+ if (!item->util)
continue;
remove_redundant_pack(packdir, item->string);
}
@@ -695,20 +699,20 @@ static void midx_included_packs(struct string_list *include,
for_each_string_list_item(item, &existing->cruft_packs) {
/*
- * no need to check DELETE_PACK, since we're not
- * doing an ALL_INTO_ONE repack
+ * no need to check for deleted packs, since we're
+ * not doing an ALL_INTO_ONE repack
*/
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
} else {
for_each_string_list_item(item, &existing->non_kept_packs) {
- if ((uintptr_t)item->util & DELETE_PACK)
+ if (item->util)
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
for_each_string_list_item(item, &existing->cruft_packs) {
- if ((uintptr_t)item->util & DELETE_PACK)
+ if (item->util)
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
--
2.38.0.16.g393fd4c6db
--- >8 ---
Thanks,
Taylor
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 1/8] builtin/repack.c: extract structure to store existing packs
2023-09-05 20:36 ` [PATCH 1/8] builtin/repack.c: extract structure to store existing packs Taylor Blau
@ 2023-09-07 7:54 ` Jeff King
0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2023-09-07 7:54 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Patrick Steinhardt
On Tue, Sep 05, 2023 at 04:36:40PM -0400, Taylor Blau wrote:
> The repack machinery needs to keep track of which packfiles were present
> in the repository at the beginning of a repack, segmented by whether or
> not each pack is marked as kept.
>
> The names of these packs are stored in two `string_list`s, corresponding
> to kept- and non-kept packs, respectively. As a consequence, many
> functions within the repack code need to take both `string_list`s as
> arguments, leading to code like this:
>
> ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
> cruft_expiration, &names,
> &existing_nonkept_packs, /* <- */
> &existing_kept_packs); /* <- */
>
> Wrap up this pair of `string_list`s into a single structure that stores
> both. This saves us from having to pass both string lists separately,
> and prepares for adding additional fields to this structure.
Makes sense. Even without any additional fields, the grouping makes the
code a bit easier to follow.
Patch is noisy, but looks correct. :)
-Peff
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/8] builtin/repack.c: extract marking packs for deletion
2023-09-05 20:36 ` [PATCH 2/8] builtin/repack.c: extract marking packs for deletion Taylor Blau
@ 2023-09-07 7:59 ` Jeff King
2023-09-07 22:10 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2023-09-07 7:59 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Patrick Steinhardt
On Tue, Sep 05, 2023 at 04:36:43PM -0400, Taylor Blau wrote:
> At the end of a repack (when given `-d`), Git attempts to remove any
> packs which have been made "redundant" as a result of the repacking
> operation. For example, an all-into-one (`-A` or `-a`) repack makes
> every pre-existing pack which is not marked as kept redundant. Geometric
> repacks (with `--geometric=<n>`) make any packs which were rolled up
> redundant, and so on.
>
> But before deleting the set of packs we think are redundant, we first
> check to see whether or not we just wrote a pack which is identical to
> any one of the packs we were going to delete. When this is the case, Git
> must avoid deleting that pack, since it matches a pack we just wrote
> (so deleting it may cause the repository to become corrupt).
>
> Right now we only process the list of non-kept packs in a single pass.
> But a future change will split the existing non-kept packs further into
> two lists: one for cruft packs, and another for non-cruft packs.
>
> Factor out this routine to prepare for calling it twice on two separate
> lists in a future patch.
There are really two "factor outs" here: we pull the code from
cmd_repack() into a helper, and then the helper is also just a thin
wrapper around its "_1" variant. That latter part isn't needed yet, but
I can guess from your description that we'll eventually have the main
function dispatch to the "_1" helper for lists.
The main caller in cmd_repack() could do that double-dispatch, but then
this code:
> + if (delete_redundant && pack_everything & ALL_INTO_ONE)
> + mark_packs_for_deletion(&existing, &names);
would know more about how "existing_packs" work than it needs to. So
this seems like a good split (and the two-liner above is making
cmd_repack() much more readable than the big loop it had in the
pre-image).
-Peff
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric
2023-09-05 20:36 ` [PATCH 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric Taylor Blau
@ 2023-09-07 8:00 ` Jeff King
0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2023-09-07 8:00 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Patrick Steinhardt
On Tue, Sep 05, 2023 at 04:36:46PM -0400, Taylor Blau wrote:
> To reduce the complexity of the already quite-long `cmd_repack()`
> implementation, extract out the parts responsible for deleting redundant
> packs from a geometric repack out into its own sub-routine.
Makes sense. Again, I'm happy to see some of these large functional
pieces of cmd_repack() moved into sub-functions. Even if we never call
them twice, IMHO it is a readability improvement to give them meaningful
names.
-Peff
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs
2023-09-05 20:36 ` [PATCH 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs Taylor Blau
@ 2023-09-07 8:02 ` Jeff King
0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2023-09-07 8:02 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Patrick Steinhardt
On Tue, Sep 05, 2023 at 04:36:48PM -0400, Taylor Blau wrote:
> To remove redundant packs at the end of a repacking operation, Git uses
> its `remove_redundant_pack()` function in a loop over the set of
> pre-existing, non-kept packs.
>
> In a later commit, we will split this list into two, one for
> pre-existing cruft pack(s), and another for non-cruft pack(s). Prepare
> for this by factoring out the routine to loop over and delete redundant
> packs into its own function.
>
> Instead of calling `remove_redundant_pack()` directly, we now will call
> `remove_redundant_existing_packs()`, which itself dispatches a call to
> `remove_redundant_packs_1()`. Note that the geometric repacking code
> will still call `remove_redundant_pack()` directly, but see the previous
> commit for more details.
>
> Having `remove_redundant_packs_1()` exist as a separate function may
> seem like overkill in this patch. However, a later patch will call
> `remove_redundant_packs_1()` once over two separate lists, so this
> refactoring sets us up for that.
Heh, so this is basically the same "_1" case discussed in the earlier
patch. This commit message explains the split a bit better, IMHO. :)
(Not worth re-rolling; just thinking out loud).
-Peff
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()`
2023-09-05 20:36 ` [PATCH 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()` Taylor Blau
@ 2023-09-07 8:04 ` Jeff King
0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2023-09-07 8:04 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Patrick Steinhardt
On Tue, Sep 05, 2023 at 04:36:51PM -0400, Taylor Blau wrote:
> But a following change will store cruft- and non-cruft packs separately,
> meaning this check would break as a result. Prepare for this by
> extracting this part of the check into a new helper function called
> `has_existing_non_kept_packs()`.
>
> This patch does not introduce any functional changes, but prepares us to
> make a more isolated change in a subsequent patch.
OK, that makes sense.
-Peff
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6/8] builtin/repack.c: store existing cruft packs separately
2023-09-05 20:36 ` [PATCH 6/8] builtin/repack.c: store existing cruft packs separately Taylor Blau
@ 2023-09-07 8:09 ` Jeff King
0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2023-09-07 8:09 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Patrick Steinhardt
On Tue, Sep 05, 2023 at 04:36:54PM -0400, Taylor Blau wrote:
> When repacking with the `--write-midx` option, we invoke the function
> `midx_included_packs()` in order to produce the list of packs we want to
> include in the resulting MIDX.
>
> This list is comprised of:
>
> - existing .keep packs
> - any pack(s) which were written earlier in the same process
> - any unchanged packs when doing a `--geometric` repack
> - any cruft packs
>
> Prior to this patch, we stored pre-existing cruft and non-cruft packs
> together (provided those packs are non-kept). This meant we needed an
> additional bit to indicate which non-kept pack(s) were cruft versus
> those that aren't.
>
> But alternatively we can store cruft packs in a separate list, avoiding
> the need for this extra bit, and simplifying the code below.
OK. Getting rid of the extra bit is nice. We shorten code that only
cares about cruft packs like:
for each pack
if (pack is cruft)
...
to just:
for each cruft_pack
...
which is good. But the flip side is that any existing code which looks
at the combined list now has to do:
for each pack
...
for each cruft_pack
...
I think there's just one case of that, here:
> @@ -707,6 +706,12 @@ static void midx_included_packs(struct string_list *include,
> continue;
> string_list_insert(include, xstrfmt("%s.idx", item->string));
> }
> +
> + for_each_string_list_item(item, &existing->cruft_packs) {
> + if ((uintptr_t)item->util & DELETE_PACK)
> + continue;
> + string_list_insert(include, xstrfmt("%s.idx", item->string));
> + }
> }
> }
It may be an OK price to pay if this lets us keep cleaning things up
(especially if we could get rid of that util casting entirely!). Let's
read on...
-Peff
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro
2023-09-06 17:22 ` Taylor Blau
@ 2023-09-07 8:19 ` Patrick Steinhardt
2023-09-07 18:16 ` Junio C Hamano
2023-09-07 8:58 ` Jeff King
1 sibling, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2023-09-07 8:19 UTC (permalink / raw)
To: Taylor Blau; +Cc: Junio C Hamano, git, Jeff King
[-- Attachment #1: Type: text/plain, Size: 4512 bytes --]
On Wed, Sep 06, 2023 at 01:22:59PM -0400, Taylor Blau wrote:
> On Wed, Sep 06, 2023 at 10:05:37AM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > > ---
> > > builtin/repack.c | 18 ++++++++++--------
> > > 1 file changed, 10 insertions(+), 8 deletions(-)
> >
> > The reason being...?
>
> Wow, I have no idea how this got sent out without a commit message! At
> least it was signed off ;-).
>
> Here's a replacement that I cooked up, with your Helped-by. Let me know
> if you want to replace this patch manually, or if you'd prefer a reroll
> of the series. Either is fine with me! :-)
Personally I think that the original version is still more readable. If
you simply see `if (item->util)` you don't really have any indicator
what this actually means, whereas previously the more verbose check for
`if ((uintptr)item->util & DELETE_PACK)` gives the reader a nice hint
what this may be about.
If the intent is to make this check a bit prettier, how about we instead
introduce a helper function like the following:
```
static inline int pack_marked_for_deletion(const struct string_list_item *item)
{
return (uintptr) item->util & DELETE_PACK;
}
```
Other than that the rest of this series looks good to me, thanks.
Patrick
> --- 8< ---
> Subject: [PATCH] builtin/repack.c: treat string_list_item util as booleans
>
> The `->util` field corresponding to each string_list_item used to track
> the existence of some pack at the beginning of a repack operation was
> originally intended to be used as a bitfield.
>
> This bitfield tracked:
>
> - (1 << 0): whether or not the pack should be deleted
> - (1 << 1): whether or not the pack is cruft
>
> The previous commit removed the use of the second bit, meaning that we
> can treat this field as a boolean instead of a bitset.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> builtin/repack.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 478fab96c9..575e69b020 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -26,7 +26,7 @@
> #define LOOSEN_UNREACHABLE 2
> #define PACK_CRUFT 4
>
> -#define DELETE_PACK 1
> +#define DELETE_PACK ((void*)(uintptr_t)1)
>
> static int pack_everything;
> static int delta_base_offset = 1;
> @@ -96,6 +96,10 @@ static int repack_config(const char *var, const char *value,
>
> struct existing_packs {
> struct string_list kept_packs;
> + /*
> + * for both non_kept_packs, and cruft_packs, a non-NULL
> + * 'util' field indicates the pack should be deleted.
> + */
> struct string_list non_kept_packs;
> struct string_list cruft_packs;
> };
> @@ -130,7 +134,7 @@ static void mark_packs_for_deletion_1(struct string_list *names,
> * (if `-d` was given).
> */
> if (!string_list_has_string(names, sha1))
> - item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
> + item->util = DELETE_PACK;
> }
> }
>
> @@ -158,7 +162,7 @@ static void remove_redundant_packs_1(struct string_list *packs)
> {
> struct string_list_item *item;
> for_each_string_list_item(item, packs) {
> - if (!((uintptr_t)item->util & DELETE_PACK))
> + if (!item->util)
> continue;
> remove_redundant_pack(packdir, item->string);
> }
> @@ -695,20 +699,20 @@ static void midx_included_packs(struct string_list *include,
>
> for_each_string_list_item(item, &existing->cruft_packs) {
> /*
> - * no need to check DELETE_PACK, since we're not
> - * doing an ALL_INTO_ONE repack
> + * no need to check for deleted packs, since we're
> + * not doing an ALL_INTO_ONE repack
> */
> string_list_insert(include, xstrfmt("%s.idx", item->string));
> }
> } else {
> for_each_string_list_item(item, &existing->non_kept_packs) {
> - if ((uintptr_t)item->util & DELETE_PACK)
> + if (item->util)
> continue;
> string_list_insert(include, xstrfmt("%s.idx", item->string));
> }
>
> for_each_string_list_item(item, &existing->cruft_packs) {
> - if ((uintptr_t)item->util & DELETE_PACK)
> + if (item->util)
> continue;
> string_list_insert(include, xstrfmt("%s.idx", item->string));
> }
> --
> 2.38.0.16.g393fd4c6db
>
> --- >8 ---
>
> Thanks,
> Taylor
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro
2023-09-06 17:22 ` Taylor Blau
2023-09-07 8:19 ` Patrick Steinhardt
@ 2023-09-07 8:58 ` Jeff King
1 sibling, 0 replies; 39+ messages in thread
From: Jeff King @ 2023-09-07 8:58 UTC (permalink / raw)
To: Taylor Blau; +Cc: Junio C Hamano, git, Patrick Steinhardt
On Wed, Sep 06, 2023 at 01:22:59PM -0400, Taylor Blau wrote:
> --- 8< ---
> Subject: [PATCH] builtin/repack.c: treat string_list_item util as booleans
>
> The `->util` field corresponding to each string_list_item used to track
> the existence of some pack at the beginning of a repack operation was
> originally intended to be used as a bitfield.
>
> This bitfield tracked:
>
> - (1 << 0): whether or not the pack should be deleted
> - (1 << 1): whether or not the pack is cruft
>
> The previous commit removed the use of the second bit, meaning that we
> can treat this field as a boolean instead of a bitset.
I do think the boolean is syntactically a little nicer than the bit-set,
just because of the casting we have to with the void pointer). After
reading the earlier parts, I was hoping the culmination of this series
would be dropping the use of util entirely (presumably in favor of
separate lists). But maybe that would be too disruptive; I didn't try
it.
> -#define DELETE_PACK 1
> +#define DELETE_PACK ((void*)(uintptr_t)1)
> [...]
> @@ -130,7 +134,7 @@ static void mark_packs_for_deletion_1(struct string_list *names,
> * (if `-d` was given).
> */
> if (!string_list_has_string(names, sha1))
> - item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
> + item->util = DELETE_PACK;
> }
> }
I do like the use of the macro here to make the meaning of the boolean
more plain, since the name "util" is totally meaningless (but we are
stuck with it).
But on the other side, things get more mysterious:
> @@ -158,7 +162,7 @@ static void remove_redundant_packs_1(struct string_list *packs)
> {
> struct string_list_item *item;
> for_each_string_list_item(item, packs) {
> - if (!((uintptr_t)item->util & DELETE_PACK))
> + if (!item->util)
> continue;
This is syntactically much nicer, but the meaning of "util" as a boolean
for "we should delete this" is lost.
So I dunno. The end result of this patch is more readable syntactically,
but arguably less so semantically. Unless we have a good reason to ditch
the bit-set entirely, I wonder if we could have the best of both with
some macro helpers. Or even a set of matching helper functions like:
void pack_mark_for_deletion(struct string_list_item *item)
{
(uintptr_t)item->util = 1;
}
int pack_should_deleted(const struct string_list_item *item)
{
return item->util;
}
which could be a bool or a bit-set; the callers no longer need to care
and get to use human-readable names.
I dunno. It is a file-local convention and there aren't that many spots,
so maybe it is not worth worrying too much about. I'm pretty sure that I
got confused by the use of "util" here when looking at the code before,
as it did not have the DELETE_PACK name until your 72263ffc32
(builtin/repack.c: use named flags for existing_packs, 2022-05-20). But
maybe the comment that you added is sufficient.
If we had generic pointer-bitset macros, then perhaps other string-list
users could benefit, too. I thought maybe fast-export could use this for
its mark_to_ptr() stuff, but that is storing a whole 32-bit value, not a
bitset. So maybe this is just a weird localized thing.
A more radical idea is that we don't care very much about the data
structure as long as it is ordered. So we could just do:
struct existing_pack {
struct list_head list;
int to_delete;
char name[FLEX_ARRAY];
};
and ditch string-list entirely. That lets us use "to_delete" in a
natural way. Though I suspect it makes all the _other_ code unreadable,
as we have to allocate them, deal with list_entry(), and so on.
So I guess I'm fine with any path (including the one in this patch).
-Peff
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/8] repack: refactor pack snapshot-ing logic
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
` (7 preceding siblings ...)
2023-09-05 20:37 ` [PATCH 8/8] builtin/repack.c: extract common cruft pack loop Taylor Blau
@ 2023-09-07 9:00 ` Jeff King
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
9 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2023-09-07 9:00 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Patrick Steinhardt
On Tue, Sep 05, 2023 at 04:36:37PM -0400, Taylor Blau wrote:
> This series refactors some of the 'git repack' internals to keep track
> of existing packs in a set of string lists stored in a single structure,
> instead of passing around multiple string lists to different functions
> throughout the various paths.
>
> The result is that the interface around pack deletion, marking packs as
> redundant, and handling the set of pre-existing packs (both kept and
> non-kept) is significantly cleaner without introducing any functional
> changes.
>
> I didn't mean to produce so much churn when I started writing these
> patches, which began as a simple effort to rename a couple of variables
> for more consistency.
These all look pretty reasonable to me. I wasn't quite sure about patch
7 and left some comments there, but all of the options are kind of ugly.
Since it's limited to that one file, it may not be worth spending too
much time or brain-power on trying to polish it further.
-Peff
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro
2023-09-07 8:19 ` Patrick Steinhardt
@ 2023-09-07 18:16 ` Junio C Hamano
0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2023-09-07 18:16 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Taylor Blau, git, Jeff King
Patrick Steinhardt <ps@pks.im> writes:
> If the intent is to make this check a bit prettier, how about we instead
> introduce a helper function like the following:
>
> ```
> static inline int pack_marked_for_deletion(const struct string_list_item *item)
> {
> return (uintptr) item->util & DELETE_PACK;
> }
> ```
Good suggestion.
Or just check if it is NULL (with the new code that only cares about
the NULL-ness). Regardless of the implementation, having such a
helper would document the intent of the code better, especially if
there are multiple places that make that check.
> Other than that the rest of this series looks good to me, thanks.
>
> Patrick
Thanks.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/8] builtin/repack.c: extract marking packs for deletion
2023-09-07 7:59 ` Jeff King
@ 2023-09-07 22:10 ` Taylor Blau
0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-07 22:10 UTC (permalink / raw)
To: Jeff King; +Cc: git, Junio C Hamano, Patrick Steinhardt
On Thu, Sep 07, 2023 at 03:59:31AM -0400, Jeff King wrote:
> There are really two "factor outs" here: we pull the code from
> cmd_repack() into a helper, and then the helper is also just a thin
> wrapper around its "_1" variant. That latter part isn't needed yet, but
> I can guess from your description that we'll eventually have the main
> function dispatch to the "_1" helper for lists.
Yeah... the "_1" variant looks ugly in isolation in this patch, but I
think makes things cleaner in subsequent patches.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v2 0/8] repack: refactor pack snapshot-ing logic
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
` (8 preceding siblings ...)
2023-09-07 9:00 ` [PATCH 0/8] repack: refactor pack snapshot-ing logic Jeff King
@ 2023-09-13 19:17 ` Taylor Blau
2023-09-13 19:17 ` [PATCH v2 1/8] builtin/repack.c: extract structure to store existing packs Taylor Blau
` (9 more replies)
9 siblings, 10 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-13 19:17 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
Here is a small reroll of my series to clean up some of the internals of
'git repack' used to track the set of existing packs.
Much is unchanged from the last round, save for some additional clean-up
on how we handle the '->util' field for each pack's string_list_item in
response to very helpful review from those CC'd.
As usual, a range-diff is available below for convenience. Thanks in
advance for your review!
Taylor Blau (8):
builtin/repack.c: extract structure to store existing packs
builtin/repack.c: extract marking packs for deletion
builtin/repack.c: extract redundant pack cleanup for --geometric
builtin/repack.c: extract redundant pack cleanup for existing packs
builtin/repack.c: extract `has_existing_non_kept_packs()`
builtin/repack.c: store existing cruft packs separately
builtin/repack.c: avoid directly inspecting "util"
builtin/repack.c: extract common cruft pack loop
builtin/repack.c | 293 +++++++++++++++++++++++++++++------------------
1 file changed, 180 insertions(+), 113 deletions(-)
Range-diff against v1:
1: 5b48b7e3cc = 1: 2e26beff22 builtin/repack.c: extract structure to store existing packs
2: 313537ef68 = 2: 62d916169d builtin/repack.c: extract marking packs for deletion
3: 5c25ef87c1 = 3: 7ed45804ea builtin/repack.c: extract redundant pack cleanup for --geometric
4: 7bb543fef8 = 4: 82057de4cf builtin/repack.c: extract redundant pack cleanup for existing packs
5: e2cf87bb94 = 5: f4f7b4c08f builtin/repack.c: extract `has_existing_non_kept_packs()`
6: 414a558883 = 6: d68a88dbd5 builtin/repack.c: store existing cruft packs separately
7: 559b487e2a ! 7: 481a29599b builtin/repack.c: drop `DELETE_PACK` macro
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- builtin/repack.c: drop `DELETE_PACK` macro
+ builtin/repack.c: avoid directly inspecting "util"
+ The `->util` field corresponding to each string_list_item is used to
+ track the existence of some pack at the beginning of a repack operation
+ was originally intended to be used as a bitfield.
+
+ This bitfield tracked:
+
+ - (1 << 0): whether or not the pack should be deleted
+ - (1 << 1): whether or not the pack is cruft
+
+ The previous commit removed the use of the second bit, but a future
+ patch (from a different series than this one) will introduce a new use
+ of it.
+
+ So we could stop treating the util pointer as a bitfield and instead
+ start treating it as if it were a boolean. But this would require some
+ backtracking when that later patch is applied.
+
+ Instead, let's avoid touching the ->util field directly, and instead
+ introduce convenience functions like:
+
+ - pack_mark_for_deletion()
+ - pack_is_marked_for_deletion()
+
+ Helped-by: Junio C Hamano <gitster@pobox.com>
+ Helped-by: Jeff King <peff@peff.net>
+ Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## builtin/repack.c ##
-@@
- #define LOOSEN_UNREACHABLE 2
- #define PACK_CRUFT 4
+@@ builtin/repack.c: static int has_existing_non_kept_packs(const struct existing_packs *existing)
+ return existing->non_kept_packs.nr || existing->cruft_packs.nr;
+ }
--#define DELETE_PACK 1
--
- static int pack_everything;
- static int delta_base_offset = 1;
- static int pack_kept_objects = -1;
-@@ builtin/repack.c: static int repack_config(const char *var, const char *value,
-
- struct existing_packs {
- struct string_list kept_packs;
-+ /*
-+ * for both non_kept_packs, and cruft_packs, a non-NULL
-+ * 'util' field indicates the pack should be deleted.
-+ */
- struct string_list non_kept_packs;
- struct string_list cruft_packs;
- };
++static void pack_mark_for_deletion(struct string_list_item *item)
++{
++ item->util = (void*)((uintptr_t)item->util | DELETE_PACK);
++}
++
++static int pack_is_marked_for_deletion(struct string_list_item *item)
++{
++ return (uintptr_t)item->util & DELETE_PACK;
++}
++
+ static void mark_packs_for_deletion_1(struct string_list *names,
+ struct string_list *list)
+ {
@@ builtin/repack.c: static void mark_packs_for_deletion_1(struct string_list *names,
* (if `-d` was given).
*/
if (!string_list_has_string(names, sha1))
- item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
-+ item->util = (void*)1;
++ pack_mark_for_deletion(item);
}
}
@@ builtin/repack.c: static void remove_redundant_packs_1(struct string_list *packs
struct string_list_item *item;
for_each_string_list_item(item, packs) {
- if (!((uintptr_t)item->util & DELETE_PACK))
-+ if (!item->util)
++ if (!pack_is_marked_for_deletion(item))
continue;
remove_redundant_pack(packdir, item->string);
}
@@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
-
- for_each_string_list_item(item, &existing->cruft_packs) {
- /*
-- * no need to check DELETE_PACK, since we're not
-- * doing an ALL_INTO_ONE repack
-+ * no need to check for deleted packs, since we're
-+ * not doing an ALL_INTO_ONE repack
- */
- string_list_insert(include, xstrfmt("%s.idx", item->string));
}
} else {
for_each_string_list_item(item, &existing->non_kept_packs) {
- if ((uintptr_t)item->util & DELETE_PACK)
-+ if (item->util)
++ if (pack_is_marked_for_deletion(item))
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
for_each_string_list_item(item, &existing->cruft_packs) {
- if ((uintptr_t)item->util & DELETE_PACK)
-+ if (item->util)
++ if (pack_is_marked_for_deletion(item))
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
8: ca7d13e7bf ! 8: 68748eb9c8 builtin/repack.c: extract common cruft pack loop
@@ Commit message
## builtin/repack.c ##
@@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
+
string_list_insert(include, strbuf_detach(&buf, NULL));
}
-
+-
- for_each_string_list_item(item, &existing->cruft_packs) {
- /*
-- * no need to check for deleted packs, since we're
-- * not doing an ALL_INTO_ONE repack
+- * no need to check DELETE_PACK, since we're not
+- * doing an ALL_INTO_ONE repack
- */
- string_list_insert(include, xstrfmt("%s.idx", item->string));
- }
} else {
for_each_string_list_item(item, &existing->non_kept_packs) {
- if (item->util)
+ if (pack_is_marked_for_deletion(item))
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
+ }
- for_each_string_list_item(item, &existing->cruft_packs) {
-- if (item->util)
+- if (pack_is_marked_for_deletion(item))
- continue;
- string_list_insert(include, xstrfmt("%s.idx", item->string));
- }
@@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
+ * `mark_packs_for_deletion()` when doing an all-into-one
+ * repack).
+ */
-+ if (item->util)
++ if (pack_is_marked_for_deletion(item))
+ continue;
+ string_list_insert(include, xstrfmt("%s.idx", item->string));
}
--
2.42.0.166.g68748eb9c8
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v2 1/8] builtin/repack.c: extract structure to store existing packs
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
@ 2023-09-13 19:17 ` Taylor Blau
2023-09-13 19:17 ` [PATCH v2 2/8] builtin/repack.c: extract marking packs for deletion Taylor Blau
` (8 subsequent siblings)
9 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-13 19:17 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
The repack machinery needs to keep track of which packfiles were present
in the repository at the beginning of a repack, segmented by whether or
not each pack is marked as kept.
The names of these packs are stored in two `string_list`s, corresponding
to kept- and non-kept packs, respectively. As a consequence, many
functions within the repack code need to take both `string_list`s as
arguments, leading to code like this:
ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
cruft_expiration, &names,
&existing_nonkept_packs, /* <- */
&existing_kept_packs); /* <- */
Wrap up this pair of `string_list`s into a single structure that stores
both. This saves us from having to pass both string lists separately,
and prepares for adding additional fields to this structure.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 90 ++++++++++++++++++++++++++----------------------
1 file changed, 49 insertions(+), 41 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 6943c5ba11..67bf86567f 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -95,14 +95,29 @@ static int repack_config(const char *var, const char *value,
return git_default_config(var, value, ctx, cb);
}
+struct existing_packs {
+ struct string_list kept_packs;
+ struct string_list non_kept_packs;
+};
+
+#define EXISTING_PACKS_INIT { \
+ .kept_packs = STRING_LIST_INIT_DUP, \
+ .non_kept_packs = STRING_LIST_INIT_DUP, \
+}
+
+static void existing_packs_release(struct existing_packs *existing)
+{
+ string_list_clear(&existing->kept_packs, 0);
+ string_list_clear(&existing->non_kept_packs, 0);
+}
+
/*
- * Adds all packs hex strings (pack-$HASH) to either fname_nonkept_list
- * or fname_kept_list based on whether each pack has a corresponding
+ * Adds all packs hex strings (pack-$HASH) to either packs->non_kept
+ * or packs->kept based on whether each pack has a corresponding
* .keep file or not. Packs without a .keep file are not to be kept
* if we are going to pack everything into one file.
*/
-static void collect_pack_filenames(struct string_list *fname_nonkept_list,
- struct string_list *fname_kept_list,
+static void collect_pack_filenames(struct existing_packs *existing,
const struct string_list *extra_keep)
{
struct packed_git *p;
@@ -126,16 +141,16 @@ static void collect_pack_filenames(struct string_list *fname_nonkept_list,
strbuf_strip_suffix(&buf, ".pack");
if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
- string_list_append(fname_kept_list, buf.buf);
+ string_list_append(&existing->kept_packs, buf.buf);
else {
struct string_list_item *item;
- item = string_list_append(fname_nonkept_list, buf.buf);
+ item = string_list_append(&existing->non_kept_packs, buf.buf);
if (p->is_cruft)
item->util = (void*)(uintptr_t)CRUFT_PACK;
}
}
- string_list_sort(fname_kept_list);
+ string_list_sort(&existing->kept_packs);
strbuf_release(&buf);
}
@@ -327,7 +342,7 @@ static int geometry_cmp(const void *va, const void *vb)
}
static void init_pack_geometry(struct pack_geometry *geometry,
- struct string_list *existing_kept_packs,
+ struct existing_packs *existing,
const struct pack_objects_args *args)
{
struct packed_git *p;
@@ -344,23 +359,24 @@ static void init_pack_geometry(struct pack_geometry *geometry,
if (!pack_kept_objects) {
/*
- * Any pack that has its pack_keep bit set will appear
- * in existing_kept_packs below, but this saves us from
- * doing a more expensive check.
+ * Any pack that has its pack_keep bit set will
+ * appear in existing->kept_packs below, but
+ * this saves us from doing a more expensive
+ * check.
*/
if (p->pack_keep)
continue;
/*
- * The pack may be kept via the --keep-pack option;
- * check 'existing_kept_packs' to determine whether to
- * ignore it.
+ * The pack may be kept via the --keep-pack
+ * option; check 'existing->kept_packs' to
+ * determine whether to ignore it.
*/
strbuf_reset(&buf);
strbuf_addstr(&buf, pack_basename(p));
strbuf_strip_suffix(&buf, ".pack");
- if (string_list_has_string(existing_kept_packs, buf.buf))
+ if (string_list_has_string(&existing->kept_packs, buf.buf))
continue;
}
if (p->is_cruft)
@@ -565,14 +581,13 @@ static void midx_snapshot_refs(struct tempfile *f)
}
static void midx_included_packs(struct string_list *include,
- struct string_list *existing_nonkept_packs,
- struct string_list *existing_kept_packs,
+ struct existing_packs *existing,
struct string_list *names,
struct pack_geometry *geometry)
{
struct string_list_item *item;
- for_each_string_list_item(item, existing_kept_packs)
+ for_each_string_list_item(item, &existing->kept_packs)
string_list_insert(include, xstrfmt("%s.idx", item->string));
for_each_string_list_item(item, names)
string_list_insert(include, xstrfmt("pack-%s.idx", item->string));
@@ -600,7 +615,7 @@ static void midx_included_packs(struct string_list *include,
string_list_insert(include, strbuf_detach(&buf, NULL));
}
- for_each_string_list_item(item, existing_nonkept_packs) {
+ for_each_string_list_item(item, &existing->non_kept_packs) {
if (!((uintptr_t)item->util & CRUFT_PACK)) {
/*
* no need to check DELETE_PACK, since we're not
@@ -611,7 +626,7 @@ static void midx_included_packs(struct string_list *include,
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
} else {
- for_each_string_list_item(item, existing_nonkept_packs) {
+ for_each_string_list_item(item, &existing->non_kept_packs) {
if ((uintptr_t)item->util & DELETE_PACK)
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
@@ -700,8 +715,7 @@ static int write_cruft_pack(const struct pack_objects_args *args,
const char *pack_prefix,
const char *cruft_expiration,
struct string_list *names,
- struct string_list *existing_packs,
- struct string_list *existing_kept_packs)
+ struct existing_packs *existing)
{
struct child_process cmd = CHILD_PROCESS_INIT;
struct strbuf line = STRBUF_INIT;
@@ -743,9 +757,9 @@ static int write_cruft_pack(const struct pack_objects_args *args,
in = xfdopen(cmd.in, "w");
for_each_string_list_item(item, names)
fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
- for_each_string_list_item(item, existing_packs)
+ for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
- for_each_string_list_item(item, existing_kept_packs)
+ for_each_string_list_item(item, &existing->kept_packs)
fprintf(in, "%s.pack\n", item->string);
fclose(in);
@@ -777,8 +791,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
struct child_process cmd = CHILD_PROCESS_INIT;
struct string_list_item *item;
struct string_list names = STRING_LIST_INIT_DUP;
- struct string_list existing_nonkept_packs = STRING_LIST_INIT_DUP;
- struct string_list existing_kept_packs = STRING_LIST_INIT_DUP;
+ struct existing_packs existing = EXISTING_PACKS_INIT;
struct pack_geometry geometry = { 0 };
struct strbuf line = STRBUF_INIT;
struct tempfile *refs_snapshot = NULL;
@@ -914,13 +927,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
- collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs,
- &keep_pack_list);
+ collect_pack_filenames(&existing, &keep_pack_list);
if (geometry.split_factor) {
if (pack_everything)
die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
- init_pack_geometry(&geometry, &existing_kept_packs, &po_args);
+ init_pack_geometry(&geometry, &existing, &po_args);
split_pack_geometry(&geometry);
}
@@ -964,7 +976,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (pack_everything & ALL_INTO_ONE) {
repack_promisor_objects(&po_args, &names);
- if (existing_nonkept_packs.nr && delete_redundant &&
+ if (existing.non_kept_packs.nr && delete_redundant &&
!(pack_everything & PACK_CRUFT)) {
for_each_string_list_item(item, &names) {
strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack",
@@ -1055,8 +1067,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
cruft_expiration, &names,
- &existing_nonkept_packs,
- &existing_kept_packs);
+ &existing);
if (ret)
goto cleanup;
@@ -1087,8 +1098,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
pack_prefix,
NULL,
&names,
- &existing_nonkept_packs,
- &existing_kept_packs);
+ &existing);
if (ret)
goto cleanup;
}
@@ -1134,7 +1144,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (delete_redundant && pack_everything & ALL_INTO_ONE) {
const int hexsz = the_hash_algo->hexsz;
- for_each_string_list_item(item, &existing_nonkept_packs) {
+ for_each_string_list_item(item, &existing.non_kept_packs) {
char *sha1;
size_t len = strlen(item->string);
if (len < hexsz)
@@ -1153,8 +1163,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (write_midx) {
struct string_list include = STRING_LIST_INIT_NODUP;
- midx_included_packs(&include, &existing_nonkept_packs,
- &existing_kept_packs, &names, &geometry);
+ midx_included_packs(&include, &existing, &names, &geometry);
ret = write_midx_included_packs(&include, &geometry,
refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
@@ -1173,7 +1182,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (delete_redundant) {
int opts = 0;
- for_each_string_list_item(item, &existing_nonkept_packs) {
+ for_each_string_list_item(item, &existing.non_kept_packs) {
if (!((uintptr_t)item->util & DELETE_PACK))
continue;
remove_redundant_pack(packdir, item->string);
@@ -1194,7 +1203,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
strbuf_strip_suffix(&buf, ".pack");
if ((p->pack_keep) ||
- (string_list_has_string(&existing_kept_packs,
+ (string_list_has_string(&existing.kept_packs,
buf.buf)))
continue;
@@ -1225,8 +1234,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
cleanup:
string_list_clear(&names, 1);
- string_list_clear(&existing_nonkept_packs, 0);
- string_list_clear(&existing_kept_packs, 0);
+ existing_packs_release(&existing);
free_pack_geometry(&geometry);
return ret;
--
2.42.0.166.g68748eb9c8
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 2/8] builtin/repack.c: extract marking packs for deletion
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
2023-09-13 19:17 ` [PATCH v2 1/8] builtin/repack.c: extract structure to store existing packs Taylor Blau
@ 2023-09-13 19:17 ` Taylor Blau
2023-09-13 19:17 ` [PATCH v2 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric Taylor Blau
` (7 subsequent siblings)
9 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-13 19:17 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
At the end of a repack (when given `-d`), Git attempts to remove any
packs which have been made "redundant" as a result of the repacking
operation. For example, an all-into-one (`-A` or `-a`) repack makes
every pre-existing pack which is not marked as kept redundant. Geometric
repacks (with `--geometric=<n>`) make any packs which were rolled up
redundant, and so on.
But before deleting the set of packs we think are redundant, we first
check to see whether or not we just wrote a pack which is identical to
any one of the packs we were going to delete. When this is the case, Git
must avoid deleting that pack, since it matches a pack we just wrote
(so deleting it may cause the repository to become corrupt).
Right now we only process the list of non-kept packs in a single pass.
But a future change will split the existing non-kept packs further into
two lists: one for cruft packs, and another for non-cruft packs.
Factor out this routine to prepare for calling it twice on two separate
lists in a future patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 50 +++++++++++++++++++++++++++++++-----------------
1 file changed, 32 insertions(+), 18 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 67bf86567f..0eee430951 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -105,6 +105,36 @@ struct existing_packs {
.non_kept_packs = STRING_LIST_INIT_DUP, \
}
+static void mark_packs_for_deletion_1(struct string_list *names,
+ struct string_list *list)
+{
+ struct string_list_item *item;
+ const int hexsz = the_hash_algo->hexsz;
+
+ for_each_string_list_item(item, list) {
+ char *sha1;
+ size_t len = strlen(item->string);
+ if (len < hexsz)
+ continue;
+ sha1 = item->string + len - hexsz;
+ /*
+ * Mark this pack for deletion, which ensures that this
+ * pack won't be included in a MIDX (if `--write-midx`
+ * was given) and that we will actually delete this pack
+ * (if `-d` was given).
+ */
+ if (!string_list_has_string(names, sha1))
+ item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
+ }
+}
+
+static void mark_packs_for_deletion(struct existing_packs *existing,
+ struct string_list *names)
+
+{
+ mark_packs_for_deletion_1(names, &existing->non_kept_packs);
+}
+
static void existing_packs_release(struct existing_packs *existing)
{
string_list_clear(&existing->kept_packs, 0);
@@ -1142,24 +1172,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
}
/* End of pack replacement. */
- if (delete_redundant && pack_everything & ALL_INTO_ONE) {
- const int hexsz = the_hash_algo->hexsz;
- for_each_string_list_item(item, &existing.non_kept_packs) {
- char *sha1;
- size_t len = strlen(item->string);
- if (len < hexsz)
- continue;
- sha1 = item->string + len - hexsz;
- /*
- * Mark this pack for deletion, which ensures that this
- * pack won't be included in a MIDX (if `--write-midx`
- * was given) and that we will actually delete this pack
- * (if `-d` was given).
- */
- if (!string_list_has_string(&names, sha1))
- item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
- }
- }
+ if (delete_redundant && pack_everything & ALL_INTO_ONE)
+ mark_packs_for_deletion(&existing, &names);
if (write_midx) {
struct string_list include = STRING_LIST_INIT_NODUP;
--
2.42.0.166.g68748eb9c8
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
2023-09-13 19:17 ` [PATCH v2 1/8] builtin/repack.c: extract structure to store existing packs Taylor Blau
2023-09-13 19:17 ` [PATCH v2 2/8] builtin/repack.c: extract marking packs for deletion Taylor Blau
@ 2023-09-13 19:17 ` Taylor Blau
2023-09-13 19:17 ` [PATCH v2 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs Taylor Blau
` (6 subsequent siblings)
9 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-13 19:17 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
To reduce the complexity of the already quite-long `cmd_repack()`
implementation, extract out the parts responsible for deleting redundant
packs from a geometric repack out into its own sub-routine.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 52 +++++++++++++++++++++++++++---------------------
1 file changed, 29 insertions(+), 23 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 0eee430951..71366811e9 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -540,6 +540,32 @@ static struct packed_git *get_preferred_pack(struct pack_geometry *geometry)
return NULL;
}
+static void geometry_remove_redundant_packs(struct pack_geometry *geometry,
+ struct string_list *names,
+ struct existing_packs *existing)
+{
+ struct strbuf buf = STRBUF_INIT;
+ uint32_t i;
+
+ for (i = 0; i < geometry->split; i++) {
+ struct packed_git *p = geometry->pack[i];
+ if (string_list_has_string(names, hash_to_hex(p->hash)))
+ continue;
+
+ strbuf_reset(&buf);
+ strbuf_addstr(&buf, pack_basename(p));
+ strbuf_strip_suffix(&buf, ".pack");
+
+ if ((p->pack_keep) ||
+ (string_list_has_string(&existing->kept_packs, buf.buf)))
+ continue;
+
+ remove_redundant_pack(packdir, buf.buf);
+ }
+
+ strbuf_release(&buf);
+}
+
static void free_pack_geometry(struct pack_geometry *geometry)
{
if (!geometry)
@@ -1202,29 +1228,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
remove_redundant_pack(packdir, item->string);
}
- if (geometry.split_factor) {
- struct strbuf buf = STRBUF_INIT;
-
- uint32_t i;
- for (i = 0; i < geometry.split; i++) {
- struct packed_git *p = geometry.pack[i];
- if (string_list_has_string(&names,
- hash_to_hex(p->hash)))
- continue;
-
- strbuf_reset(&buf);
- strbuf_addstr(&buf, pack_basename(p));
- strbuf_strip_suffix(&buf, ".pack");
-
- if ((p->pack_keep) ||
- (string_list_has_string(&existing.kept_packs,
- buf.buf)))
- continue;
-
- remove_redundant_pack(packdir, buf.buf);
- }
- strbuf_release(&buf);
- }
+ if (geometry.split_factor)
+ geometry_remove_redundant_packs(&geometry, &names,
+ &existing);
if (show_progress)
opts |= PRUNE_PACKED_VERBOSE;
prune_packed_objects(opts);
--
2.42.0.166.g68748eb9c8
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
` (2 preceding siblings ...)
2023-09-13 19:17 ` [PATCH v2 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric Taylor Blau
@ 2023-09-13 19:17 ` Taylor Blau
2023-09-13 19:17 ` [PATCH v2 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()` Taylor Blau
` (5 subsequent siblings)
9 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-13 19:17 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
To remove redundant packs at the end of a repacking operation, Git uses
its `remove_redundant_pack()` function in a loop over the set of
pre-existing, non-kept packs.
In a later commit, we will split this list into two, one for
pre-existing cruft pack(s), and another for non-cruft pack(s). Prepare
for this by factoring out the routine to loop over and delete redundant
packs into its own function.
Instead of calling `remove_redundant_pack()` directly, we now will call
`remove_redundant_existing_packs()`, which itself dispatches a call to
`remove_redundant_packs_1()`. Note that the geometric repacking code
will still call `remove_redundant_pack()` directly, but see the previous
commit for more details.
Having `remove_redundant_packs_1()` exist as a separate function may
seem like overkill in this patch. However, a later patch will call
`remove_redundant_packs_1()` once over two separate lists, so this
refactoring sets us up for that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 45 ++++++++++++++++++++++++++++-----------------
1 file changed, 28 insertions(+), 17 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 71366811e9..b5fb14c017 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -135,6 +135,33 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
mark_packs_for_deletion_1(names, &existing->non_kept_packs);
}
+static void remove_redundant_pack(const char *dir_name, const char *base_name)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
+ strbuf_addf(&buf, "%s.pack", base_name);
+ if (m && midx_contains_pack(m, buf.buf))
+ clear_midx_file(the_repository);
+ strbuf_insertf(&buf, 0, "%s/", dir_name);
+ unlink_pack_path(buf.buf, 1);
+ strbuf_release(&buf);
+}
+
+static void remove_redundant_packs_1(struct string_list *packs)
+{
+ struct string_list_item *item;
+ for_each_string_list_item(item, packs) {
+ if (!((uintptr_t)item->util & DELETE_PACK))
+ continue;
+ remove_redundant_pack(packdir, item->string);
+ }
+}
+
+static void remove_redundant_existing_packs(struct existing_packs *existing)
+{
+ remove_redundant_packs_1(&existing->non_kept_packs);
+}
+
static void existing_packs_release(struct existing_packs *existing)
{
string_list_clear(&existing->kept_packs, 0);
@@ -184,18 +211,6 @@ static void collect_pack_filenames(struct existing_packs *existing,
strbuf_release(&buf);
}
-static void remove_redundant_pack(const char *dir_name, const char *base_name)
-{
- struct strbuf buf = STRBUF_INIT;
- struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
- strbuf_addf(&buf, "%s.pack", base_name);
- if (m && midx_contains_pack(m, buf.buf))
- clear_midx_file(the_repository);
- strbuf_insertf(&buf, 0, "%s/", dir_name);
- unlink_pack_path(buf.buf, 1);
- strbuf_release(&buf);
-}
-
static void prepare_pack_objects(struct child_process *cmd,
const struct pack_objects_args *args,
const char *out)
@@ -1222,11 +1237,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (delete_redundant) {
int opts = 0;
- for_each_string_list_item(item, &existing.non_kept_packs) {
- if (!((uintptr_t)item->util & DELETE_PACK))
- continue;
- remove_redundant_pack(packdir, item->string);
- }
+ remove_redundant_existing_packs(&existing);
if (geometry.split_factor)
geometry_remove_redundant_packs(&geometry, &names,
--
2.42.0.166.g68748eb9c8
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()`
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
` (3 preceding siblings ...)
2023-09-13 19:17 ` [PATCH v2 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs Taylor Blau
@ 2023-09-13 19:17 ` Taylor Blau
2023-09-15 10:02 ` Christian Couder
2023-09-13 19:17 ` [PATCH v2 6/8] builtin/repack.c: store existing cruft packs separately Taylor Blau
` (4 subsequent siblings)
9 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2023-09-13 19:17 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
When there is:
- at least one pre-existing packfile (which is not marked as kept),
- repacking with the `-d` flag, and
- not doing a cruft repack
, then we pass a handful of additional options to the inner
`pack-objects` process, like `--unpack-unreachable`,
`--keep-unreachable`, and `--pack-loose-unreachable`, in addition to
marking any packs we just wrote for promisor remotes as kept in-core
(with `--keep-pack`, as opposed to the presence of a ".keep" file on
disk).
Because we store both cruft and non-cruft packs together in the same
`existing.non_kept_packs` list, it suffices to check its `nr` member to
see if it is zero or not.
But a following change will store cruft- and non-cruft packs separately,
meaning this check would break as a result. Prepare for this by
extracting this part of the check into a new helper function called
`has_existing_non_kept_packs()`.
This patch does not introduce any functional changes, but prepares us to
make a more isolated change in a subsequent patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index b5fb14c017..9ebc2e774b 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -105,6 +105,11 @@ struct existing_packs {
.non_kept_packs = STRING_LIST_INIT_DUP, \
}
+static int has_existing_non_kept_packs(const struct existing_packs *existing)
+{
+ return existing->non_kept_packs.nr;
+}
+
static void mark_packs_for_deletion_1(struct string_list *names,
struct string_list *list)
{
@@ -1047,7 +1052,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
if (pack_everything & ALL_INTO_ONE) {
repack_promisor_objects(&po_args, &names);
- if (existing.non_kept_packs.nr && delete_redundant &&
+ if (has_existing_non_kept_packs(&existing) &&
+ delete_redundant &&
!(pack_everything & PACK_CRUFT)) {
for_each_string_list_item(item, &names) {
strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack",
--
2.42.0.166.g68748eb9c8
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 6/8] builtin/repack.c: store existing cruft packs separately
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
` (4 preceding siblings ...)
2023-09-13 19:17 ` [PATCH v2 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()` Taylor Blau
@ 2023-09-13 19:17 ` Taylor Blau
2023-09-13 19:18 ` [PATCH v2 7/8] builtin/repack.c: avoid directly inspecting "util" Taylor Blau
` (3 subsequent siblings)
9 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-13 19:17 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
When repacking with the `--write-midx` option, we invoke the function
`midx_included_packs()` in order to produce the list of packs we want to
include in the resulting MIDX.
This list is comprised of:
- existing .keep packs
- any pack(s) which were written earlier in the same process
- any unchanged packs when doing a `--geometric` repack
- any cruft packs
Prior to this patch, we stored pre-existing cruft and non-cruft packs
together (provided those packs are non-kept). This meant we needed an
additional bit to indicate which non-kept pack(s) were cruft versus
those that aren't.
But alternatively we can store cruft packs in a separate list, avoiding
the need for this extra bit, and simplifying the code below.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 9ebc2e774b..8103a9d308 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -27,7 +27,6 @@
#define PACK_CRUFT 4
#define DELETE_PACK 1
-#define CRUFT_PACK 2
static int pack_everything;
static int delta_base_offset = 1;
@@ -98,16 +97,18 @@ static int repack_config(const char *var, const char *value,
struct existing_packs {
struct string_list kept_packs;
struct string_list non_kept_packs;
+ struct string_list cruft_packs;
};
#define EXISTING_PACKS_INIT { \
.kept_packs = STRING_LIST_INIT_DUP, \
.non_kept_packs = STRING_LIST_INIT_DUP, \
+ .cruft_packs = STRING_LIST_INIT_DUP, \
}
static int has_existing_non_kept_packs(const struct existing_packs *existing)
{
- return existing->non_kept_packs.nr;
+ return existing->non_kept_packs.nr || existing->cruft_packs.nr;
}
static void mark_packs_for_deletion_1(struct string_list *names,
@@ -138,6 +139,7 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
{
mark_packs_for_deletion_1(names, &existing->non_kept_packs);
+ mark_packs_for_deletion_1(names, &existing->cruft_packs);
}
static void remove_redundant_pack(const char *dir_name, const char *base_name)
@@ -165,12 +167,14 @@ static void remove_redundant_packs_1(struct string_list *packs)
static void remove_redundant_existing_packs(struct existing_packs *existing)
{
remove_redundant_packs_1(&existing->non_kept_packs);
+ remove_redundant_packs_1(&existing->cruft_packs);
}
static void existing_packs_release(struct existing_packs *existing)
{
string_list_clear(&existing->kept_packs, 0);
string_list_clear(&existing->non_kept_packs, 0);
+ string_list_clear(&existing->cruft_packs, 0);
}
/*
@@ -204,12 +208,10 @@ static void collect_pack_filenames(struct existing_packs *existing,
if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
string_list_append(&existing->kept_packs, buf.buf);
- else {
- struct string_list_item *item;
- item = string_list_append(&existing->non_kept_packs, buf.buf);
- if (p->is_cruft)
- item->util = (void*)(uintptr_t)CRUFT_PACK;
- }
+ else if (p->is_cruft)
+ string_list_append(&existing->cruft_packs, buf.buf);
+ else
+ string_list_append(&existing->non_kept_packs, buf.buf);
}
string_list_sort(&existing->kept_packs);
@@ -691,14 +693,11 @@ static void midx_included_packs(struct string_list *include,
string_list_insert(include, strbuf_detach(&buf, NULL));
}
- for_each_string_list_item(item, &existing->non_kept_packs) {
- if (!((uintptr_t)item->util & CRUFT_PACK)) {
- /*
- * no need to check DELETE_PACK, since we're not
- * doing an ALL_INTO_ONE repack
- */
- continue;
- }
+ for_each_string_list_item(item, &existing->cruft_packs) {
+ /*
+ * no need to check DELETE_PACK, since we're not
+ * doing an ALL_INTO_ONE repack
+ */
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
} else {
@@ -707,6 +706,12 @@ static void midx_included_packs(struct string_list *include,
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
+
+ for_each_string_list_item(item, &existing->cruft_packs) {
+ if ((uintptr_t)item->util & DELETE_PACK)
+ continue;
+ string_list_insert(include, xstrfmt("%s.idx", item->string));
+ }
}
}
@@ -835,6 +840,8 @@ static int write_cruft_pack(const struct pack_objects_args *args,
fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
+ for_each_string_list_item(item, &existing->cruft_packs)
+ fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->kept_packs)
fprintf(in, "%s.pack\n", item->string);
fclose(in);
--
2.42.0.166.g68748eb9c8
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 7/8] builtin/repack.c: avoid directly inspecting "util"
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
` (5 preceding siblings ...)
2023-09-13 19:17 ` [PATCH v2 6/8] builtin/repack.c: store existing cruft packs separately Taylor Blau
@ 2023-09-13 19:18 ` Taylor Blau
2023-09-13 19:18 ` [PATCH v2 8/8] builtin/repack.c: extract common cruft pack loop Taylor Blau
` (2 subsequent siblings)
9 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-13 19:18 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
The `->util` field corresponding to each string_list_item is used to
track the existence of some pack at the beginning of a repack operation
was originally intended to be used as a bitfield.
This bitfield tracked:
- (1 << 0): whether or not the pack should be deleted
- (1 << 1): whether or not the pack is cruft
The previous commit removed the use of the second bit, but a future
patch (from a different series than this one) will introduce a new use
of it.
So we could stop treating the util pointer as a bitfield and instead
start treating it as if it were a boolean. But this would require some
backtracking when that later patch is applied.
Instead, let's avoid touching the ->util field directly, and instead
introduce convenience functions like:
- pack_mark_for_deletion()
- pack_is_marked_for_deletion()
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 8103a9d308..e9a091fbbc 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -111,6 +111,16 @@ static int has_existing_non_kept_packs(const struct existing_packs *existing)
return existing->non_kept_packs.nr || existing->cruft_packs.nr;
}
+static void pack_mark_for_deletion(struct string_list_item *item)
+{
+ item->util = (void*)((uintptr_t)item->util | DELETE_PACK);
+}
+
+static int pack_is_marked_for_deletion(struct string_list_item *item)
+{
+ return (uintptr_t)item->util & DELETE_PACK;
+}
+
static void mark_packs_for_deletion_1(struct string_list *names,
struct string_list *list)
{
@@ -130,7 +140,7 @@ static void mark_packs_for_deletion_1(struct string_list *names,
* (if `-d` was given).
*/
if (!string_list_has_string(names, sha1))
- item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
+ pack_mark_for_deletion(item);
}
}
@@ -158,7 +168,7 @@ static void remove_redundant_packs_1(struct string_list *packs)
{
struct string_list_item *item;
for_each_string_list_item(item, packs) {
- if (!((uintptr_t)item->util & DELETE_PACK))
+ if (!pack_is_marked_for_deletion(item))
continue;
remove_redundant_pack(packdir, item->string);
}
@@ -702,13 +712,13 @@ static void midx_included_packs(struct string_list *include,
}
} else {
for_each_string_list_item(item, &existing->non_kept_packs) {
- if ((uintptr_t)item->util & DELETE_PACK)
+ if (pack_is_marked_for_deletion(item))
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
for_each_string_list_item(item, &existing->cruft_packs) {
- if ((uintptr_t)item->util & DELETE_PACK)
+ if (pack_is_marked_for_deletion(item))
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
--
2.42.0.166.g68748eb9c8
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 8/8] builtin/repack.c: extract common cruft pack loop
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
` (6 preceding siblings ...)
2023-09-13 19:18 ` [PATCH v2 7/8] builtin/repack.c: avoid directly inspecting "util" Taylor Blau
@ 2023-09-13 19:18 ` Taylor Blau
2023-09-13 19:44 ` [PATCH v2 0/8] repack: refactor pack snapshot-ing logic Junio C Hamano
2023-09-15 10:09 ` Christian Couder
9 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2023-09-13 19:18 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Patrick Steinhardt
When generating the list of packs to store in a MIDX (when given the
`--write-midx` option), we include any cruft packs both during
--geometric and non-geometric repacks.
But the rules for when we do and don't have to check whether any of
those cruft packs were queued for deletion differ slightly between the
two cases.
But the two can be unified, provided there is a little bit of extra
detail added in the comment to clarify when it is safe to avoid checking
for any pending deletions (and why it is OK to do so even when not
required).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index e9a091fbbc..529e13120d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -702,26 +702,31 @@ static void midx_included_packs(struct string_list *include,
string_list_insert(include, strbuf_detach(&buf, NULL));
}
-
- for_each_string_list_item(item, &existing->cruft_packs) {
- /*
- * no need to check DELETE_PACK, since we're not
- * doing an ALL_INTO_ONE repack
- */
- string_list_insert(include, xstrfmt("%s.idx", item->string));
- }
} else {
for_each_string_list_item(item, &existing->non_kept_packs) {
if (pack_is_marked_for_deletion(item))
continue;
string_list_insert(include, xstrfmt("%s.idx", item->string));
}
+ }
- for_each_string_list_item(item, &existing->cruft_packs) {
- if (pack_is_marked_for_deletion(item))
- continue;
- string_list_insert(include, xstrfmt("%s.idx", item->string));
- }
+ for_each_string_list_item(item, &existing->cruft_packs) {
+ /*
+ * When doing a --geometric repack, there is no need to check
+ * for deleted packs, since we're by definition not doing an
+ * ALL_INTO_ONE repack (hence no packs will be deleted).
+ * Otherwise we must check for and exclude any packs which are
+ * enqueued for deletion.
+ *
+ * So we could omit the conditional below in the --geometric
+ * case, but doing so is unnecessary since no packs are marked
+ * as pending deletion (since we only call
+ * `mark_packs_for_deletion()` when doing an all-into-one
+ * repack).
+ */
+ if (pack_is_marked_for_deletion(item))
+ continue;
+ string_list_insert(include, xstrfmt("%s.idx", item->string));
}
}
--
2.42.0.166.g68748eb9c8
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/8] repack: refactor pack snapshot-ing logic
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
` (7 preceding siblings ...)
2023-09-13 19:18 ` [PATCH v2 8/8] builtin/repack.c: extract common cruft pack loop Taylor Blau
@ 2023-09-13 19:44 ` Junio C Hamano
2023-09-14 0:07 ` Jeff King
2023-09-14 11:10 ` Christian Couder
2023-09-15 10:09 ` Christian Couder
9 siblings, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2023-09-13 19:44 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Patrick Steinhardt, Christian Couder
Taylor Blau <me@ttaylorr.com> writes:
> Here is a small reroll of my series to clean up some of the internals of
> 'git repack' used to track the set of existing packs.
>
> Much is unchanged from the last round, save for some additional clean-up
> on how we handle the '->util' field for each pack's string_list_item in
> response to very helpful review from those CC'd.
The change to [7/8] was as expected and looking good. Let's see if
we see additional reviews from others, plan to declare victory and
merge it to 'next' by early next week at the latest, if not sooner.
Christian, your cc/repack-sift-filtered-objects-to-separate-pack
topic will have to interact with this topic when merged to 'seen',
so it would be good for you to give a review on these patches (if
only to understand the new world order) and optionally make a trial
merge between the two to see how well they work together and what
adjustment will be needed when you eventually rebase your topic on
top. Actual rebasing can wait until this topic graduates, but trial
merge is something you can immediately do in the meantime to prepare
for the future.
Thanks.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/8] repack: refactor pack snapshot-ing logic
2023-09-13 19:44 ` [PATCH v2 0/8] repack: refactor pack snapshot-ing logic Junio C Hamano
@ 2023-09-14 0:07 ` Jeff King
2023-09-14 10:33 ` Patrick Steinhardt
2023-09-14 11:10 ` Christian Couder
1 sibling, 1 reply; 39+ messages in thread
From: Jeff King @ 2023-09-14 0:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Taylor Blau, git, Patrick Steinhardt, Christian Couder
On Wed, Sep 13, 2023 at 12:44:04PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Here is a small reroll of my series to clean up some of the internals of
> > 'git repack' used to track the set of existing packs.
> >
> > Much is unchanged from the last round, save for some additional clean-up
> > on how we handle the '->util' field for each pack's string_list_item in
> > response to very helpful review from those CC'd.
>
> The change to [7/8] was as expected and looking good. Let's see if
> we see additional reviews from others, plan to declare victory and
> merge it to 'next' by early next week at the latest, if not sooner.
This looks great to me. The motivation in the revised patch 7 is much
easier to follow, and the end result is much nicer to read. :)
-Peff
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/8] repack: refactor pack snapshot-ing logic
2023-09-14 0:07 ` Jeff King
@ 2023-09-14 10:33 ` Patrick Steinhardt
0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2023-09-14 10:33 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Taylor Blau, git, Christian Couder
[-- Attachment #1: Type: text/plain, Size: 992 bytes --]
On Wed, Sep 13, 2023 at 08:07:50PM -0400, Jeff King wrote:
> On Wed, Sep 13, 2023 at 12:44:04PM -0700, Junio C Hamano wrote:
>
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > Here is a small reroll of my series to clean up some of the internals of
> > > 'git repack' used to track the set of existing packs.
> > >
> > > Much is unchanged from the last round, save for some additional clean-up
> > > on how we handle the '->util' field for each pack's string_list_item in
> > > response to very helpful review from those CC'd.
> >
> > The change to [7/8] was as expected and looking good. Let's see if
> > we see additional reviews from others, plan to declare victory and
> > merge it to 'next' by early next week at the latest, if not sooner.
>
> This looks great to me. The motivation in the revised patch 7 is much
> easier to follow, and the end result is much nicer to read. :)
>
> -Peff
Agreed, the patch series loogs good to me now. Thanks!
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/8] repack: refactor pack snapshot-ing logic
2023-09-13 19:44 ` [PATCH v2 0/8] repack: refactor pack snapshot-ing logic Junio C Hamano
2023-09-14 0:07 ` Jeff King
@ 2023-09-14 11:10 ` Christian Couder
2023-09-14 18:01 ` Taylor Blau
1 sibling, 1 reply; 39+ messages in thread
From: Christian Couder @ 2023-09-14 11:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Taylor Blau, git, Jeff King, Patrick Steinhardt
On Wed, Sep 13, 2023 at 9:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Here is a small reroll of my series to clean up some of the internals of
> > 'git repack' used to track the set of existing packs.
> >
> > Much is unchanged from the last round, save for some additional clean-up
> > on how we handle the '->util' field for each pack's string_list_item in
> > response to very helpful review from those CC'd.
>
> The change to [7/8] was as expected and looking good. Let's see if
> we see additional reviews from others, plan to declare victory and
> merge it to 'next' by early next week at the latest, if not sooner.
>
> Christian, your cc/repack-sift-filtered-objects-to-separate-pack
> topic will have to interact with this topic when merged to 'seen',
> so it would be good for you to give a review on these patches (if
> only to understand the new world order) and optionally make a trial
> merge between the two to see how well they work together and what
> adjustment will be needed when you eventually rebase your topic on
> top. Actual rebasing can wait until this topic graduates, but trial
> merge is something you can immediately do in the meantime to prepare
> for the future.
Ok, I will try to review and merge this with
cc/repack-sift-filtered-objects-to-separate-pack soon.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/8] repack: refactor pack snapshot-ing logic
2023-09-14 11:10 ` Christian Couder
@ 2023-09-14 18:01 ` Taylor Blau
2023-09-15 5:56 ` Christian Couder
0 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2023-09-14 18:01 UTC (permalink / raw)
To: Christian Couder; +Cc: Junio C Hamano, git, Jeff King, Patrick Steinhardt
On Thu, Sep 14, 2023 at 01:10:38PM +0200, Christian Couder wrote:
> Ok, I will try to review and merge this with
> cc/repack-sift-filtered-objects-to-separate-pack soon.
I took a look at how much/little effort was going to be required, and
luckily the changes are isolated only to a single patch. It's just your
f1ffa71e8f (repack: add `--filter=<filter-spec>` option, 2023-09-11),
and in particular the `write_filtered_pack()` function.
I started messing around with it myself and generated the following
fixup! which can be applied on top of your version of f1ffa71e8f. It's
mostly straightforward, but there is a gotcha that the loop over
non-kept packs has to change to:
for_each_string_list_item(item, &existing->non_kept_packs)
/* ... */
for_each_string_list_item(item, &existing->cruft_packs)
/* ... */
, instead of just the first loop over non_kept_packs, since cruft packs
are stored in a separate list.
In any event, here's the fixup! I generated on top of that patch:
--- 8< ---
Subject: [PATCH] fixup! repack: add `--filter=<filter-spec>` option
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 120f4241c0..0d23323d05 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -834,15 +834,13 @@ static int finish_pack_objects_cmd(struct child_process *cmd,
static int write_filtered_pack(const struct pack_objects_args *args,
const char *destination,
const char *pack_prefix,
- struct string_list *keep_pack_list,
- struct string_list *names,
- struct string_list *existing_packs,
- struct string_list *existing_kept_packs)
+ struct existing_packs *existing,
+ struct string_list *names)
{
struct child_process cmd = CHILD_PROCESS_INIT;
struct string_list_item *item;
FILE *in;
- int ret, i;
+ int ret;
const char *caret;
const char *scratch;
int local = skip_prefix(destination, packdir, &scratch);
@@ -853,9 +851,8 @@ static int write_filtered_pack(const struct pack_objects_args *args,
if (!pack_kept_objects)
strvec_push(&cmd.args, "--honor-pack-keep");
- for (i = 0; i < keep_pack_list->nr; i++)
- strvec_pushf(&cmd.args, "--keep-pack=%s",
- keep_pack_list->items[i].string);
+ for_each_string_list_item(item, &existing->kept_packs)
+ strvec_pushf(&cmd.args, "--keep-pack=%s", item->string);
cmd.in = -1;
@@ -872,10 +869,12 @@ static int write_filtered_pack(const struct pack_objects_args *args,
in = xfdopen(cmd.in, "w");
for_each_string_list_item(item, names)
fprintf(in, "^%s-%s.pack\n", pack_prefix, item->string);
- for_each_string_list_item(item, existing_packs)
+ for_each_string_list_item(item, &existing->non_kept_packs)
+ fprintf(in, "%s.pack\n", item->string);
+ for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "%s.pack\n", item->string);
caret = pack_kept_objects ? "" : "^";
- for_each_string_list_item(item, existing_kept_packs)
+ for_each_string_list_item(item, &existing->kept_packs)
fprintf(in, "%s%s.pack\n", caret, item->string);
fclose(in);
@@ -1261,10 +1260,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
ret = write_filtered_pack(&po_args,
packtmp,
find_pack_prefix(packdir, packtmp),
- &keep_pack_list,
- &names,
- &existing_nonkept_packs,
- &existing_kept_packs);
+ &existing,
+ &names);
if (ret)
goto cleanup;
}
--
2.42.0.137.g6fe1dff026
--- >8 ---
Thanks,
Taylor
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/8] repack: refactor pack snapshot-ing logic
2023-09-14 18:01 ` Taylor Blau
@ 2023-09-15 5:56 ` Christian Couder
0 siblings, 0 replies; 39+ messages in thread
From: Christian Couder @ 2023-09-15 5:56 UTC (permalink / raw)
To: Taylor Blau; +Cc: Junio C Hamano, git, Jeff King, Patrick Steinhardt
On Thu, Sep 14, 2023 at 8:01 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Thu, Sep 14, 2023 at 01:10:38PM +0200, Christian Couder wrote:
> > Ok, I will try to review and merge this with
> > cc/repack-sift-filtered-objects-to-separate-pack soon.
>
> I took a look at how much/little effort was going to be required, and
> luckily the changes are isolated only to a single patch. It's just your
> f1ffa71e8f (repack: add `--filter=<filter-spec>` option, 2023-09-11),
> and in particular the `write_filtered_pack()` function.
>
> I started messing around with it myself and generated the following
> fixup! which can be applied on top of your version of f1ffa71e8f. It's
> mostly straightforward, but there is a gotcha that the loop over
> non-kept packs has to change to:
>
> for_each_string_list_item(item, &existing->non_kept_packs)
> /* ... */
> for_each_string_list_item(item, &existing->cruft_packs)
> /* ... */
>
> , instead of just the first loop over non_kept_packs, since cruft packs
> are stored in a separate list.
>
> In any event, here's the fixup! I generated on top of that patch:
Thanks a lot! I will very likely use it in the new version I will send
after your series has graduated.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()`
2023-09-13 19:17 ` [PATCH v2 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()` Taylor Blau
@ 2023-09-15 10:02 ` Christian Couder
0 siblings, 0 replies; 39+ messages in thread
From: Christian Couder @ 2023-09-15 10:02 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Patrick Steinhardt
On Thu, Sep 14, 2023 at 12:13 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> When there is:
>
> - at least one pre-existing packfile (which is not marked as kept),
> - repacking with the `-d` flag, and
> - not doing a cruft repack
>
> , then we pass a handful of additional options to the inner
Nit (not worth a reroll): I think the comma at the beginning of the
above line would be better after "not doing a cruft repack".
> `pack-objects` process, like `--unpack-unreachable`,
> `--keep-unreachable`, and `--pack-loose-unreachable`, in addition to
> marking any packs we just wrote for promisor remotes as kept in-core
> (with `--keep-pack`, as opposed to the presence of a ".keep" file on
> disk).
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/8] repack: refactor pack snapshot-ing logic
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
` (8 preceding siblings ...)
2023-09-13 19:44 ` [PATCH v2 0/8] repack: refactor pack snapshot-ing logic Junio C Hamano
@ 2023-09-15 10:09 ` Christian Couder
9 siblings, 0 replies; 39+ messages in thread
From: Christian Couder @ 2023-09-15 10:09 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Patrick Steinhardt
On Wed, Sep 13, 2023 at 9:17 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Here is a small reroll of my series to clean up some of the internals of
> 'git repack' used to track the set of existing packs.
>
> Much is unchanged from the last round, save for some additional clean-up
> on how we handle the '->util' field for each pack's string_list_item in
> response to very helpful review from those CC'd.
>
> As usual, a range-diff is available below for convenience. Thanks in
> advance for your review!
>
> Taylor Blau (8):
> builtin/repack.c: extract structure to store existing packs
> builtin/repack.c: extract marking packs for deletion
> builtin/repack.c: extract redundant pack cleanup for --geometric
> builtin/repack.c: extract redundant pack cleanup for existing packs
> builtin/repack.c: extract `has_existing_non_kept_packs()`
> builtin/repack.c: store existing cruft packs separately
> builtin/repack.c: avoid directly inspecting "util"
> builtin/repack.c: extract common cruft pack loop
I think it would be a bit nicer with s/builtin\/repack.c/repack/ in
all the above commit subjects, but I don't think it's worth a reroll.
Except for another very small nit in a commit message also not worth a
reroll, this LGTM.
Thanks,
Christian.
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2023-09-15 10:10 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-05 20:36 [PATCH 0/8] repack: refactor pack snapshot-ing logic Taylor Blau
2023-09-05 20:36 ` [PATCH 1/8] builtin/repack.c: extract structure to store existing packs Taylor Blau
2023-09-07 7:54 ` Jeff King
2023-09-05 20:36 ` [PATCH 2/8] builtin/repack.c: extract marking packs for deletion Taylor Blau
2023-09-07 7:59 ` Jeff King
2023-09-07 22:10 ` Taylor Blau
2023-09-05 20:36 ` [PATCH 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric Taylor Blau
2023-09-07 8:00 ` Jeff King
2023-09-05 20:36 ` [PATCH 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs Taylor Blau
2023-09-07 8:02 ` Jeff King
2023-09-05 20:36 ` [PATCH 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()` Taylor Blau
2023-09-07 8:04 ` Jeff King
2023-09-05 20:36 ` [PATCH 6/8] builtin/repack.c: store existing cruft packs separately Taylor Blau
2023-09-07 8:09 ` Jeff King
2023-09-05 20:36 ` [PATCH 7/8] builtin/repack.c: drop `DELETE_PACK` macro Taylor Blau
2023-09-06 17:05 ` Junio C Hamano
2023-09-06 17:22 ` Taylor Blau
2023-09-07 8:19 ` Patrick Steinhardt
2023-09-07 18:16 ` Junio C Hamano
2023-09-07 8:58 ` Jeff King
2023-09-05 20:37 ` [PATCH 8/8] builtin/repack.c: extract common cruft pack loop Taylor Blau
2023-09-07 9:00 ` [PATCH 0/8] repack: refactor pack snapshot-ing logic Jeff King
2023-09-13 19:17 ` [PATCH v2 " Taylor Blau
2023-09-13 19:17 ` [PATCH v2 1/8] builtin/repack.c: extract structure to store existing packs Taylor Blau
2023-09-13 19:17 ` [PATCH v2 2/8] builtin/repack.c: extract marking packs for deletion Taylor Blau
2023-09-13 19:17 ` [PATCH v2 3/8] builtin/repack.c: extract redundant pack cleanup for --geometric Taylor Blau
2023-09-13 19:17 ` [PATCH v2 4/8] builtin/repack.c: extract redundant pack cleanup for existing packs Taylor Blau
2023-09-13 19:17 ` [PATCH v2 5/8] builtin/repack.c: extract `has_existing_non_kept_packs()` Taylor Blau
2023-09-15 10:02 ` Christian Couder
2023-09-13 19:17 ` [PATCH v2 6/8] builtin/repack.c: store existing cruft packs separately Taylor Blau
2023-09-13 19:18 ` [PATCH v2 7/8] builtin/repack.c: avoid directly inspecting "util" Taylor Blau
2023-09-13 19:18 ` [PATCH v2 8/8] builtin/repack.c: extract common cruft pack loop Taylor Blau
2023-09-13 19:44 ` [PATCH v2 0/8] repack: refactor pack snapshot-ing logic Junio C Hamano
2023-09-14 0:07 ` Jeff King
2023-09-14 10:33 ` Patrick Steinhardt
2023-09-14 11:10 ` Christian Couder
2023-09-14 18:01 ` Taylor Blau
2023-09-15 5:56 ` Christian Couder
2023-09-15 10:09 ` Christian Couder
Code repositories for project(s) associated with this public inbox
https://80x24.org/pub/scm/git/git.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).