Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Allow configuration for loose-objects maintenance task
@ 2025-03-24  0:51 Derrick Stolee via GitGitGadget
  2025-03-24  0:51 ` [PATCH 1/2] maintenance: force progress/no-quiet to children Derrick Stolee via GitGitGadget
  2025-03-24  0:51 ` [PATCH 2/2] maintenance: add loose-objects.batchSize config Derrick Stolee via GitGitGadget
  0 siblings, 2 replies; 4+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-03-24  0:51 UTC (permalink / raw)
  To: git; +Cc: gitster, Derrick Stolee

The loose-objects task for the git maintenance run command has a hard-coded
limit. The limit exists by default for the purposes of background
maintenance, but can be misleading if users truly want to clean up all loose
objects in one command (and don't want to use git repack). This adds a new
maintenance.loose-objects.batchSize config option to help users adjust this
value up or down.

When testing, I noticed that progress indicators were not always provided
when isatty(2) is false. This is because the --[no-]quiet option was not
appropriately passing to child processes. A small change fixes this before
the config is added, so we can test the results using stderr output.

Thanks,

 * Stolee

Derrick Stolee (2):
  maintenance: force progress/no-quiet to children
  maintenance: add loose-objects.batchSize config

 Documentation/config/maintenance.adoc |  5 +++++
 Documentation/git-maintenance.adoc    | 18 ++++++++++-------
 builtin/gc.c                          | 20 +++++++++++++++++++
 t/t7900-maintenance.sh                | 28 +++++++++++++++++++++++++++
 4 files changed, 64 insertions(+), 7 deletions(-)


base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1885%2Fderrickstolee%2Floose-objects-batch-size-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1885/derrickstolee/loose-objects-batch-size-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1885
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] maintenance: force progress/no-quiet to children
  2025-03-24  0:51 [PATCH 0/2] Allow configuration for loose-objects maintenance task Derrick Stolee via GitGitGadget
@ 2025-03-24  0:51 ` Derrick Stolee via GitGitGadget
  2025-03-24  0:51 ` [PATCH 2/2] maintenance: add loose-objects.batchSize config Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 4+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-03-24  0:51 UTC (permalink / raw)
  To: git; +Cc: gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <stolee@gmail.com>

The --no-quiet option for 'git maintenance run' is supposed to indicate
that progress should happen even while ignoring the value of isatty(2).
However, Git implicitly asks child processes to check isatty(2) since
these arguments are not passed through.

The pass through of --no-quiet will be useful in a test in the next
change.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 builtin/gc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 99431fd4674..6672f165bda 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1029,6 +1029,8 @@ static int run_write_commit_graph(struct maintenance_run_opts *opts)
 
 	if (opts->quiet)
 		strvec_push(&child.args, "--no-progress");
+	else
+		strvec_push(&child.args, "--progress");
 
 	return !!run_command(&child);
 }
@@ -1185,6 +1187,8 @@ static int pack_loose(struct maintenance_run_opts *opts)
 	strvec_push(&pack_proc.args, "pack-objects");
 	if (opts->quiet)
 		strvec_push(&pack_proc.args, "--quiet");
+	else
+		strvec_push(&pack_proc.args, "--no-quiet");
 	strvec_pushf(&pack_proc.args, "%s/pack/loose", r->objects->odb->path);
 
 	pack_proc.in = -1;
@@ -1263,6 +1267,8 @@ static int multi_pack_index_write(struct maintenance_run_opts *opts)
 
 	if (opts->quiet)
 		strvec_push(&child.args, "--no-progress");
+	else
+		strvec_push(&child.args, "--progress");
 
 	if (run_command(&child))
 		return error(_("failed to write multi-pack-index"));
@@ -1279,6 +1285,8 @@ static int multi_pack_index_expire(struct maintenance_run_opts *opts)
 
 	if (opts->quiet)
 		strvec_push(&child.args, "--no-progress");
+	else
+		strvec_push(&child.args, "--progress");
 
 	if (run_command(&child))
 		return error(_("'git multi-pack-index expire' failed"));
@@ -1335,6 +1343,8 @@ static int multi_pack_index_repack(struct maintenance_run_opts *opts)
 
 	if (opts->quiet)
 		strvec_push(&child.args, "--no-progress");
+	else
+		strvec_push(&child.args, "--progress");
 
 	strvec_pushf(&child.args, "--batch-size=%"PRIuMAX,
 				  (uintmax_t)get_auto_pack_size());
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] maintenance: add loose-objects.batchSize config
  2025-03-24  0:51 [PATCH 0/2] Allow configuration for loose-objects maintenance task Derrick Stolee via GitGitGadget
  2025-03-24  0:51 ` [PATCH 1/2] maintenance: force progress/no-quiet to children Derrick Stolee via GitGitGadget
@ 2025-03-24  0:51 ` Derrick Stolee via GitGitGadget
  2025-03-24  6:11   ` Junio C Hamano
  1 sibling, 1 reply; 4+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-03-24  0:51 UTC (permalink / raw)
  To: git; +Cc: gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <stolee@gmail.com>

The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.

Add a new config option that allows this limit to be adjusted or ignored
entirely.

While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 Documentation/config/maintenance.adoc |  5 +++++
 Documentation/git-maintenance.adoc    | 18 ++++++++++-------
 builtin/gc.c                          | 10 ++++++++++
 t/t7900-maintenance.sh                | 28 +++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 72a9d6cf816..42f9545da0e 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -61,6 +61,11 @@ maintenance.loose-objects.auto::
 	loose objects is at least the value of `maintenance.loose-objects.auto`.
 	The default value is 100.
 
+maintenance.loose-objects.batchSize::
+	This integer config option controls the maximum number of loose objects
+	written into a packfile during the `loose-objects` task. The default is
+	fifty thousand. Use value `0` to indicate no limit.
+
 maintenance.incremental-repack.auto::
 	This integer config option controls how often the `incremental-repack`
 	task should be run as part of `git maintenance run --auto`. If zero,
diff --git a/Documentation/git-maintenance.adoc b/Documentation/git-maintenance.adoc
index 0450d74aff1..c90b370b1fc 100644
--- a/Documentation/git-maintenance.adoc
+++ b/Documentation/git-maintenance.adoc
@@ -126,13 +126,17 @@ loose-objects::
 	objects that already exist in a pack-file; concurrent Git processes
 	will examine the pack-file for the object data instead of the loose
 	object. Second, it creates a new pack-file (starting with "loose-")
-	containing a batch of loose objects. The batch size is limited to 50
-	thousand objects to prevent the job from taking too long on a
-	repository with many loose objects. The `gc` task writes unreachable
-	objects as loose objects to be cleaned up by a later step only if
-	they are not re-added to a pack-file; for this reason it is not
-	advisable to enable both the `loose-objects` and `gc` tasks at the
-	same time.
+	containing a batch of loose objects.
++
+The batch size defaults to fifty thousand objects to prevent the job from
+taking too long on a repository with many loose objects. Use the
+`maintenance.loose-objects.batchSize` config option to adjust this size,
+including a value of `0` to remove the limit.
++
+The `gc` task writes unreachable objects as loose objects to be cleaned up
+by a later step only if they are not re-added to a pack-file; for this
+reason it is not advisable to enable both the `loose-objects` and `gc`
+tasks at the same time.
 
 incremental-repack::
 	The `incremental-repack` job repacks the object directory
diff --git a/builtin/gc.c b/builtin/gc.c
index 6672f165bda..817081e1a50 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1163,6 +1163,7 @@ static int write_loose_object_to_stdin(const struct object_id *oid,
 
 	fprintf(d->in, "%s\n", oid_to_hex(oid));
 
+	/* If batch_size is INT_MAX, then this will return 0 always. */
 	return ++(d->count) > d->batch_size;
 }
 
@@ -1208,6 +1209,15 @@ static int pack_loose(struct maintenance_run_opts *opts)
 	data.count = 0;
 	data.batch_size = 50000;
 
+	repo_config_get_int(r, "maintenance.loose-objects.batchSize",
+			    &data.batch_size);
+
+	/* If configured as 0, then remove limit. */
+	if (!data.batch_size)
+		data.batch_size = INT_MAX;
+	else if (data.batch_size > 0)
+		data.batch_size--; /* Decrease for equality on limit. */
+
 	for_each_loose_file_in_objdir(r->objects->odb->path,
 				      write_loose_object_to_stdin,
 				      NULL,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 1909aed95e0..834ddb5ad68 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -306,6 +306,34 @@ test_expect_success 'maintenance.loose-objects.auto' '
 	test_subcommand git prune-packed --quiet <trace-loC
 '
 
+test_expect_success 'maintenance.loose-objects.batchSize' '
+	git init loose-batch &&
+
+	# This creates three objects per commit.
+	test_commit_bulk -C loose-batch 34 &&
+	pack=$(ls loose-batch/.git/objects/pack/pack-*.pack) &&
+	index="${pack%pack}idx" &&
+	rm "$index" &&
+	git -C loose-batch unpack-objects <"$pack" &&
+	git -C loose-batch config maintenance.loose-objects.batchSize 50 &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	grep "Enumerating objects: 50, done." err &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	grep "Enumerating objects: 50, done." err &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	grep "Enumerating objects: 2, done." err &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	test_must_be_empty err
+'
+
 test_expect_success 'incremental-repack task' '
 	packDir=.git/objects/pack &&
 	for i in $(test_seq 1 5)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] maintenance: add loose-objects.batchSize config
  2025-03-24  0:51 ` [PATCH 2/2] maintenance: add loose-objects.batchSize config Derrick Stolee via GitGitGadget
@ 2025-03-24  6:11   ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2025-03-24  6:11 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> The 'loose-objects' task of 'git maintenance run' first deletes loose
> objects that exit within packfiles and then collects loose objects into

"exit" -> "exist"?  It may read better to also do "collects" ->
"collects remaining".

> a packfile. This second step uses an implicit limit of fifty thousand
> that cannot be modified by users.
>
> Add a new config option that allows this limit to be adjusted or ignored
> entirely.
>
> While creating tests for this option, I noticed that actually there was
> an off-by-one error due to the strict comparison in the limit check.

Ahh, I, contrary to my usual routine, started reading from the code
change before reading the proposed log message and was wondering
about this exact point.  

> +	/* If batch_size is INT_MAX, then this will return 0 always. */

Cute ;-).

>  	return ++(d->count) > d->batch_size;
>  }

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-03-24  6:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-24  0:51 [PATCH 0/2] Allow configuration for loose-objects maintenance task Derrick Stolee via GitGitGadget
2025-03-24  0:51 ` [PATCH 1/2] maintenance: force progress/no-quiet to children Derrick Stolee via GitGitGadget
2025-03-24  0:51 ` [PATCH 2/2] maintenance: add loose-objects.batchSize config Derrick Stolee via GitGitGadget
2025-03-24  6:11   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).