From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH v2] builtin/pack-objects.c: introduce `pack.extraCruftTips`
Date: Wed, 3 May 2023 15:59:06 -0400 [thread overview]
Message-ID: <20230503195906.GA1796816@coredump.intra.peff.net> (raw)
In-Reply-To: <73ad7b90e1fe6c15f41ff828651f7ab06076ffd8.1683072587.git.me@ttaylorr.com>
On Tue, May 02, 2023 at 08:09:47PM -0400, Taylor Blau wrote:
> However, there is no option to be able to keep around certain objects
> that have otherwise aged out of the grace period. The only way to retain
> those objects is:
>
> - to point a reference at them, which may be undesirable or
> infeasible,
> - to track them via the reflog, which may be undesirable since the
> reflog's lifetime is limited to that of the reference it's tracking
> (and callers may want to keep those unreachable objects around for
> longer)
> - to extend the grace period, which may keep around other objects that
> the caller *does* want to discard,
> - or, to force the caller to construct the pack of objects they want
> to keep themselves, and then mark the pack as kept by adding a
> ".keep" file.
OK, I understand the use case you're trying to support, and your
approach mostly makes sense. But there are two things I was surprised by
in the implementation:
1. Does this need to be tied to cruft packs? The same logic would
apply to "repack -A" which turns objects loose (of course you
probably don't want to do that in the long term for performance
reasons, but it's conceptually the same thing; see below).
2. Why is there a separate walk distinct from the one that rescues
recent-but-unreachable objects?
Conceptually it seems to me that the simplest and most flexible way to
think of this new feature is: pretend these objects are recent enough to
be kept in the grace period, even though their mtimes do not qualify".
And then everything else would just fall out naturally. Am I missing
something?
In a pre-cruft-pack world, I'd have just expected the patch to look like
this:
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index bd6ad016d6..1d655dc758 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4077,6 +4077,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
if (add_unseen_recent_objects_to_traversal(revs,
unpack_unreachable_expiration, NULL, 0))
die(_("unable to add recent objects"));
+ add_tips_from_program(revs);
if (prepare_revision_walk(revs))
die(_("revision walk setup failed"));
traverse_commit_list(revs, record_recent_commit,
Sadly the cruft-pack feature doesn't really share this code, but I think
it does something similar, and could just consider those synthetic tips
as "recent" for this run.
> +static int enumerate_extra_cruft_tips_1(struct rev_info *revs, const char *args)
> +{
> + struct child_process cmd = CHILD_PROCESS_INIT;
> + struct strbuf buf = STRBUF_INIT;
> + FILE *out = NULL;
> + int ret = 0;
> +
> + cmd.use_shell = 1;
> + cmd.out = -1;
> +
> + strvec_push(&cmd.args, args);
> + strvec_pushv(&cmd.env, (const char **)local_repo_env);
Why does this clear the environment of local-repo variables? That seems
unlike most other hooks we have, and would cause confusion if $GIT_DIR
or various config variables are important (e.g., should "git -c foo.bar
gc" persist foo.bar when running the hook? It usually does).
I know that some hooks that try to change repositories by changing
directories have the opposite confusion ($GIT_DIR is set, but they did
not want it). But it makes more sense to me to remain consistent with
how other hooks behave.
> + if (start_command(&cmd)) {
> + ret = -1;
> + goto done;
> + }
This may be a matter of taste, but you can "return -1" directly here, as
nothing has been allocated (and a failed start_command() will call
child_process_clear() for you). This would mean "out" is always set in
the "done:" label, so it wouldn't need a NULL-check (nor an
initialization).
> +
> + out = xfdopen(cmd.out, "r");
> + while (strbuf_getline_lf(&buf, out) != EOF) {
is there any reason to be a stickler for LF versus CRLF here? I.e.,
wouldn't strbuf_getline() be more friendly, with little chance that we
misinterpret the result?
> + struct object_id oid;
> + struct object *obj;
> + const char *rest;
> +
> + if (parse_oid_hex(buf.buf, &oid, &rest) || *rest) {
> + ret = error(_("invalid extra cruft tip: '%s'"), buf.buf);
> + goto done;
> + }
> +
> + obj = parse_object(the_repository, &oid);
> + if (!obj)
> + continue;
This parse_object() can be pretty expensive, especially if you are
rescuing a lot of objects (or if your program directly references large
blobs). Can we use oid_object_info() here in combination with
lookup_object_by_type()?
-Peff
next prev parent reply other threads:[~2023-05-03 19:59 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-20 17:27 [PATCH] builtin/pack-objects.c: introduce `pack.extraCruftTips` Taylor Blau
2023-04-20 18:12 ` Junio C Hamano
2023-04-20 19:30 ` Taylor Blau
2023-04-20 19:52 ` Junio C Hamano
2023-04-20 20:48 ` Taylor Blau
2023-04-21 0:10 ` Chris Torek
2023-04-21 2:14 ` Taylor Blau
2023-04-25 19:42 ` Derrick Stolee
2023-04-25 21:25 ` Taylor Blau
2023-04-26 10:52 ` Derrick Stolee
2023-05-03 0:06 ` Taylor Blau
2023-05-03 0:09 ` [PATCH v2] " Taylor Blau
2023-05-03 14:01 ` Derrick Stolee
2023-05-03 19:59 ` Jeff King [this message]
2023-05-03 21:22 ` Taylor Blau
2023-05-05 21:23 ` Jeff King
2023-05-06 0:06 ` Taylor Blau
2023-05-06 0:14 ` Taylor Blau
2023-05-03 21:28 ` Taylor Blau
2023-05-05 21:26 ` Jeff King
2023-05-05 22:13 ` Jeff King
2023-05-06 0:13 ` Taylor Blau
2023-05-06 0:20 ` Taylor Blau
2023-05-06 2:12 ` Jeff King
2023-05-03 22:05 ` [PATCH v3] " Taylor Blau
2023-05-03 23:18 ` Junio C Hamano
2023-05-03 23:42 ` Junio C Hamano
2023-05-03 23:48 ` Taylor Blau
2023-05-03 23:50 ` Taylor Blau
2023-05-05 21:39 ` Jeff King
2023-05-05 22:19 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230503195906.GA1796816@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).