Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Glen Choo <chooglen@google.com>
Cc: git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>,
	Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	Chris Torek <chris.torek@gmail.com>,
	me@waleedkhan.name, martinvonz@google.com
Subject: Re: [PATCH v4 2/2] gc: introduce `gc.recentObjectsHook`
Date: Wed, 7 Jun 2023 18:56:32 -0400	[thread overview]
Message-ID: <ZIELIAQmW9qWyUdM@nand.local> (raw)
In-Reply-To: <kl6l1qj5z56p.fsf@chooglen-macbookpro.roam.corp.google.com>

Hi Glen,

Apologies for my delayed response. Now that we are on the other side of
2.41 I think that it's the right time to pick this back up again.

On Wed, May 24, 2023 at 04:21:34PM -0700, Glen Choo wrote:
> Hi Taylor! It was great seeing you at Review Club today :)

It was fun :-).

> It would be useful to flesh out why keeping these extra refs are
> either "undesirable" or "infeasible". Presumably, you already have
> some idea of why this is the case for the GitHub 'audit log'.

Yes, thanks for suggesting it. I think the gap in my explanation is
"[...] if there are many such objects". Hopefully that clarifies it.

> Another potential use case I can think of is client-side third party Git
> tools that implement custom workflows on top of a Git repo, e.g.
> git-branchless (https://github.com/arxanas/git-branchless) and jj
> (https://github.com/martinvonz/jj/, btw I contribute a little to jj
> too).

I thought that this was an interesting part of the discussion. I hadn't
thought of it when writing up these patches, but I think that it could
be potentially useful for those tools if they want to keep around some
precious set of metadata objects without having to point refs at them.

It also introduces a little bit of a higher barrier between the tool and
user to destroy those objects. Without pinning them with this hook, all
a user has to do to remove them is drop the reference(s) which points at
them, and then GC. Now they'd have to modify the hook, etc.

> > +gc.recentObjectsHook::
>
> I have a small preference to use "command" instead of "hook" to avoid
> confusion with Git hooks (I've already observed some of this confusion
> in off-list conversations). There's some precedent for "hook" in
> `uploadpack.packObjectsHook`, but that's a one-off (and it used to
> confuse me a lot too :P).

Unless you feel strongly, let's leave it as-is. "gc.recentObjectsHook"
is the third iteration of this name, and I'd like to avoid spending much
more time on naming if we can help it.

> > +	When considering the recency of an object (e.g., when generating
> > +	a cruft pack or storing unreachable objects as loose), use the
> > +	shell to execute the specified command(s). Interpret their
> > +	output as object IDs which Git will consider as "recent",
> > +	regardless of their age.
>
> >From a potential user's POV, two things seem unclear:
>
> - What does "recenct" mean in this context? Does it just mean
>   "precious"?
> - What objects do I need to list? Is it every object I want to keep or
>   just the reachable tips?

To answer your questions: recency is referring to the "mtime" of an
object [^1], not whether or not it is precious. I clarified this by
removing the term "recent" from this sentence altogether, to instead
read:

    "When considering whether or not to remove an object [...]"

You only need to list the tips, since Git will treat the output of the
hook as input to a traversal which allows for missing objects. Any
object visited along that traversal will be kept in the repository and
rescued from deletion. I tried to clarify this by adding a final
sentence (emphasis mine):

    "By treating their mtimes as "now", any objects **(and their
    descendants)** mentioned in the output will be kept regardless of
    their true age."

> In the code changes, I noticed a few out-of-date references to "cruft
> tips", but everything else looked reasonable to me.

Thanks, I'll clean those up and resubmit it with the above fixes
squashed in.

Thanks,
Taylor

[^1]: an object's mtime is the most recent of (1) the st_mtime of a its
  loose object file, if it exists, (2) the st_mtime of any non-cruft
  pack(s) that contain that object, or (3) the value listed in the .mtimes
  file of the cruft pack corresponding to that object's entry.

  reply	other threads:[~2023-06-07 22:57 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11 23:20 [PATCH v3 0/2] pack-objects: introduce `pack.extraRecentObjectsHook` Taylor Blau
2023-05-11 23:20 ` [PATCH v3 1/2] reachable.c: extract `obj_is_recent()` Taylor Blau
2023-05-11 23:20 ` [PATCH v3 2/2] builtin/pack-objects.c: introduce `pack.recentObjectsHook` Taylor Blau
2023-05-12  4:58   ` Jeff King
2023-05-15 20:15     ` Taylor Blau
2023-05-12 21:24   ` Jeff King
2023-05-12 21:36     ` Taylor Blau
2023-05-12 21:46       ` Jeff King
2023-05-12 21:45     ` Jeff King
2023-05-12 22:01       ` Jeff King
2023-05-12 23:21         ` Junio C Hamano
2023-05-13  0:11           ` Jeff King
2023-05-13  0:11             ` Jeff King
2023-05-15 20:49       ` Taylor Blau
2023-05-15 20:38     ` Taylor Blau
2023-05-11 23:23 ` [PATCH v3 0/2] pack-objects: introduce `pack.extraRecentObjectsHook` Taylor Blau
2023-05-11 23:39 ` Junio C Hamano
2023-05-11 23:48   ` Taylor Blau
2023-05-16  0:23 ` [PATCH v4 0/2] gc: introduce `gc.recentObjectsHook` Taylor Blau
2023-05-16  0:24   ` [PATCH v4 1/2] reachable.c: extract `obj_is_recent()` Taylor Blau
2023-05-16  0:24   ` [PATCH v4 2/2] gc: introduce `gc.recentObjectsHook` Taylor Blau
2023-05-24 23:21     ` Glen Choo
2023-06-07 22:56       ` Taylor Blau [this message]
2023-06-07 22:58 ` [PATCH v5 0/2] " Taylor Blau
2023-06-07 22:58   ` [PATCH v5 1/2] reachable.c: extract `obj_is_recent()` Taylor Blau
2023-06-07 22:58   ` [PATCH v5 2/2] gc: introduce `gc.recentObjectsHook` Taylor Blau
2023-06-09 23:33     ` Glen Choo
2023-06-12 21:14       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIELIAQmW9qWyUdM@nand.local \
    --to=me@ttaylorr.com \
    --cc=chooglen@google.com \
    --cc=chris.torek@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martinvonz@google.com \
    --cc=me@waleedkhan.name \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).