Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Chris Torek <chris.torek@gmail.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3 2/2] builtin/pack-objects.c: introduce `pack.recentObjectsHook`
Date: Fri, 12 May 2023 17:45:42 -0400	[thread overview]
Message-ID: <20230512214542.GB2495860@coredump.intra.peff.net> (raw)
In-Reply-To: <20230512212456.GA2495860@coredump.intra.peff.net>

On Fri, May 12, 2023 at 05:24:56PM -0400, Jeff King wrote:

> > This patch introduces a new configuration, `pack.recentObjectsHook`
> > which allows the caller to specify a program (or set of programs) whose
> > output is treated as a set of objects to treat as recent, regardless of
> > their true age.
> 
> I was going to complain about putting this in the "pack" section,
> because I thought by touching reachable.c, we'd also affect git-prune.
> But I don't think we do, because it does its own direct mtime check on
> the loose objects.
> 
> But I'm not sure that's the right behavior.
> 
> It feels like even before your patch, this is a huge gap in our
> object-retention strategy.  During repacking, we try to avoid dropping
> objects which are reachable from recent-but-unreachable things we're
> keeping (since otherwise it effectively corrupts those recent objects,
> making them less valuable to keep). But git-prune will happily drop them
> anyway!
> 
> And I think the same thing would apply to your hook. If the hook says
> "object XYZ is precious even if unreachable, keep it", then git-prune
> ignoring that seems like it would be a source of errors.
> 
> I suspect both could be fixed by having git-prune trigger the same
> add_unseen_recent_objects_to_traversal() call either as part of
> the perform_reachability_traversal() walk, or maybe in its own walk (I
> think maybe it has to be its own because the second walk should avoid
> complaining about missing objects).

<phew> I am happy to say that I was wrong here, and git-prune behaves as
it should, courtesy of d3038d22f9 (prune: keep objects reachable from
recent objects, 2014-10-15). The magic happens in mark_reachable_objects(),
which handles walking the recent objects by calling...you guessed it,
add_unseen_recent_objects_to_traversal().

So it does the right thing now, and your patch should kick in
automatically for git-prune, too. But I think we'd want two things:

  1. Should the config variable name be made more generic to match?
     Maybe "core" is too broad (though certainly I'd expect it to apply
     anywhere in Git where we check recent-ness of objects), but perhaps
     "gc" would make sense (even though it is not strictly part of the
     gc command, it is within that realm of concepts).

  2. We probably want a test covering git-prune in this situation.

The thing that confused me when looking at the code earlier is that
git-prune itself checks the mtime of the objects, too, even if they were
not mentioned in the recent-but-reachable walk. That seems redundant to
me, but somehow it isn't. If I do:

diff --git a/builtin/prune.c b/builtin/prune.c
index 5dc9b20720..22b7ce4b10 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -92,7 +92,7 @@ static int prune_object(const struct object_id *oid, const char *fullpath,
 		return 0;
 	}
 	if (st.st_mtime > expire)
-		return 0;
+		BUG("object should have been saved via is_object_reachable!");
 	if (show_only || verbose) {
 		enum object_type type = oid_object_info(the_repository, oid,
 							NULL);

then t5304 shows some failures. I'm not quite sure what is going on, but
_if_ that mtime check in prune_object() is important, then it probably
should also respect the hook mechanism. So there may be a (3) to add to
your patch there.

-Peff

  parent reply	other threads:[~2023-05-12 21:45 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11 23:20 [PATCH v3 0/2] pack-objects: introduce `pack.extraRecentObjectsHook` Taylor Blau
2023-05-11 23:20 ` [PATCH v3 1/2] reachable.c: extract `obj_is_recent()` Taylor Blau
2023-05-11 23:20 ` [PATCH v3 2/2] builtin/pack-objects.c: introduce `pack.recentObjectsHook` Taylor Blau
2023-05-12  4:58   ` Jeff King
2023-05-15 20:15     ` Taylor Blau
2023-05-12 21:24   ` Jeff King
2023-05-12 21:36     ` Taylor Blau
2023-05-12 21:46       ` Jeff King
2023-05-12 21:45     ` Jeff King [this message]
2023-05-12 22:01       ` Jeff King
2023-05-12 23:21         ` Junio C Hamano
2023-05-13  0:11           ` Jeff King
2023-05-13  0:11             ` Jeff King
2023-05-15 20:49       ` Taylor Blau
2023-05-15 20:38     ` Taylor Blau
2023-05-11 23:23 ` [PATCH v3 0/2] pack-objects: introduce `pack.extraRecentObjectsHook` Taylor Blau
2023-05-11 23:39 ` Junio C Hamano
2023-05-11 23:48   ` Taylor Blau
2023-05-16  0:23 ` [PATCH v4 0/2] gc: introduce `gc.recentObjectsHook` Taylor Blau
2023-05-16  0:24   ` [PATCH v4 1/2] reachable.c: extract `obj_is_recent()` Taylor Blau
2023-05-16  0:24   ` [PATCH v4 2/2] gc: introduce `gc.recentObjectsHook` Taylor Blau
2023-05-24 23:21     ` Glen Choo
2023-06-07 22:56       ` Taylor Blau
2023-06-07 22:58 ` [PATCH v5 0/2] " Taylor Blau
2023-06-07 22:58   ` [PATCH v5 1/2] reachable.c: extract `obj_is_recent()` Taylor Blau
2023-06-07 22:58   ` [PATCH v5 2/2] gc: introduce `gc.recentObjectsHook` Taylor Blau
2023-06-09 23:33     ` Glen Choo
2023-06-12 21:14       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230512214542.GB2495860@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=chris.torek@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).