From: Derrick Stolee <derrickstolee@github.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] builtin/pack-objects.c: introduce `pack.extraCruftTips`
Date: Wed, 26 Apr 2023 06:52:26 -0400 [thread overview]
Message-ID: <97d84abc-9b25-9d43-503e-264b33c223d4@github.com> (raw)
In-Reply-To: <ZEhFScCoAgGGM/th@nand.local>
On 4/25/2023 5:25 PM, Taylor Blau wrote:
> On Tue, Apr 25, 2023 at 03:42:51PM -0400, Derrick Stolee wrote:
>> On 4/20/2023 1:27 PM, Taylor Blau wrote:
>>> The implementation is straightforward: after determining the set of
>>> objects to pack into the cruft pack (either by calling
>>> `enumerate_cruft_objects()` or `enumerate_and_traverse_cruft_objects()`)
>>> we figure out if there are any program(s) we need to consult in order to
>>> determine if there are any objects which we need to "rescue". We then
>>> add those as tips to another reachability traversal, marking every
>>> object along the way as cruft (and thus retaining it in the cruft pack).
>>
>> I initially wondered why we would need a second walk, and I _think_
>> one reason is that we don't want to update the mtimes for these
>> objects as if they were reachable. The next point about the reachable
>> pack is also critical.
>
> Their mtimes might be updated during that second walk, but the key point
> is that we want to rescue any objects that the extra tips might reach
> (but aren't listed themselves in the output of one of the hooks).
>
> The idea there is similar to the rescuing pass we do for pruning GC's,
> where we'll save any objects which would have been pruned, but are
> reachable from another object which won't yet be pruned.
>
>>> Instead, keep these unreachable objects in the cruft pack instead, to
>>> ensure that we can continue to have a pack containing just reachable
>>> objects, which is always safe to write a bitmap over.
>>
>> Makes sense. Also: don't update their mtime so they could be removed
>> immediately if the external pointers change.
>
> I don't think I'm quite following why we would want to leave their
> mtimes alone here. I think we'd want their mtimes to be accurate as of
> the last time we updated the *.mtimes file so that if they (a) no longer
> appear in the output of one of the extra-cruft hooks, and (b) they have
> been written recently, that they would not be pruned immediately.
You corrected me that we _are_ updating mtimes, so my understanding was
incorrect and this follow-up is incorrect. Thanks.
Is it possible to demonstrate this mtime-updating in a test?
>>> + if (progress)
>>> + progress_state = start_progress(_("Enumerating extra cruft tips"), 0);
>>
>> Should we be including two progress counts? or would it be sufficient
>> to say "traversing extra cruft objects" and include both of these steps
>> in that one progress mode?
>
> I may be missing something, but there are already two progress meters
> here: one (above) for enumerating the list of extra cruft tips, which
> increments each time we read a new line that refers to an extant object,
> and another (below) for the number of objects that we visit along the
> second walk.
I meant: why two counts? Should we instead only have one?
>>> + if (prepare_revision_walk(&revs))
>>> + die(_("revision walk setup failed"));
>>> + if (progress)
>>> + progress_state = start_progress(_("Traversing extra cruft objects"), 0);
>>> + nr_seen = 0;
>>> + traverse_commit_list(&revs, show_cruft_commit, show_cruft_object, NULL);
>>
>> Hm. I'm trying to look at these helpers and figure out what they
>> are doing to prevent these objects from being pruned. This could
>> use a comment or something, as I'm unable to grok it at present.
>
> Very fair, the show_cruft_object() callback calls
> add_cruft_object_entry(), which adds the visited object to the cruft
> pack regardless of its mtime with respect to pruning.
Ah, thanks. That helps a lot.
> A comment here might look like:
>
> /* retain all objects reachable from extra tips via show_cruft_object() */
>
> or something.
Sounds good.
>> Could we store this commit to make sure it is removed from the
>> repository later?
>
> Possibly, though I think we already have good coverage of this in other
> tests. So I'd be equally happy to assert on the exact contents of the
> cruft pack (which wouldn't include the above commit), but I'm happy to
> assert on it more directly if you feel strongly.
This is a case of me thinking "can we test the test?" to make sure
the test is doing everything we think it is doing...
>>> + git checkout --orphan old &&
>>> + git rm -fr . &&
>>> + test_commit --no-tag cruft.old &&
>>> + cruft_old="$(git rev-parse HEAD)" &&
>>> +
>>> + git checkout --orphan new &&
>>> + git rm -fr . &&
>>> + test_commit --no-tag cruft.new &&
>>> + cruft_new="$(git rev-parse HEAD)" &&
>>> +
>>> + git checkout main &&
>>> + git branch -D old new &&
>>
>> Do you need to delete the 'discard' branch here?
>
> Great catch. I didn't notice it here because even though it was
> reachable (and not mentioned as an extra tip), so didn't appear in the
> cruft pack.
...so we catch things like this automatically.
>>> + mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
>>> + git show-index <${mtimes%.mtimes}.idx >cruft &&
>>> + cut -d" " -f2 cruft | sort >cruft.actual &&
>>> + test_cmp cruft.expect cruft.actual
>>
>> One thing that would be good, as a safety check, is to remove
>> the pack.extraCruftTips config and re-run the repack and be
>> sure that we are pruning the objects previously saved by the
>> hook.
>
> Good call. I think this amounts to:
>
> --- 8< ---
> diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh
> index 1260e11d41..44852e6925 100755
> --- a/t/t5329-pack-objects-cruft.sh
> +++ b/t/t5329-pack-objects-cruft.sh
> @@ -773,7 +773,7 @@ test_expect_success 'additional cruft tips may be specified via pack.extraCruftT
> cruft_new="$(git rev-parse HEAD)" &&
>
> git checkout main &&
> - git branch -D old new &&
> + git branch -D discard old new &&
> git reflog expire --all --expire=all &&
>
> # mark cruft.old with an mtime that is many minutes
> @@ -802,6 +802,19 @@ test_expect_success 'additional cruft tips may be specified via pack.extraCruftT
> mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
> git show-index <${mtimes%.mtimes}.idx >cruft &&
> cut -d" " -f2 cruft | sort >cruft.actual &&
> + test_cmp cruft.expect cruft.actual &&
If we store "discard=$(git rev-parse discard)" earlier, we can also
check
test_must_fail git show $discard &&
> + # Ensure that the "old" objects are removed after
> + # dropping the pack.extraCruftTips hook.
> + git config --unset pack.extraCruftTips &&
> + git repack --cruft --cruft-expiration=now -d &&
> +
> + mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
> + git show-index <${mtimes%.mtimes}.idx >cruft &&
> + cut -d" " -f2 cruft | sort >cruft.actual &&
> +
> + git rev-list --objects --no-object-names $cruft_new >cruft.raw &&
I like how $cruft_new is still kept because its mtime is still
"later than now".
> + sort cruft.raw >cruft.expect &&
> test_cmp cruft.expect cruft.actual
> )
> '
> --- >8 ---
>
>> It would be helpful to include a test for the multi-value case where
>> multiple scripts are executed and maybe include an "exit 1" in some
>> of them?
>
> Definitely, that is a great suggestion (especially because it would have
> caught the "if (!ret)" bug that you pointed out above).
>
> Thanks for a thorough review. I'll wait for other feedback before
> re-rolling, or do so in a couple of days if I haven't heard anything.
Thanks,
-Stolee
next prev parent reply other threads:[~2023-04-26 10:53 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-20 17:27 [PATCH] builtin/pack-objects.c: introduce `pack.extraCruftTips` Taylor Blau
2023-04-20 18:12 ` Junio C Hamano
2023-04-20 19:30 ` Taylor Blau
2023-04-20 19:52 ` Junio C Hamano
2023-04-20 20:48 ` Taylor Blau
2023-04-21 0:10 ` Chris Torek
2023-04-21 2:14 ` Taylor Blau
2023-04-25 19:42 ` Derrick Stolee
2023-04-25 21:25 ` Taylor Blau
2023-04-26 10:52 ` Derrick Stolee [this message]
2023-05-03 0:06 ` Taylor Blau
2023-05-03 0:09 ` [PATCH v2] " Taylor Blau
2023-05-03 14:01 ` Derrick Stolee
2023-05-03 19:59 ` Jeff King
2023-05-03 21:22 ` Taylor Blau
2023-05-05 21:23 ` Jeff King
2023-05-06 0:06 ` Taylor Blau
2023-05-06 0:14 ` Taylor Blau
2023-05-03 21:28 ` Taylor Blau
2023-05-05 21:26 ` Jeff King
2023-05-05 22:13 ` Jeff King
2023-05-06 0:13 ` Taylor Blau
2023-05-06 0:20 ` Taylor Blau
2023-05-06 2:12 ` Jeff King
2023-05-03 22:05 ` [PATCH v3] " Taylor Blau
2023-05-03 23:18 ` Junio C Hamano
2023-05-03 23:42 ` Junio C Hamano
2023-05-03 23:48 ` Taylor Blau
2023-05-03 23:50 ` Taylor Blau
2023-05-05 21:39 ` Jeff King
2023-05-05 22:19 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97d84abc-9b25-9d43-503e-264b33c223d4@github.com \
--to=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).