On Wed, Dec 21, 2022 at 05:04:46AM +0100, Christian Couder wrote: > A previous commit has implemented `git repack --filter=` to > allow users to remove objects that are available on a promisor remote > but that they don't want locally. > > Users might want to perform such a cleanup regularly at the same time as > they perform other repacks and cleanups, so as part of `git gc`. > > Let's allow them to configure a for that purpose using a > new gc.repackFilter config option. > > Now when `git gc` will perform a repack with a configured > through this option and not empty, the repack process will be passed a > corresponding `--filter=` argument. > > Signed-off-by: Christian Couder > --- > Documentation/config/gc.txt | 9 +++++++++ > builtin/gc.c | 6 ++++++ > t/t6500-gc.sh | 19 +++++++++++++++++++ > 3 files changed, 34 insertions(+) > > diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt > index 38fea076a2..9359136f14 100644 > --- a/Documentation/config/gc.txt > +++ b/Documentation/config/gc.txt > @@ -130,6 +130,15 @@ or rebase occurring. Since these changes are not part of the current > project most users will want to expire them sooner, which is why the > default is more aggressive than `gc.reflogExpire`. > > +gc.repackFilter:: > + When repacking, use the specified filter to omit certain > + objects from the resulting packfile. WARNING: this could > + easily corrupt the current repo and lose data if ANY of the > + omitted objects hasn't been already pushed to a remote. Be > + very careful about objects that might have been created > + locally! See the `--filter=` option of > + linkgit:git-repack[1]. This limitation means that no user can ever configure this unless they also set `gc.auto=0`, or otherwise Git might periodically delete their local objects without any way to restore them. I see though that this might be a useful thing to have for partial clone repositories: right now they only ever grow in size, so it would be nice if Git could gracefully cull that size every now and then. But for this to be automatic I'd expect a few safeguards: - Given a remote with a partial clone filter we only ever delete objects that seem to exist on the remote. This might for example be determined via a graph-walk of the remote's references. This protects against deleting objects that only exist locally. - Objects that are referenced by any of the currently checked-out HEADs should not get puned. This protects against deleting objects that you'd have to re-download anyway. - It would be great to have a grace period after which objects get pruned. Given that we have no access times for packed objects though this doesn't seem that easy to implement though. Still, this would protect the user from objects getting deleted when they frequently switch between different commits. From my point of view we should not expose a high-level interface around git-gc(1) at this point in time because of these missing safeguards. Otherwise it is only a footgun waiting to go off. Until then I don't see much of a problem with exposing the low-level interface via git-repack(1) though. Patrick > gc.rerereResolved:: > Records of conflicted merge you resolved earlier are > kept for this many days when 'git rerere gc' is run. > diff --git a/builtin/gc.c b/builtin/gc.c > index 02455fdcd7..bf28619723 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -52,6 +52,7 @@ static timestamp_t gc_log_expire_time; > static const char *gc_log_expire = "1.day.ago"; > static const char *prune_expire = "2.weeks.ago"; > static const char *prune_worktrees_expire = "3.months.ago"; > +static char *repack_filter; > static unsigned long big_pack_threshold; > static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE; > > @@ -161,6 +162,8 @@ static void gc_config(void) > git_config_get_ulong("gc.bigpackthreshold", &big_pack_threshold); > git_config_get_ulong("pack.deltacachesize", &max_delta_cache_size); > > + git_config_get_string("gc.repackfilter", &repack_filter); > + > git_config(git_default_config, NULL); > } > > @@ -346,6 +349,9 @@ static void add_repack_all_option(struct string_list *keep_pack) > > if (keep_pack) > for_each_string_list(keep_pack, keep_one_pack, NULL); > + > + if (repack_filter && *repack_filter) > + strvec_pushf(&repack, "--filter=%s", repack_filter); > } > > static void add_repack_incremental_option(void) > diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh > index d9acb63951..b1492b521a 100755 > --- a/t/t6500-gc.sh > +++ b/t/t6500-gc.sh > @@ -56,6 +56,7 @@ test_expect_success 'gc -h with invalid configuration' ' > ' > > test_expect_success 'gc is not aborted due to a stale symref' ' > + test_when_finished "rm -rf remote client" && > git init remote && > ( > cd remote && > @@ -202,6 +203,24 @@ test_expect_success 'one of gc.reflogExpire{Unreachable,}=never does not skip "e > grep -E "^trace: (built-in|exec|run_command): git reflog expire --" trace.out > ' > > +test_expect_success 'gc.repackFilter launches repack with a filter' ' > + test_when_finished "rm -rf server client" && > + test_create_repo server && > + git -C server config uploadpack.allowFilter true && > + git -C server config uploadpack.allowAnySHA1InWant true && > + test_commit -C server 1 && > + git clone --bare --no-local server client && > + git -C client config remote.origin.promisor true && > + git -C client rev-list --objects --all --missing=print >objects && > + test $(grep -c "^?" objects) = 0 && > + > + GIT_TRACE=$(pwd)/trace.out git -C client -c gc.repackFilter=blob:none -c repack.writeBitmaps=false -c gc.pruneExpire=now gc && > + > + grep -E "^trace: (built-in|exec|run_command): git repack .* --filter=blob:none ?.*" trace.out && > + git -C client rev-list --objects --all --missing=print >objects && > + test $(grep -c "^?" objects) = 1 > +' > + > prepare_cruft_history () { > test_commit base && > > -- > 2.39.0.59.g395bcb85bc.dirty >