Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Christian Couder <christian.couder@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	John Cai <johncai86@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH v4 3/3] gc: add gc.repackFilter config option
Date: Wed, 4 Jan 2023 15:57:03 +0100	[thread overview]
Message-ID: <Y7WTv19aqiFCU8au@ncase> (raw)
In-Reply-To: <20221221040446.2860985-4-christian.couder@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6314 bytes --]

On Wed, Dec 21, 2022 at 05:04:46AM +0100, Christian Couder wrote:
> A previous commit has implemented `git repack --filter=<filter-spec>` to
> allow users to remove objects that are available on a promisor remote
> but that they don't want locally.
> 
> Users might want to perform such a cleanup regularly at the same time as
> they perform other repacks and cleanups, so as part of `git gc`.
> 
> Let's allow them to configure a <filter-spec> for that purpose using a
> new gc.repackFilter config option.
> 
> Now when `git gc` will perform a repack with a <filter-spec> configured
> through this option and not empty, the repack process will be passed a
> corresponding `--filter=<filter-spec>` argument.
> 
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  Documentation/config/gc.txt |  9 +++++++++
>  builtin/gc.c                |  6 ++++++
>  t/t6500-gc.sh               | 19 +++++++++++++++++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
> index 38fea076a2..9359136f14 100644
> --- a/Documentation/config/gc.txt
> +++ b/Documentation/config/gc.txt
> @@ -130,6 +130,15 @@ or rebase occurring.  Since these changes are not part of the current
>  project most users will want to expire them sooner, which is why the
>  default is more aggressive than `gc.reflogExpire`.
>  
> +gc.repackFilter::
> +	When repacking, use the specified filter to omit certain
> +	objects from the resulting packfile. WARNING: this could
> +	easily corrupt the current repo and lose data if ANY of the
> +	omitted objects hasn't been already pushed to a remote. Be
> +	very careful about objects that might have been created
> +	locally! See the `--filter=<filter-spec>` option of
> +	linkgit:git-repack[1].

This limitation means that no user can ever configure this unless they
also set `gc.auto=0`, or otherwise Git might periodically delete their
local objects without any way to restore them.

I see though that this might be a useful thing to have for partial clone
repositories: right now they only ever grow in size, so it would be nice
if Git could gracefully cull that size every now and then. But for this
to be automatic I'd expect a few safeguards:

    - Given a remote with a partial clone filter we only ever delete
      objects that seem to exist on the remote. This might for example
      be determined via a graph-walk of the remote's references. This
      protects against deleting objects that only exist locally.

    - Objects that are referenced by any of the currently checked-out
      HEADs should not get puned. This protects against deleting objects
      that you'd have to re-download anyway.

    - It would be great to have a grace period after which objects get
      pruned. Given that we have no access times for packed objects
      though this doesn't seem that easy to implement though. Still,
      this would protect the user from objects getting deleted when they
      frequently switch between different commits.

From my point of view we should not expose a high-level interface around
git-gc(1) at this point in time because of these missing safeguards.
Otherwise it is only a footgun waiting to go off.

Until then I don't see much of a problem with exposing the low-level
interface via git-repack(1) though.

Patrick

>  gc.rerereResolved::
>  	Records of conflicted merge you resolved earlier are
>  	kept for this many days when 'git rerere gc' is run.
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 02455fdcd7..bf28619723 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -52,6 +52,7 @@ static timestamp_t gc_log_expire_time;
>  static const char *gc_log_expire = "1.day.ago";
>  static const char *prune_expire = "2.weeks.ago";
>  static const char *prune_worktrees_expire = "3.months.ago";
> +static char *repack_filter;
>  static unsigned long big_pack_threshold;
>  static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
>  
> @@ -161,6 +162,8 @@ static void gc_config(void)
>  	git_config_get_ulong("gc.bigpackthreshold", &big_pack_threshold);
>  	git_config_get_ulong("pack.deltacachesize", &max_delta_cache_size);
>  
> +	git_config_get_string("gc.repackfilter", &repack_filter);
> +
>  	git_config(git_default_config, NULL);
>  }
>  
> @@ -346,6 +349,9 @@ static void add_repack_all_option(struct string_list *keep_pack)
>  
>  	if (keep_pack)
>  		for_each_string_list(keep_pack, keep_one_pack, NULL);
> +
> +	if (repack_filter && *repack_filter)
> +		strvec_pushf(&repack, "--filter=%s", repack_filter);
>  }
>  
>  static void add_repack_incremental_option(void)
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index d9acb63951..b1492b521a 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -56,6 +56,7 @@ test_expect_success 'gc -h with invalid configuration' '
>  '
>  
>  test_expect_success 'gc is not aborted due to a stale symref' '
> +	test_when_finished "rm -rf remote client" &&
>  	git init remote &&
>  	(
>  		cd remote &&
> @@ -202,6 +203,24 @@ test_expect_success 'one of gc.reflogExpire{Unreachable,}=never does not skip "e
>  	grep -E "^trace: (built-in|exec|run_command): git reflog expire --" trace.out
>  '
>  
> +test_expect_success 'gc.repackFilter launches repack with a filter' '
> +	test_when_finished "rm -rf server client" &&
> +	test_create_repo server &&
> +	git -C server config uploadpack.allowFilter true &&
> +	git -C server config uploadpack.allowAnySHA1InWant true &&
> +	test_commit -C server 1 &&
> +	git clone --bare --no-local server client &&
> +	git -C client config remote.origin.promisor true &&
> +	git -C client rev-list --objects --all --missing=print >objects &&
> +	test $(grep -c "^?" objects) = 0 &&
> +
> +	GIT_TRACE=$(pwd)/trace.out git -C client -c gc.repackFilter=blob:none -c repack.writeBitmaps=false -c gc.pruneExpire=now gc &&
> +
> +	grep -E "^trace: (built-in|exec|run_command): git repack .* --filter=blob:none ?.*" trace.out &&
> +	git -C client rev-list --objects --all --missing=print >objects &&
> +	test $(grep -c "^?" objects) = 1
> +'
> +
>  prepare_cruft_history () {
>  	test_commit base &&
>  
> -- 
> 2.39.0.59.g395bcb85bc.dirty
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-01-04 14:57 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12 13:51 [PATCH 0/3] Implement filtering repacks Christian Couder
2022-10-12 13:51 ` [PATCH 1/3] pack-objects: allow --filter without --stdout Christian Couder
2022-10-12 13:51 ` [PATCH 2/3] repack: add --filter=<filter-spec> option Christian Couder
2022-10-12 13:51 ` [PATCH 3/3] repack: introduce --force to force filtering Christian Couder
2022-10-14 16:46 ` [PATCH 0/3] Implement filtering repacks Junio C Hamano
2022-10-20 11:23   ` Christian Couder
2022-10-28 19:49     ` Taylor Blau
2022-10-28 20:26       ` Junio C Hamano
2022-11-07  9:12         ` Christian Couder
2022-11-07  9:00       ` Christian Couder
2022-10-25 12:28 ` [PATCH v2 0/2] " Christian Couder
2022-10-25 12:28   ` [PATCH v2 1/2] pack-objects: allow --filter without --stdout Christian Couder
2022-10-25 12:28   ` [PATCH v2 2/2] repack: add --filter=<filter-spec> option Christian Couder
2022-10-28 19:54   ` [PATCH v2 0/2] Implement filtering repacks Taylor Blau
2022-11-07  9:29     ` Christian Couder
2022-11-22 17:51   ` [PATCH v3 " Christian Couder
2022-11-22 17:51     ` [PATCH v3 1/2] pack-objects: allow --filter without --stdout Christian Couder
2022-11-22 17:51     ` [PATCH v3 2/2] repack: add --filter=<filter-spec> option Christian Couder
2022-11-23  0:31     ` [PATCH v3 0/2] Implement filtering repacks Junio C Hamano
2022-12-21  3:53       ` Christian Couder
2022-11-23  0:35     ` Junio C Hamano
2022-12-21  4:04     ` [PATCH v4 0/3] " Christian Couder
2022-12-21  4:04       ` [PATCH v4 1/3] pack-objects: allow --filter without --stdout Christian Couder
2023-01-04 14:56         ` Patrick Steinhardt
2022-12-21  4:04       ` [PATCH v4 2/3] repack: add --filter=<filter-spec> option Christian Couder
2023-01-04 14:56         ` Patrick Steinhardt
2023-01-05  1:39           ` Junio C Hamano
2022-12-21  4:04       ` [PATCH v4 3/3] gc: add gc.repackFilter config option Christian Couder
2023-01-04 14:57         ` Patrick Steinhardt [this message]
2024-05-15 13:25 ` [PATCH v2 0/3] upload-pack: support a missing-action Christian Couder
2024-05-15 13:25   ` [PATCH v2 1/3] rev-list: refactor --missing=<missing-action> Christian Couder
2024-05-15 16:16     ` Junio C Hamano
2024-05-15 13:25   ` [PATCH v2 2/3] pack-objects: use the missing action API Christian Couder
2024-05-15 16:46     ` Junio C Hamano
2024-05-24 16:40       ` Christian Couder
2024-05-15 13:25   ` [PATCH v2 3/3] upload-pack: allow configuring a missing-action Christian Couder
2024-05-15 17:08     ` Junio C Hamano
2024-05-24 16:41       ` Christian Couder
2024-05-24 21:51         ` Junio C Hamano
2024-05-28 10:10           ` Christian Couder
2024-05-28 15:54             ` Junio C Hamano
2024-05-15 13:59   ` [PATCH v2 0/3] upload-pack: support " Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7WTv19aqiFCU8au@ncase \
    --to=ps@pks.im \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).