Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, John Cai <johncai86@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [PATCH 0/3] Implement filtering repacks
Date: Fri, 28 Oct 2022 15:49:42 -0400	[thread overview]
Message-ID: <Y1wyVpHprWGxEDi/@nand.local> (raw)
In-Reply-To: <CAP8UFD2HX6rK4TRP6ynUzWn4eoHa1FrbiFOtxBaxX-ZkBF3FJw@mail.gmail.com>

On Thu, Oct 20, 2022 at 01:23:02PM +0200, Christian Couder wrote:
> On Fri, Oct 14, 2022 at 6:46 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Christian Couder <christian.couder@gmail.com> writes:
> >
> > > For example one might want to clone with a filter to avoid too many
> > > space to be taken by some large blobs, and one might realize after
> > > some time that a number of the large blobs have still be downloaded
> > > because some old branches referencing them were checked out. In this
> > > case a filtering repack could remove some of those large blobs.
> > >
> > > Some of the comments on the patch series that John sent were related
> > > to the possible data loss and repo corruption that a filtering repack
> > > could cause. It's indeed true that it could be very dangerous, and we
> > > agree that improvements were needed in this area.
> >
> > The wish is understandable, but I do not think this gives a good UI.
> >
> > This feature is, from an end-user's point of view, very similar to
> > "git prune-packed", in that we prune data that is not necessary due
> > to redundancy.  Nobody runs "prune-packed" directly; most people are
> > even unaware of it being run on their behalf when they run "git gc".
>
> I am Ok with adding the --filter option to `git gc`, or a config
> option with a similar effect. I wonder how `git gc` should implement
> that option though.
>
> If we implement a new command called for example `git filter-packed`,
> similar to `git prune-packed`, then this new command will call `git
> pack-objects --filter=...`.

Conceptually, yes, the two are similar. Though `prune-filtered` is
necessarily going to differ in implementation from `prune-packed`, since
we will have to write new pack(s), not just delete loose objects which
appear in packs already.

So it's really not just a matter of purely deleting redundant loose
copies of objects like in the case of prune-packed. Here we really do
care about potentially writing a new set of packs to satisfy the new
filter constraint.

Presumably that tool would implement creating the new packs according to
the given --filter, and would similarly delete existing packs. That is
basically what your implementation in repack already does, so I am not
sure what the difference would be.

> Yeah. So to sum up, it looks like you are Ok with `git gc
> --filter=...`  which is fine for me, even if I wonder if `git repack
> --filter=...` could be a good first step as it is less likely to be
> used automatically (so safer in a way) and it might be better for
> implementation related performance reasons.

If we don't intend to have `git repack --filter` part of our backwards
compatibility guarantee, then I would prefer to see the implementation
just live in git-gc from start to finish.

Thanks,
Taylor

  reply	other threads:[~2022-10-28 19:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12 13:51 [PATCH 0/3] Implement filtering repacks Christian Couder
2022-10-12 13:51 ` [PATCH 1/3] pack-objects: allow --filter without --stdout Christian Couder
2022-10-12 13:51 ` [PATCH 2/3] repack: add --filter=<filter-spec> option Christian Couder
2022-10-12 13:51 ` [PATCH 3/3] repack: introduce --force to force filtering Christian Couder
2022-10-14 16:46 ` [PATCH 0/3] Implement filtering repacks Junio C Hamano
2022-10-20 11:23   ` Christian Couder
2022-10-28 19:49     ` Taylor Blau [this message]
2022-10-28 20:26       ` Junio C Hamano
2022-11-07  9:12         ` Christian Couder
2022-11-07  9:00       ` Christian Couder
2022-10-25 12:28 ` [PATCH v2 0/2] " Christian Couder
2022-10-25 12:28   ` [PATCH v2 1/2] pack-objects: allow --filter without --stdout Christian Couder
2022-10-25 12:28   ` [PATCH v2 2/2] repack: add --filter=<filter-spec> option Christian Couder
2022-10-28 19:54   ` [PATCH v2 0/2] Implement filtering repacks Taylor Blau
2022-11-07  9:29     ` Christian Couder
2022-11-22 17:51   ` [PATCH v3 " Christian Couder
2022-11-22 17:51     ` [PATCH v3 1/2] pack-objects: allow --filter without --stdout Christian Couder
2022-11-22 17:51     ` [PATCH v3 2/2] repack: add --filter=<filter-spec> option Christian Couder
2022-11-23  0:31     ` [PATCH v3 0/2] Implement filtering repacks Junio C Hamano
2022-12-21  3:53       ` Christian Couder
2022-11-23  0:35     ` Junio C Hamano
2022-12-21  4:04     ` [PATCH v4 0/3] " Christian Couder
2022-12-21  4:04       ` [PATCH v4 1/3] pack-objects: allow --filter without --stdout Christian Couder
2023-01-04 14:56         ` Patrick Steinhardt
2022-12-21  4:04       ` [PATCH v4 2/3] repack: add --filter=<filter-spec> option Christian Couder
2023-01-04 14:56         ` Patrick Steinhardt
2023-01-05  1:39           ` Junio C Hamano
2022-12-21  4:04       ` [PATCH v4 3/3] gc: add gc.repackFilter config option Christian Couder
2023-01-04 14:57         ` Patrick Steinhardt
2024-05-15 13:25 ` [PATCH v2 0/3] upload-pack: support a missing-action Christian Couder
2024-05-15 13:25   ` [PATCH v2 1/3] rev-list: refactor --missing=<missing-action> Christian Couder
2024-05-15 16:16     ` Junio C Hamano
2024-05-15 13:25   ` [PATCH v2 2/3] pack-objects: use the missing action API Christian Couder
2024-05-15 16:46     ` Junio C Hamano
2024-05-15 13:25   ` [PATCH v2 3/3] upload-pack: allow configuring a missing-action Christian Couder
2024-05-15 17:08     ` Junio C Hamano
2024-05-15 13:59   ` [PATCH v2 0/3] upload-pack: support " Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1wyVpHprWGxEDi/@nand.local \
    --to=me@ttaylorr.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).