From: Christian Couder <christian.couder@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
John Cai <johncai86@gmail.com>,
Jonathan Tan <jonathantanmy@google.com>,
Jonathan Nieder <jrnieder@gmail.com>,
Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
Patrick Steinhardt <ps@pks.im>,
Christian Couder <christian.couder@gmail.com>
Subject: [PATCH 0/9] Repack objects into separate packfiles based on a filter
Date: Wed, 14 Jun 2023 21:25:32 +0200 [thread overview]
Message-ID: <20230614192541.1599256-1-christian.couder@gmail.com> (raw)
# Intro
Last year, John Cai sent 2 versions of a patch series to implement
`git repack --filter=<filter-spec>` and later I sent 4 versions of a
patch series trying to do it a bit differently:
- https://lore.kernel.org/git/pull.1206.git.git.1643248180.gitgitgadget@gmail.com/
- https://lore.kernel.org/git/20221012135114.294680-1-christian.couder@gmail.com/
In these patch series, the `--filter=<filter-spec>` removed the
filtered out objects altogether which was considered very dangerous
even though we implemented different safety checks in some of the
latter series.
In some discussions, it was mentioned that such a feature, or a
similar feature in `git gc`, or in a new standalone command (perhaps
called `git prune-filtered`), should put the filtered out objects into
a new packfile instead of deleting them.
Recently there were internal discussions at GitLab about either moving
blobs from inactive repos onto cheaper storage, or moving large blobs
onto cheaper storage. This lead us to rethink at repacking using a
filter, but moving the filtered out objects into a separate packfile
instead of deleting them.
So here is a new patch series doing that while implementing the
`--filter=<filter-spec>` option in `git repack`.
This could be useful for the following purposes:
- As a way for servers to save storage costs by for example moving
large blobs, or blobs in inactive repos, to separate storage
(while still making them accessible using for example the
alternates mechanism).
- As a way to use partial clone on a Git server to offload large
blobs to, for example, an http server, while using multiple
promisor remotes (to be able to access everything) on the client
side. (In this case the packfile that contains the filtered out
object can be manualy removed after checking that all the objects
it contains are available through the promisor remote.)
- As a way for clients to reclaim some space when they cloned with a
filter to save disk space but then fetched a lot of unwanted
objects (for example when checking out old branches) and now want
to remove these unwanted objects. (In this case they can first
move the packfile that contains filtered out objects to a separate
directory or storage, then check that everything works well, and
then manually remove the packfile after some time.)
As the features and the code are quite different from those in the
previous series, I decided to start a new series instead of continuing
a previous one.
# Commit overview
* 1/9 pack-objects: allow `--filter` without `--stdout`
This patch is the same as the first patch in the previous series. To
be able to later repack with a filter we need `git pack-objects` to
write packfiles when it's filtering instead of just writing the pack
without the filtered out objects to stdout.
* 2/9 pack-objects: add `--print-filtered` to print omitted objects
We need a way to know the objects that are filtered out of the
packfile generated by `git pack-objects --filter=<filter-spec>`. The
simplest way is to teach pack-objects to print their oids to stdout.
* 3/9 t/helper: add 'find-pack' test-tool
For testing `git repack --filter=...` that we are going to
implement, it's useful to have a test helper that can tell which
packfiles contain a specific object.
* - 4/9 repack: refactor piping an oid to a command
- 5/9 repack: refactor finishing pack-objects command
These are small refactorings so that `git repack --filter=...` will
be able to reuse useful existing functions.
* 6/9 repack: add `--filter=<filter-spec>` option
This actually adds the `--filter=<filter-spec>` option. It uses one
`git pack-objects` process with both the `--filter` and the
`--print-filtered` options. From this process it reads the oids of
the filtered out objects and pass them to a separate `git
pack-objects` process which will pack these objects into a separate
packfile.
* 7/9 gc: add `gc.repackFilter` config option
This is a gc config option so that `git gc` can also repack using a
filter and put the filtered out objects into a separate packfile.
* 8/9 repack: implement `--filter-to` for storing filtered out objects
For some use cases, it's interesting to create the packfile that
contains the filtered out objects into a separate location. This is
similar to the --expire-to option for cruft packfiles.
* 9/9 gc: add `gc.repackFilterTo` config option
This allows specifying the location of the packfile that contains
the filtered out objects when using `gc.repackFilter`.
Christian Couder (9):
pack-objects: allow `--filter` without `--stdout`
pack-objects: add `--print-filtered` to print omitted objects
t/helper: add 'find-pack' test-tool
repack: refactor piping an oid to a command
repack: refactor finishing pack-objects command
repack: add `--filter=<filter-spec>` option
gc: add `gc.repackFilter` config option
repack: implement `--filter-to` for storing filtered out objects
gc: add `gc.repackFilterTo` config option
Documentation/config/gc.txt | 11 ++
Documentation/git-pack-objects.txt | 14 ++-
Documentation/git-repack.txt | 11 ++
Makefile | 1 +
builtin/gc.c | 10 ++
builtin/pack-objects.c | 55 ++++++--
builtin/repack.c | 166 ++++++++++++++++++-------
t/helper/test-find-pack.c | 35 ++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t5317-pack-objects-filter-objects.sh | 27 ++++
t/t6500-gc.sh | 23 ++++
t/t7700-repack.sh | 43 +++++++
13 files changed, 345 insertions(+), 53 deletions(-)
create mode 100644 t/helper/test-find-pack.c
--
2.41.0.37.gae45d9845e
next reply other threads:[~2023-06-14 19:26 UTC|newest]
Thread overview: 161+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-14 19:25 Christian Couder [this message]
2023-06-14 19:25 ` [PATCH 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-06-21 10:49 ` Taylor Blau
2023-07-05 6:16 ` Christian Couder
2023-06-14 19:25 ` [PATCH 2/9] pack-objects: add `--print-filtered` to print omitted objects Christian Couder
2023-06-15 22:50 ` Junio C Hamano
2023-06-21 10:52 ` Taylor Blau
2023-06-21 11:11 ` Christian Couder
2023-06-21 11:54 ` Taylor Blau
2023-06-14 19:25 ` [PATCH 3/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-06-15 23:32 ` Junio C Hamano
2023-06-21 10:40 ` Christian Couder
2023-06-21 10:54 ` Taylor Blau
2023-06-14 19:25 ` [PATCH 4/9] repack: refactor piping an oid to a command Christian Couder
2023-06-15 23:46 ` Junio C Hamano
2023-06-21 10:55 ` Taylor Blau
2023-06-21 10:56 ` Christian Couder
2023-06-14 19:25 ` [PATCH 5/9] repack: refactor finishing pack-objects command Christian Couder
2023-06-16 0:13 ` Junio C Hamano
2023-06-21 11:06 ` Taylor Blau
2023-06-21 11:19 ` Christian Couder
2023-06-21 11:05 ` Taylor Blau
2023-06-14 19:25 ` [PATCH 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-06-16 0:43 ` Junio C Hamano
2023-06-21 11:20 ` Taylor Blau
2023-06-21 15:04 ` Christian Couder
2023-06-22 11:05 ` Taylor Blau
2023-06-21 14:40 ` Christian Couder
2023-06-21 16:53 ` Junio C Hamano
2023-06-22 8:39 ` Christian Couder
2023-06-22 18:32 ` Junio C Hamano
2023-06-21 11:17 ` Taylor Blau
2023-07-05 7:18 ` Christian Couder
2023-06-14 19:25 ` [PATCH 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-06-14 19:25 ` [PATCH 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-06-16 2:21 ` Junio C Hamano
2023-06-21 11:49 ` Taylor Blau
2023-06-21 12:08 ` Christian Couder
2023-06-21 12:25 ` Taylor Blau
2023-06-21 16:44 ` Junio C Hamano
2023-07-05 6:19 ` Christian Couder
2023-06-14 19:25 ` [PATCH 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-06-16 2:54 ` Junio C Hamano
2023-06-14 21:36 ` [PATCH 0/9] Repack objects into separate packfiles based on a filter Junio C Hamano
2023-06-16 3:08 ` Junio C Hamano
2023-07-05 6:08 ` [PATCH v2 0/8] " Christian Couder
2023-07-05 6:08 ` [PATCH v2 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-07-05 6:08 ` [PATCH v2 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-07-05 6:08 ` [PATCH v2 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-07-05 6:08 ` [PATCH v2 4/8] repack: refactor finding pack prefix Christian Couder
2023-07-05 6:08 ` [PATCH v2 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-07-05 17:53 ` Junio C Hamano
2023-07-24 9:01 ` Christian Couder
2023-07-24 18:28 ` Junio C Hamano
2023-07-25 15:22 ` Christian Couder
2023-07-25 17:25 ` Junio C Hamano
2023-07-25 23:08 ` Junio C Hamano
2023-08-08 8:45 ` Christian Couder
2023-08-09 20:38 ` Taylor Blau
2023-08-09 22:50 ` Junio C Hamano
2023-08-09 23:38 ` Junio C Hamano
2023-08-10 0:10 ` Jeff King
2023-07-05 18:12 ` Junio C Hamano
2023-07-24 9:02 ` Christian Couder
2023-07-05 6:08 ` [PATCH v2 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-07-05 6:08 ` [PATCH v2 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-07-05 18:26 ` Junio C Hamano
2023-07-24 9:00 ` Christian Couder
2023-07-24 18:18 ` Junio C Hamano
2023-07-25 13:41 ` Robert Coup
2023-07-25 16:50 ` Junio C Hamano
2023-07-25 15:45 ` Christian Couder
2023-07-05 6:08 ` [PATCH v2 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-07-24 8:59 ` [PATCH v3 0/8] Repack objects into separate packfiles based on a filter Christian Couder
2023-07-24 8:59 ` [PATCH v3 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-07-25 22:38 ` Taylor Blau
2023-07-25 23:51 ` Junio C Hamano
2023-07-24 8:59 ` [PATCH v3 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-07-25 22:44 ` Taylor Blau
2023-08-08 8:28 ` Christian Couder
2023-07-24 8:59 ` [PATCH v3 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-07-25 22:45 ` Taylor Blau
2023-07-24 8:59 ` [PATCH v3 4/8] repack: refactor finding pack prefix Christian Couder
2023-07-25 22:47 ` Taylor Blau
2023-08-08 8:29 ` Christian Couder
2023-07-24 8:59 ` [PATCH v3 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-07-25 23:04 ` Taylor Blau
2023-08-08 8:34 ` Christian Couder
2023-08-09 21:12 ` Taylor Blau
2023-07-24 8:59 ` [PATCH v3 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-07-25 23:07 ` Taylor Blau
2023-08-08 8:38 ` Christian Couder
2023-08-09 21:15 ` Taylor Blau
2023-07-24 8:59 ` [PATCH v3 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-07-24 8:59 ` [PATCH v3 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-07-25 23:10 ` [PATCH v3 0/8] Repack objects into separate packfiles based on a filter Taylor Blau
2023-08-08 8:26 ` [PATCH v4 " Christian Couder
2023-08-08 8:26 ` [PATCH v4 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-08-08 8:26 ` [PATCH v4 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-08-09 21:18 ` Taylor Blau
2023-08-08 8:26 ` [PATCH v4 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-08-08 8:26 ` [PATCH v4 4/8] repack: refactor finding pack prefix Christian Couder
2023-08-09 21:20 ` Taylor Blau
2023-08-08 8:26 ` [PATCH v4 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-08-09 21:40 ` Taylor Blau
2023-08-08 8:26 ` [PATCH v4 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-08-08 8:26 ` [PATCH v4 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-08-08 8:26 ` [PATCH v4 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-08-09 21:45 ` [PATCH v4 0/8] Repack objects into separate packfiles based on a filter Taylor Blau
2023-08-09 21:57 ` Junio C Hamano
2023-08-12 0:12 ` Christian Couder
2023-08-12 0:00 ` [PATCH v5 " Christian Couder
2023-08-12 0:00 ` [PATCH v5 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-08-12 0:00 ` [PATCH v5 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-08-12 0:00 ` [PATCH v5 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-08-12 0:00 ` [PATCH v5 4/8] repack: refactor finding pack prefix Christian Couder
2023-08-12 0:00 ` [PATCH v5 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-08-12 0:00 ` [PATCH v5 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-08-12 0:00 ` [PATCH v5 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-08-12 0:00 ` [PATCH v5 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-08-15 0:51 ` [PATCH v5 0/8] Repack objects into separate packfiles based on a filter Junio C Hamano
2023-08-15 21:43 ` Taylor Blau
2023-08-15 22:32 ` Junio C Hamano
2023-08-15 23:09 ` Taylor Blau
2023-08-15 23:18 ` Junio C Hamano
2023-08-16 0:38 ` Taylor Blau
2023-08-16 17:16 ` Junio C Hamano
2023-09-11 15:20 ` Christian Couder
2023-09-11 15:06 ` [PATCH v6 0/9] " Christian Couder
2023-09-11 15:06 ` [PATCH v6 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-09-11 15:06 ` [PATCH v6 2/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-09-11 15:06 ` [PATCH v6 3/9] repack: refactor finishing pack-objects command Christian Couder
2023-09-11 15:06 ` [PATCH v6 4/9] repack: refactor finding pack prefix Christian Couder
2023-09-11 15:06 ` [PATCH v6 5/9] pack-bitmap-write: rebuild using new bitmap when remapping Christian Couder
2023-09-11 15:06 ` [PATCH v6 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-09-11 15:06 ` [PATCH v6 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-09-11 15:06 ` [PATCH v6 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-09-11 15:06 ` [PATCH v6 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-09-25 15:25 ` [PATCH v7 0/9] Repack objects into separate packfiles based on a filter Christian Couder
2023-09-25 15:25 ` [PATCH v7 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-09-25 15:25 ` [PATCH v7 2/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-09-25 15:25 ` [PATCH v7 3/9] repack: refactor finishing pack-objects command Christian Couder
2023-09-25 15:25 ` [PATCH v7 4/9] repack: refactor finding pack prefix Christian Couder
2023-09-25 15:25 ` [PATCH v7 5/9] pack-bitmap-write: rebuild using new bitmap when remapping Christian Couder
2023-09-25 15:25 ` [PATCH v7 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-09-25 15:25 ` [PATCH v7 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-09-25 15:25 ` [PATCH v7 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-09-25 15:25 ` [PATCH v7 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-09-25 19:14 ` [PATCH v7 0/9] Repack objects into separate packfiles based on a filter Junio C Hamano
2023-09-25 22:41 ` Taylor Blau
2023-10-02 16:54 ` [PATCH v8 " Christian Couder
2023-10-02 16:54 ` [PATCH v8 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-10-02 16:54 ` [PATCH v8 2/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-10-02 16:54 ` [PATCH v8 3/9] repack: refactor finishing pack-objects command Christian Couder
2023-10-02 16:54 ` [PATCH v8 4/9] repack: refactor finding pack prefix Christian Couder
2023-10-02 16:55 ` [PATCH v8 5/9] pack-bitmap-write: rebuild using new bitmap when remapping Christian Couder
2023-10-02 16:55 ` [PATCH v8 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-10-02 16:55 ` [PATCH v8 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-10-02 16:55 ` [PATCH v8 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-10-02 16:55 ` [PATCH v8 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-10-02 20:14 ` [PATCH v8 0/9] Repack objects into separate packfiles based on a filter Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230614192541.1599256-1-christian.couder@gmail.com \
--to=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johncai86@gmail.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=me@ttaylorr.com \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).