Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org, tytso@mit.edu, derrickstolee@github.com,
	gitster@pobox.com, larsxschneider@gmail.com
Subject: Re: [PATCH v3 01/17] Documentation/technical: add cruft-packs.txt
Date: Mon, 21 Mar 2022 21:16:24 -0400	[thread overview]
Message-ID: <YjkjaH61dMLHXr0d@nand.local> (raw)
In-Reply-To: <YiZI99yeijQe5Jaq@google.com>

On Mon, Mar 07, 2022 at 10:03:35AM -0800, Jonathan Nieder wrote:
> Sorry for the very slow review!  I've mentioned a few times that this
> overlaps in interesting ways with the gc mechanism described in
> hash-function-transition.txt, so I'd like to compare and see how they
> interact.

Sorry for my equally-slow reply ;). I was on vacation last week and
wasn't following the list closely.

> > +Unreachable objects aren't removed immediately, since doing so could race with
> > +an incoming push which may reference an object which is about to be deleted.
> > +Instead, those unreachable objects are stored as loose object and stay that way
> > +until they are older than the expiration window, at which point they are removed
> > +by linkgit:git-prune[1].
> > +
> > +Git must store these unreachable objects loose in order to keep track of their
> > +per-object mtimes.
>
> It's worth noting that this behavior is already racy.  That is because
> when an unreachable object becomes newly reachable, we do not update
> its mtime and the mtimes of every object reachable from it, so if it
> then becomes transiently unreachable again then it can be wrongly
> collected.

Just to be clear, the race here only happens if the object in question
becomes reachable _after_ a pruning GC determines its mtime. If that's
the case, then the object will indeed be wrongly collected. This is
consistent with the existing behavior (which is racy in the exact same
way).

(After re-reading what you wrote and my response, I think we are saying
the exact same thing, but it doesn't hurt to think aloud).

> > +
> > +== Cruft packs
> > +
> > +A cruft pack eliminates the need for storing unreachable objects in a loose
> > +state by including the per-object mtimes in a separate file alongside a single
> > +pack containing all loose objects.
>
> Can this doc say a little about how "git prune" handles these files?
> In particular, does a non cruft pack aware copy of Git (or JGit,
> libgit2, etc) do the right thing or does it fight with this mechanism?
> If the latter, do we have a repository extension (extensions.*) to
> prevent that?

I mentioned this in much more detail in [1], but the answer is that the
cruft pack looks like any other pack, it just happens to have another
metadata file (the .mtimes one) attached to it. So other implementations
of Git should treat it as they would any other pack. Like I mentioned in
[1], cruft packs were designed with the explicit goal of not requiring a
repository extension.

> > +  3. Write the pack out, along with a `.mtimes` file that records the per-object
> > +     timestamps.
>
> As a point of comparison, the design in hash-function-transition uses
> a single timestamp for the whole pack.  During read operations, objects
> in a cruft pack are considered present; during writes, they are
> considered _not present_ so that if we want to make a cruft object
> newly present then we put a copy of it in a new pack.
>
> Advantage of the mtimes file approach:
> - less duplication of storage: a revived object is only stored once,
>   in a cruft pack, and then the next gc can "graduate" it out of the
>   cruft pack and shrink the cruft pack
> - less affect on non-gc Git code: writes don't need to know that any
>   cruft objects referenced need to be copied into a new pack
>
> Advantages of the mtime per cruft pack approach:
> - easy expiration: once a cruft pack has reached its expiration date,
>   it can be deleted as a whole
> - less I/O churn: a cruft pack stays as-is until combined into another
>   cruft pack or deleted.  There is no frequently-modified mtimes file
>   associated to it
> - informs the storage layer about what is likely to be accessed: cruft
>   packs can get filesystem attributes to put them in less-optimized
>   storage since they are likely to be less frequently read
>
> [...]

The key advantage of cruft packs is that you can expire unreachable
objects in piecemeal while still retaining the benefit of being able to
de-duplicate cruft objects and store them packed against each other.

> > +Notable alternatives to this design include:
>
> This doesn't mention the approach described in
> hash-function-transition.txt (and that's already implemented and has
> been in use for many years in JGit's DfsRepository).  Does that mean
> you aren't aware of it?

Implementing the UNREACHABLE_GARBAGE concept from
hash-function-transition.txt in cruft pack-terms would be equivalent to
not writing the mtimes file at all. This follows from the fact that a
pre-cruft packs implementation of Git considers a packed object's mtime
to be the same as the pack it's contained in. (I'm deliberately
avoiding any details from the h-f-t document regarding re-writing
objects contained in a garbage pack here, since this is separate from
the pack structure itself (and could easily be implemented on top of
cruft packs)).

So I'm not sure what the alternative we'd list would be, since it
removes the key feature of the design of cruft packs.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/YiZMhuI%2FDdpvQ%2FED@nand.local/

  reply	other threads:[~2022-03-22  1:16 UTC|newest]

Thread overview: 201+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29 22:25 [PATCH 00/17] cruft packs Taylor Blau
2021-11-29 22:25 ` [PATCH 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2021-12-02 14:33   ` Derrick Stolee
2021-12-03 21:53     ` Taylor Blau
2021-12-04 22:20   ` Elijah Newren
2021-12-04 23:32     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2021-12-02 15:06   ` Derrick Stolee
2021-12-02 22:32     ` brian m. carlson
2021-12-03 22:24     ` Taylor Blau
2022-01-07 19:41       ` Taylor Blau
2021-11-29 22:25 ` [PATCH 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2021-11-29 22:25 ` [PATCH 04/17] chunk-format.h: extract oid_version() Taylor Blau
2021-12-02 15:22   ` Derrick Stolee
2021-12-03 22:40     ` Taylor Blau
2021-12-06 17:33       ` Derrick Stolee
2021-11-29 22:25 ` [PATCH 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2021-12-02 15:36   ` Derrick Stolee
2021-12-03 23:04     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2021-12-06 21:16   ` Derrick Stolee
2022-02-23 22:24     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2021-11-29 22:25 ` [PATCH 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2021-12-06 21:44   ` Derrick Stolee
2022-03-01  2:48     ` Taylor Blau
2021-12-07 15:17   ` Derrick Stolee
2022-02-23 23:34     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2021-11-29 22:25 ` [PATCH 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2021-11-29 22:25 ` [PATCH 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2021-12-07 15:30   ` Derrick Stolee
2022-02-23 23:35     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2021-12-05 20:46   ` Junio C Hamano
2022-03-01  2:00     ` Taylor Blau
2021-12-07 15:38   ` Derrick Stolee
2022-02-23 23:37     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2021-11-29 22:25 ` [PATCH 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2021-11-29 22:25 ` [PATCH 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2021-11-29 22:25 ` [PATCH 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2021-11-29 22:25 ` [PATCH 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2021-12-03 19:51 ` [PATCH 00/17] " Junio C Hamano
2021-12-03 20:08   ` Taylor Blau
2021-12-03 20:47     ` Taylor Blau
2022-03-02  0:57 ` [PATCH v2 " Taylor Blau
2022-03-02  0:58   ` [PATCH v2 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2022-03-02  0:58   ` [PATCH v2 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-03-02 20:22     ` Derrick Stolee
2022-03-02 21:33       ` Taylor Blau
2022-03-02  0:58   ` [PATCH v2 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2022-03-02  0:58   ` [PATCH v2 04/17] chunk-format.h: extract oid_version() Taylor Blau
2022-03-02  0:58   ` [PATCH v2 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2022-03-02  0:58   ` [PATCH v2 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2022-03-02  0:58   ` [PATCH v2 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2022-03-02  0:58   ` [PATCH v2 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2022-03-02  0:58   ` [PATCH v2 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2022-03-02 20:19     ` Derrick Stolee
2022-03-02 21:28       ` Taylor Blau
2022-03-02  0:58   ` [PATCH v2 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2022-03-02  0:58   ` [PATCH v2 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2022-03-02  7:42     ` Junio C Hamano
2022-03-02 15:54       ` Taylor Blau
2022-03-02 19:57         ` Derrick Stolee
2022-03-02  0:58   ` [PATCH v2 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2022-03-02  0:58   ` [PATCH v2 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2022-03-02  0:58   ` [PATCH v2 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2022-03-02  0:58   ` [PATCH v2 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2022-03-02  0:58   ` [PATCH v2 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2022-03-02  0:58   ` [PATCH v2 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2022-03-02 20:23   ` [PATCH v2 00/17] " Derrick Stolee
2022-03-02 21:36     ` Taylor Blau
2022-03-03  0:20 ` [PATCH v3 " Taylor Blau
2022-03-03  0:20   ` [PATCH v3 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2022-03-07 18:03     ` Jonathan Nieder
2022-03-22  1:16       ` Taylor Blau [this message]
2022-03-22 21:45         ` Jonathan Nieder
2022-03-22 22:02           ` Taylor Blau
2022-03-22 23:04             ` Jonathan Nieder
2022-03-23  1:01               ` Taylor Blau
2022-03-28 18:46                 ` Taylor Blau
2022-03-28 20:55                   ` Junio C Hamano
2022-03-28 21:21                     ` Taylor Blau
2022-03-29 15:59                       ` Junio C Hamano
2022-03-30  2:23                         ` Taylor Blau
2022-03-30 13:37                           ` Junio C Hamano
2022-03-30 17:30                             ` Taylor Blau
2022-03-03  0:20   ` [PATCH v3 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-03-03  0:20   ` [PATCH v3 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2022-03-03  0:20   ` [PATCH v3 04/17] chunk-format.h: extract oid_version() Taylor Blau
2022-03-03 16:30     ` Ævar Arnfjörð Bjarmason
2022-03-03 23:32       ` Taylor Blau
2022-03-04  0:16         ` Junio C Hamano
2022-03-03  0:20   ` [PATCH v3 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2022-03-03 16:45     ` Ævar Arnfjörð Bjarmason
2022-03-03 23:35       ` Taylor Blau
2022-03-04 10:40         ` Ævar Arnfjörð Bjarmason
2022-03-03  0:20   ` [PATCH v3 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2022-03-03  0:21   ` [PATCH v3 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2022-03-03  0:21   ` [PATCH v3 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2022-03-03  0:21   ` [PATCH v3 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2022-03-03  0:21   ` [PATCH v3 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2022-03-03  0:21   ` [PATCH v3 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2022-03-03  0:21   ` [PATCH v3 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2022-03-03  0:21   ` [PATCH v3 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2022-03-03  0:21   ` [PATCH v3 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2022-03-03  0:21   ` [PATCH v3 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2022-03-03  0:21   ` [PATCH v3 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2022-03-03  0:21   ` [PATCH v3 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2022-03-03  1:29   ` [PATCH v3 00/17] " Derrick Stolee
2022-05-18 23:10 ` [PATCH v4 " Taylor Blau
2022-05-18 23:10   ` [PATCH v4 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2022-05-19 14:04     ` Junio C Hamano
2022-05-18 23:10   ` [PATCH v4 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-05-19 10:40     ` Ævar Arnfjörð Bjarmason
2022-05-19 15:21       ` Junio C Hamano
2022-05-20  7:32         ` Ævar Arnfjörð Bjarmason
2022-05-20 22:37           ` Taylor Blau
2022-05-18 23:10   ` [PATCH v4 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2022-05-18 23:11   ` [PATCH v4 04/17] chunk-format.h: extract oid_version() Taylor Blau
2022-05-19 11:44     ` Ævar Arnfjörð Bjarmason
2022-05-18 23:11   ` [PATCH v4 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2022-05-18 23:11   ` [PATCH v4 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2022-05-18 23:11   ` [PATCH v4 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2022-05-18 23:11   ` [PATCH v4 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2022-05-19 10:04     ` Junio C Hamano
2022-05-19 15:16       ` Junio C Hamano
2022-05-20 22:52         ` Taylor Blau
2022-05-18 23:11   ` [PATCH v4 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2022-05-18 23:11   ` [PATCH v4 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2022-05-18 23:11   ` [PATCH v4 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2022-05-18 23:11   ` [PATCH v4 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2022-05-19 11:29     ` Ævar Arnfjörð Bjarmason
2022-05-20 22:39       ` Taylor Blau
2022-05-18 23:11   ` [PATCH v4 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2022-05-18 23:11   ` [PATCH v4 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2022-05-18 23:11   ` [PATCH v4 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2022-05-19 11:32     ` Ævar Arnfjörð Bjarmason
2022-05-20 22:42       ` Taylor Blau
2022-05-18 23:11   ` [PATCH v4 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2022-05-18 23:11   ` [PATCH v4 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2022-05-18 23:48   ` [PATCH v4 00/17] " Derrick Stolee
2022-05-20 23:19     ` Junio C Hamano
2022-05-20 23:30       ` Taylor Blau
2022-05-19 11:42   ` [RFC PATCH 0/2] Utility functions for duplicated pack(write) code Ævar Arnfjörð Bjarmason
2022-05-19 11:42     ` [RFC PATCH 1/2] packfile API: add and use a pack_name_to_ext() utility function Ævar Arnfjörð Bjarmason
2022-05-19 15:40       ` Junio C Hamano
2022-05-19 11:42     ` [RFC PATCH 2/2] hash API: add and use a hash_short_id_by_algo() function Ævar Arnfjörð Bjarmason
2022-05-19 15:50       ` Junio C Hamano
2022-05-19 19:07         ` Ævar Arnfjörð Bjarmason
2022-05-19 15:31     ` [RFC PATCH 0/2] Utility functions for duplicated pack(write) code Junio C Hamano
2022-05-19 11:54   ` [PATCH v4 00/17] cruft packs Ævar Arnfjörð Bjarmason
2022-05-20 23:17 ` [PATCH v5 " Taylor Blau
2022-05-20 23:17   ` [PATCH v5 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2022-05-20 23:17   ` [PATCH v5 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-05-24 19:32     ` Jonathan Nieder
2022-05-24 19:44       ` rsbecker
2022-05-24 22:25         ` Taylor Blau
2022-05-24 23:24           ` rsbecker
2022-05-25  0:07             ` Taylor Blau
2022-05-25  0:20               ` rsbecker
2022-05-25  9:11               ` adding new 32-bit on-disk (unsigned) timestamp formats (was: [PATCH v5 02/17] pack-mtimes: support reading .mtimes files) Ævar Arnfjörð Bjarmason
2022-05-25 13:30                 ` Derrick Stolee
2022-05-25 21:13                   ` Taylor Blau
2022-05-26  0:02                     ` Ævar Arnfjörð Bjarmason
2022-05-26  0:12                       ` Taylor Blau
2022-05-24 22:21       ` [PATCH v5 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-05-25  7:48         ` Jonathan Nieder
2022-05-25 21:36           ` Taylor Blau
2022-05-25 21:58             ` rsbecker
2022-05-25 22:59               ` Taylor Blau
2022-05-25 23:02     ` Taylor Blau
2022-05-26  0:30       ` Junio C Hamano
2023-06-01 13:01     ` Andreas Schwab
2022-05-20 23:17   ` [PATCH v5 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2022-05-20 23:17   ` [PATCH v5 04/17] chunk-format.h: extract oid_version() Taylor Blau
2022-05-20 23:17   ` [PATCH v5 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2022-05-20 23:17   ` [PATCH v5 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2022-05-20 23:17   ` [PATCH v5 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2022-05-20 23:17   ` [PATCH v5 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2022-05-20 23:17   ` [PATCH v5 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2022-05-20 23:17   ` [PATCH v5 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2022-05-20 23:18   ` [PATCH v5 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2022-05-20 23:18   ` [PATCH v5 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2022-05-20 23:18   ` [PATCH v5 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2022-05-20 23:18   ` [PATCH v5 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2022-05-20 23:18   ` [PATCH v5 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2022-05-20 23:18   ` [PATCH v5 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2022-06-19  5:38     ` René Scharfe
2022-06-21 15:58       ` Junio C Hamano
2022-05-20 23:18   ` [PATCH v5 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2022-05-21 11:17   ` [PATCH v5 00/17] " Ævar Arnfjörð Bjarmason
2022-05-24 19:39     ` Jonathan Nieder
2022-05-24 21:50       ` Taylor Blau
2022-05-24 21:55         ` Ævar Arnfjörð Bjarmason
2022-05-24 22:12           ` Taylor Blau
2022-05-25  7:53             ` Jonathan Nieder
2022-05-25 19:59               ` Derrick Stolee
2022-05-25 21:09                 ` Taylor Blau
2022-05-26  0:06                   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjkjaH61dMLHXr0d@nand.local \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=larsxschneider@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).