* [PATCH 00/13] midx: incremental multi-pack indexes, part two
@ 2024-08-15 21:01 Taylor Blau
2024-08-15 21:01 ` [PATCH 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
` (16 more replies)
0 siblings, 17 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
This series is based on 'master', with an additional merge between
tb/incremental-midx-part-1[1] and my newer series to fix a handful of
bugs related to pseudo-merge bitmaps[2].
This is the second of three series to implement support for incremental
multi-pack indexes (MIDXs). This series brings support for bitmaps that
are tied to incremental MIDXs in addition to regular MIDX bitmaps.
The details are laid out in the commits themselves, but the high-level
approach is as follows:
- Each layer in the incremental MIDX chain has its own corresponding
*.bitmap file. Each bitmap contains commits / pseudo-merges which
are selected only from the commits in that layer. Likewise, only
that layer's objects are written in the type-level bitmaps.
- The reachability traversal is only conducted on the top-most bitmap
corresponding to the most recent layer in the incremental MIDX
chain. Earlier layers may be consulted to retrieve commit /
pseudo-merge reachability bitmaps, but only the top-most bitmap's
"result" and "haves" fields are used.
- In essence, the top-most bitmap is the only one that "matters", and
earlier bitmaps are merely used to look up commit and pseudo-merge
bitmaps from that layer.
- Whenever we need to look at the type-level bitmaps corresponding to
the whole incremental MIDX chain, a new "ewah_or_iterator" is used.
This works in concept like a typical ewah_iterator, except works
over many EWAH bitmaps in parallel, OR-ing their results together
before returning them to the user.
In effect, this allows us to treat the union of all type-level
bitmaps (each of which only stores information about the objects its
corresponding layer within the incremental MIDX chain) as a single
type-level bitmap corresponding to all of the objects across every
layer of the incremental MIDX chain.
The sum total of this series is that we are able to append new commits /
pseudo-merges to a repository's reachability bitmaps without having to
rewrite existing bitmaps, making the operation much cheaper to perform
in large repositories.
The series is laid out roughly as follows:
- The first patch describes the technical details of incremental MIDX
bitmaps.
- The second patch adjusts the pack-revindex internals to prepare for
incremental MIDX bitmaps.
- The next seven patches adjust various components of the pack-bitmap
internals to do the same.
- The next three patches introduce and adjust callers to use the
ewah_or_iterator (as above).
- The final patch implements writing incremental MIDX bitmaps, and
introduces tests.
After this series, the remaining goals for this project include being
able to compact contiguous runs of incremental MIDX layers into a single
layer to support growing the chain of MIDX layers without the chain
itself becoming too long.
Thanks in advance for your review!
[1]: https://lore.kernel.org/git/cover.1722958595.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/cover.1723743050.git.me@ttaylorr.com/
Taylor Blau (13):
Documentation: describe incremental MIDX bitmaps
pack-revindex: prepare for incremental MIDX bitmaps
pack-bitmap.c: open and store incremental bitmap layers
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: keep track of each layer's type bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
midx: implement writing incremental MIDX bitmaps
Documentation/technical/multi-pack-index.txt | 64 ++++
builtin/pack-objects.c | 3 +-
ewah/ewah_bitmap.c | 33 ++
ewah/ewok.h | 12 +
midx-write.c | 35 +-
pack-bitmap-write.c | 65 +++-
pack-bitmap.c | 328 ++++++++++++++-----
pack-bitmap.h | 4 +-
pack-revindex.c | 32 +-
t/t5334-incremental-multi-pack-index.sh | 84 +++++
10 files changed, 548 insertions(+), 112 deletions(-)
base-commit: f28e4ef872c247b0b35f48b1c3d2c5f77753b908
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply [flat|nested] 136+ messages in thread
* [PATCH 01/13] Documentation: describe incremental MIDX bitmaps
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 02/13] pack-revindex: prepare for " Taylor Blau
` (15 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare to implement support for reachability bitmaps for the new
incremental multi-pack index (MIDX) feature over the following commits.
This commit begins by first describing the relevant format and usage
details for incremental MIDX bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/technical/multi-pack-index.txt | 64 ++++++++++++++++++++
1 file changed, 64 insertions(+)
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index cc063b30be..a063262c36 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -164,6 +164,70 @@ objects_nr($H2) + objects_nr($H1) + i
(in the C implementation, this is often computed as `i +
m->num_objects_in_base`).
+=== Pseudo-pack order for incremental MIDXs
+
+The original implementation of multi-pack reachability bitmaps defined
+the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
+titled "multi-pack-index reverse indexes") roughly as follows:
+
+____
+In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the MIDX, laid out in pack order, and the
+packs arranged in MIDX order (with the preferred pack coming first).
+____
+
+In the incremental MIDX design, we extend this definition to include
+objects from multiple layers of the MIDX chain. The pseudo-pack order
+for incremental MIDXs is determined by concatenating the pseudo-pack
+ordering for each layer of the MIDX chain in order. Formally two objects
+`o1` and `o2` are compared as follows:
+
+1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
+ `o1` is considered less than `o2`.
+2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
+ MIDX layer has no base, then If one of `pack(o1)` and `pack(o2)` is
+ preferred and the other is not, then the preferred one sorts first. If
+ there is a base layer (i.e. the MIDX layer is not the first layer in
+ the chain), then if `pack(o1)` appears earlier in that MIDX layer's
+ pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
+ appears earlier, than the opposite is true.
+3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
+ same MIDX layer. Sort `o1` and `o2` by their offset within their
+ containing packfile.
+
+=== Reachability bitmaps and incremental MIDXs
+
+Each layer of an incremental MIDX chain may have its objects (and the
+objects from any previous layer in the same MIDX chain) represented in
+its own `*.bitmap` file.
+
+The structure of a `*.bitmap` file belonging to an incremental MIDX
+chain is identical to that of a non-incremental MIDX bitmap, or a
+classic single-pack bitmap. Since objects are added to the end of the
+incremental MIDX's pseudo-pack order (see: above), it is possible to
+extend a bitmap when appending to the end of a MIDX chain.
+
+(Note: it is possible likewise to compress a contiguous sequence of MIDX
+incremental layers, and their `*.bitmap`(s) into a single layer and
+`*.bitmap`, but this is not yet implemented.)
+
+The object positions used are global within the pseudo-pack order, so
+subsequent layers will have, for example, `m->num_objects_in_base`
+number of `0` bits in each of their four type bitmaps. This follows from
+the fact that we only write type bitmap entries for objects present in
+the layer immediately corresponding to the bitmap).
+
+Note also that only the bitmap pertaining to the most recent layer in an
+incremental MIDX chain is used to store reachability information about
+the interesting and uninteresting objects in a reachability query.
+Earlier bitmap layers are only used to look up commit and pseudo-merge
+bitmaps from that layer, as well as the type-level bitmaps for objects
+in that layer.
+
+To simplify the implementation, type-level bitmaps are iterated
+simultaneously, and their results are OR'd together to avoid recursively
+calling internal bitmap functions.
+
Future Work
-----------
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
2024-08-15 21:01 ` [PATCH 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
` (14 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:
- load_midx_revindex() learns to use the appropriate MIDX filename
depending on whether the given 'struct multi_pack_index *' is
incremental or not.
- pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
object position in the MIDX pseudo-pack order, and finds the
earliest containing MIDX (similar to midx.c::midx_for_object().
- midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
number of objects in the base (since 'vb - midx->revindx_data' is
relative to the containing MIDX, and pack_pos_to_midx() expects a
global position).
Likewise, this function adjusts its output by adding
m->num_objects_in_base to return a global position out through the
`*pos` pointer.
Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 30 ++++++++++++++++++++----------
pack-revindex.c | 32 +++++++++++++++++++++++---------
2 files changed, 43 insertions(+), 19 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2e657a2aa4..0a7039d955 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
return read_bitmap(index->map, index->map_size, &index->map_pos);
}
+static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
+{
+ if (index->midx) {
+ struct multi_pack_index *m = index->midx;
+ return m->num_objects + m->num_objects_in_base;
+ }
+ return index->pack->num_objects;
+}
+
static uint32_t bitmap_num_objects(struct bitmap_index *index)
{
if (index->midx)
@@ -925,7 +934,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
if (pos < kh_end(positions)) {
int bitmap_pos = kh_value(positions, pos);
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
}
return -1;
@@ -993,7 +1002,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
bitmap_pos = kh_value(eindex->positions, hash_pos);
}
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
}
struct bitmap_show_data {
@@ -1498,7 +1507,8 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
struct object *obj;
- if (!bitmap_get(objects, st_add(bitmap_num_objects(bitmap_git), i)))
+ if (!bitmap_get(objects,
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
continue;
obj = eindex->objects[i];
@@ -1677,7 +1687,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* them individually.
*/
for (i = 0; i < eindex->count; i++) {
- size_t pos = st_add(i, bitmap_num_objects(bitmap_git));
+ size_t pos = st_add(i, bitmap_non_extended_bits(bitmap_git));
if (eindex->objects[i]->type == type &&
bitmap_get(to_filter, pos) &&
!bitmap_get(tips, pos))
@@ -1703,7 +1713,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
oi.sizep = &size;
- if (pos < bitmap_num_objects(bitmap_git)) {
+ if (pos < bitmap_non_extended_bits(bitmap_git)) {
struct packed_git *pack;
off_t ofs;
@@ -1726,7 +1736,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
}
} else {
struct eindex *eindex = &bitmap_git->ext_index;
- struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
+ struct object *obj = eindex->objects[pos - bitmap_non_extended_bits(bitmap_git)];
if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
}
@@ -1878,7 +1888,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t objects_nr;
size_t i, pos;
- objects_nr = bitmap_num_objects(bitmap_git);
+ objects_nr = bitmap_non_extended_bits(bitmap_git);
pos = objects_nr / BITS_IN_EWORD;
if (pos > result->word_alloc)
@@ -2399,7 +2409,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
if (eindex->objects[i]->type == type &&
bitmap_get(objects,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
count++;
}
@@ -2798,7 +2808,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
BUG("rebuild_existing_bitmaps: missing required rev-cache "
"extension");
- num_objects = bitmap_num_objects(bitmap_git);
+ num_objects = bitmap_non_extended_bits(bitmap_git);
CALLOC_ARRAY(reposition, num_objects);
for (i = 0; i < num_objects; ++i) {
@@ -2941,7 +2951,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
struct object *obj = eindex->objects[i];
if (!bitmap_get(result,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
continue;
if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
diff --git a/pack-revindex.c b/pack-revindex.c
index 22d3c23464..ce3f7ae214 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -383,8 +383,12 @@ int load_midx_revindex(struct multi_pack_index *m)
trace2_data_string("load_midx_revindex", the_repository,
"source", "rev");
- get_midx_filename_ext(&revindex_name, m->object_dir,
- get_midx_checksum(m), MIDX_EXT_REV);
+ if (m->has_chain)
+ get_split_midx_filename_ext(&revindex_name, m->object_dir,
+ get_midx_checksum(m), MIDX_EXT_REV);
+ else
+ get_midx_filename_ext(&revindex_name, m->object_dir,
+ get_midx_checksum(m), MIDX_EXT_REV);
ret = load_revindex_from_disk(revindex_name.buf,
m->num_objects,
@@ -471,11 +475,15 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
{
+ while (m && pos < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
if (!m->revindex_data)
BUG("pack_pos_to_midx: reverse index not yet loaded");
- if (m->num_objects <= pos)
+ if (m->num_objects + m->num_objects_in_base <= pos)
BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
- return get_be32(m->revindex_data + pos);
+ return get_be32(m->revindex_data + pos - m->num_objects_in_base);
}
struct midx_pack_key {
@@ -491,7 +499,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
const struct midx_pack_key *key = va;
struct multi_pack_index *midx = key->midx;
- uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+ size_t pos = (uint32_t*)vb - (const uint32_t *)midx->revindex_data;
+ uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
off_t versus_offset;
@@ -529,9 +538,9 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
{
uint32_t *found;
- if (key->pack >= m->num_packs)
+ if (key->pack >= m->num_packs + m->num_packs_in_base)
BUG("MIDX pack lookup out of bounds (%"PRIu32" >= %"PRIu32")",
- key->pack, m->num_packs);
+ key->pack, m->num_packs + m->num_packs_in_base);
/*
* The preferred pack sorts first, so determine its identifier by
* looking at the first object in pseudo-pack order.
@@ -551,7 +560,8 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
if (!found)
return -1;
- *pos = found - m->revindex_data;
+ *pos = (found - m->revindex_data) + m->num_objects_in_base;
+
return 0;
}
@@ -559,9 +569,13 @@ int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
{
struct midx_pack_key key;
+ while (m && at < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, at);
if (!m->revindex_data)
BUG("midx_to_pack_pos: reverse index not yet loaded");
- if (m->num_objects <= at)
+ if (m->num_objects + m->num_objects_in_base <= at)
BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
key.pack = nth_midxed_pack_int_id(m, at);
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 03/13] pack-bitmap.c: open and store incremental bitmap layers
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
2024-08-15 21:01 ` [PATCH 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
2024-08-15 21:01 ` [PATCH 02/13] pack-revindex: prepare for " Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
` (13 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.
The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain bitmap layers along the "base" pointer,
ensures that the correct packs and their reverse indexes are loaded
across MIDX layers, etc.
While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.
Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 63 ++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 50 insertions(+), 13 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 0a7039d955..c27383c027 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -54,6 +54,13 @@ struct bitmap_index {
struct packed_git *pack;
struct multi_pack_index *midx;
+ /*
+ * If using a multi-pack index chain, 'base' points to the
+ * bitmap index corresponding to this bitmap's midx->base_midx.
+ */
+ struct bitmap_index *base;
+ uint32_t base_nr;
+
/* mmapped buffer of the whole bitmap index */
unsigned char *map;
size_t map_size; /* size of the mmaped buffer */
@@ -377,8 +384,13 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
char *midx_bitmap_filename(struct multi_pack_index *midx)
{
struct strbuf buf = STRBUF_INIT;
- get_midx_filename_ext(&buf, midx->object_dir, get_midx_checksum(midx),
- MIDX_EXT_BITMAP);
+ if (midx->has_chain)
+ get_split_midx_filename_ext(&buf, midx->object_dir,
+ get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(&buf, midx->object_dir,
+ get_midx_checksum(midx), MIDX_EXT_BITMAP);
return strbuf_detach(&buf, NULL);
}
@@ -397,10 +409,17 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
{
struct stat st;
char *bitmap_name = midx_bitmap_filename(midx);
- int fd = git_open(bitmap_name);
+ int fd;
uint32_t i, preferred_pack;
struct packed_git *preferred;
+ fd = git_open(bitmap_name);
+ if (fd < 0 && errno == ENOENT) {
+ FREE_AND_NULL(bitmap_name);
+ bitmap_name = midx_bitmap_filename(midx);
+ fd = git_open(bitmap_name);
+ }
+
if (fd < 0) {
if (errno != ENOENT)
warning_errno("cannot open '%s'", bitmap_name);
@@ -446,7 +465,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
goto cleanup;
}
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+ for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
if (prepare_midx_pack(the_repository, bitmap_git->midx, i)) {
warning(_("could not open pack %s"),
bitmap_git->midx->pack_names[i]);
@@ -459,13 +478,20 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
goto cleanup;
}
- preferred = bitmap_git->midx->packs[preferred_pack];
+ preferred = nth_midxed_pack(bitmap_git->midx, preferred_pack);
if (!is_pack_valid(preferred)) {
warning(_("preferred pack (%s) is invalid"),
preferred->pack_name);
goto cleanup;
}
+ if (midx->base_midx) {
+ bitmap_git->base = prepare_midx_bitmap_git(midx->base_midx);
+ bitmap_git->base_nr = bitmap_git->base->base_nr + 1;
+ } else {
+ bitmap_git->base_nr = 1;
+ }
+
return 0;
cleanup:
@@ -516,6 +542,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
bitmap_git->map_size = xsize_t(st.st_size);
bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
bitmap_git->map_pos = 0;
+ bitmap_git->base_nr = 1;
close(fd);
if (load_bitmap_header(bitmap_git) < 0) {
@@ -535,8 +562,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git)
{
if (bitmap_is_midx(bitmap_git)) {
- uint32_t i;
- int ret;
+ struct multi_pack_index *m;
/*
* The multi-pack-index's .rev file is already loaded via
@@ -545,10 +571,15 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
* But we still need to open the individual pack .rev files,
* since we will need to make use of them in pack-objects.
*/
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
- ret = load_pack_revindex(r, bitmap_git->midx->packs[i]);
- if (ret)
- return ret;
+ for (m = bitmap_git->midx; m; m = m->base_midx) {
+ uint32_t i;
+ int ret;
+
+ for (i = 0; i < m->num_packs; i++) {
+ ret = load_pack_revindex(r, m->packs[i]);
+ if (ret)
+ return ret;
+ }
}
return 0;
}
@@ -574,6 +605,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
goto failed;
+ if (bitmap_git->base) {
+ if (!bitmap_is_midx(bitmap_git))
+ BUG("non-MIDX bitmap has non-NULL base bitmap index");
+ if (load_bitmap(r, bitmap_git->base) < 0)
+ goto failed;
+ }
+
return 0;
failed:
@@ -658,10 +696,9 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
{
- struct repository *r = the_repository;
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
+ if (!open_midx_bitmap_1(bitmap_git, midx))
return bitmap_git;
free_bitmap_index(bitmap_git);
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (2 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
` (12 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.
Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.
The changes to do so are as follows:
- Avoid initializing hash_pos at its declaration, since
bitmap_for_commit() is now a recursive function and may receive a
NULL bitmap_index pointer as its first argument.
- In cases where we would previously return NULL (to indicate that a
lookup failed and the given bitmap_index does not contain an entry
corresponding to the given commit), recursively call the function on
the previous bitmap layer.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index c27383c027..88623d9e06 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -946,18 +946,21 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
struct commit *commit)
{
- khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
- commit->object.oid);
+ khiter_t hash_pos;
+ if (!bitmap_git)
+ return NULL;
+
+ hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
return lookup_stored_bitmap(bitmap);
}
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 05/13] pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (3 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
` (11 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 88623d9e06..f91ab1b572 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1631,7 +1631,7 @@ static void show_objects_for_type(
nth_midxed_object_oid(&oid, m, index_pos);
pack_id = nth_midxed_pack_int_id(m, index_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
} else {
index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 06/13] pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (4 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
` (10 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index f91ab1b572..2b3c53d882 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2320,7 +2320,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
multi_pack_reuse = 0;
if (multi_pack_reuse) {
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+ struct multi_pack_index *m = bitmap_git->midx;
+ for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
struct bitmapped_pack pack;
if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
warning(_("unable to load pack: '%s', disabling pack-reuse"),
@@ -2344,14 +2345,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t pack_int_id;
if (bitmap_is_midx(bitmap_git)) {
+ struct multi_pack_index *m = bitmap_git->midx;
uint32_t preferred_pack_pos;
- if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
+ while (m->base_midx)
+ m = m->base_midx;
+
+ if (midx_preferred_pack(m, &preferred_pack_pos) < 0) {
warning(_("unable to compute preferred pack, disabling pack-reuse"));
return;
}
- pack = bitmap_git->midx->packs[preferred_pack_pos];
+ pack = nth_midxed_pack(m, preferred_pack_pos);
pack_int_id = preferred_pack_pos;
} else {
pack = bitmap_git->pack;
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (5 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
` (9 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 105 ++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 84 insertions(+), 21 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2b3c53d882..5fea2714c1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -943,8 +943,9 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
return NULL;
}
-struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
- struct commit *commit)
+static struct ewah_bitmap *find_bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit,
+ struct bitmap_index **found)
{
khiter_t hash_pos;
if (!bitmap_git)
@@ -954,18 +955,30 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(bitmap);
}
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
}
+struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit)
+{
+ return find_bitmap_for_commit(bitmap_git, commit, NULL);
+}
+
static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
const struct object_id *oid)
{
@@ -2489,6 +2502,8 @@ struct bitmap_test_data {
struct bitmap *tags;
struct progress *prg;
size_t seen;
+
+ struct bitmap_test_data *base_tdata;
};
static void test_bitmap_type(struct bitmap_test_data *tdata,
@@ -2497,6 +2512,11 @@ static void test_bitmap_type(struct bitmap_test_data *tdata,
enum object_type bitmap_type = OBJ_NONE;
int bitmaps_nr = 0;
+ if (bitmap_is_midx(tdata->bitmap_git)) {
+ while (pos < tdata->bitmap_git->midx->num_objects_in_base)
+ tdata = tdata->base_tdata;
+ }
+
if (bitmap_get(tdata->commits, pos)) {
bitmap_type = OBJ_COMMIT;
bitmaps_nr++;
@@ -2560,13 +2580,57 @@ static void test_show_commit(struct commit *commit, void *data)
display_progress(tdata->prg, ++tdata->seen);
}
+static uint32_t bitmap_total_entry_count(struct bitmap_index *bitmap_git)
+{
+ uint32_t total = 0;
+ do {
+ total = st_add(total, bitmap_git->entry_count);
+ bitmap_git = bitmap_git->base;
+ } while (bitmap_git);
+
+ return total;
+}
+
+static void prepare_bitmap_test_data(struct bitmap_test_data *tdata,
+ struct bitmap_index *bitmap_git)
+{
+ memset(tdata, 0, sizeof(struct bitmap_test_data));
+
+ tdata->bitmap_git = bitmap_git;
+ tdata->base = bitmap_new();
+ tdata->commits = ewah_to_bitmap(bitmap_git->commits);
+ tdata->trees = ewah_to_bitmap(bitmap_git->trees);
+ tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
+ tdata->tags = ewah_to_bitmap(bitmap_git->tags);
+
+ if (bitmap_git->base) {
+ CALLOC_ARRAY(tdata->base_tdata, 1);
+ prepare_bitmap_test_data(tdata->base_tdata, bitmap_git->base);
+ }
+}
+
+static void free_bitmap_test_data(struct bitmap_test_data *tdata)
+{
+ if (!tdata)
+ return;
+
+ free_bitmap_test_data(tdata->base_tdata);
+ free(tdata->base_tdata);
+
+ bitmap_free(tdata->base);
+ bitmap_free(tdata->commits);
+ bitmap_free(tdata->trees);
+ bitmap_free(tdata->blobs);
+ bitmap_free(tdata->tags);
+}
+
void test_bitmap_walk(struct rev_info *revs)
{
struct object *root;
struct bitmap *result = NULL;
size_t result_popcnt;
struct bitmap_test_data tdata;
- struct bitmap_index *bitmap_git;
+ struct bitmap_index *bitmap_git, *found;
struct ewah_bitmap *bm;
if (!(bitmap_git = prepare_bitmap_git(revs->repo)))
@@ -2575,17 +2639,26 @@ void test_bitmap_walk(struct rev_info *revs)
if (revs->pending.nr != 1)
die(_("you must specify exactly one commit to test"));
- fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
+ fprintf_ln(stderr, "Bitmap v%d test (%d entries%s, %d total)",
bitmap_git->version,
bitmap_git->entry_count,
- bitmap_git->table_lookup ? "" : " loaded");
+ bitmap_git->table_lookup ? "" : " loaded",
+ bitmap_total_entry_count(bitmap_git));
root = revs->pending.objects[0].item;
- bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
+ bm = find_bitmap_for_commit(bitmap_git, (struct commit *)root, &found);
if (bm) {
fprintf_ln(stderr, "Found bitmap for '%s'. %d bits / %08x checksum",
- oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm));
+ oid_to_hex(&root->oid),
+ (int)bm->bit_size, ewah_checksum(bm));
+
+ if (bitmap_is_midx(found))
+ fprintf_ln(stderr, "Located via MIDX '%s'.",
+ hash_to_hex(get_midx_checksum(found->midx)));
+ else
+ fprintf_ln(stderr, "Located via pack '%s'.",
+ hash_to_hex(found->pack->hash));
result = ewah_to_bitmap(bm);
}
@@ -2602,14 +2675,8 @@ void test_bitmap_walk(struct rev_info *revs)
if (prepare_revision_walk(revs))
die(_("revision walk setup failed"));
- tdata.bitmap_git = bitmap_git;
- tdata.base = bitmap_new();
- tdata.commits = ewah_to_bitmap(bitmap_git->commits);
- tdata.trees = ewah_to_bitmap(bitmap_git->trees);
- tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
- tdata.tags = ewah_to_bitmap(bitmap_git->tags);
+ prepare_bitmap_test_data(&tdata, bitmap_git);
tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
- tdata.seen = 0;
traverse_commit_list(revs, &test_show_commit, &test_show_object, &tdata);
@@ -2621,11 +2688,7 @@ void test_bitmap_walk(struct rev_info *revs)
die(_("mismatch in bitmap results"));
bitmap_free(result);
- bitmap_free(tdata.base);
- bitmap_free(tdata.commits);
- bitmap_free(tdata.trees);
- bitmap_free(tdata.blobs);
- bitmap_free(tdata.tags);
+ free_bitmap_test_data(&tdata);
free_bitmap_index(bitmap_git);
}
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 08/13] pack-bitmap.c: compute disk-usage with incremental MIDXs
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (6 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
` (8 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 5fea2714c1..d3bb78237e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1774,7 +1774,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
} else {
pack = bitmap_git->pack;
@@ -3020,7 +3020,7 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- struct packed_git *pack = bitmap_git->midx->packs[pack_id];
+ struct packed_git *pack = nth_midxed_pack(bitmap_git->midx, pack_id);
if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
struct object_id oid;
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 09/13] pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (7 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
` (7 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index d3bb78237e..1fa101bb33 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1087,10 +1087,15 @@ static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git
struct commit *commit,
uint32_t commit_pos)
{
- int ret;
+ struct bitmap_index *curr = bitmap_git;
+ int ret = 0;
- ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges,
- result, commit, commit_pos);
+ while (curr) {
+ ret += apply_pseudo_merges_for_commit(&curr->pseudo_merges,
+ result, commit,
+ commit_pos);
+ curr = curr->base;
+ }
if (ret)
pseudo_merges_satisfied_nr += ret;
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 10/13] ewah: implement `struct ewah_or_iterator`
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (8 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
` (6 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.
Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.
Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.
There are a couple of alternative approaches which were considered:
- Decompress each EWAH bitmap and OR them together, enumerating a
single (non-EWAH) bitmap. This would work, but has the disadvantage
of decompressing a potentially large bitmap, which may not be
necessary if the caller does not wish to read all of it.
- Recursively call bitmap internal functions, reusing the "result" and
"haves" bitmap from the top-most layer. This approach resembles the
original implementation of this feature, but is inefficient in that
it both (a) requires significant refactoring to implement, and (b)
enumerates large sections of later bitmaps which are all zeros (as
they pertain to objects in earlier layers).
(b) is not so bad in and of itself, but can cause significant
slow-downs when combined with expensive loop bodies.
This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
ewah/ewah_bitmap.c | 33 +++++++++++++++++++++++++++++++++
ewah/ewok.h | 12 ++++++++++++
2 files changed, 45 insertions(+)
diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 8785cbc54a..b3a7ada071 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -372,6 +372,39 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
read_new_rlw(it);
}
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr)
+{
+ size_t i;
+
+ memset(it, 0, sizeof(*it));
+
+ ALLOC_ARRAY(it->its, nr);
+ for (i = 0; i < nr; i++)
+ ewah_iterator_init(&it->its[it->nr++], parents[i]);
+}
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
+{
+ eword_t buf, out = 0;
+ size_t i;
+ int ret = 0;
+
+ for (i = 0; i < it->nr; i++)
+ if (ewah_iterator_next(&buf, &it->its[i])) {
+ out |= buf;
+ ret = 1;
+ }
+
+ *next = out;
+ return ret;
+}
+
+void ewah_or_iterator_free(struct ewah_or_iterator *it)
+{
+ free(it->its);
+}
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 5e357e2493..4b70641045 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -148,6 +148,18 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
*/
int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
+struct ewah_or_iterator {
+ struct ewah_iterator *its;
+ size_t nr;
+};
+
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr);
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
+
+void ewah_or_iterator_free(struct ewah_or_iterator *it);
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (9 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
` (5 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.
These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 51 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1fa101bb33..e1badc7887 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -78,6 +78,24 @@ struct bitmap_index {
struct ewah_bitmap *blobs;
struct ewah_bitmap *tags;
+ /*
+ * Type index arrays when this bitmap is associated with an
+ * incremental multi-pack index chain.
+ *
+ * If n is the number of unique layers in the MIDX chain, then
+ * commits_all[n-1] is this structs 'commits' field,
+ * commits_all[n-2] is the commits field of this bitmap's
+ * 'base', and so on.
+ *
+ * When either associated either with a non-incremental MIDX, or
+ * a single packfile, these arrays each contain a single
+ * element.
+ */
+ struct ewah_bitmap **commits_all;
+ struct ewah_bitmap **trees_all;
+ struct ewah_bitmap **blobs_all;
+ struct ewah_bitmap **tags_all;
+
/* Map from object ID -> `stored_bitmap` for all the bitmapped commits */
kh_oid_map_t *bitmaps;
@@ -586,7 +604,29 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
return load_pack_revindex(r, bitmap_git->pack);
}
-static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
+static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
+{
+ struct bitmap_index *curr = bitmap_git;
+ size_t i = bitmap_git->base_nr - 1;
+
+ ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr);
+ ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr);
+ ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr);
+ ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr);
+
+ while (curr) {
+ bitmap_git->commits_all[i] = curr->commits;
+ bitmap_git->trees_all[i] = curr->trees;
+ bitmap_git->blobs_all[i] = curr->blobs;
+ bitmap_git->tags_all[i] = curr->tags;
+
+ curr = curr->base;
+ i -= 1;
+ }
+}
+
+static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
+ int recursing)
{
assert(bitmap_git->map);
@@ -608,10 +648,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (bitmap_git->base) {
if (!bitmap_is_midx(bitmap_git))
BUG("non-MIDX bitmap has non-NULL base bitmap index");
- if (load_bitmap(r, bitmap_git->base) < 0)
+ if (load_bitmap(r, bitmap_git->base, 1) < 0)
goto failed;
}
+ if (!recursing)
+ load_all_type_bitmaps(bitmap_git);
+
return 0;
failed:
@@ -687,7 +730,7 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
{
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git))
+ if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git, 0))
return bitmap_git;
free_bitmap_index(bitmap_git);
@@ -2042,7 +2085,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
* from disk. this is the point of no return; after this the rev_list
* becomes invalidated and we must perform the revwalk through bitmaps
*/
- if (load_bitmap(revs->repo, bitmap_git) < 0)
+ if (load_bitmap(revs->repo, bitmap_git, 0) < 0)
goto cleanup;
if (!use_boundary_traversal)
@@ -2957,6 +3000,10 @@ void free_bitmap_index(struct bitmap_index *b)
ewah_pool_free(b->trees);
ewah_pool_free(b->blobs);
ewah_pool_free(b->tags);
+ free(b->commits_all);
+ free(b->trees_all);
+ free(b->blobs_all);
+ free(b->tags_all);
if (b->bitmaps) {
struct stored_bitmap *sb;
kh_foreach_value(b->bitmaps, sb, {
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (10 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 21:01 ` [PATCH 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
` (4 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 42 +++++++++++++++++++++++++++---------------
1 file changed, 27 insertions(+), 15 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index e1badc7887..9fac43749c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1622,25 +1622,29 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
}
}
-static void init_type_iterator(struct ewah_iterator *it,
+static void init_type_iterator(struct ewah_or_iterator *it,
struct bitmap_index *bitmap_git,
enum object_type type)
{
switch (type) {
case OBJ_COMMIT:
- ewah_iterator_init(it, bitmap_git->commits);
+ ewah_or_iterator_init(it, bitmap_git->commits_all,
+ bitmap_git->base_nr);
break;
case OBJ_TREE:
- ewah_iterator_init(it, bitmap_git->trees);
+ ewah_or_iterator_init(it, bitmap_git->trees_all,
+ bitmap_git->base_nr);
break;
case OBJ_BLOB:
- ewah_iterator_init(it, bitmap_git->blobs);
+ ewah_or_iterator_init(it, bitmap_git->blobs_all,
+ bitmap_git->base_nr);
break;
case OBJ_TAG:
- ewah_iterator_init(it, bitmap_git->tags);
+ ewah_or_iterator_init(it, bitmap_git->tags_all,
+ bitmap_git->base_nr);
break;
default:
@@ -1657,7 +1661,7 @@ static void show_objects_for_type(
size_t i = 0;
uint32_t offset;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
struct bitmap *objects = bitmap_git->result;
@@ -1665,7 +1669,7 @@ static void show_objects_for_type(
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < objects->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = objects->words[i] & filter;
size_t pos = (i * BITS_IN_EWORD);
@@ -1707,6 +1711,8 @@ static void show_objects_for_type(
show_reach(&oid, object_type, 0, hash, pack, ofs);
}
}
+
+ ewah_or_iterator_free(&it);
}
static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
@@ -1758,7 +1764,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
@@ -1775,7 +1781,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* packfile.
*/
for (i = 0, init_type_iterator(&it, bitmap_git, type);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
if (i < tips->word_alloc)
mask &= ~tips->words[i];
@@ -1795,6 +1801,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_free(&it);
bitmap_free(tips);
}
@@ -1852,14 +1859,14 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
tips = find_tip_objects(bitmap_git, tip_objects, OBJ_BLOB);
for (i = 0, init_type_iterator(&it, bitmap_git, OBJ_BLOB);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
eword_t word = to_filter->words[i] & mask;
unsigned offset;
@@ -1887,6 +1894,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_free(&it);
bitmap_free(tips);
}
@@ -2502,12 +2510,12 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
struct eindex *eindex = &bitmap_git->ext_index;
uint32_t i = 0, count = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
init_type_iterator(&it, bitmap_git, type);
- while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+ while (i < objects->word_alloc && ewah_or_iterator_next(&filter, &it)) {
eword_t word = objects->words[i++] & filter;
count += ewah_bit_popcount64(word);
}
@@ -2519,6 +2527,8 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
count++;
}
+ ewah_or_iterator_free(&it);
+
return count;
}
@@ -3046,13 +3056,13 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
{
struct bitmap *result = bitmap_git->result;
off_t total = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
size_t i;
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < result->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = result->words[i] & filter;
size_t base = (i * BITS_IN_EWORD);
unsigned offset;
@@ -3093,6 +3103,8 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
}
}
+ ewah_or_iterator_free(&it);
+
return total;
}
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH 13/13] midx: implement writing incremental MIDX bitmaps
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (11 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
@ 2024-08-15 21:01 ` Taylor Blau
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (3 subsequent siblings)
16 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 21:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 3 +-
midx-write.c | 35 +++++++----
pack-bitmap-write.c | 65 ++++++++++++++-----
pack-bitmap.h | 4 +-
t/t5334-incremental-multi-pack-index.sh | 84 +++++++++++++++++++++++++
5 files changed, 161 insertions(+), 30 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e23c4950ed..9e61ff7ca3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1342,7 +1342,8 @@ static void write_pack_file(void)
if (write_bitmap_index) {
bitmap_writer_init(&bitmap_writer,
- the_repository, &to_pack);
+ the_repository, &to_pack,
+ NULL);
bitmap_writer_set_checksum(&bitmap_writer, hash);
bitmap_writer_build_type_index(&bitmap_writer,
written_list);
diff --git a/midx-write.c b/midx-write.c
index 81501efdda..bac3b0589a 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -826,20 +826,26 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
return cb.commits;
}
-static int write_midx_bitmap(const char *midx_name,
+static int write_midx_bitmap(struct write_midx_context *ctx,
+ const char *object_dir, const char *midx_name,
const unsigned char *midx_hash,
struct packing_data *pdata,
struct commit **commits,
uint32_t commits_nr,
- uint32_t *pack_order,
unsigned flags)
{
int ret, i;
uint16_t options = 0;
struct bitmap_writer writer;
struct pack_idx_entry **index;
- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
- hash_to_hex(midx_hash));
+ struct strbuf bitmap_name = STRBUF_INIT;
+
+ if (ctx->incremental)
+ get_split_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
+ MIDX_EXT_BITMAP);
trace2_region_enter("midx", "write_midx_bitmap", the_repository);
@@ -858,7 +864,8 @@ static int write_midx_bitmap(const char *midx_name,
for (i = 0; i < pdata->nr_objects; i++)
index[i] = &pdata->objects[i].idx;
- bitmap_writer_init(&writer, the_repository, pdata);
+ bitmap_writer_init(&writer, the_repository, pdata,
+ ctx->incremental ? ctx->base_midx : NULL);
bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS);
bitmap_writer_build_type_index(&writer, index);
@@ -876,7 +883,7 @@ static int write_midx_bitmap(const char *midx_name,
* bitmap_writer_finish().
*/
for (i = 0; i < pdata->nr_objects; i++)
- index[pack_order[i]] = &pdata->objects[i].idx;
+ index[ctx->pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(&writer, commits, commits_nr);
ret = bitmap_writer_build(&writer);
@@ -884,11 +891,11 @@ static int write_midx_bitmap(const char *midx_name,
goto cleanup;
bitmap_writer_set_checksum(&writer, midx_hash);
- bitmap_writer_finish(&writer, index, bitmap_name, options);
+ bitmap_writer_finish(&writer, index, bitmap_name.buf, options);
cleanup:
free(index);
- free(bitmap_name);
+ strbuf_release(&bitmap_name);
bitmap_writer_free(&writer);
trace2_region_leave("midx", "write_midx_bitmap", the_repository);
@@ -1072,8 +1079,6 @@ static int write_midx_internal(const char *object_dir,
trace2_region_enter("midx", "write_midx_internal", the_repository);
ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
- if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
- die(_("cannot write incremental MIDX with bitmap"));
if (ctx.incremental)
strbuf_addf(&midx_name,
@@ -1115,6 +1120,12 @@ static int write_midx_internal(const char *object_dir,
if (ctx.incremental) {
struct multi_pack_index *m = ctx.base_midx;
while (m) {
+ if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+ error(_("could not load reverse index for MIDX %s"),
+ hash_to_hex(get_midx_checksum(m)));
+ result = 1;
+ goto cleanup;
+ }
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
@@ -1404,8 +1415,8 @@ static int write_midx_internal(const char *object_dir,
FREE_AND_NULL(ctx.entries);
ctx.entries_nr = 0;
- if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
- commits, commits_nr, ctx.pack_order,
+ if (write_midx_bitmap(&ctx, object_dir, midx_name.buf,
+ midx_hash, &pdata, commits, commits_nr,
flags) < 0) {
error(_("could not write multi-pack bitmap"));
result = 1;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 923f793cec..8fc979cbc9 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -25,6 +25,8 @@
#include "alloc.h"
#include "refs.h"
#include "strmap.h"
+#include "midx.h"
+#include "pack-revindex.h"
struct bitmapped_commit {
struct commit *commit;
@@ -42,7 +44,8 @@ static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer
}
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata)
+ struct packing_data *pdata,
+ struct multi_pack_index *midx)
{
memset(writer, 0, sizeof(struct bitmap_writer));
if (writer->bitmaps)
@@ -50,6 +53,7 @@ void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
writer->bitmaps = kh_init_oid_map();
writer->pseudo_merge_commits = kh_init_oid_map();
writer->to_pack = pdata;
+ writer->midx = midx;
string_list_init_dup(&writer->pseudo_merge_groups);
@@ -104,6 +108,11 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
struct pack_idx_entry **index)
{
uint32_t i;
+ uint32_t base_objects = 0;
+
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
writer->commits = ewah_new();
writer->trees = ewah_new();
@@ -133,19 +142,19 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
switch (real_type) {
case OBJ_COMMIT:
- ewah_set(writer->commits, i);
+ ewah_set(writer->commits, i + base_objects);
break;
case OBJ_TREE:
- ewah_set(writer->trees, i);
+ ewah_set(writer->trees, i + base_objects);
break;
case OBJ_BLOB:
- ewah_set(writer->blobs, i);
+ ewah_set(writer->blobs, i + base_objects);
break;
case OBJ_TAG:
- ewah_set(writer->tags, i);
+ ewah_set(writer->tags, i + base_objects);
break;
default:
@@ -198,19 +207,37 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer,
static uint32_t find_object_pos(struct bitmap_writer *writer,
const struct object_id *oid, int *found)
{
- struct object_entry *entry = packlist_find(writer->to_pack, oid);
+ struct object_entry *entry;
+
+ entry = packlist_find(writer->to_pack, oid);
+ if (entry) {
+ uint32_t base_objects = 0;
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+
+ if (found)
+ *found = 1;
+ return oe_in_pack_pos(writer->to_pack, entry) + base_objects;
+ } else if (writer->midx) {
+ uint32_t at, pos;
+
+ if (!bsearch_midx(oid, writer->midx, &at))
+ goto missing;
+ if (midx_to_pack_pos(writer->midx, at, &pos) < 0)
+ goto missing;
- if (!entry) {
if (found)
- *found = 0;
- warning("Failed to write bitmap index. Packfile doesn't have full closure "
- "(object %s is missing)", oid_to_hex(oid));
- return 0;
+ *found = 1;
+ return pos;
}
+missing:
if (found)
- *found = 1;
- return oe_in_pack_pos(writer->to_pack, entry);
+ *found = 0;
+ warning("Failed to write bitmap index. Packfile doesn't have full closure "
+ "(object %s is missing)", oid_to_hex(oid));
+ return 0;
}
static void compute_xor_offsets(struct bitmap_writer *writer)
@@ -577,7 +604,7 @@ int bitmap_writer_build(struct bitmap_writer *writer)
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct prio_queue tree_queue = { NULL };
struct bitmap_index *old_bitmap;
- uint32_t *mapping;
+ uint32_t *mapping = NULL;
int closed = 1; /* until proven otherwise */
if (writer->show_progress)
@@ -1010,7 +1037,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
struct strbuf tmp_file = STRBUF_INIT;
struct hashfile *f;
off_t *offsets = NULL;
- uint32_t i;
+ uint32_t i, base_objects;
struct bitmap_disk_header header;
@@ -1036,6 +1063,12 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (options & BITMAP_OPT_LOOKUP_TABLE)
CALLOC_ARRAY(offsets, writer->to_pack->nr_objects);
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+ else
+ base_objects = 0;
+
for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
struct bitmapped_commit *stored = &writer->selected[i];
int commit_pos = oid_pos(&stored->commit->object.oid, index,
@@ -1044,7 +1077,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (commit_pos < 0)
BUG(_("trying to write commit not in index"));
- stored->commit_pos = commit_pos;
+ stored->commit_pos = commit_pos + base_objects;
}
write_selected_commits_v1(writer, f, offsets);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index ff0fd815b8..4242458198 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -110,6 +110,7 @@ struct bitmap_writer {
kh_oid_map_t *bitmaps;
struct packing_data *to_pack;
+ struct multi_pack_index *midx; /* if appending to a MIDX chain */
struct bitmapped_commit *selected;
unsigned int selected_nr, selected_alloc;
@@ -124,7 +125,8 @@ struct bitmap_writer {
};
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata);
+ struct packing_data *pdata,
+ struct multi_pack_index *midx);
void bitmap_writer_show_progress(struct bitmap_writer *writer, int show);
void bitmap_writer_set_checksum(struct bitmap_writer *writer,
const unsigned char *sha1);
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
index c3b08acc73..0b6d45c8fd 100755
--- a/t/t5334-incremental-multi-pack-index.sh
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -43,4 +43,88 @@ test_expect_success 'convert incremental to non-incremental' '
compare_results_with_midx 'non-incremental MIDX conversion'
+write_midx_layer () {
+ n=1
+ if test -f $midx_chain
+ then
+ n="$(($(wc -l <$midx_chain) + 1))"
+ fi
+
+ for i in 1 2
+ do
+ test_commit $n.$i &&
+ git repack -d || return 1
+ done &&
+ git multi-pack-index write --bitmap --incremental
+}
+
+test_expect_success 'write initial MIDX layer' '
+ git repack -ad &&
+ write_midx_layer
+'
+
+test_expect_success 'read bitmap from first MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'write another MIDX layer' '
+ write_midx_layer
+'
+
+test_expect_success 'midx verify with multiple layers' '
+ git multi-pack-index verify
+'
+
+test_expect_success 'read bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 2.2
+'
+
+test_expect_success 'read earlier bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'show object from first pack' '
+ git cat-file -p 1.1
+'
+
+test_expect_success 'show object from second pack' '
+ git cat-file -p 2.2
+'
+
+for reuse in false single multi
+do
+ test_expect_success "full clone (pack.allowPackReuse=$reuse)" '
+ rm -fr clone.git &&
+
+ git config pack.allowPackReuse $reuse &&
+ git clone --no-local --bare . clone.git
+ '
+done
+
+test_expect_success 'relink existing MIDX layer' '
+ rm -fr "$midxdir" &&
+
+ GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap &&
+
+ midx_hash="$(test-tool read-midx --checksum $objdir)" &&
+
+ test_path_is_file "$packdir/multi-pack-index" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_commit another &&
+ git repack -d &&
+ git multi-pack-index write --bitmap --incremental &&
+
+ test_path_is_missing "$packdir/multi-pack-index" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.midx" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.rev" &&
+ test_line_count = 2 "$midx_chain"
+
+'
+
test_done
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 00/13] midx: incremental multi-pack indexes, part two
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (12 preceding siblings ...)
2024-08-15 21:01 ` [PATCH 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
@ 2024-08-15 22:28 ` Taylor Blau
2024-08-15 22:28 ` [PATCH v2 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
` (12 more replies)
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (2 subsequent siblings)
16 siblings, 13 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:28 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
== Changes since last time
[The first round did not include commit messages for four intermediate
commits which could benefit from additional description, which has been
corrected here. The code is unchanged from the previous round, and a
range-diff is below.]
== Original cover letter
This series is based on 'master', with an additional merge between
tb/incremental-midx-part-1[1] and my newer series to fix a handful of
bugs related to pseudo-merge bitmaps[2].
This is the second of three series to implement support for incremental
multi-pack indexes (MIDXs). This series brings support for bitmaps that
are tied to incremental MIDXs in addition to regular MIDX bitmaps.
The details are laid out in the commits themselves, but the high-level
approach is as follows:
- Each layer in the incremental MIDX chain has its own corresponding
*.bitmap file. Each bitmap contains commits / pseudo-merges which
are selected only from the commits in that layer. Likewise, only
that layer's objects are written in the type-level bitmaps.
- The reachability traversal is only conducted on the top-most bitmap
corresponding to the most recent layer in the incremental MIDX
chain. Earlier layers may be consulted to retrieve commit /
pseudo-merge reachability bitmaps, but only the top-most bitmap's
"result" and "haves" fields are used.
- In essence, the top-most bitmap is the only one that "matters", and
earlier bitmaps are merely used to look up commit and pseudo-merge
bitmaps from that layer.
- Whenever we need to look at the type-level bitmaps corresponding to
the whole incremental MIDX chain, a new "ewah_or_iterator" is used.
This works in concept like a typical ewah_iterator, except works
over many EWAH bitmaps in parallel, OR-ing their results together
before returning them to the user.
In effect, this allows us to treat the union of all type-level
bitmaps (each of which only stores information about the objects its
corresponding layer within the incremental MIDX chain) as a single
type-level bitmap corresponding to all of the objects across every
layer of the incremental MIDX chain.
The sum total of this series is that we are able to append new commits /
pseudo-merges to a repository's reachability bitmaps without having to
rewrite existing bitmaps, making the operation much cheaper to perform
in large repositories.
The series is laid out roughly as follows:
- The first patch describes the technical details of incremental MIDX
bitmaps.
- The second patch adjusts the pack-revindex internals to prepare for
incremental MIDX bitmaps.
- The next seven patches adjust various components of the pack-bitmap
internals to do the same.
- The next three patches introduce and adjust callers to use the
ewah_or_iterator (as above).
- The final patch implements writing incremental MIDX bitmaps, and
introduces tests.
After this series, the remaining goals for this project include being
able to compact contiguous runs of incremental MIDX layers into a single
layer to support growing the chain of MIDX layers without the chain
itself becoming too long.
Thanks in advance for your review!
[1]: https://lore.kernel.org/git/cover.1722958595.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/cover.1723743050.git.me@ttaylorr.com/
Taylor Blau (13):
Documentation: describe incremental MIDX bitmaps
pack-revindex: prepare for incremental MIDX bitmaps
pack-bitmap.c: open and store incremental bitmap layers
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: keep track of each layer's type bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
midx: implement writing incremental MIDX bitmaps
Documentation/technical/multi-pack-index.txt | 64 ++++
builtin/pack-objects.c | 3 +-
ewah/ewah_bitmap.c | 33 ++
ewah/ewok.h | 12 +
midx-write.c | 35 +-
pack-bitmap-write.c | 65 +++-
pack-bitmap.c | 328 ++++++++++++++-----
pack-bitmap.h | 4 +-
pack-revindex.c | 32 +-
t/t5334-incremental-multi-pack-index.sh | 84 +++++
10 files changed, 548 insertions(+), 112 deletions(-)
Range-diff against v1:
-: ---------- > 1: d1b8d11b37 Documentation: describe incremental MIDX bitmaps
-: ---------- > 2: f5d0866e5c pack-revindex: prepare for incremental MIDX bitmaps
-: ---------- > 3: 43444efc21 pack-bitmap.c: open and store incremental bitmap layers
-: ---------- > 4: 4487130648 pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
1: b7eae5dc61 ! 5: b720fe56da pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
@@ Metadata
## Commit message ##
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
+ Since we may ask for a pack_id that is in an earlier MIDX layer relative
+ to the one corresponding to our bitmap, use nth_midxed_pack() instead of
+ accessing the ->packs array directly.
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## pack-bitmap.c ##
2: 01b8bd22cd ! 6: 9716d022e0 pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
@@ Metadata
## Commit message ##
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
+ In a similar fashion as previous commits in the first phase of
+ incremental MIDXs, enumerate not just the packs in the current
+ incremental MIDX layer, but previous ones as well.
+
+ Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
+ single pack from a MIDX, use the oldest layer's preferred pack as it is
+ likely to contain the most amount of reusable sections.
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## pack-bitmap.c ##
3: 928a4eabc8 ! 7: 6baece3175 pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
@@ Metadata
## Commit message ##
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
+ Implement support for the special `--test-bitmap` mode of `git rev-list`
+ when using incremental MIDXs.
+
+ The bitmap_test_data structure is extended to contain a "base" pointer
+ that mirrors the structure of the bitmap chain that it is being used to
+ test.
+
+ When we find a commit to test, we first chase down the ->base pointer to
+ find the appropriate bitmap_test_data for the bitmap layer that the
+ given commit is contained within, and then perform the test on that
+ bitmap.
+
+ In order to implement this, light modifications are made to
+ bitmap_for_commit() to reimplement it in terms of a new function,
+ find_bitmap_for_commit(), which fills out a pointer which indicates the
+ bitmap layer which contains the given commit.
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## pack-bitmap.c ##
4: 129f55ac28 ! 8: 5c909df38a pack-bitmap.c: compute disk-usage with incremental MIDXs
@@ Metadata
## Commit message ##
pack-bitmap.c: compute disk-usage with incremental MIDXs
+ In a similar fashion as previous commits, use nth_midxed_pack() instead
+ of accessing the MIDX's ->packs array directly to support incremental
+ MIDXs.
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## pack-bitmap.c ##
5: 81839fc1d1 = 9: f9ae10fce9 pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
6: 63019e952a = 10: 04042981c1 ewah: implement `struct ewah_or_iterator`
7: 01508e4ff5 = 11: c4d543d43d pack-bitmap.c: keep track of each layer's type bitmaps
8: 59a50a2ea2 = 12: c6730b4107 pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
9: da34cc9441 = 13: afefb45557 midx: implement writing incremental MIDX bitmaps
base-commit: f28e4ef872c247b0b35f48b1c3d2c5f77753b908
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply [flat|nested] 136+ messages in thread
* [PATCH v2 01/13] Documentation: describe incremental MIDX bitmaps
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
@ 2024-08-15 22:28 ` Taylor Blau
2024-08-15 22:28 ` [PATCH v2 02/13] pack-revindex: prepare for " Taylor Blau
` (11 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:28 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare to implement support for reachability bitmaps for the new
incremental multi-pack index (MIDX) feature over the following commits.
This commit begins by first describing the relevant format and usage
details for incremental MIDX bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/technical/multi-pack-index.txt | 64 ++++++++++++++++++++
1 file changed, 64 insertions(+)
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index cc063b30be..a063262c36 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -164,6 +164,70 @@ objects_nr($H2) + objects_nr($H1) + i
(in the C implementation, this is often computed as `i +
m->num_objects_in_base`).
+=== Pseudo-pack order for incremental MIDXs
+
+The original implementation of multi-pack reachability bitmaps defined
+the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
+titled "multi-pack-index reverse indexes") roughly as follows:
+
+____
+In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the MIDX, laid out in pack order, and the
+packs arranged in MIDX order (with the preferred pack coming first).
+____
+
+In the incremental MIDX design, we extend this definition to include
+objects from multiple layers of the MIDX chain. The pseudo-pack order
+for incremental MIDXs is determined by concatenating the pseudo-pack
+ordering for each layer of the MIDX chain in order. Formally two objects
+`o1` and `o2` are compared as follows:
+
+1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
+ `o1` is considered less than `o2`.
+2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
+ MIDX layer has no base, then If one of `pack(o1)` and `pack(o2)` is
+ preferred and the other is not, then the preferred one sorts first. If
+ there is a base layer (i.e. the MIDX layer is not the first layer in
+ the chain), then if `pack(o1)` appears earlier in that MIDX layer's
+ pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
+ appears earlier, than the opposite is true.
+3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
+ same MIDX layer. Sort `o1` and `o2` by their offset within their
+ containing packfile.
+
+=== Reachability bitmaps and incremental MIDXs
+
+Each layer of an incremental MIDX chain may have its objects (and the
+objects from any previous layer in the same MIDX chain) represented in
+its own `*.bitmap` file.
+
+The structure of a `*.bitmap` file belonging to an incremental MIDX
+chain is identical to that of a non-incremental MIDX bitmap, or a
+classic single-pack bitmap. Since objects are added to the end of the
+incremental MIDX's pseudo-pack order (see: above), it is possible to
+extend a bitmap when appending to the end of a MIDX chain.
+
+(Note: it is possible likewise to compress a contiguous sequence of MIDX
+incremental layers, and their `*.bitmap`(s) into a single layer and
+`*.bitmap`, but this is not yet implemented.)
+
+The object positions used are global within the pseudo-pack order, so
+subsequent layers will have, for example, `m->num_objects_in_base`
+number of `0` bits in each of their four type bitmaps. This follows from
+the fact that we only write type bitmap entries for objects present in
+the layer immediately corresponding to the bitmap).
+
+Note also that only the bitmap pertaining to the most recent layer in an
+incremental MIDX chain is used to store reachability information about
+the interesting and uninteresting objects in a reachability query.
+Earlier bitmap layers are only used to look up commit and pseudo-merge
+bitmaps from that layer, as well as the type-level bitmaps for objects
+in that layer.
+
+To simplify the implementation, type-level bitmaps are iterated
+simultaneously, and their results are OR'd together to avoid recursively
+calling internal bitmap functions.
+
Future Work
-----------
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
2024-08-15 22:28 ` [PATCH v2 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
@ 2024-08-15 22:28 ` Taylor Blau
2024-08-15 22:28 ` [PATCH v2 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
` (10 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:28 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:
- load_midx_revindex() learns to use the appropriate MIDX filename
depending on whether the given 'struct multi_pack_index *' is
incremental or not.
- pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
object position in the MIDX pseudo-pack order, and finds the
earliest containing MIDX (similar to midx.c::midx_for_object().
- midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
number of objects in the base (since 'vb - midx->revindx_data' is
relative to the containing MIDX, and pack_pos_to_midx() expects a
global position).
Likewise, this function adjusts its output by adding
m->num_objects_in_base to return a global position out through the
`*pos` pointer.
Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 30 ++++++++++++++++++++----------
pack-revindex.c | 32 +++++++++++++++++++++++---------
2 files changed, 43 insertions(+), 19 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2e657a2aa4..0a7039d955 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
return read_bitmap(index->map, index->map_size, &index->map_pos);
}
+static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
+{
+ if (index->midx) {
+ struct multi_pack_index *m = index->midx;
+ return m->num_objects + m->num_objects_in_base;
+ }
+ return index->pack->num_objects;
+}
+
static uint32_t bitmap_num_objects(struct bitmap_index *index)
{
if (index->midx)
@@ -925,7 +934,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
if (pos < kh_end(positions)) {
int bitmap_pos = kh_value(positions, pos);
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
}
return -1;
@@ -993,7 +1002,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
bitmap_pos = kh_value(eindex->positions, hash_pos);
}
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
}
struct bitmap_show_data {
@@ -1498,7 +1507,8 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
struct object *obj;
- if (!bitmap_get(objects, st_add(bitmap_num_objects(bitmap_git), i)))
+ if (!bitmap_get(objects,
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
continue;
obj = eindex->objects[i];
@@ -1677,7 +1687,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* them individually.
*/
for (i = 0; i < eindex->count; i++) {
- size_t pos = st_add(i, bitmap_num_objects(bitmap_git));
+ size_t pos = st_add(i, bitmap_non_extended_bits(bitmap_git));
if (eindex->objects[i]->type == type &&
bitmap_get(to_filter, pos) &&
!bitmap_get(tips, pos))
@@ -1703,7 +1713,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
oi.sizep = &size;
- if (pos < bitmap_num_objects(bitmap_git)) {
+ if (pos < bitmap_non_extended_bits(bitmap_git)) {
struct packed_git *pack;
off_t ofs;
@@ -1726,7 +1736,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
}
} else {
struct eindex *eindex = &bitmap_git->ext_index;
- struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
+ struct object *obj = eindex->objects[pos - bitmap_non_extended_bits(bitmap_git)];
if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
}
@@ -1878,7 +1888,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t objects_nr;
size_t i, pos;
- objects_nr = bitmap_num_objects(bitmap_git);
+ objects_nr = bitmap_non_extended_bits(bitmap_git);
pos = objects_nr / BITS_IN_EWORD;
if (pos > result->word_alloc)
@@ -2399,7 +2409,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
if (eindex->objects[i]->type == type &&
bitmap_get(objects,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
count++;
}
@@ -2798,7 +2808,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
BUG("rebuild_existing_bitmaps: missing required rev-cache "
"extension");
- num_objects = bitmap_num_objects(bitmap_git);
+ num_objects = bitmap_non_extended_bits(bitmap_git);
CALLOC_ARRAY(reposition, num_objects);
for (i = 0; i < num_objects; ++i) {
@@ -2941,7 +2951,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
struct object *obj = eindex->objects[i];
if (!bitmap_get(result,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
continue;
if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
diff --git a/pack-revindex.c b/pack-revindex.c
index 22d3c23464..ce3f7ae214 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -383,8 +383,12 @@ int load_midx_revindex(struct multi_pack_index *m)
trace2_data_string("load_midx_revindex", the_repository,
"source", "rev");
- get_midx_filename_ext(&revindex_name, m->object_dir,
- get_midx_checksum(m), MIDX_EXT_REV);
+ if (m->has_chain)
+ get_split_midx_filename_ext(&revindex_name, m->object_dir,
+ get_midx_checksum(m), MIDX_EXT_REV);
+ else
+ get_midx_filename_ext(&revindex_name, m->object_dir,
+ get_midx_checksum(m), MIDX_EXT_REV);
ret = load_revindex_from_disk(revindex_name.buf,
m->num_objects,
@@ -471,11 +475,15 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
{
+ while (m && pos < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
if (!m->revindex_data)
BUG("pack_pos_to_midx: reverse index not yet loaded");
- if (m->num_objects <= pos)
+ if (m->num_objects + m->num_objects_in_base <= pos)
BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
- return get_be32(m->revindex_data + pos);
+ return get_be32(m->revindex_data + pos - m->num_objects_in_base);
}
struct midx_pack_key {
@@ -491,7 +499,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
const struct midx_pack_key *key = va;
struct multi_pack_index *midx = key->midx;
- uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+ size_t pos = (uint32_t*)vb - (const uint32_t *)midx->revindex_data;
+ uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
off_t versus_offset;
@@ -529,9 +538,9 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
{
uint32_t *found;
- if (key->pack >= m->num_packs)
+ if (key->pack >= m->num_packs + m->num_packs_in_base)
BUG("MIDX pack lookup out of bounds (%"PRIu32" >= %"PRIu32")",
- key->pack, m->num_packs);
+ key->pack, m->num_packs + m->num_packs_in_base);
/*
* The preferred pack sorts first, so determine its identifier by
* looking at the first object in pseudo-pack order.
@@ -551,7 +560,8 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
if (!found)
return -1;
- *pos = found - m->revindex_data;
+ *pos = (found - m->revindex_data) + m->num_objects_in_base;
+
return 0;
}
@@ -559,9 +569,13 @@ int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
{
struct midx_pack_key key;
+ while (m && at < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, at);
if (!m->revindex_data)
BUG("midx_to_pack_pos: reverse index not yet loaded");
- if (m->num_objects <= at)
+ if (m->num_objects + m->num_objects_in_base <= at)
BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
key.pack = nth_midxed_pack_int_id(m, at);
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 03/13] pack-bitmap.c: open and store incremental bitmap layers
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
2024-08-15 22:28 ` [PATCH v2 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
2024-08-15 22:28 ` [PATCH v2 02/13] pack-revindex: prepare for " Taylor Blau
@ 2024-08-15 22:28 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
` (9 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:28 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.
The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain bitmap layers along the "base" pointer,
ensures that the correct packs and their reverse indexes are loaded
across MIDX layers, etc.
While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.
Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 63 ++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 50 insertions(+), 13 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 0a7039d955..c27383c027 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -54,6 +54,13 @@ struct bitmap_index {
struct packed_git *pack;
struct multi_pack_index *midx;
+ /*
+ * If using a multi-pack index chain, 'base' points to the
+ * bitmap index corresponding to this bitmap's midx->base_midx.
+ */
+ struct bitmap_index *base;
+ uint32_t base_nr;
+
/* mmapped buffer of the whole bitmap index */
unsigned char *map;
size_t map_size; /* size of the mmaped buffer */
@@ -377,8 +384,13 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
char *midx_bitmap_filename(struct multi_pack_index *midx)
{
struct strbuf buf = STRBUF_INIT;
- get_midx_filename_ext(&buf, midx->object_dir, get_midx_checksum(midx),
- MIDX_EXT_BITMAP);
+ if (midx->has_chain)
+ get_split_midx_filename_ext(&buf, midx->object_dir,
+ get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(&buf, midx->object_dir,
+ get_midx_checksum(midx), MIDX_EXT_BITMAP);
return strbuf_detach(&buf, NULL);
}
@@ -397,10 +409,17 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
{
struct stat st;
char *bitmap_name = midx_bitmap_filename(midx);
- int fd = git_open(bitmap_name);
+ int fd;
uint32_t i, preferred_pack;
struct packed_git *preferred;
+ fd = git_open(bitmap_name);
+ if (fd < 0 && errno == ENOENT) {
+ FREE_AND_NULL(bitmap_name);
+ bitmap_name = midx_bitmap_filename(midx);
+ fd = git_open(bitmap_name);
+ }
+
if (fd < 0) {
if (errno != ENOENT)
warning_errno("cannot open '%s'", bitmap_name);
@@ -446,7 +465,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
goto cleanup;
}
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+ for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
if (prepare_midx_pack(the_repository, bitmap_git->midx, i)) {
warning(_("could not open pack %s"),
bitmap_git->midx->pack_names[i]);
@@ -459,13 +478,20 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
goto cleanup;
}
- preferred = bitmap_git->midx->packs[preferred_pack];
+ preferred = nth_midxed_pack(bitmap_git->midx, preferred_pack);
if (!is_pack_valid(preferred)) {
warning(_("preferred pack (%s) is invalid"),
preferred->pack_name);
goto cleanup;
}
+ if (midx->base_midx) {
+ bitmap_git->base = prepare_midx_bitmap_git(midx->base_midx);
+ bitmap_git->base_nr = bitmap_git->base->base_nr + 1;
+ } else {
+ bitmap_git->base_nr = 1;
+ }
+
return 0;
cleanup:
@@ -516,6 +542,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
bitmap_git->map_size = xsize_t(st.st_size);
bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
bitmap_git->map_pos = 0;
+ bitmap_git->base_nr = 1;
close(fd);
if (load_bitmap_header(bitmap_git) < 0) {
@@ -535,8 +562,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git)
{
if (bitmap_is_midx(bitmap_git)) {
- uint32_t i;
- int ret;
+ struct multi_pack_index *m;
/*
* The multi-pack-index's .rev file is already loaded via
@@ -545,10 +571,15 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
* But we still need to open the individual pack .rev files,
* since we will need to make use of them in pack-objects.
*/
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
- ret = load_pack_revindex(r, bitmap_git->midx->packs[i]);
- if (ret)
- return ret;
+ for (m = bitmap_git->midx; m; m = m->base_midx) {
+ uint32_t i;
+ int ret;
+
+ for (i = 0; i < m->num_packs; i++) {
+ ret = load_pack_revindex(r, m->packs[i]);
+ if (ret)
+ return ret;
+ }
}
return 0;
}
@@ -574,6 +605,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
goto failed;
+ if (bitmap_git->base) {
+ if (!bitmap_is_midx(bitmap_git))
+ BUG("non-MIDX bitmap has non-NULL base bitmap index");
+ if (load_bitmap(r, bitmap_git->base) < 0)
+ goto failed;
+ }
+
return 0;
failed:
@@ -658,10 +696,9 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
{
- struct repository *r = the_repository;
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
+ if (!open_midx_bitmap_1(bitmap_git, midx))
return bitmap_git;
free_bitmap_index(bitmap_git);
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (2 preceding siblings ...)
2024-08-15 22:28 ` [PATCH v2 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
` (8 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.
Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.
The changes to do so are as follows:
- Avoid initializing hash_pos at its declaration, since
bitmap_for_commit() is now a recursive function and may receive a
NULL bitmap_index pointer as its first argument.
- In cases where we would previously return NULL (to indicate that a
lookup failed and the given bitmap_index does not contain an entry
corresponding to the given commit), recursively call the function on
the previous bitmap layer.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index c27383c027..88623d9e06 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -946,18 +946,21 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
struct commit *commit)
{
- khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
- commit->object.oid);
+ khiter_t hash_pos;
+ if (!bitmap_git)
+ return NULL;
+
+ hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
return lookup_stored_bitmap(bitmap);
}
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 05/13] pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (3 preceding siblings ...)
2024-08-15 22:29 ` [PATCH v2 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
` (7 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Since we may ask for a pack_id that is in an earlier MIDX layer relative
to the one corresponding to our bitmap, use nth_midxed_pack() instead of
accessing the ->packs array directly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 88623d9e06..f91ab1b572 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1631,7 +1631,7 @@ static void show_objects_for_type(
nth_midxed_object_oid(&oid, m, index_pos);
pack_id = nth_midxed_pack_int_id(m, index_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
} else {
index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 06/13] pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (4 preceding siblings ...)
2024-08-15 22:29 ` [PATCH v2 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
` (6 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
In a similar fashion as previous commits in the first phase of
incremental MIDXs, enumerate not just the packs in the current
incremental MIDX layer, but previous ones as well.
Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
likely to contain the most amount of reusable sections.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index f91ab1b572..2b3c53d882 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2320,7 +2320,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
multi_pack_reuse = 0;
if (multi_pack_reuse) {
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+ struct multi_pack_index *m = bitmap_git->midx;
+ for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
struct bitmapped_pack pack;
if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
warning(_("unable to load pack: '%s', disabling pack-reuse"),
@@ -2344,14 +2345,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t pack_int_id;
if (bitmap_is_midx(bitmap_git)) {
+ struct multi_pack_index *m = bitmap_git->midx;
uint32_t preferred_pack_pos;
- if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
+ while (m->base_midx)
+ m = m->base_midx;
+
+ if (midx_preferred_pack(m, &preferred_pack_pos) < 0) {
warning(_("unable to compute preferred pack, disabling pack-reuse"));
return;
}
- pack = bitmap_git->midx->packs[preferred_pack_pos];
+ pack = nth_midxed_pack(m, preferred_pack_pos);
pack_int_id = preferred_pack_pos;
} else {
pack = bitmap_git->pack;
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (5 preceding siblings ...)
2024-08-15 22:29 ` [PATCH v2 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
` (5 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.
The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.
When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.
In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 105 ++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 84 insertions(+), 21 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2b3c53d882..5fea2714c1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -943,8 +943,9 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
return NULL;
}
-struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
- struct commit *commit)
+static struct ewah_bitmap *find_bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit,
+ struct bitmap_index **found)
{
khiter_t hash_pos;
if (!bitmap_git)
@@ -954,18 +955,30 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(bitmap);
}
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
}
+struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit)
+{
+ return find_bitmap_for_commit(bitmap_git, commit, NULL);
+}
+
static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
const struct object_id *oid)
{
@@ -2489,6 +2502,8 @@ struct bitmap_test_data {
struct bitmap *tags;
struct progress *prg;
size_t seen;
+
+ struct bitmap_test_data *base_tdata;
};
static void test_bitmap_type(struct bitmap_test_data *tdata,
@@ -2497,6 +2512,11 @@ static void test_bitmap_type(struct bitmap_test_data *tdata,
enum object_type bitmap_type = OBJ_NONE;
int bitmaps_nr = 0;
+ if (bitmap_is_midx(tdata->bitmap_git)) {
+ while (pos < tdata->bitmap_git->midx->num_objects_in_base)
+ tdata = tdata->base_tdata;
+ }
+
if (bitmap_get(tdata->commits, pos)) {
bitmap_type = OBJ_COMMIT;
bitmaps_nr++;
@@ -2560,13 +2580,57 @@ static void test_show_commit(struct commit *commit, void *data)
display_progress(tdata->prg, ++tdata->seen);
}
+static uint32_t bitmap_total_entry_count(struct bitmap_index *bitmap_git)
+{
+ uint32_t total = 0;
+ do {
+ total = st_add(total, bitmap_git->entry_count);
+ bitmap_git = bitmap_git->base;
+ } while (bitmap_git);
+
+ return total;
+}
+
+static void prepare_bitmap_test_data(struct bitmap_test_data *tdata,
+ struct bitmap_index *bitmap_git)
+{
+ memset(tdata, 0, sizeof(struct bitmap_test_data));
+
+ tdata->bitmap_git = bitmap_git;
+ tdata->base = bitmap_new();
+ tdata->commits = ewah_to_bitmap(bitmap_git->commits);
+ tdata->trees = ewah_to_bitmap(bitmap_git->trees);
+ tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
+ tdata->tags = ewah_to_bitmap(bitmap_git->tags);
+
+ if (bitmap_git->base) {
+ CALLOC_ARRAY(tdata->base_tdata, 1);
+ prepare_bitmap_test_data(tdata->base_tdata, bitmap_git->base);
+ }
+}
+
+static void free_bitmap_test_data(struct bitmap_test_data *tdata)
+{
+ if (!tdata)
+ return;
+
+ free_bitmap_test_data(tdata->base_tdata);
+ free(tdata->base_tdata);
+
+ bitmap_free(tdata->base);
+ bitmap_free(tdata->commits);
+ bitmap_free(tdata->trees);
+ bitmap_free(tdata->blobs);
+ bitmap_free(tdata->tags);
+}
+
void test_bitmap_walk(struct rev_info *revs)
{
struct object *root;
struct bitmap *result = NULL;
size_t result_popcnt;
struct bitmap_test_data tdata;
- struct bitmap_index *bitmap_git;
+ struct bitmap_index *bitmap_git, *found;
struct ewah_bitmap *bm;
if (!(bitmap_git = prepare_bitmap_git(revs->repo)))
@@ -2575,17 +2639,26 @@ void test_bitmap_walk(struct rev_info *revs)
if (revs->pending.nr != 1)
die(_("you must specify exactly one commit to test"));
- fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
+ fprintf_ln(stderr, "Bitmap v%d test (%d entries%s, %d total)",
bitmap_git->version,
bitmap_git->entry_count,
- bitmap_git->table_lookup ? "" : " loaded");
+ bitmap_git->table_lookup ? "" : " loaded",
+ bitmap_total_entry_count(bitmap_git));
root = revs->pending.objects[0].item;
- bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
+ bm = find_bitmap_for_commit(bitmap_git, (struct commit *)root, &found);
if (bm) {
fprintf_ln(stderr, "Found bitmap for '%s'. %d bits / %08x checksum",
- oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm));
+ oid_to_hex(&root->oid),
+ (int)bm->bit_size, ewah_checksum(bm));
+
+ if (bitmap_is_midx(found))
+ fprintf_ln(stderr, "Located via MIDX '%s'.",
+ hash_to_hex(get_midx_checksum(found->midx)));
+ else
+ fprintf_ln(stderr, "Located via pack '%s'.",
+ hash_to_hex(found->pack->hash));
result = ewah_to_bitmap(bm);
}
@@ -2602,14 +2675,8 @@ void test_bitmap_walk(struct rev_info *revs)
if (prepare_revision_walk(revs))
die(_("revision walk setup failed"));
- tdata.bitmap_git = bitmap_git;
- tdata.base = bitmap_new();
- tdata.commits = ewah_to_bitmap(bitmap_git->commits);
- tdata.trees = ewah_to_bitmap(bitmap_git->trees);
- tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
- tdata.tags = ewah_to_bitmap(bitmap_git->tags);
+ prepare_bitmap_test_data(&tdata, bitmap_git);
tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
- tdata.seen = 0;
traverse_commit_list(revs, &test_show_commit, &test_show_object, &tdata);
@@ -2621,11 +2688,7 @@ void test_bitmap_walk(struct rev_info *revs)
die(_("mismatch in bitmap results"));
bitmap_free(result);
- bitmap_free(tdata.base);
- bitmap_free(tdata.commits);
- bitmap_free(tdata.trees);
- bitmap_free(tdata.blobs);
- bitmap_free(tdata.tags);
+ free_bitmap_test_data(&tdata);
free_bitmap_index(bitmap_git);
}
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 08/13] pack-bitmap.c: compute disk-usage with incremental MIDXs
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (6 preceding siblings ...)
2024-08-15 22:29 ` [PATCH v2 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
` (4 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
In a similar fashion as previous commits, use nth_midxed_pack() instead
of accessing the MIDX's ->packs array directly to support incremental
MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 5fea2714c1..d3bb78237e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1774,7 +1774,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
} else {
pack = bitmap_git->pack;
@@ -3020,7 +3020,7 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- struct packed_git *pack = bitmap_git->midx->packs[pack_id];
+ struct packed_git *pack = nth_midxed_pack(bitmap_git->midx, pack_id);
if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
struct object_id oid;
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 09/13] pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (7 preceding siblings ...)
2024-08-15 22:29 ` [PATCH v2 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
` (3 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index d3bb78237e..1fa101bb33 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1087,10 +1087,15 @@ static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git
struct commit *commit,
uint32_t commit_pos)
{
- int ret;
+ struct bitmap_index *curr = bitmap_git;
+ int ret = 0;
- ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges,
- result, commit, commit_pos);
+ while (curr) {
+ ret += apply_pseudo_merges_for_commit(&curr->pseudo_merges,
+ result, commit,
+ commit_pos);
+ curr = curr->base;
+ }
if (ret)
pseudo_merges_satisfied_nr += ret;
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 10/13] ewah: implement `struct ewah_or_iterator`
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (8 preceding siblings ...)
2024-08-15 22:29 ` [PATCH v2 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
` (2 subsequent siblings)
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.
Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.
Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.
There are a couple of alternative approaches which were considered:
- Decompress each EWAH bitmap and OR them together, enumerating a
single (non-EWAH) bitmap. This would work, but has the disadvantage
of decompressing a potentially large bitmap, which may not be
necessary if the caller does not wish to read all of it.
- Recursively call bitmap internal functions, reusing the "result" and
"haves" bitmap from the top-most layer. This approach resembles the
original implementation of this feature, but is inefficient in that
it both (a) requires significant refactoring to implement, and (b)
enumerates large sections of later bitmaps which are all zeros (as
they pertain to objects in earlier layers).
(b) is not so bad in and of itself, but can cause significant
slow-downs when combined with expensive loop bodies.
This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
ewah/ewah_bitmap.c | 33 +++++++++++++++++++++++++++++++++
ewah/ewok.h | 12 ++++++++++++
2 files changed, 45 insertions(+)
diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 8785cbc54a..b3a7ada071 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -372,6 +372,39 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
read_new_rlw(it);
}
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr)
+{
+ size_t i;
+
+ memset(it, 0, sizeof(*it));
+
+ ALLOC_ARRAY(it->its, nr);
+ for (i = 0; i < nr; i++)
+ ewah_iterator_init(&it->its[it->nr++], parents[i]);
+}
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
+{
+ eword_t buf, out = 0;
+ size_t i;
+ int ret = 0;
+
+ for (i = 0; i < it->nr; i++)
+ if (ewah_iterator_next(&buf, &it->its[i])) {
+ out |= buf;
+ ret = 1;
+ }
+
+ *next = out;
+ return ret;
+}
+
+void ewah_or_iterator_free(struct ewah_or_iterator *it)
+{
+ free(it->its);
+}
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 5e357e2493..4b70641045 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -148,6 +148,18 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
*/
int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
+struct ewah_or_iterator {
+ struct ewah_iterator *its;
+ size_t nr;
+};
+
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr);
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
+
+void ewah_or_iterator_free(struct ewah_or_iterator *it);
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (9 preceding siblings ...)
2024-08-15 22:29 ` [PATCH v2 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
2024-08-15 22:29 ` [PATCH v2 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.
These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 51 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1fa101bb33..e1badc7887 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -78,6 +78,24 @@ struct bitmap_index {
struct ewah_bitmap *blobs;
struct ewah_bitmap *tags;
+ /*
+ * Type index arrays when this bitmap is associated with an
+ * incremental multi-pack index chain.
+ *
+ * If n is the number of unique layers in the MIDX chain, then
+ * commits_all[n-1] is this structs 'commits' field,
+ * commits_all[n-2] is the commits field of this bitmap's
+ * 'base', and so on.
+ *
+ * When either associated either with a non-incremental MIDX, or
+ * a single packfile, these arrays each contain a single
+ * element.
+ */
+ struct ewah_bitmap **commits_all;
+ struct ewah_bitmap **trees_all;
+ struct ewah_bitmap **blobs_all;
+ struct ewah_bitmap **tags_all;
+
/* Map from object ID -> `stored_bitmap` for all the bitmapped commits */
kh_oid_map_t *bitmaps;
@@ -586,7 +604,29 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
return load_pack_revindex(r, bitmap_git->pack);
}
-static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
+static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
+{
+ struct bitmap_index *curr = bitmap_git;
+ size_t i = bitmap_git->base_nr - 1;
+
+ ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr);
+ ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr);
+ ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr);
+ ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr);
+
+ while (curr) {
+ bitmap_git->commits_all[i] = curr->commits;
+ bitmap_git->trees_all[i] = curr->trees;
+ bitmap_git->blobs_all[i] = curr->blobs;
+ bitmap_git->tags_all[i] = curr->tags;
+
+ curr = curr->base;
+ i -= 1;
+ }
+}
+
+static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
+ int recursing)
{
assert(bitmap_git->map);
@@ -608,10 +648,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (bitmap_git->base) {
if (!bitmap_is_midx(bitmap_git))
BUG("non-MIDX bitmap has non-NULL base bitmap index");
- if (load_bitmap(r, bitmap_git->base) < 0)
+ if (load_bitmap(r, bitmap_git->base, 1) < 0)
goto failed;
}
+ if (!recursing)
+ load_all_type_bitmaps(bitmap_git);
+
return 0;
failed:
@@ -687,7 +730,7 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
{
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git))
+ if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git, 0))
return bitmap_git;
free_bitmap_index(bitmap_git);
@@ -2042,7 +2085,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
* from disk. this is the point of no return; after this the rev_list
* becomes invalidated and we must perform the revwalk through bitmaps
*/
- if (load_bitmap(revs->repo, bitmap_git) < 0)
+ if (load_bitmap(revs->repo, bitmap_git, 0) < 0)
goto cleanup;
if (!use_boundary_traversal)
@@ -2957,6 +3000,10 @@ void free_bitmap_index(struct bitmap_index *b)
ewah_pool_free(b->trees);
ewah_pool_free(b->blobs);
ewah_pool_free(b->tags);
+ free(b->commits_all);
+ free(b->trees_all);
+ free(b->blobs_all);
+ free(b->tags_all);
if (b->bitmaps) {
struct stored_bitmap *sb;
kh_foreach_value(b->bitmaps, sb, {
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (10 preceding siblings ...)
2024-08-15 22:29 ` [PATCH v2 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
12 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 42 +++++++++++++++++++++++++++---------------
1 file changed, 27 insertions(+), 15 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index e1badc7887..9fac43749c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1622,25 +1622,29 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
}
}
-static void init_type_iterator(struct ewah_iterator *it,
+static void init_type_iterator(struct ewah_or_iterator *it,
struct bitmap_index *bitmap_git,
enum object_type type)
{
switch (type) {
case OBJ_COMMIT:
- ewah_iterator_init(it, bitmap_git->commits);
+ ewah_or_iterator_init(it, bitmap_git->commits_all,
+ bitmap_git->base_nr);
break;
case OBJ_TREE:
- ewah_iterator_init(it, bitmap_git->trees);
+ ewah_or_iterator_init(it, bitmap_git->trees_all,
+ bitmap_git->base_nr);
break;
case OBJ_BLOB:
- ewah_iterator_init(it, bitmap_git->blobs);
+ ewah_or_iterator_init(it, bitmap_git->blobs_all,
+ bitmap_git->base_nr);
break;
case OBJ_TAG:
- ewah_iterator_init(it, bitmap_git->tags);
+ ewah_or_iterator_init(it, bitmap_git->tags_all,
+ bitmap_git->base_nr);
break;
default:
@@ -1657,7 +1661,7 @@ static void show_objects_for_type(
size_t i = 0;
uint32_t offset;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
struct bitmap *objects = bitmap_git->result;
@@ -1665,7 +1669,7 @@ static void show_objects_for_type(
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < objects->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = objects->words[i] & filter;
size_t pos = (i * BITS_IN_EWORD);
@@ -1707,6 +1711,8 @@ static void show_objects_for_type(
show_reach(&oid, object_type, 0, hash, pack, ofs);
}
}
+
+ ewah_or_iterator_free(&it);
}
static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
@@ -1758,7 +1764,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
@@ -1775,7 +1781,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* packfile.
*/
for (i = 0, init_type_iterator(&it, bitmap_git, type);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
if (i < tips->word_alloc)
mask &= ~tips->words[i];
@@ -1795,6 +1801,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_free(&it);
bitmap_free(tips);
}
@@ -1852,14 +1859,14 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
tips = find_tip_objects(bitmap_git, tip_objects, OBJ_BLOB);
for (i = 0, init_type_iterator(&it, bitmap_git, OBJ_BLOB);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
eword_t word = to_filter->words[i] & mask;
unsigned offset;
@@ -1887,6 +1894,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_free(&it);
bitmap_free(tips);
}
@@ -2502,12 +2510,12 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
struct eindex *eindex = &bitmap_git->ext_index;
uint32_t i = 0, count = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
init_type_iterator(&it, bitmap_git, type);
- while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+ while (i < objects->word_alloc && ewah_or_iterator_next(&filter, &it)) {
eword_t word = objects->words[i++] & filter;
count += ewah_bit_popcount64(word);
}
@@ -2519,6 +2527,8 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
count++;
}
+ ewah_or_iterator_free(&it);
+
return count;
}
@@ -3046,13 +3056,13 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
{
struct bitmap *result = bitmap_git->result;
off_t total = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
size_t i;
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < result->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = result->words[i] & filter;
size_t base = (i * BITS_IN_EWORD);
unsigned offset;
@@ -3093,6 +3103,8 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
}
}
+ ewah_or_iterator_free(&it);
+
return total;
}
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v2 13/13] midx: implement writing incremental MIDX bitmaps
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (11 preceding siblings ...)
2024-08-15 22:29 ` [PATCH v2 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
@ 2024-08-15 22:29 ` Taylor Blau
2024-08-28 17:55 ` [PATCH] fixup! " Junio C Hamano
12 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-08-15 22:29 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 3 +-
midx-write.c | 35 +++++++----
pack-bitmap-write.c | 65 ++++++++++++++-----
pack-bitmap.h | 4 +-
t/t5334-incremental-multi-pack-index.sh | 84 +++++++++++++++++++++++++
5 files changed, 161 insertions(+), 30 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e23c4950ed..9e61ff7ca3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1342,7 +1342,8 @@ static void write_pack_file(void)
if (write_bitmap_index) {
bitmap_writer_init(&bitmap_writer,
- the_repository, &to_pack);
+ the_repository, &to_pack,
+ NULL);
bitmap_writer_set_checksum(&bitmap_writer, hash);
bitmap_writer_build_type_index(&bitmap_writer,
written_list);
diff --git a/midx-write.c b/midx-write.c
index 81501efdda..bac3b0589a 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -826,20 +826,26 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
return cb.commits;
}
-static int write_midx_bitmap(const char *midx_name,
+static int write_midx_bitmap(struct write_midx_context *ctx,
+ const char *object_dir, const char *midx_name,
const unsigned char *midx_hash,
struct packing_data *pdata,
struct commit **commits,
uint32_t commits_nr,
- uint32_t *pack_order,
unsigned flags)
{
int ret, i;
uint16_t options = 0;
struct bitmap_writer writer;
struct pack_idx_entry **index;
- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
- hash_to_hex(midx_hash));
+ struct strbuf bitmap_name = STRBUF_INIT;
+
+ if (ctx->incremental)
+ get_split_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
+ MIDX_EXT_BITMAP);
trace2_region_enter("midx", "write_midx_bitmap", the_repository);
@@ -858,7 +864,8 @@ static int write_midx_bitmap(const char *midx_name,
for (i = 0; i < pdata->nr_objects; i++)
index[i] = &pdata->objects[i].idx;
- bitmap_writer_init(&writer, the_repository, pdata);
+ bitmap_writer_init(&writer, the_repository, pdata,
+ ctx->incremental ? ctx->base_midx : NULL);
bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS);
bitmap_writer_build_type_index(&writer, index);
@@ -876,7 +883,7 @@ static int write_midx_bitmap(const char *midx_name,
* bitmap_writer_finish().
*/
for (i = 0; i < pdata->nr_objects; i++)
- index[pack_order[i]] = &pdata->objects[i].idx;
+ index[ctx->pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(&writer, commits, commits_nr);
ret = bitmap_writer_build(&writer);
@@ -884,11 +891,11 @@ static int write_midx_bitmap(const char *midx_name,
goto cleanup;
bitmap_writer_set_checksum(&writer, midx_hash);
- bitmap_writer_finish(&writer, index, bitmap_name, options);
+ bitmap_writer_finish(&writer, index, bitmap_name.buf, options);
cleanup:
free(index);
- free(bitmap_name);
+ strbuf_release(&bitmap_name);
bitmap_writer_free(&writer);
trace2_region_leave("midx", "write_midx_bitmap", the_repository);
@@ -1072,8 +1079,6 @@ static int write_midx_internal(const char *object_dir,
trace2_region_enter("midx", "write_midx_internal", the_repository);
ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
- if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
- die(_("cannot write incremental MIDX with bitmap"));
if (ctx.incremental)
strbuf_addf(&midx_name,
@@ -1115,6 +1120,12 @@ static int write_midx_internal(const char *object_dir,
if (ctx.incremental) {
struct multi_pack_index *m = ctx.base_midx;
while (m) {
+ if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+ error(_("could not load reverse index for MIDX %s"),
+ hash_to_hex(get_midx_checksum(m)));
+ result = 1;
+ goto cleanup;
+ }
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
@@ -1404,8 +1415,8 @@ static int write_midx_internal(const char *object_dir,
FREE_AND_NULL(ctx.entries);
ctx.entries_nr = 0;
- if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
- commits, commits_nr, ctx.pack_order,
+ if (write_midx_bitmap(&ctx, object_dir, midx_name.buf,
+ midx_hash, &pdata, commits, commits_nr,
flags) < 0) {
error(_("could not write multi-pack bitmap"));
result = 1;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 923f793cec..8fc979cbc9 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -25,6 +25,8 @@
#include "alloc.h"
#include "refs.h"
#include "strmap.h"
+#include "midx.h"
+#include "pack-revindex.h"
struct bitmapped_commit {
struct commit *commit;
@@ -42,7 +44,8 @@ static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer
}
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata)
+ struct packing_data *pdata,
+ struct multi_pack_index *midx)
{
memset(writer, 0, sizeof(struct bitmap_writer));
if (writer->bitmaps)
@@ -50,6 +53,7 @@ void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
writer->bitmaps = kh_init_oid_map();
writer->pseudo_merge_commits = kh_init_oid_map();
writer->to_pack = pdata;
+ writer->midx = midx;
string_list_init_dup(&writer->pseudo_merge_groups);
@@ -104,6 +108,11 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
struct pack_idx_entry **index)
{
uint32_t i;
+ uint32_t base_objects = 0;
+
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
writer->commits = ewah_new();
writer->trees = ewah_new();
@@ -133,19 +142,19 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
switch (real_type) {
case OBJ_COMMIT:
- ewah_set(writer->commits, i);
+ ewah_set(writer->commits, i + base_objects);
break;
case OBJ_TREE:
- ewah_set(writer->trees, i);
+ ewah_set(writer->trees, i + base_objects);
break;
case OBJ_BLOB:
- ewah_set(writer->blobs, i);
+ ewah_set(writer->blobs, i + base_objects);
break;
case OBJ_TAG:
- ewah_set(writer->tags, i);
+ ewah_set(writer->tags, i + base_objects);
break;
default:
@@ -198,19 +207,37 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer,
static uint32_t find_object_pos(struct bitmap_writer *writer,
const struct object_id *oid, int *found)
{
- struct object_entry *entry = packlist_find(writer->to_pack, oid);
+ struct object_entry *entry;
+
+ entry = packlist_find(writer->to_pack, oid);
+ if (entry) {
+ uint32_t base_objects = 0;
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+
+ if (found)
+ *found = 1;
+ return oe_in_pack_pos(writer->to_pack, entry) + base_objects;
+ } else if (writer->midx) {
+ uint32_t at, pos;
+
+ if (!bsearch_midx(oid, writer->midx, &at))
+ goto missing;
+ if (midx_to_pack_pos(writer->midx, at, &pos) < 0)
+ goto missing;
- if (!entry) {
if (found)
- *found = 0;
- warning("Failed to write bitmap index. Packfile doesn't have full closure "
- "(object %s is missing)", oid_to_hex(oid));
- return 0;
+ *found = 1;
+ return pos;
}
+missing:
if (found)
- *found = 1;
- return oe_in_pack_pos(writer->to_pack, entry);
+ *found = 0;
+ warning("Failed to write bitmap index. Packfile doesn't have full closure "
+ "(object %s is missing)", oid_to_hex(oid));
+ return 0;
}
static void compute_xor_offsets(struct bitmap_writer *writer)
@@ -577,7 +604,7 @@ int bitmap_writer_build(struct bitmap_writer *writer)
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct prio_queue tree_queue = { NULL };
struct bitmap_index *old_bitmap;
- uint32_t *mapping;
+ uint32_t *mapping = NULL;
int closed = 1; /* until proven otherwise */
if (writer->show_progress)
@@ -1010,7 +1037,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
struct strbuf tmp_file = STRBUF_INIT;
struct hashfile *f;
off_t *offsets = NULL;
- uint32_t i;
+ uint32_t i, base_objects;
struct bitmap_disk_header header;
@@ -1036,6 +1063,12 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (options & BITMAP_OPT_LOOKUP_TABLE)
CALLOC_ARRAY(offsets, writer->to_pack->nr_objects);
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+ else
+ base_objects = 0;
+
for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
struct bitmapped_commit *stored = &writer->selected[i];
int commit_pos = oid_pos(&stored->commit->object.oid, index,
@@ -1044,7 +1077,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (commit_pos < 0)
BUG(_("trying to write commit not in index"));
- stored->commit_pos = commit_pos;
+ stored->commit_pos = commit_pos + base_objects;
}
write_selected_commits_v1(writer, f, offsets);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index ff0fd815b8..4242458198 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -110,6 +110,7 @@ struct bitmap_writer {
kh_oid_map_t *bitmaps;
struct packing_data *to_pack;
+ struct multi_pack_index *midx; /* if appending to a MIDX chain */
struct bitmapped_commit *selected;
unsigned int selected_nr, selected_alloc;
@@ -124,7 +125,8 @@ struct bitmap_writer {
};
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata);
+ struct packing_data *pdata,
+ struct multi_pack_index *midx);
void bitmap_writer_show_progress(struct bitmap_writer *writer, int show);
void bitmap_writer_set_checksum(struct bitmap_writer *writer,
const unsigned char *sha1);
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
index c3b08acc73..0b6d45c8fd 100755
--- a/t/t5334-incremental-multi-pack-index.sh
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -43,4 +43,88 @@ test_expect_success 'convert incremental to non-incremental' '
compare_results_with_midx 'non-incremental MIDX conversion'
+write_midx_layer () {
+ n=1
+ if test -f $midx_chain
+ then
+ n="$(($(wc -l <$midx_chain) + 1))"
+ fi
+
+ for i in 1 2
+ do
+ test_commit $n.$i &&
+ git repack -d || return 1
+ done &&
+ git multi-pack-index write --bitmap --incremental
+}
+
+test_expect_success 'write initial MIDX layer' '
+ git repack -ad &&
+ write_midx_layer
+'
+
+test_expect_success 'read bitmap from first MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'write another MIDX layer' '
+ write_midx_layer
+'
+
+test_expect_success 'midx verify with multiple layers' '
+ git multi-pack-index verify
+'
+
+test_expect_success 'read bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 2.2
+'
+
+test_expect_success 'read earlier bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'show object from first pack' '
+ git cat-file -p 1.1
+'
+
+test_expect_success 'show object from second pack' '
+ git cat-file -p 2.2
+'
+
+for reuse in false single multi
+do
+ test_expect_success "full clone (pack.allowPackReuse=$reuse)" '
+ rm -fr clone.git &&
+
+ git config pack.allowPackReuse $reuse &&
+ git clone --no-local --bare . clone.git
+ '
+done
+
+test_expect_success 'relink existing MIDX layer' '
+ rm -fr "$midxdir" &&
+
+ GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap &&
+
+ midx_hash="$(test-tool read-midx --checksum $objdir)" &&
+
+ test_path_is_file "$packdir/multi-pack-index" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_commit another &&
+ git repack -d &&
+ git multi-pack-index write --bitmap --incremental &&
+
+ test_path_is_missing "$packdir/multi-pack-index" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.midx" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.rev" &&
+ test_line_count = 2 "$midx_chain"
+
+'
+
test_done
--
2.46.0.86.ge766d390f0.dirty
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH] fixup! midx: implement writing incremental MIDX bitmaps
2024-08-15 22:29 ` [PATCH v2 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
@ 2024-08-28 17:55 ` Junio C Hamano
2024-08-28 18:33 ` Jeff King
0 siblings, 1 reply; 136+ messages in thread
From: Junio C Hamano @ 2024-08-28 17:55 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King
With -Wunused, the compiler notices that the midx_name parameter is
unused. In this case, it is truly unused, the function signature is
not constrained externally, so we can simply drop the parameter from
the definition of the function and its sole caller.
This comes from 01a2cbab (midx: implement writing incremental MIDX
bitmaps, 2024-08-15), so I'll squash the following to that commit.
midx-write.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git c/midx-write.c w/midx-write.c
index bac3b0589a..0ad9139fdb 100644
--- c/midx-write.c
+++ w/midx-write.c
@@ -827,7 +827,7 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
}
static int write_midx_bitmap(struct write_midx_context *ctx,
- const char *object_dir, const char *midx_name,
+ const char *object_dir,
const unsigned char *midx_hash,
struct packing_data *pdata,
struct commit **commits,
@@ -1415,7 +1415,7 @@ static int write_midx_internal(const char *object_dir,
FREE_AND_NULL(ctx.entries);
ctx.entries_nr = 0;
- if (write_midx_bitmap(&ctx, object_dir, midx_name.buf,
+ if (write_midx_bitmap(&ctx, object_dir,
midx_hash, &pdata, commits, commits_nr,
flags) < 0) {
error(_("could not write multi-pack bitmap"));
^ permalink raw reply related [flat|nested] 136+ messages in thread
* Re: [PATCH] fixup! midx: implement writing incremental MIDX bitmaps
2024-08-28 17:55 ` [PATCH] fixup! " Junio C Hamano
@ 2024-08-28 18:33 ` Jeff King
2024-08-29 18:57 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2024-08-28 18:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Taylor Blau, git
On Wed, Aug 28, 2024 at 10:55:20AM -0700, Junio C Hamano wrote:
> With -Wunused, the compiler notices that the midx_name parameter is
> unused. In this case, it is truly unused, the function signature is
> not constrained externally, so we can simply drop the parameter from
> the definition of the function and its sole caller.
>
> This comes from 01a2cbab (midx: implement writing incremental MIDX
> bitmaps, 2024-08-15), so I'll squash the following to that commit.
Well, that didn't take long for this to come up again. :) I've been
fixing them progressively as they hit 'next' (since I don't usually
build 'seen'), but this one isn't there yet.
I'm always curious in these cases why we have the parameter at all if
it's unnecessary (i.e., is it a bug or leftover cruft). In this case, it
was present before that commit, but refactoring meant that we no longer
write to $name-$hash.bitmap, but instead use get_midx_filename_ext(), or
get_split_midx_filename_ext() in incremental mode.
Is that right, though? It looks like the caller might pass in a
tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX,
if we're in incremental mode. But we'll write directly to
"multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to
what degree it matters, since that's the name we want in the long run.
But would we possibly overwrite an active-in-use file rather than doing
the atomic rename-into-place if we happened to generate the same midx?
It feels like we should still respect the name the caller is using for
tempfiles, and then rename it into the correct spot at the end.
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH] fixup! midx: implement writing incremental MIDX bitmaps
2024-08-28 18:33 ` Jeff King
@ 2024-08-29 18:57 ` Taylor Blau
2024-08-29 19:27 ` Jeff King
0 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-08-29 18:57 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
On Wed, Aug 28, 2024 at 02:33:56PM -0400, Jeff King wrote:
> Is that right, though? It looks like the caller might pass in a
> tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX,
> if we're in incremental mode. But we'll write directly to
> "multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to
> what degree it matters, since that's the name we want in the long run.
> But would we possibly overwrite an active-in-use file rather than doing
> the atomic rename-into-place if we happened to generate the same midx?
>
> It feels like we should still respect the name the caller is using for
> tempfiles, and then rename it into the correct spot at the end.
In either case, we're going to write to a temporary file initialized by
the pack-bitmap machinery and then rename() it into place at the end of
bitmap_writer_finish().
On the caller side, in the non-incremental mode, we'll pass
$GIT_DIR/objects/pack/multi-pack-index-$hash.bitmap as the location,
write its contents into a temporary file, and then rename() it there.
But in the incremental mode this series introduces, I think it would be
a bug to pass a tmp_midx_XXXXXX file path there, since nobody would move
it from tmp_midx_XXXXX-$HASH.bitmap into its final location.
So I think what's written here with the fixup! patch is right (and
should be squashed into 13/13 in the next round), but let me know if I'm
missing something.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH] fixup! midx: implement writing incremental MIDX bitmaps
2024-08-29 18:57 ` Taylor Blau
@ 2024-08-29 19:27 ` Jeff King
2024-11-19 20:56 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2024-08-29 19:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: Junio C Hamano, git
On Thu, Aug 29, 2024 at 02:57:08PM -0400, Taylor Blau wrote:
> On Wed, Aug 28, 2024 at 02:33:56PM -0400, Jeff King wrote:
> > Is that right, though? It looks like the caller might pass in a
> > tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX,
> > if we're in incremental mode. But we'll write directly to
> > "multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to
> > what degree it matters, since that's the name we want in the long run.
> > But would we possibly overwrite an active-in-use file rather than doing
> > the atomic rename-into-place if we happened to generate the same midx?
> >
> > It feels like we should still respect the name the caller is using for
> > tempfiles, and then rename it into the correct spot at the end.
>
> In either case, we're going to write to a temporary file initialized by
> the pack-bitmap machinery and then rename() it into place at the end of
> bitmap_writer_finish().
OK, that addresses my worry, if we're always writing to a tempfile (and
I verified with some recent stracing that this is the case). So renaming
that into tmp_midx_XXXXXX.bitmap would just be a pointless extra layer
of renames.
I do wonder if it's possible for us to generate a new different revindex
and bitmap pair for the same midx hash, and for a reader to see a
mismatched set for a moment. But that's an atomicity problem, and an
extra layer of renames is not going to solve that.
> On the caller side, in the non-incremental mode, we'll pass
> $GIT_DIR/objects/pack/multi-pack-index-$hash.bitmap as the location,
> write its contents into a temporary file, and then rename() it there.
>
> But in the incremental mode this series introduces, I think it would be
> a bug to pass a tmp_midx_XXXXXX file path there, since nobody would move
> it from tmp_midx_XXXXX-$HASH.bitmap into its final location.
>
> So I think what's written here with the fixup! patch is right (and
> should be squashed into 13/13 in the next round), but let me know if I'm
> missing something.
What confused me is that write_midx_reverse_index() _does_ still take
midx_name, and respects it. But I think that is a bug!
We do not usually even call that function, since modern midx's have a
RIDX chunk inside them instead of a separate file. But if you do this:
# generate an extra pack
git commit --allow-empty -m foo
git repack -d
# make an incremental midx with a .rev file; usually this ends up
# as a RIDX chunk, so we have to force it.
GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --incremental --bitmap
then you'll end up with a tmp_midx_XXXXXX-*.rev file leftover in
multi-pack-index.d (since, as you note, nobody is moving those into
place).
So probably write_midx_reverse_index() needs the same treatment to
derive its own filenames for the incremental case, and to drop the
midx_name parameter.
Or I wonder if we could simply drop the code to write a separate .rev
file entirely? I don't think there's a reason anybody would want it.
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH] fixup! midx: implement writing incremental MIDX bitmaps
2024-08-29 19:27 ` Jeff King
@ 2024-11-19 20:56 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 20:56 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
On Thu, Aug 29, 2024 at 03:27:13PM -0400, Jeff King wrote:
> On Thu, Aug 29, 2024 at 02:57:08PM -0400, Taylor Blau wrote:
>
> > On Wed, Aug 28, 2024 at 02:33:56PM -0400, Jeff King wrote:
> > > Is that right, though? It looks like the caller might pass in a
> > > tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX,
> > > if we're in incremental mode. But we'll write directly to
> > > "multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to
> > > what degree it matters, since that's the name we want in the long run.
> > > But would we possibly overwrite an active-in-use file rather than doing
> > > the atomic rename-into-place if we happened to generate the same midx?
> > >
> > > It feels like we should still respect the name the caller is using for
> > > tempfiles, and then rename it into the correct spot at the end.
> >
> > In either case, we're going to write to a temporary file initialized by
> > the pack-bitmap machinery and then rename() it into place at the end of
> > bitmap_writer_finish().
>
> OK, that addresses my worry, if we're always writing to a tempfile (and
> I verified with some recent stracing that this is the case). So renaming
> that into tmp_midx_XXXXXX.bitmap would just be a pointless extra layer
> of renames.
Yeah, I think we are OK here.
> I do wonder if it's possible for us to generate a new different revindex
> and bitmap pair for the same midx hash, and for a reader to see a
> mismatched set for a moment. But that's an atomicity problem, and an
> extra layer of renames is not going to solve that.
What you're describing is basically the bug that we fixed in 95e8383bac
(midx.c: make changing the preferred pack safe, 2022-01-25). That commit
sought to ensure that there was no way to have a different reverse index
(IOW, pseudo-pack order) for a given MIDX hash.
It's possible that there is some case that we're not yet covering, but I
can't think of it. The things that we care about are (a) the set of
objects, (b) the set of packs those objects came from, and (c) the
preferred pack. We don't store (c) directly, but we can infer it from
the reverse index, which we do write within the RIDX chunk as you note
below.
> > On the caller side, in the non-incremental mode, we'll pass
> > $GIT_DIR/objects/pack/multi-pack-index-$hash.bitmap as the location,
> > write its contents into a temporary file, and then rename() it there.
> >
> > But in the incremental mode this series introduces, I think it would be
> > a bug to pass a tmp_midx_XXXXXX file path there, since nobody would move
> > it from tmp_midx_XXXXX-$HASH.bitmap into its final location.
> >
> > So I think what's written here with the fixup! patch is right (and
> > should be squashed into 13/13 in the next round), but let me know if I'm
> > missing something.
>
> What confused me is that write_midx_reverse_index() _does_ still take
> midx_name, and respects it. But I think that is a bug!
>
> We do not usually even call that function, since modern midx's have a
> RIDX chunk inside them instead of a separate file. But if you do this:
>
> # generate an extra pack
> git commit --allow-empty -m foo
> git repack -d
>
> # make an incremental midx with a .rev file; usually this ends up
> # as a RIDX chunk, so we have to force it.
> GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --incremental --bitmap
>
> then you'll end up with a tmp_midx_XXXXXX-*.rev file leftover in
> multi-pack-index.d (since, as you note, nobody is moving those into
> place).
>
> So probably write_midx_reverse_index() needs the same treatment to
> derive its own filenames for the incremental case, and to drop the
> midx_name parameter.
>
> Or I wonder if we could simply drop the code to write a separate .rev
> file entirely? I don't think there's a reason anybody would want it.
I would kind of like to get rid of it, but we use it in a couple of
places in the test suite:
$ git grep GIT_TEST_MIDX_WRITE_REV=
t/t5327-multi-pack-bitmaps-rev.sh:GIT_TEST_MIDX_WRITE_REV=1
t/t5334-incremental-multi-pack-index.sh: GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap &&
And both of those tests are testing the old behavior, which we need an
out-of-MIDX .rev file in order to do. Alternatively, we could store a
test fixture in the repository that contains these files so we don't
have to build them from scratch.
But after the xz incident earlier this year, I am *very* wary of adding
binary test fixtures into the tree, since they seem like an easy vector
for attack.
So I'm content to fix the bug here and keep the old code around for a
while longer. The fix is indeed as simple as you described, which is
nice ;-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* [PATCH v3 00/13] midx: incremental multi-pack indexes, part two
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (13 preceding siblings ...)
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
` (13 more replies)
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
16 siblings, 14 replies; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
== Changes since last time
This round fixes a small issue when writing legacy ".rev" files outside
of the MIDX in '--incremental' mode.
The rest of the series is unchanged, and re-submitted to solicit review
now that I have more time to focus on this series.
== Original cover letter
This series is based on 'master', with an additional merge between
tb/incremental-midx-part-1[1] and my newer series to fix a handful of
bugs related to pseudo-merge bitmaps[2].
This is the second of three series to implement support for incremental
multi-pack indexes (MIDXs). This series brings support for bitmaps that
are tied to incremental MIDXs in addition to regular MIDX bitmaps.
The details are laid out in the commits themselves, but the high-level
approach is as follows:
- Each layer in the incremental MIDX chain has its own corresponding
*.bitmap file. Each bitmap contains commits / pseudo-merges which
are selected only from the commits in that layer. Likewise, only
that layer's objects are written in the type-level bitmaps.
- The reachability traversal is only conducted on the top-most bitmap
corresponding to the most recent layer in the incremental MIDX
chain. Earlier layers may be consulted to retrieve commit /
pseudo-merge reachability bitmaps, but only the top-most bitmap's
"result" and "haves" fields are used.
- In essence, the top-most bitmap is the only one that "matters", and
earlier bitmaps are merely used to look up commit and pseudo-merge
bitmaps from that layer.
- Whenever we need to look at the type-level bitmaps corresponding to
the whole incremental MIDX chain, a new "ewah_or_iterator" is used.
This works in concept like a typical ewah_iterator, except works
over many EWAH bitmaps in parallel, OR-ing their results together
before returning them to the user.
In effect, this allows us to treat the union of all type-level
bitmaps (each of which only stores information about the objects its
corresponding layer within the incremental MIDX chain) as a single
type-level bitmap corresponding to all of the objects across every
layer of the incremental MIDX chain.
The sum total of this series is that we are able to append new commits /
pseudo-merges to a repository's reachability bitmaps without having to
rewrite existing bitmaps, making the operation much cheaper to perform
in large repositories.
The series is laid out roughly as follows:
- The first patch describes the technical details of incremental MIDX
bitmaps.
- The second patch adjusts the pack-revindex internals to prepare for
incremental MIDX bitmaps.
- The next seven patches adjust various components of the pack-bitmap
internals to do the same.
- The next three patches introduce and adjust callers to use the
ewah_or_iterator (as above).
- The final patch implements writing incremental MIDX bitmaps, and
introduces tests.
After this series, the remaining goals for this project include being
able to compact contiguous runs of incremental MIDX layers into a single
layer to support growing the chain of MIDX layers without the chain
itself becoming too long.
Thanks in advance for your review!
[1]: https://lore.kernel.org/git/cover.1722958595.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/cover.1723743050.git.me@ttaylorr.com/
Taylor Blau (13):
Documentation: describe incremental MIDX bitmaps
pack-revindex: prepare for incremental MIDX bitmaps
pack-bitmap.c: open and store incremental bitmap layers
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: keep track of each layer's type bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
midx: implement writing incremental MIDX bitmaps
Documentation/technical/multi-pack-index.txt | 64 ++++
builtin/pack-objects.c | 3 +-
ewah/ewah_bitmap.c | 33 ++
ewah/ewok.h | 12 +
midx-write.c | 49 ++-
pack-bitmap-write.c | 65 +++-
pack-bitmap.c | 329 +++++++++++++++----
pack-bitmap.h | 4 +-
pack-revindex.c | 32 +-
t/t5334-incremental-multi-pack-index.sh | 84 +++++
10 files changed, 559 insertions(+), 116 deletions(-)
Range-diff against v2:
1: 90b21b11ed7 < -: ----------- Documentation: describe incremental MIDX format
2: 0d3b19c59ff < -: ----------- midx: add new fields for incremental MIDX chains
3: 5cd742b6776 < -: ----------- midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
4: 372104c73de < -: ----------- midx: teach `prepare_midx_pack()` about incremental MIDXs
5: e68a3ceff9a < -: ----------- midx: teach `nth_midxed_object_oid()` about incremental MIDXs
6: ff2d7bc5ca0 < -: ----------- midx: teach `nth_bitmapped_pack()` about incremental MIDXs
7: 32c3fceada7 < -: ----------- midx: introduce `bsearch_one_midx()`
8: 16db6c98cec < -: ----------- midx: teach `bsearch_midx()` about incremental MIDXs
9: 761c7c59ba1 < -: ----------- midx: teach `nth_midxed_offset()` about incremental MIDXs
10: 8366456d29b < -: ----------- midx: teach `fill_midx_entry()` about incremental MIDXs
11: 909d927c470 < -: ----------- midx: remove unused `midx_locate_pack()`
12: 71127601b5d < -: ----------- midx: teach `midx_contains_pack()` about incremental MIDXs
13: 2f98ebb141e < -: ----------- midx: teach `midx_preferred_pack()` about incremental MIDXs
14: 550ae2dc933 < -: ----------- midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
15: 9ae1bc415e9 < -: ----------- midx: support reading incremental MIDX chains
16: 3d4181df518 < -: ----------- midx: implement verification support for incremental MIDXs
17: 3b268f91bf3 < -: ----------- t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
18: 09d74f89423 < -: ----------- t/t5313-pack-bounds-checks.sh: prepare for sub-directories
19: 5d467d38a8d < -: ----------- midx: implement support for writing incremental MIDX chains
20: 9d322fc5399 < -: ----------- pack-bitmap: initialize `bitmap_writer_init()` with packing_data
21: 238ca46998e < -: ----------- pack-bitmap: drop redundant args from `bitmap_writer_build_type_index()`
22: 5e198489fa8 < -: ----------- pack-bitmap: drop redundant args from `bitmap_writer_build()`
23: 819a0765f38 < -: ----------- pack-bitmap: drop redundant args from `bitmap_writer_finish()`
24: 0fea7803d86 < -: ----------- pack-bitmap-write.c: select pseudo-merges even for small bitmaps
25: 228553e412f < -: ----------- t/t5333-pseudo-merge-bitmaps.sh: demonstrate empty pseudo-merge groups
26: c7e0ee07120 < -: ----------- pseudo-merge.c: do not generate empty pseudo-merge commits
27: c9a64b1d2a9 < -: ----------- pseudo-merge.c: ensure pseudo-merge groups are closed
28: d1b8d11b37f = 1: caed2c6ec34 Documentation: describe incremental MIDX bitmaps
29: f5d0866e5cb = 2: b902513f436 pack-revindex: prepare for incremental MIDX bitmaps
30: 43444efc214 ! 3: 5b5d625cbe0 pack-bitmap.c: open and store incremental bitmap layers
@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r)
return bitmap_git;
free_bitmap_index(bitmap_git);
+@@ pack-bitmap.c: void free_bitmap_index(struct bitmap_index *b)
+ close_midx_revindex(b->midx);
+ }
+ free_pseudo_merge_map(&b->pseudo_merges);
++ free_bitmap_index(b->base);
+ free(b);
+ }
+
31: 44871306487 = 4: 16259667fb4 pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
32: b720fe56da8 = 5: b7a45d7eff8 pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
33: 9716d022e0b = 6: c8401fa0fbd pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
34: 6baece31750 = 7: 17ab23dd76d pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
35: 5c909df38ad = 8: 75d170ce078 pack-bitmap.c: compute disk-usage with incremental MIDXs
36: f9ae10fce90 = 9: 0b4fcfcecb6 pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
37: 04042981c1a = 10: e1b5f6181e3 ewah: implement `struct ewah_or_iterator`
38: c4d543d43dc = 11: 9ab8fb472f4 pack-bitmap.c: keep track of each layer's type bitmaps
39: c6730b4107e = 12: 87cb011e7fc pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
40: afefb455575 ! 13: 77ddd1170f9 midx: implement writing incremental MIDX bitmaps
@@ builtin/pack-objects.c: static void write_pack_file(void)
written_list);
## midx-write.c ##
+@@ midx-write.c: static uint32_t *midx_pack_order(struct write_midx_context *ctx)
+ return pack_order;
+ }
+
+-static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
+- struct write_midx_context *ctx)
++static void write_midx_reverse_index(struct write_midx_context *ctx,
++ const char *object_dir,
++ unsigned char *midx_hash)
+ {
+ struct strbuf buf = STRBUF_INIT;
+ char *tmp_file;
+
+ trace2_region_enter("midx", "write_midx_reverse_index", the_repository);
+
+- strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
++ if (ctx->incremental)
++ get_split_midx_filename_ext(&buf, object_dir, midx_hash,
++ MIDX_EXT_REV);
++ else
++ get_midx_filename_ext(&buf, object_dir, midx_hash,
++ MIDX_EXT_REV);
+
+ tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
+ midx_hash, WRITE_REV);
@@ midx-write.c: static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
return cb.commits;
}
-static int write_midx_bitmap(const char *midx_name,
+static int write_midx_bitmap(struct write_midx_context *ctx,
-+ const char *object_dir, const char *midx_name,
++ const char *object_dir,
const unsigned char *midx_hash,
struct packing_data *pdata,
struct commit **commits,
@@ midx-write.c: static int write_midx_internal(const char *object_dir,
m = m->base_midx;
}
@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+
+ if (flags & MIDX_WRITE_REV_INDEX &&
+ git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
+- write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
++ write_midx_reverse_index(&ctx, object_dir, midx_hash);
+
+ if (flags & MIDX_WRITE_BITMAP) {
+ struct packing_data pdata;
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
FREE_AND_NULL(ctx.entries);
ctx.entries_nr = 0;
- if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
- commits, commits_nr, ctx.pack_order,
-+ if (write_midx_bitmap(&ctx, object_dir, midx_name.buf,
++ if (write_midx_bitmap(&ctx, object_dir,
+ midx_hash, &pdata, commits, commits_nr,
flags) < 0) {
error(_("could not write multi-pack bitmap"));
base-commit: 090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply [flat|nested] 136+ messages in thread
* [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 02/13] pack-revindex: prepare for " Taylor Blau
` (12 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
Prepare to implement support for reachability bitmaps for the new
incremental multi-pack index (MIDX) feature over the following commits.
This commit begins by first describing the relevant format and usage
details for incremental MIDX bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/technical/multi-pack-index.txt | 64 ++++++++++++++++++++
1 file changed, 64 insertions(+)
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index cc063b30bea..a063262c360 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -164,6 +164,70 @@ objects_nr($H2) + objects_nr($H1) + i
(in the C implementation, this is often computed as `i +
m->num_objects_in_base`).
+=== Pseudo-pack order for incremental MIDXs
+
+The original implementation of multi-pack reachability bitmaps defined
+the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
+titled "multi-pack-index reverse indexes") roughly as follows:
+
+____
+In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the MIDX, laid out in pack order, and the
+packs arranged in MIDX order (with the preferred pack coming first).
+____
+
+In the incremental MIDX design, we extend this definition to include
+objects from multiple layers of the MIDX chain. The pseudo-pack order
+for incremental MIDXs is determined by concatenating the pseudo-pack
+ordering for each layer of the MIDX chain in order. Formally two objects
+`o1` and `o2` are compared as follows:
+
+1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
+ `o1` is considered less than `o2`.
+2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
+ MIDX layer has no base, then If one of `pack(o1)` and `pack(o2)` is
+ preferred and the other is not, then the preferred one sorts first. If
+ there is a base layer (i.e. the MIDX layer is not the first layer in
+ the chain), then if `pack(o1)` appears earlier in that MIDX layer's
+ pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
+ appears earlier, than the opposite is true.
+3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
+ same MIDX layer. Sort `o1` and `o2` by their offset within their
+ containing packfile.
+
+=== Reachability bitmaps and incremental MIDXs
+
+Each layer of an incremental MIDX chain may have its objects (and the
+objects from any previous layer in the same MIDX chain) represented in
+its own `*.bitmap` file.
+
+The structure of a `*.bitmap` file belonging to an incremental MIDX
+chain is identical to that of a non-incremental MIDX bitmap, or a
+classic single-pack bitmap. Since objects are added to the end of the
+incremental MIDX's pseudo-pack order (see: above), it is possible to
+extend a bitmap when appending to the end of a MIDX chain.
+
+(Note: it is possible likewise to compress a contiguous sequence of MIDX
+incremental layers, and their `*.bitmap`(s) into a single layer and
+`*.bitmap`, but this is not yet implemented.)
+
+The object positions used are global within the pseudo-pack order, so
+subsequent layers will have, for example, `m->num_objects_in_base`
+number of `0` bits in each of their four type bitmaps. This follows from
+the fact that we only write type bitmap entries for objects present in
+the layer immediately corresponding to the bitmap).
+
+Note also that only the bitmap pertaining to the most recent layer in an
+incremental MIDX chain is used to store reachability information about
+the interesting and uninteresting objects in a reachability query.
+Earlier bitmap layers are only used to look up commit and pseudo-merge
+bitmaps from that layer, as well as the type-level bitmaps for objects
+in that layer.
+
+To simplify the implementation, type-level bitmaps are iterated
+simultaneously, and their results are OR'd together to avoid recursively
+calling internal bitmap functions.
+
Future Work
-----------
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
2024-11-19 22:07 ` [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
` (11 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:
- load_midx_revindex() learns to use the appropriate MIDX filename
depending on whether the given 'struct multi_pack_index *' is
incremental or not.
- pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
object position in the MIDX pseudo-pack order, and finds the
earliest containing MIDX (similar to midx.c::midx_for_object().
- midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
number of objects in the base (since 'vb - midx->revindx_data' is
relative to the containing MIDX, and pack_pos_to_midx() expects a
global position).
Likewise, this function adjusts its output by adding
m->num_objects_in_base to return a global position out through the
`*pos` pointer.
Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 30 ++++++++++++++++++++----------
pack-revindex.c | 32 +++++++++++++++++++++++---------
2 files changed, 43 insertions(+), 19 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 4fa9dfc771a..bba9c6a905a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
return read_bitmap(index->map, index->map_size, &index->map_pos);
}
+static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
+{
+ if (index->midx) {
+ struct multi_pack_index *m = index->midx;
+ return m->num_objects + m->num_objects_in_base;
+ }
+ return index->pack->num_objects;
+}
+
static uint32_t bitmap_num_objects(struct bitmap_index *index)
{
if (index->midx)
@@ -925,7 +934,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
if (pos < kh_end(positions)) {
int bitmap_pos = kh_value(positions, pos);
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
}
return -1;
@@ -993,7 +1002,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
bitmap_pos = kh_value(eindex->positions, hash_pos);
}
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
}
struct bitmap_show_data {
@@ -1498,7 +1507,8 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
struct object *obj;
- if (!bitmap_get(objects, st_add(bitmap_num_objects(bitmap_git), i)))
+ if (!bitmap_get(objects,
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
continue;
obj = eindex->objects[i];
@@ -1677,7 +1687,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* them individually.
*/
for (i = 0; i < eindex->count; i++) {
- size_t pos = st_add(i, bitmap_num_objects(bitmap_git));
+ size_t pos = st_add(i, bitmap_non_extended_bits(bitmap_git));
if (eindex->objects[i]->type == type &&
bitmap_get(to_filter, pos) &&
!bitmap_get(tips, pos))
@@ -1703,7 +1713,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
oi.sizep = &size;
- if (pos < bitmap_num_objects(bitmap_git)) {
+ if (pos < bitmap_non_extended_bits(bitmap_git)) {
struct packed_git *pack;
off_t ofs;
@@ -1726,7 +1736,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
}
} else {
struct eindex *eindex = &bitmap_git->ext_index;
- struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
+ struct object *obj = eindex->objects[pos - bitmap_non_extended_bits(bitmap_git)];
if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
}
@@ -1878,7 +1888,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t objects_nr;
size_t i, pos;
- objects_nr = bitmap_num_objects(bitmap_git);
+ objects_nr = bitmap_non_extended_bits(bitmap_git);
pos = objects_nr / BITS_IN_EWORD;
if (pos > result->word_alloc)
@@ -2403,7 +2413,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
if (eindex->objects[i]->type == type &&
bitmap_get(objects,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
count++;
}
@@ -2802,7 +2812,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
BUG("rebuild_existing_bitmaps: missing required rev-cache "
"extension");
- num_objects = bitmap_num_objects(bitmap_git);
+ num_objects = bitmap_non_extended_bits(bitmap_git);
CALLOC_ARRAY(reposition, num_objects);
for (i = 0; i < num_objects; ++i) {
@@ -2945,7 +2955,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
struct object *obj = eindex->objects[i];
if (!bitmap_get(result,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
continue;
if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
diff --git a/pack-revindex.c b/pack-revindex.c
index 22d3c234648..ce3f7ae2149 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -383,8 +383,12 @@ int load_midx_revindex(struct multi_pack_index *m)
trace2_data_string("load_midx_revindex", the_repository,
"source", "rev");
- get_midx_filename_ext(&revindex_name, m->object_dir,
- get_midx_checksum(m), MIDX_EXT_REV);
+ if (m->has_chain)
+ get_split_midx_filename_ext(&revindex_name, m->object_dir,
+ get_midx_checksum(m), MIDX_EXT_REV);
+ else
+ get_midx_filename_ext(&revindex_name, m->object_dir,
+ get_midx_checksum(m), MIDX_EXT_REV);
ret = load_revindex_from_disk(revindex_name.buf,
m->num_objects,
@@ -471,11 +475,15 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
{
+ while (m && pos < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
if (!m->revindex_data)
BUG("pack_pos_to_midx: reverse index not yet loaded");
- if (m->num_objects <= pos)
+ if (m->num_objects + m->num_objects_in_base <= pos)
BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
- return get_be32(m->revindex_data + pos);
+ return get_be32(m->revindex_data + pos - m->num_objects_in_base);
}
struct midx_pack_key {
@@ -491,7 +499,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
const struct midx_pack_key *key = va;
struct multi_pack_index *midx = key->midx;
- uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+ size_t pos = (uint32_t*)vb - (const uint32_t *)midx->revindex_data;
+ uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
off_t versus_offset;
@@ -529,9 +538,9 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
{
uint32_t *found;
- if (key->pack >= m->num_packs)
+ if (key->pack >= m->num_packs + m->num_packs_in_base)
BUG("MIDX pack lookup out of bounds (%"PRIu32" >= %"PRIu32")",
- key->pack, m->num_packs);
+ key->pack, m->num_packs + m->num_packs_in_base);
/*
* The preferred pack sorts first, so determine its identifier by
* looking at the first object in pseudo-pack order.
@@ -551,7 +560,8 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
if (!found)
return -1;
- *pos = found - m->revindex_data;
+ *pos = (found - m->revindex_data) + m->num_objects_in_base;
+
return 0;
}
@@ -559,9 +569,13 @@ int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
{
struct midx_pack_key key;
+ while (m && at < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, at);
if (!m->revindex_data)
BUG("midx_to_pack_pos: reverse index not yet loaded");
- if (m->num_objects <= at)
+ if (m->num_objects + m->num_objects_in_base <= at)
BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
key.pack = nth_midxed_pack_int_id(m, at);
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 03/13] pack-bitmap.c: open and store incremental bitmap layers
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
2024-11-19 22:07 ` [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
2024-11-19 22:07 ` [PATCH v3 02/13] pack-revindex: prepare for " Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
` (10 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.
The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain bitmap layers along the "base" pointer,
ensures that the correct packs and their reverse indexes are loaded
across MIDX layers, etc.
While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.
Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 64 ++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 51 insertions(+), 13 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index bba9c6a905a..41675a69f68 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -54,6 +54,13 @@ struct bitmap_index {
struct packed_git *pack;
struct multi_pack_index *midx;
+ /*
+ * If using a multi-pack index chain, 'base' points to the
+ * bitmap index corresponding to this bitmap's midx->base_midx.
+ */
+ struct bitmap_index *base;
+ uint32_t base_nr;
+
/* mmapped buffer of the whole bitmap index */
unsigned char *map;
size_t map_size; /* size of the mmaped buffer */
@@ -377,8 +384,13 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
char *midx_bitmap_filename(struct multi_pack_index *midx)
{
struct strbuf buf = STRBUF_INIT;
- get_midx_filename_ext(&buf, midx->object_dir, get_midx_checksum(midx),
- MIDX_EXT_BITMAP);
+ if (midx->has_chain)
+ get_split_midx_filename_ext(&buf, midx->object_dir,
+ get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(&buf, midx->object_dir,
+ get_midx_checksum(midx), MIDX_EXT_BITMAP);
return strbuf_detach(&buf, NULL);
}
@@ -397,10 +409,17 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
{
struct stat st;
char *bitmap_name = midx_bitmap_filename(midx);
- int fd = git_open(bitmap_name);
+ int fd;
uint32_t i, preferred_pack;
struct packed_git *preferred;
+ fd = git_open(bitmap_name);
+ if (fd < 0 && errno == ENOENT) {
+ FREE_AND_NULL(bitmap_name);
+ bitmap_name = midx_bitmap_filename(midx);
+ fd = git_open(bitmap_name);
+ }
+
if (fd < 0) {
if (errno != ENOENT)
warning_errno("cannot open '%s'", bitmap_name);
@@ -446,7 +465,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
goto cleanup;
}
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+ for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
if (prepare_midx_pack(the_repository, bitmap_git->midx, i)) {
warning(_("could not open pack %s"),
bitmap_git->midx->pack_names[i]);
@@ -459,13 +478,20 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
goto cleanup;
}
- preferred = bitmap_git->midx->packs[preferred_pack];
+ preferred = nth_midxed_pack(bitmap_git->midx, preferred_pack);
if (!is_pack_valid(preferred)) {
warning(_("preferred pack (%s) is invalid"),
preferred->pack_name);
goto cleanup;
}
+ if (midx->base_midx) {
+ bitmap_git->base = prepare_midx_bitmap_git(midx->base_midx);
+ bitmap_git->base_nr = bitmap_git->base->base_nr + 1;
+ } else {
+ bitmap_git->base_nr = 1;
+ }
+
return 0;
cleanup:
@@ -516,6 +542,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
bitmap_git->map_size = xsize_t(st.st_size);
bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
bitmap_git->map_pos = 0;
+ bitmap_git->base_nr = 1;
close(fd);
if (load_bitmap_header(bitmap_git) < 0) {
@@ -535,8 +562,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git)
{
if (bitmap_is_midx(bitmap_git)) {
- uint32_t i;
- int ret;
+ struct multi_pack_index *m;
/*
* The multi-pack-index's .rev file is already loaded via
@@ -545,10 +571,15 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
* But we still need to open the individual pack .rev files,
* since we will need to make use of them in pack-objects.
*/
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
- ret = load_pack_revindex(r, bitmap_git->midx->packs[i]);
- if (ret)
- return ret;
+ for (m = bitmap_git->midx; m; m = m->base_midx) {
+ uint32_t i;
+ int ret;
+
+ for (i = 0; i < m->num_packs; i++) {
+ ret = load_pack_revindex(r, m->packs[i]);
+ if (ret)
+ return ret;
+ }
}
return 0;
}
@@ -574,6 +605,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
goto failed;
+ if (bitmap_git->base) {
+ if (!bitmap_is_midx(bitmap_git))
+ BUG("non-MIDX bitmap has non-NULL base bitmap index");
+ if (load_bitmap(r, bitmap_git->base) < 0)
+ goto failed;
+ }
+
return 0;
failed:
@@ -658,10 +696,9 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
{
- struct repository *r = the_repository;
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
+ if (!open_midx_bitmap_1(bitmap_git, midx))
return bitmap_git;
free_bitmap_index(bitmap_git);
@@ -2875,6 +2912,7 @@ void free_bitmap_index(struct bitmap_index *b)
close_midx_revindex(b->midx);
}
free_pseudo_merge_map(&b->pseudo_merges);
+ free_bitmap_index(b->base);
free(b);
}
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (2 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
` (9 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.
Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.
The changes to do so are as follows:
- Avoid initializing hash_pos at its declaration, since
bitmap_for_commit() is now a recursive function and may receive a
NULL bitmap_index pointer as its first argument.
- In cases where we would previously return NULL (to indicate that a
lookup failed and the given bitmap_index does not contain an entry
corresponding to the given commit), recursively call the function on
the previous bitmap layer.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 41675a69f68..e3fdcf8a01a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -946,18 +946,21 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
struct commit *commit)
{
- khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
- commit->object.oid);
+ khiter_t hash_pos;
+ if (!bitmap_git)
+ return NULL;
+
+ hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
return lookup_stored_bitmap(bitmap);
}
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 05/13] pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (3 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
` (8 subsequent siblings)
13 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
Since we may ask for a pack_id that is in an earlier MIDX layer relative
to the one corresponding to our bitmap, use nth_midxed_pack() instead of
accessing the ->packs array directly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index e3fdcf8a01a..c2c824347a6 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1631,7 +1631,7 @@ static void show_objects_for_type(
nth_midxed_object_oid(&oid, m, index_pos);
pack_id = nth_midxed_pack_int_id(m, index_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
} else {
index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 06/13] pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (4 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
` (7 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
In a similar fashion as previous commits in the first phase of
incremental MIDXs, enumerate not just the packs in the current
incremental MIDX layer, but previous ones as well.
Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
likely to contain the most amount of reusable sections.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index c2c824347a6..1dddb242434 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2323,7 +2323,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
multi_pack_reuse = 0;
if (multi_pack_reuse) {
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+ struct multi_pack_index *m = bitmap_git->midx;
+ for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
struct bitmapped_pack pack;
if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
warning(_("unable to load pack: '%s', disabling pack-reuse"),
@@ -2347,14 +2348,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t pack_int_id;
if (bitmap_is_midx(bitmap_git)) {
+ struct multi_pack_index *m = bitmap_git->midx;
uint32_t preferred_pack_pos;
- if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
+ while (m->base_midx)
+ m = m->base_midx;
+
+ if (midx_preferred_pack(m, &preferred_pack_pos) < 0) {
warning(_("unable to compute preferred pack, disabling pack-reuse"));
return;
}
- pack = bitmap_git->midx->packs[preferred_pack_pos];
+ pack = nth_midxed_pack(m, preferred_pack_pos);
pack_int_id = preferred_pack_pos;
} else {
pack = bitmap_git->pack;
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (5 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
` (6 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.
The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.
When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.
In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 105 ++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 84 insertions(+), 21 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1dddb242434..02864a0e1f7 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -943,8 +943,9 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
return NULL;
}
-struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
- struct commit *commit)
+static struct ewah_bitmap *find_bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit,
+ struct bitmap_index **found)
{
khiter_t hash_pos;
if (!bitmap_git)
@@ -954,18 +955,30 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(bitmap);
}
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
}
+struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit)
+{
+ return find_bitmap_for_commit(bitmap_git, commit, NULL);
+}
+
static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
const struct object_id *oid)
{
@@ -2493,6 +2506,8 @@ struct bitmap_test_data {
struct bitmap *tags;
struct progress *prg;
size_t seen;
+
+ struct bitmap_test_data *base_tdata;
};
static void test_bitmap_type(struct bitmap_test_data *tdata,
@@ -2501,6 +2516,11 @@ static void test_bitmap_type(struct bitmap_test_data *tdata,
enum object_type bitmap_type = OBJ_NONE;
int bitmaps_nr = 0;
+ if (bitmap_is_midx(tdata->bitmap_git)) {
+ while (pos < tdata->bitmap_git->midx->num_objects_in_base)
+ tdata = tdata->base_tdata;
+ }
+
if (bitmap_get(tdata->commits, pos)) {
bitmap_type = OBJ_COMMIT;
bitmaps_nr++;
@@ -2564,13 +2584,57 @@ static void test_show_commit(struct commit *commit, void *data)
display_progress(tdata->prg, ++tdata->seen);
}
+static uint32_t bitmap_total_entry_count(struct bitmap_index *bitmap_git)
+{
+ uint32_t total = 0;
+ do {
+ total = st_add(total, bitmap_git->entry_count);
+ bitmap_git = bitmap_git->base;
+ } while (bitmap_git);
+
+ return total;
+}
+
+static void prepare_bitmap_test_data(struct bitmap_test_data *tdata,
+ struct bitmap_index *bitmap_git)
+{
+ memset(tdata, 0, sizeof(struct bitmap_test_data));
+
+ tdata->bitmap_git = bitmap_git;
+ tdata->base = bitmap_new();
+ tdata->commits = ewah_to_bitmap(bitmap_git->commits);
+ tdata->trees = ewah_to_bitmap(bitmap_git->trees);
+ tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
+ tdata->tags = ewah_to_bitmap(bitmap_git->tags);
+
+ if (bitmap_git->base) {
+ CALLOC_ARRAY(tdata->base_tdata, 1);
+ prepare_bitmap_test_data(tdata->base_tdata, bitmap_git->base);
+ }
+}
+
+static void free_bitmap_test_data(struct bitmap_test_data *tdata)
+{
+ if (!tdata)
+ return;
+
+ free_bitmap_test_data(tdata->base_tdata);
+ free(tdata->base_tdata);
+
+ bitmap_free(tdata->base);
+ bitmap_free(tdata->commits);
+ bitmap_free(tdata->trees);
+ bitmap_free(tdata->blobs);
+ bitmap_free(tdata->tags);
+}
+
void test_bitmap_walk(struct rev_info *revs)
{
struct object *root;
struct bitmap *result = NULL;
size_t result_popcnt;
struct bitmap_test_data tdata;
- struct bitmap_index *bitmap_git;
+ struct bitmap_index *bitmap_git, *found;
struct ewah_bitmap *bm;
if (!(bitmap_git = prepare_bitmap_git(revs->repo)))
@@ -2579,17 +2643,26 @@ void test_bitmap_walk(struct rev_info *revs)
if (revs->pending.nr != 1)
die(_("you must specify exactly one commit to test"));
- fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
+ fprintf_ln(stderr, "Bitmap v%d test (%d entries%s, %d total)",
bitmap_git->version,
bitmap_git->entry_count,
- bitmap_git->table_lookup ? "" : " loaded");
+ bitmap_git->table_lookup ? "" : " loaded",
+ bitmap_total_entry_count(bitmap_git));
root = revs->pending.objects[0].item;
- bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
+ bm = find_bitmap_for_commit(bitmap_git, (struct commit *)root, &found);
if (bm) {
fprintf_ln(stderr, "Found bitmap for '%s'. %d bits / %08x checksum",
- oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm));
+ oid_to_hex(&root->oid),
+ (int)bm->bit_size, ewah_checksum(bm));
+
+ if (bitmap_is_midx(found))
+ fprintf_ln(stderr, "Located via MIDX '%s'.",
+ hash_to_hex(get_midx_checksum(found->midx)));
+ else
+ fprintf_ln(stderr, "Located via pack '%s'.",
+ hash_to_hex(found->pack->hash));
result = ewah_to_bitmap(bm);
}
@@ -2606,14 +2679,8 @@ void test_bitmap_walk(struct rev_info *revs)
if (prepare_revision_walk(revs))
die(_("revision walk setup failed"));
- tdata.bitmap_git = bitmap_git;
- tdata.base = bitmap_new();
- tdata.commits = ewah_to_bitmap(bitmap_git->commits);
- tdata.trees = ewah_to_bitmap(bitmap_git->trees);
- tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
- tdata.tags = ewah_to_bitmap(bitmap_git->tags);
+ prepare_bitmap_test_data(&tdata, bitmap_git);
tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
- tdata.seen = 0;
traverse_commit_list(revs, &test_show_commit, &test_show_object, &tdata);
@@ -2625,11 +2692,7 @@ void test_bitmap_walk(struct rev_info *revs)
die(_("mismatch in bitmap results"));
bitmap_free(result);
- bitmap_free(tdata.base);
- bitmap_free(tdata.commits);
- bitmap_free(tdata.trees);
- bitmap_free(tdata.blobs);
- bitmap_free(tdata.tags);
+ free_bitmap_test_data(&tdata);
free_bitmap_index(bitmap_git);
}
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 08/13] pack-bitmap.c: compute disk-usage with incremental MIDXs
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (6 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
` (5 subsequent siblings)
13 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
In a similar fashion as previous commits, use nth_midxed_pack() instead
of accessing the MIDX's ->packs array directly to support incremental
MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 02864a0e1f7..b48d6b144d8 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1774,7 +1774,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
} else {
pack = bitmap_git->pack;
@@ -3025,7 +3025,7 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- struct packed_git *pack = bitmap_git->midx->packs[pack_id];
+ struct packed_git *pack = nth_midxed_pack(bitmap_git->midx, pack_id);
if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
struct object_id oid;
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 09/13] pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (7 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
` (4 subsequent siblings)
13 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index b48d6b144d8..570f6dbdad6 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1087,10 +1087,15 @@ static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git
struct commit *commit,
uint32_t commit_pos)
{
- int ret;
+ struct bitmap_index *curr = bitmap_git;
+ int ret = 0;
- ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges,
- result, commit, commit_pos);
+ while (curr) {
+ ret += apply_pseudo_merges_for_commit(&curr->pseudo_merges,
+ result, commit,
+ commit_pos);
+ curr = curr->base;
+ }
if (ret)
pseudo_merges_satisfied_nr += ret;
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 10/13] ewah: implement `struct ewah_or_iterator`
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (8 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
` (3 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.
Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.
Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.
There are a couple of alternative approaches which were considered:
- Decompress each EWAH bitmap and OR them together, enumerating a
single (non-EWAH) bitmap. This would work, but has the disadvantage
of decompressing a potentially large bitmap, which may not be
necessary if the caller does not wish to read all of it.
- Recursively call bitmap internal functions, reusing the "result" and
"haves" bitmap from the top-most layer. This approach resembles the
original implementation of this feature, but is inefficient in that
it both (a) requires significant refactoring to implement, and (b)
enumerates large sections of later bitmaps which are all zeros (as
they pertain to objects in earlier layers).
(b) is not so bad in and of itself, but can cause significant
slow-downs when combined with expensive loop bodies.
This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
ewah/ewah_bitmap.c | 33 +++++++++++++++++++++++++++++++++
ewah/ewok.h | 12 ++++++++++++
2 files changed, 45 insertions(+)
diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 8785cbc54a8..b3a7ada0714 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -372,6 +372,39 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
read_new_rlw(it);
}
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr)
+{
+ size_t i;
+
+ memset(it, 0, sizeof(*it));
+
+ ALLOC_ARRAY(it->its, nr);
+ for (i = 0; i < nr; i++)
+ ewah_iterator_init(&it->its[it->nr++], parents[i]);
+}
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
+{
+ eword_t buf, out = 0;
+ size_t i;
+ int ret = 0;
+
+ for (i = 0; i < it->nr; i++)
+ if (ewah_iterator_next(&buf, &it->its[i])) {
+ out |= buf;
+ ret = 1;
+ }
+
+ *next = out;
+ return ret;
+}
+
+void ewah_or_iterator_free(struct ewah_or_iterator *it)
+{
+ free(it->its);
+}
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 5e357e24933..4b70641045e 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -148,6 +148,18 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
*/
int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
+struct ewah_or_iterator {
+ struct ewah_iterator *its;
+ size_t nr;
+};
+
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr);
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
+
+void ewah_or_iterator_free(struct ewah_or_iterator *it);
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (9 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
` (2 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.
These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 51 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 570f6dbdad6..348488e2d9e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -78,6 +78,24 @@ struct bitmap_index {
struct ewah_bitmap *blobs;
struct ewah_bitmap *tags;
+ /*
+ * Type index arrays when this bitmap is associated with an
+ * incremental multi-pack index chain.
+ *
+ * If n is the number of unique layers in the MIDX chain, then
+ * commits_all[n-1] is this structs 'commits' field,
+ * commits_all[n-2] is the commits field of this bitmap's
+ * 'base', and so on.
+ *
+ * When either associated either with a non-incremental MIDX, or
+ * a single packfile, these arrays each contain a single
+ * element.
+ */
+ struct ewah_bitmap **commits_all;
+ struct ewah_bitmap **trees_all;
+ struct ewah_bitmap **blobs_all;
+ struct ewah_bitmap **tags_all;
+
/* Map from object ID -> `stored_bitmap` for all the bitmapped commits */
kh_oid_map_t *bitmaps;
@@ -586,7 +604,29 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
return load_pack_revindex(r, bitmap_git->pack);
}
-static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
+static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
+{
+ struct bitmap_index *curr = bitmap_git;
+ size_t i = bitmap_git->base_nr - 1;
+
+ ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr);
+ ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr);
+ ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr);
+ ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr);
+
+ while (curr) {
+ bitmap_git->commits_all[i] = curr->commits;
+ bitmap_git->trees_all[i] = curr->trees;
+ bitmap_git->blobs_all[i] = curr->blobs;
+ bitmap_git->tags_all[i] = curr->tags;
+
+ curr = curr->base;
+ i -= 1;
+ }
+}
+
+static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
+ int recursing)
{
assert(bitmap_git->map);
@@ -608,10 +648,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (bitmap_git->base) {
if (!bitmap_is_midx(bitmap_git))
BUG("non-MIDX bitmap has non-NULL base bitmap index");
- if (load_bitmap(r, bitmap_git->base) < 0)
+ if (load_bitmap(r, bitmap_git->base, 1) < 0)
goto failed;
}
+ if (!recursing)
+ load_all_type_bitmaps(bitmap_git);
+
return 0;
failed:
@@ -687,7 +730,7 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
{
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git))
+ if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git, 0))
return bitmap_git;
free_bitmap_index(bitmap_git);
@@ -2042,7 +2085,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
* from disk. this is the point of no return; after this the rev_list
* becomes invalidated and we must perform the revwalk through bitmaps
*/
- if (load_bitmap(revs->repo, bitmap_git) < 0)
+ if (load_bitmap(revs->repo, bitmap_git, 0) < 0)
goto cleanup;
if (!use_boundary_traversal)
@@ -2961,6 +3004,10 @@ void free_bitmap_index(struct bitmap_index *b)
ewah_pool_free(b->trees);
ewah_pool_free(b->blobs);
ewah_pool_free(b->tags);
+ free(b->commits_all);
+ free(b->trees_all);
+ free(b->blobs_all);
+ free(b->tags_all);
if (b->bitmaps) {
struct stored_bitmap *sb;
kh_foreach_value(b->bitmaps, sb, {
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (10 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
2024-11-20 8:49 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Junio C Hamano
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 42 +++++++++++++++++++++++++++---------------
1 file changed, 27 insertions(+), 15 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 348488e2d9e..83696d834f6 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1622,25 +1622,29 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
}
}
-static void init_type_iterator(struct ewah_iterator *it,
+static void init_type_iterator(struct ewah_or_iterator *it,
struct bitmap_index *bitmap_git,
enum object_type type)
{
switch (type) {
case OBJ_COMMIT:
- ewah_iterator_init(it, bitmap_git->commits);
+ ewah_or_iterator_init(it, bitmap_git->commits_all,
+ bitmap_git->base_nr);
break;
case OBJ_TREE:
- ewah_iterator_init(it, bitmap_git->trees);
+ ewah_or_iterator_init(it, bitmap_git->trees_all,
+ bitmap_git->base_nr);
break;
case OBJ_BLOB:
- ewah_iterator_init(it, bitmap_git->blobs);
+ ewah_or_iterator_init(it, bitmap_git->blobs_all,
+ bitmap_git->base_nr);
break;
case OBJ_TAG:
- ewah_iterator_init(it, bitmap_git->tags);
+ ewah_or_iterator_init(it, bitmap_git->tags_all,
+ bitmap_git->base_nr);
break;
default:
@@ -1657,7 +1661,7 @@ static void show_objects_for_type(
size_t i = 0;
uint32_t offset;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
struct bitmap *objects = bitmap_git->result;
@@ -1665,7 +1669,7 @@ static void show_objects_for_type(
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < objects->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = objects->words[i] & filter;
size_t pos = (i * BITS_IN_EWORD);
@@ -1707,6 +1711,8 @@ static void show_objects_for_type(
show_reach(&oid, object_type, 0, hash, pack, ofs);
}
}
+
+ ewah_or_iterator_free(&it);
}
static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
@@ -1758,7 +1764,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
@@ -1775,7 +1781,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* packfile.
*/
for (i = 0, init_type_iterator(&it, bitmap_git, type);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
if (i < tips->word_alloc)
mask &= ~tips->words[i];
@@ -1795,6 +1801,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_free(&it);
bitmap_free(tips);
}
@@ -1852,14 +1859,14 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
tips = find_tip_objects(bitmap_git, tip_objects, OBJ_BLOB);
for (i = 0, init_type_iterator(&it, bitmap_git, OBJ_BLOB);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
eword_t word = to_filter->words[i] & mask;
unsigned offset;
@@ -1887,6 +1894,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_free(&it);
bitmap_free(tips);
}
@@ -2506,12 +2514,12 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
struct eindex *eindex = &bitmap_git->ext_index;
uint32_t i = 0, count = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
init_type_iterator(&it, bitmap_git, type);
- while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+ while (i < objects->word_alloc && ewah_or_iterator_next(&filter, &it)) {
eword_t word = objects->words[i++] & filter;
count += ewah_bit_popcount64(word);
}
@@ -2523,6 +2531,8 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
count++;
}
+ ewah_or_iterator_free(&it);
+
return count;
}
@@ -3051,13 +3061,13 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
{
struct bitmap *result = bitmap_git->result;
off_t total = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
size_t i;
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < result->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = result->words[i] & filter;
size_t base = (i * BITS_IN_EWORD);
unsigned offset;
@@ -3098,6 +3108,8 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
}
}
+ ewah_or_iterator_free(&it);
+
return total;
}
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v3 13/13] midx: implement writing incremental MIDX bitmaps
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (11 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
@ 2024-11-19 22:07 ` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2024-11-20 8:49 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Junio C Hamano
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2024-11-19 22:07 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 3 +-
midx-write.c | 49 ++++++++++-----
pack-bitmap-write.c | 65 ++++++++++++++-----
pack-bitmap.h | 4 +-
t/t5334-incremental-multi-pack-index.sh | 84 +++++++++++++++++++++++++
5 files changed, 171 insertions(+), 34 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 08007142671..09d9ef62055 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1370,7 +1370,8 @@ static void write_pack_file(void)
if (write_bitmap_index) {
bitmap_writer_init(&bitmap_writer,
- the_repository, &to_pack);
+ the_repository, &to_pack,
+ NULL);
bitmap_writer_set_checksum(&bitmap_writer, hash);
bitmap_writer_build_type_index(&bitmap_writer,
written_list);
diff --git a/midx-write.c b/midx-write.c
index b3a5f6c5166..6f7a8e045fd 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -645,15 +645,21 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
return pack_order;
}
-static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
- struct write_midx_context *ctx)
+static void write_midx_reverse_index(struct write_midx_context *ctx,
+ const char *object_dir,
+ unsigned char *midx_hash)
{
struct strbuf buf = STRBUF_INIT;
char *tmp_file;
trace2_region_enter("midx", "write_midx_reverse_index", the_repository);
- strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+ if (ctx->incremental)
+ get_split_midx_filename_ext(&buf, object_dir, midx_hash,
+ MIDX_EXT_REV);
+ else
+ get_midx_filename_ext(&buf, object_dir, midx_hash,
+ MIDX_EXT_REV);
tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
midx_hash, WRITE_REV);
@@ -827,20 +833,26 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
return cb.commits;
}
-static int write_midx_bitmap(const char *midx_name,
+static int write_midx_bitmap(struct write_midx_context *ctx,
+ const char *object_dir,
const unsigned char *midx_hash,
struct packing_data *pdata,
struct commit **commits,
uint32_t commits_nr,
- uint32_t *pack_order,
unsigned flags)
{
int ret, i;
uint16_t options = 0;
struct bitmap_writer writer;
struct pack_idx_entry **index;
- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
- hash_to_hex(midx_hash));
+ struct strbuf bitmap_name = STRBUF_INIT;
+
+ if (ctx->incremental)
+ get_split_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
+ MIDX_EXT_BITMAP);
trace2_region_enter("midx", "write_midx_bitmap", the_repository);
@@ -859,7 +871,8 @@ static int write_midx_bitmap(const char *midx_name,
for (i = 0; i < pdata->nr_objects; i++)
index[i] = &pdata->objects[i].idx;
- bitmap_writer_init(&writer, the_repository, pdata);
+ bitmap_writer_init(&writer, the_repository, pdata,
+ ctx->incremental ? ctx->base_midx : NULL);
bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS);
bitmap_writer_build_type_index(&writer, index);
@@ -877,7 +890,7 @@ static int write_midx_bitmap(const char *midx_name,
* bitmap_writer_finish().
*/
for (i = 0; i < pdata->nr_objects; i++)
- index[pack_order[i]] = &pdata->objects[i].idx;
+ index[ctx->pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(&writer, commits, commits_nr);
ret = bitmap_writer_build(&writer);
@@ -885,11 +898,11 @@ static int write_midx_bitmap(const char *midx_name,
goto cleanup;
bitmap_writer_set_checksum(&writer, midx_hash);
- bitmap_writer_finish(&writer, index, bitmap_name, options);
+ bitmap_writer_finish(&writer, index, bitmap_name.buf, options);
cleanup:
free(index);
- free(bitmap_name);
+ strbuf_release(&bitmap_name);
bitmap_writer_free(&writer);
trace2_region_leave("midx", "write_midx_bitmap", the_repository);
@@ -1073,8 +1086,6 @@ static int write_midx_internal(const char *object_dir,
trace2_region_enter("midx", "write_midx_internal", the_repository);
ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
- if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
- die(_("cannot write incremental MIDX with bitmap"));
if (ctx.incremental)
strbuf_addf(&midx_name,
@@ -1116,6 +1127,12 @@ static int write_midx_internal(const char *object_dir,
if (ctx.incremental) {
struct multi_pack_index *m = ctx.base_midx;
while (m) {
+ if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+ error(_("could not load reverse index for MIDX %s"),
+ hash_to_hex(get_midx_checksum(m)));
+ result = 1;
+ goto cleanup;
+ }
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
@@ -1382,7 +1399,7 @@ static int write_midx_internal(const char *object_dir,
if (flags & MIDX_WRITE_REV_INDEX &&
git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
- write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+ write_midx_reverse_index(&ctx, object_dir, midx_hash);
if (flags & MIDX_WRITE_BITMAP) {
struct packing_data pdata;
@@ -1405,8 +1422,8 @@ static int write_midx_internal(const char *object_dir,
FREE_AND_NULL(ctx.entries);
ctx.entries_nr = 0;
- if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
- commits, commits_nr, ctx.pack_order,
+ if (write_midx_bitmap(&ctx, object_dir,
+ midx_hash, &pdata, commits, commits_nr,
flags) < 0) {
error(_("could not write multi-pack bitmap"));
result = 1;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 49758e2525f..1fbebe84479 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -25,6 +25,8 @@
#include "alloc.h"
#include "refs.h"
#include "strmap.h"
+#include "midx.h"
+#include "pack-revindex.h"
struct bitmapped_commit {
struct commit *commit;
@@ -42,7 +44,8 @@ static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer
}
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata)
+ struct packing_data *pdata,
+ struct multi_pack_index *midx)
{
memset(writer, 0, sizeof(struct bitmap_writer));
if (writer->bitmaps)
@@ -50,6 +53,7 @@ void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
writer->bitmaps = kh_init_oid_map();
writer->pseudo_merge_commits = kh_init_oid_map();
writer->to_pack = pdata;
+ writer->midx = midx;
string_list_init_dup(&writer->pseudo_merge_groups);
@@ -112,6 +116,11 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
struct pack_idx_entry **index)
{
uint32_t i;
+ uint32_t base_objects = 0;
+
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
writer->commits = ewah_new();
writer->trees = ewah_new();
@@ -141,19 +150,19 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
switch (real_type) {
case OBJ_COMMIT:
- ewah_set(writer->commits, i);
+ ewah_set(writer->commits, i + base_objects);
break;
case OBJ_TREE:
- ewah_set(writer->trees, i);
+ ewah_set(writer->trees, i + base_objects);
break;
case OBJ_BLOB:
- ewah_set(writer->blobs, i);
+ ewah_set(writer->blobs, i + base_objects);
break;
case OBJ_TAG:
- ewah_set(writer->tags, i);
+ ewah_set(writer->tags, i + base_objects);
break;
default:
@@ -206,19 +215,37 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer,
static uint32_t find_object_pos(struct bitmap_writer *writer,
const struct object_id *oid, int *found)
{
- struct object_entry *entry = packlist_find(writer->to_pack, oid);
+ struct object_entry *entry;
+
+ entry = packlist_find(writer->to_pack, oid);
+ if (entry) {
+ uint32_t base_objects = 0;
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+
+ if (found)
+ *found = 1;
+ return oe_in_pack_pos(writer->to_pack, entry) + base_objects;
+ } else if (writer->midx) {
+ uint32_t at, pos;
+
+ if (!bsearch_midx(oid, writer->midx, &at))
+ goto missing;
+ if (midx_to_pack_pos(writer->midx, at, &pos) < 0)
+ goto missing;
- if (!entry) {
if (found)
- *found = 0;
- warning("Failed to write bitmap index. Packfile doesn't have full closure "
- "(object %s is missing)", oid_to_hex(oid));
- return 0;
+ *found = 1;
+ return pos;
}
+missing:
if (found)
- *found = 1;
- return oe_in_pack_pos(writer->to_pack, entry);
+ *found = 0;
+ warning("Failed to write bitmap index. Packfile doesn't have full closure "
+ "(object %s is missing)", oid_to_hex(oid));
+ return 0;
}
static void compute_xor_offsets(struct bitmap_writer *writer)
@@ -585,7 +612,7 @@ int bitmap_writer_build(struct bitmap_writer *writer)
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct prio_queue tree_queue = { NULL };
struct bitmap_index *old_bitmap;
- uint32_t *mapping;
+ uint32_t *mapping = NULL;
int closed = 1; /* until proven otherwise */
if (writer->show_progress)
@@ -1018,7 +1045,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
struct strbuf tmp_file = STRBUF_INIT;
struct hashfile *f;
off_t *offsets = NULL;
- uint32_t i;
+ uint32_t i, base_objects;
struct bitmap_disk_header header;
@@ -1044,6 +1071,12 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (options & BITMAP_OPT_LOOKUP_TABLE)
CALLOC_ARRAY(offsets, writer->to_pack->nr_objects);
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+ else
+ base_objects = 0;
+
for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
struct bitmapped_commit *stored = &writer->selected[i];
int commit_pos = oid_pos(&stored->commit->object.oid, index,
@@ -1052,7 +1085,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (commit_pos < 0)
BUG(_("trying to write commit not in index"));
- stored->commit_pos = commit_pos;
+ stored->commit_pos = commit_pos + base_objects;
}
write_selected_commits_v1(writer, f, offsets);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index d7f4b8b8e95..dd0951088f6 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -111,6 +111,7 @@ struct bitmap_writer {
kh_oid_map_t *bitmaps;
struct packing_data *to_pack;
+ struct multi_pack_index *midx; /* if appending to a MIDX chain */
struct bitmapped_commit *selected;
unsigned int selected_nr, selected_alloc;
@@ -125,7 +126,8 @@ struct bitmap_writer {
};
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata);
+ struct packing_data *pdata,
+ struct multi_pack_index *midx);
void bitmap_writer_show_progress(struct bitmap_writer *writer, int show);
void bitmap_writer_set_checksum(struct bitmap_writer *writer,
const unsigned char *sha1);
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
index 471994c4bc8..3aac7ccdfe2 100755
--- a/t/t5334-incremental-multi-pack-index.sh
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -45,4 +45,88 @@ test_expect_success 'convert incremental to non-incremental' '
compare_results_with_midx 'non-incremental MIDX conversion'
+write_midx_layer () {
+ n=1
+ if test -f $midx_chain
+ then
+ n="$(($(wc -l <$midx_chain) + 1))"
+ fi
+
+ for i in 1 2
+ do
+ test_commit $n.$i &&
+ git repack -d || return 1
+ done &&
+ git multi-pack-index write --bitmap --incremental
+}
+
+test_expect_success 'write initial MIDX layer' '
+ git repack -ad &&
+ write_midx_layer
+'
+
+test_expect_success 'read bitmap from first MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'write another MIDX layer' '
+ write_midx_layer
+'
+
+test_expect_success 'midx verify with multiple layers' '
+ git multi-pack-index verify
+'
+
+test_expect_success 'read bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 2.2
+'
+
+test_expect_success 'read earlier bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'show object from first pack' '
+ git cat-file -p 1.1
+'
+
+test_expect_success 'show object from second pack' '
+ git cat-file -p 2.2
+'
+
+for reuse in false single multi
+do
+ test_expect_success "full clone (pack.allowPackReuse=$reuse)" '
+ rm -fr clone.git &&
+
+ git config pack.allowPackReuse $reuse &&
+ git clone --no-local --bare . clone.git
+ '
+done
+
+test_expect_success 'relink existing MIDX layer' '
+ rm -fr "$midxdir" &&
+
+ GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap &&
+
+ midx_hash="$(test-tool read-midx --checksum $objdir)" &&
+
+ test_path_is_file "$packdir/multi-pack-index" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_commit another &&
+ git repack -d &&
+ git multi-pack-index write --bitmap --incremental &&
+
+ test_path_is_missing "$packdir/multi-pack-index" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.midx" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.rev" &&
+ test_line_count = 2 "$midx_chain"
+
+'
+
test_done
--
2.47.0.301.g77ddd1170f9
^ permalink raw reply related [flat|nested] 136+ messages in thread
* Re: [PATCH v3 00/13] midx: incremental multi-pack indexes, part two
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (12 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
@ 2024-11-20 8:49 ` Junio C Hamano
13 siblings, 0 replies; 136+ messages in thread
From: Junio C Hamano @ 2024-11-20 8:49 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King
Taylor Blau <me@ttaylorr.com> writes:
> == Changes since last time
>
> This round fixes a small issue when writing legacy ".rev" files outside
> of the MIDX in '--incremental' mode.
>
> The rest of the series is unchanged, and re-submitted to solicit review
> now that I have more time to focus on this series.
>
> == Original cover letter
>
> This series is based on 'master', with an additional merge between
> tb/incremental-midx-part-1[1] and my newer series to fix a handful of
> bugs related to pseudo-merge bitmaps[2].
Both of these prerequisite topics were from August, so we do not
have to worry about reconstructing the base anymore ;-) As I do not
have any trace of this topic in my tree anymore (except that I know
an earlier round that ended with "fixup! midex: implement writing"
existed in the past), we could queue this on 'maint' afresh, I
guess?
When merged to 'seen', pack-bitmap.c has conflicts with other topics
in flight and what is annoying is the lines involved in the
conflicts are rather on the overly long side.
I think I resolved them correctly, but we may want to correct these
overly long lines if a new iteration is needed in the future.
Thanks.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps
2024-11-19 22:07 ` [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-02-28 23:26 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:19PM -0500, Taylor Blau wrote:
> diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
> index cc063b30bea..a063262c360 100644
> --- a/Documentation/technical/multi-pack-index.txt
> +++ b/Documentation/technical/multi-pack-index.txt
> @@ -164,6 +164,70 @@ objects_nr($H2) + objects_nr($H1) + i
> (in the C implementation, this is often computed as `i +
> m->num_objects_in_base`).
>
> +=== Pseudo-pack order for incremental MIDXs
> +
> +The original implementation of multi-pack reachability bitmaps defined
> +the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
> +titled "multi-pack-index reverse indexes") roughly as follows:
> +
> +____
> +In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
> +objects in packs stored by the MIDX, laid out in pack order, and the
> +packs arranged in MIDX order (with the preferred pack coming first).
> +____
> +
> +In the incremental MIDX design, we extend this definition to include
> +objects from multiple layers of the MIDX chain. The pseudo-pack order
> +for incremental MIDXs is determined by concatenating the pseudo-pack
> +ordering for each layer of the MIDX chain in order. Formally two objects
> +`o1` and `o2` are compared as follows:
> +
> +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
> + `o1` is considered less than `o2`.
Just as a refresher for myself: what is the consequence of an object
`o1` sorting earlier than `o2`? In the case where those refer to
different objects it is only used to establish the pseudo-pack order so
that we know how to interpret the bitmaps. But in the case where those
two objects refer to the same underlying object, e.g. because the object
is contained in two packs, it also impacts which of both objects would
be preferred e.g. during a clone, right?
> +2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
> + MIDX layer has no base, then If one of `pack(o1)` and `pack(o2)` is
s/If/if
> + preferred and the other is not, then the preferred one sorts first. If
> + there is a base layer (i.e. the MIDX layer is not the first layer in
> + the chain), then if `pack(o1)` appears earlier in that MIDX layer's
> + pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
> + appears earlier, than the opposite is true.
Another question for my own understanding: why is it relevant whether we
have a base or not? I would have expected that the case where the
objects appear in two different layers is already covered by (1), so
from thereon we only need to care about two objects existing in the same
layer.
> +3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
> + same MIDX layer. Sort `o1` and `o2` by their offset within their
> + containing packfile.
> +
> +=== Reachability bitmaps and incremental MIDXs
> +
> +Each layer of an incremental MIDX chain may have its objects (and the
> +objects from any previous layer in the same MIDX chain) represented in
> +its own `*.bitmap` file.
> +
> +The structure of a `*.bitmap` file belonging to an incremental MIDX
> +chain is identical to that of a non-incremental MIDX bitmap, or a
> +classic single-pack bitmap. Since objects are added to the end of the
> +incremental MIDX's pseudo-pack order (see: above), it is possible to
> +extend a bitmap when appending to the end of a MIDX chain.
> +
> +(Note: it is possible likewise to compress a contiguous sequence of MIDX
> +incremental layers, and their `*.bitmap`(s) into a single layer and
> +`*.bitmap`, but this is not yet implemented.)
Fair. What do we currently do in this context? Do we just keep on
appending layer after layer?
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2024-11-19 22:07 ` [PATCH v3 02/13] pack-revindex: prepare for " Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-02-28 23:39 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:22PM -0500, Taylor Blau wrote:
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 4fa9dfc771a..bba9c6a905a 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
> return read_bitmap(index->map, index->map_size, &index->map_pos);
> }
>
> +static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
> +{
> + if (index->midx) {
> + struct multi_pack_index *m = index->midx;
> + return m->num_objects + m->num_objects_in_base;
> + }
> + return index->pack->num_objects;
> +}
>
> static uint32_t bitmap_num_objects(struct bitmap_index *index)
> {
> if (index->midx)
Okay, despite counting our own objects, we also need to account for any
objects that the MIDX layer that we depend on may refer to. I assume
that this is basically recursive, and that the base itself would also
account for its next layer, if any.
What is interesting to see after this commit is what callsites remain
for `bitmap_num_objects()`. Most of them are converted, but some still
exist:
- `load_bitmap_header()`, where we use it to determine the size of the
hash cache. Makes sense.
- `pseudo_merge_bitmap_for_commit()`, where we use it to compute the
merged bitmap of a specific commit. This one feels weird to me, I
would have expected to use `bitmap_non_extended_bits()` here.
- `filter_bitmap_blob_limit()`, where we seem to filter through the
bitmap of the current layer. I _think_ it makes sense to retain.
- `create_bitmap_mapping()`, which feels like it should be converted?
It would be nice to document in the commit message why those functions
don't need to be converted to help guide the reader a bit.
> diff --git a/pack-revindex.c b/pack-revindex.c
> index 22d3c234648..ce3f7ae2149 100644
> --- a/pack-revindex.c
> +++ b/pack-revindex.c
> @@ -383,8 +383,12 @@ int load_midx_revindex(struct multi_pack_index *m)
> trace2_data_string("load_midx_revindex", the_repository,
> "source", "rev");
>
> - get_midx_filename_ext(&revindex_name, m->object_dir,
> - get_midx_checksum(m), MIDX_EXT_REV);
> + if (m->has_chain)
> + get_split_midx_filename_ext(&revindex_name, m->object_dir,
> + get_midx_checksum(m), MIDX_EXT_REV);
> + else
> + get_midx_filename_ext(&revindex_name, m->object_dir,
> + get_midx_checksum(m), MIDX_EXT_REV);
>
> ret = load_revindex_from_disk(revindex_name.buf,
> m->num_objects,
Here we teach the reverse index to read indices from MIDX chains.
> @@ -471,11 +475,15 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
>
> uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
> {
> + while (m && pos < m->num_objects_in_base)
> + m = m->base_midx;
> + if (!m)
> + BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
> if (!m->revindex_data)
> BUG("pack_pos_to_midx: reverse index not yet loaded");
> - if (m->num_objects <= pos)
> + if (m->num_objects + m->num_objects_in_base <= pos)
> BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
> - return get_be32(m->revindex_data + pos);
> + return get_be32(m->revindex_data + pos - m->num_objects_in_base);
> }
>
> struct midx_pack_key {
Here we teach the reverse index logic to walk the MIDX layers so that we
find the one that is supposed to contain a given position.
> @@ -491,7 +499,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
> const struct midx_pack_key *key = va;
> struct multi_pack_index *midx = key->midx;
>
> - uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
> + size_t pos = (uint32_t*)vb - (const uint32_t *)midx->revindex_data;
Micronit: missing space between `uint32_t` and `*`.
> + uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
> uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
> off_t versus_offset;
>
Okay, the calculation to calculate the position is basically the same,
but we now also offset the position by the number of objects in
preceding layers. Makes sense.
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 03/13] pack-bitmap.c: open and store incremental bitmap layers
2024-11-19 22:07 ` [PATCH v3 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-02-28 23:49 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:26PM -0500, Taylor Blau wrote:
> Prepare the pack-bitmap machinery to work with incremental MIDXs by
> adding a new "base" field to keep track of the bitmap index associated
> with the previous MIDX layer.
>
> The changes in this commit are mostly boilerplate to open the correct
> bitmap(s), add them to the chain bitmap layers along the "base" pointer,
s/bitmap layers/of &/
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index bba9c6a905a..41675a69f68 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -54,6 +54,13 @@ struct bitmap_index {
> struct packed_git *pack;
> struct multi_pack_index *midx;
>
> + /*
> + * If using a multi-pack index chain, 'base' points to the
> + * bitmap index corresponding to this bitmap's midx->base_midx.
> + */
> + struct bitmap_index *base;
> + uint32_t base_nr;
> +
It would be nice to point out that `base_nr` is not 0-indexed, but
1-indexed, which is rather uncommon. Is there any particular reason why
you made it 1-indexed?
> @@ -377,8 +384,13 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
> char *midx_bitmap_filename(struct multi_pack_index *midx)
> {
> struct strbuf buf = STRBUF_INIT;
> - get_midx_filename_ext(&buf, midx->object_dir, get_midx_checksum(midx),
> - MIDX_EXT_BITMAP);
> + if (midx->has_chain)
> + get_split_midx_filename_ext(&buf, midx->object_dir,
> + get_midx_checksum(midx),
> + MIDX_EXT_BITMAP);
> + else
> + get_midx_filename_ext(&buf, midx->object_dir,
> + get_midx_checksum(midx), MIDX_EXT_BITMAP);
>
> return strbuf_detach(&buf, NULL);
> }
Okay, this is mostly the same change as in the preceding commit, but for
bitmaps instead of reverse indices.
> @@ -397,10 +409,17 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
> {
> struct stat st;
> char *bitmap_name = midx_bitmap_filename(midx);
> - int fd = git_open(bitmap_name);
> + int fd;
> uint32_t i, preferred_pack;
> struct packed_git *preferred;
>
> + fd = git_open(bitmap_name);
> + if (fd < 0 && errno == ENOENT) {
> + FREE_AND_NULL(bitmap_name);
> + bitmap_name = midx_bitmap_filename(midx);
> + fd = git_open(bitmap_name);
> + }
> +
Wait, this looks weird to me. `bitmap_name` already contains the result
of `midx_bitmap_filename()`, so you're essentially retrying the exact
same operation as before?
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
2024-11-19 22:07 ` [PATCH v3 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:12 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:29PM -0500, Taylor Blau wrote:
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 41675a69f68..e3fdcf8a01a 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -946,18 +946,21 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
> struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
> struct commit *commit)
> {
> - khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
> - commit->object.oid);
> + khiter_t hash_pos;
> + if (!bitmap_git)
> + return NULL;
> +
> + hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
> if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
> struct stored_bitmap *bitmap = NULL;
> if (!bitmap_git->table_lookup)
> - return NULL;
> + return bitmap_for_commit(bitmap_git->base, commit);
>
> /* this is a fairly hot codepath - no trace2_region please */
> /* NEEDSWORK: cache misses aren't recorded */
> bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
> if (!bitmap)
> - return NULL;
> + return bitmap_for_commit(bitmap_git->base, commit);
> return lookup_stored_bitmap(bitmap);
> }
> return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
One of the things that worries me a bit is that by recursing, we
essentially are bound in the depth of MIDX layers as we may otherwise
bust the stack. Not that I expect us to typically have thousands of
layers, but if there ever was a bug this may fail in bad ways.
I already asked this for a previous commit, but what is the current
state regarding compaction of the layers? Do we need to be worried about
this or do we already know to keep things limited in general?
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 06/13] pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
2024-11-19 22:07 ` [PATCH v3 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:16 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:35PM -0500, Taylor Blau wrote:
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index c2c824347a6..1dddb242434 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -2347,14 +2348,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
> uint32_t pack_int_id;
>
> if (bitmap_is_midx(bitmap_git)) {
> + struct multi_pack_index *m = bitmap_git->midx;
> uint32_t preferred_pack_pos;
>
> - if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
> + while (m->base_midx)
> + m = m->base_midx;
> +
> + if (midx_preferred_pack(m, &preferred_pack_pos) < 0) {
> warning(_("unable to compute preferred pack, disabling pack-reuse"));
> return;
> }
Instead of completely disabling preferred packs, should we maybe fall
back to the preferred pack of the next-higher layer?
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2024-11-19 22:07 ` [PATCH v3 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:19 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:38PM -0500, Taylor Blau wrote:
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 1dddb242434..02864a0e1f7 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -2564,13 +2584,57 @@ static void test_show_commit(struct commit *commit, void *data)
> display_progress(tdata->prg, ++tdata->seen);
> }
>
> +static uint32_t bitmap_total_entry_count(struct bitmap_index *bitmap_git)
> +{
> + uint32_t total = 0;
> + do {
> + total = st_add(total, bitmap_git->entry_count);
> + bitmap_git = bitmap_git->base;
> + } while (bitmap_git);
> +
> + return total;
> +}
> +
> +static void prepare_bitmap_test_data(struct bitmap_test_data *tdata,
> + struct bitmap_index *bitmap_git)
Nit: according to our style guide this should be called
`bitmap_test_data_prepare()`.
> +{
> + memset(tdata, 0, sizeof(struct bitmap_test_data));
> +
> + tdata->bitmap_git = bitmap_git;
> + tdata->base = bitmap_new();
> + tdata->commits = ewah_to_bitmap(bitmap_git->commits);
> + tdata->trees = ewah_to_bitmap(bitmap_git->trees);
> + tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
> + tdata->tags = ewah_to_bitmap(bitmap_git->tags);
> +
> + if (bitmap_git->base) {
> + CALLOC_ARRAY(tdata->base_tdata, 1);
> + prepare_bitmap_test_data(tdata->base_tdata, bitmap_git->base);
> + }
> +}
> +
> +static void free_bitmap_test_data(struct bitmap_test_data *tdata)
Same nit here, this should be called `bitmap_test_data_free()`. In fact,
it should be called `bitmap_test_data_release()`, because we don't free
`tadata` itself.
> @@ -2579,17 +2643,26 @@ void test_bitmap_walk(struct rev_info *revs)
> if (revs->pending.nr != 1)
> die(_("you must specify exactly one commit to test"));
>
> - fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
> + fprintf_ln(stderr, "Bitmap v%d test (%d entries%s, %d total)",
> bitmap_git->version,
> bitmap_git->entry_count,
> - bitmap_git->table_lookup ? "" : " loaded");
> + bitmap_git->table_lookup ? "" : " loaded",
> + bitmap_total_entry_count(bitmap_git));
>
> root = revs->pending.objects[0].item;
> - bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
> + bm = find_bitmap_for_commit(bitmap_git, (struct commit *)root, &found);
>
> if (bm) {
> fprintf_ln(stderr, "Found bitmap for '%s'. %d bits / %08x checksum",
> - oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm));
> + oid_to_hex(&root->oid),
> + (int)bm->bit_size, ewah_checksum(bm));
> +
> + if (bitmap_is_midx(found))
> + fprintf_ln(stderr, "Located via MIDX '%s'.",
> + hash_to_hex(get_midx_checksum(found->midx)));
> + else
> + fprintf_ln(stderr, "Located via pack '%s'.",
> + hash_to_hex(found->pack->hash));
>
> result = ewah_to_bitmap(bm);
> }
I'm a bit surprised that this doesn't result in any changes required in
our tests.
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 10/13] ewah: implement `struct ewah_or_iterator`
2024-11-19 22:07 ` [PATCH v3 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:22 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:47PM -0500, Taylor Blau wrote:
> diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
> index 8785cbc54a8..b3a7ada0714 100644
> --- a/ewah/ewah_bitmap.c
> +++ b/ewah/ewah_bitmap.c
> @@ -372,6 +372,39 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
> read_new_rlw(it);
> }
>
> +void ewah_or_iterator_init(struct ewah_or_iterator *it,
> + struct ewah_bitmap **parents, size_t nr)
> +{
> + size_t i;
> +
> + memset(it, 0, sizeof(*it));
> +
> + ALLOC_ARRAY(it->its, nr);
> + for (i = 0; i < nr; i++)
> + ewah_iterator_init(&it->its[it->nr++], parents[i]);
> +}
> +
> +int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
> +{
> + eword_t buf, out = 0;
> + size_t i;
> + int ret = 0;
> +
> + for (i = 0; i < it->nr; i++)
> + if (ewah_iterator_next(&buf, &it->its[i])) {
> + out |= buf;
> + ret = 1;
> + }
> +
> + *next = out;
> + return ret;
> +}
Yup, this looks rather straight-forward: we advance each of our
subiterators and or their respective results into the accumulated result
that we end up returning to the user.
One thing that surprised me though is that we don't seem to be able to
tell whether all of the iterators could be moved on to the next result
or not. But I guess that makes sense: iterators of lower levels will
cover less objects and will thus eventually be exhausted before the
iterators on the higher levels.
> +void ewah_or_iterator_free(struct ewah_or_iterator *it)
> +{
> + free(it->its);
> +}
> +
This should be called `ewar_or_iterator_release()` because we don't
releate `it`, as documented by our style guide.
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2024-11-19 22:07 ` [PATCH v3 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:26 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:50PM -0500, Taylor Blau wrote:
> @@ -586,7 +604,29 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
> return load_pack_revindex(r, bitmap_git->pack);
> }
>
> -static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
> +static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
> +{
> + struct bitmap_index *curr = bitmap_git;
> + size_t i = bitmap_git->base_nr - 1;
> +
> + ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr);
> + ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr);
> + ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr);
> + ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr);
> +
> + while (curr) {
> + bitmap_git->commits_all[i] = curr->commits;
> + bitmap_git->trees_all[i] = curr->trees;
> + bitmap_git->blobs_all[i] = curr->blobs;
> + bitmap_git->tags_all[i] = curr->tags;
> +
> + curr = curr->base;
> + i -= 1;
Do we want to `BUG()` in case `i == 0` before this statement?
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 13/13] midx: implement writing incremental MIDX bitmaps
2024-11-19 22:07 ` [PATCH v3 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:31 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:56PM -0500, Taylor Blau wrote:
> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> index 49758e2525f..1fbebe84479 100644
> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -25,6 +25,8 @@
> #include "alloc.h"
> #include "refs.h"
> #include "strmap.h"
> +#include "midx.h"
> +#include "pack-revindex.h"
Nit: let's keep the headers sorted alphabetically.
> @@ -206,19 +215,37 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer,
> static uint32_t find_object_pos(struct bitmap_writer *writer,
> const struct object_id *oid, int *found)
> {
> - struct object_entry *entry = packlist_find(writer->to_pack, oid);
> + struct object_entry *entry;
> +
> + entry = packlist_find(writer->to_pack, oid);
> + if (entry) {
> + uint32_t base_objects = 0;
> + if (writer->midx)
> + base_objects = writer->midx->num_objects +
> + writer->midx->num_objects_in_base;
> +
> + if (found)
> + *found = 1;
> + return oe_in_pack_pos(writer->to_pack, entry) + base_objects;
> + } else if (writer->midx) {
> + uint32_t at, pos;
> +
> + if (!bsearch_midx(oid, writer->midx, &at))
> + goto missing;
> + if (midx_to_pack_pos(writer->midx, at, &pos) < 0)
> + goto missing;
>
> - if (!entry) {
> if (found)
> - *found = 0;
> - warning("Failed to write bitmap index. Packfile doesn't have full closure "
> - "(object %s is missing)", oid_to_hex(oid));
> - return 0;
> + *found = 1;
> + return pos;
> }
>
> +missing:
> if (found)
> - *found = 1;
> - return oe_in_pack_pos(writer->to_pack, entry);
> + *found = 0;
> + warning("Failed to write bitmap index. Packfile doesn't have full closure "
> + "(object %s is missing)", oid_to_hex(oid));
Is this warning still accurate? I assume that in the MIDX case we don't
have to have full closure in a single packfile, as that would otherwise
make the whole thing rather moot.
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
2024-11-19 22:07 ` [PATCH v3 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
@ 2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:28 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Patrick Steinhardt @ 2025-02-28 10:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Tue, Nov 19, 2024 at 05:07:53PM -0500, Taylor Blau wrote:
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 348488e2d9e..83696d834f6 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -1622,25 +1622,29 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
> }
> }
>
> -static void init_type_iterator(struct ewah_iterator *it,
> +static void init_type_iterator(struct ewah_or_iterator *it,
> struct bitmap_index *bitmap_git,
> enum object_type type)
> {
> switch (type) {
> case OBJ_COMMIT:
> - ewah_iterator_init(it, bitmap_git->commits);
> + ewah_or_iterator_init(it, bitmap_git->commits_all,
> + bitmap_git->base_nr);
> break;
>
> case OBJ_TREE:
> - ewah_iterator_init(it, bitmap_git->trees);
> + ewah_or_iterator_init(it, bitmap_git->trees_all,
> + bitmap_git->base_nr);
> break;
>
> case OBJ_BLOB:
> - ewah_iterator_init(it, bitmap_git->blobs);
> + ewah_or_iterator_init(it, bitmap_git->blobs_all,
> + bitmap_git->base_nr);
> break;
>
> case OBJ_TAG:
> - ewah_iterator_init(it, bitmap_git->tags);
> + ewah_or_iterator_init(it, bitmap_git->tags_all,
> + bitmap_git->base_nr);
> break;
>
> default:
One thing I wonder here is whether we want to continue using the
non-layered iterator in case we don't have an MIDX. But I guess it makes
it easier to not do it like that because we have less conditionals, and
overall the or'd logic shouldn't perform significantly workse anyway. So
as long as we always initialize `base_nr` and the relevant `*_all`
fields even in the non-MIDX case we're fine.
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-02-28 23:26 ` Taylor Blau
2025-03-03 10:54 ` Patrick Steinhardt
0 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-02-28 23:26 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:04AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:19PM -0500, Taylor Blau wrote:
> > diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
> > index cc063b30bea..a063262c360 100644
> > --- a/Documentation/technical/multi-pack-index.txt
> > +++ b/Documentation/technical/multi-pack-index.txt
> > @@ -164,6 +164,70 @@ objects_nr($H2) + objects_nr($H1) + i
> > (in the C implementation, this is often computed as `i +
> > m->num_objects_in_base`).
> >
> > +=== Pseudo-pack order for incremental MIDXs
> > +
> > +The original implementation of multi-pack reachability bitmaps defined
> > +the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
> > +titled "multi-pack-index reverse indexes") roughly as follows:
> > +
> > +____
> > +In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
> > +objects in packs stored by the MIDX, laid out in pack order, and the
> > +packs arranged in MIDX order (with the preferred pack coming first).
> > +____
> > +
> > +In the incremental MIDX design, we extend this definition to include
> > +objects from multiple layers of the MIDX chain. The pseudo-pack order
> > +for incremental MIDXs is determined by concatenating the pseudo-pack
> > +ordering for each layer of the MIDX chain in order. Formally two objects
> > +`o1` and `o2` are compared as follows:
> > +
> > +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
> > + `o1` is considered less than `o2`.
>
> Just as a refresher for myself: what is the consequence of an object
> `o1` sorting earlier than `o2`? In the case where those refer to
> different objects it is only used to establish the pseudo-pack order so
> that we know how to interpret the bitmaps. But in the case where those
> two objects refer to the same underlying object, e.g. because the object
> is contained in two packs, it also impacts which of both objects would
> be preferred e.g. during a clone, right?
Great question -- the pseudo-pack order here is how we translate the set
of objects in a MIDX into their corresponding bit positions in the
bitmap.
So if "o1" sorts ahead of "o2", that means that "o1" will appear in an
earlier bit position than "o2". But note that we're talking about
objects in a MIDX chain here, comprised of objects from each MIDX'd layer of
that chain. So by that point the duplicates have already been filtered
out, since:
- The MIDX only stores one copy of an object in any given MIDX, and
- The incremental MIDX design avoids putting objects from earlier
layers in later ones.
I tried to get at this a few lines up with "[...] a MIDX's pseudo-pack
is the de-duplicated concatenation of [...]" to make clear that o1 != o2
here. But let me know if you think I should clarify or emphasize that
point further.
> > +2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
> > + MIDX layer has no base, then If one of `pack(o1)` and `pack(o2)` is
>
> s/If/if
Great catch, thanks!
> > + preferred and the other is not, then the preferred one sorts first. If
> > + there is a base layer (i.e. the MIDX layer is not the first layer in
> > + the chain), then if `pack(o1)` appears earlier in that MIDX layer's
> > + pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
> > + appears earlier, than the opposite is true.
>
> Another question for my own understanding: why is it relevant whether we
> have a base or not? I would have expected that the case where the
> objects appear in two different layers is already covered by (1), so
> from thereon we only need to care about two objects existing in the same
> layer.
Good question. Throughout this design I'm trying to get rid of the
concept of a "preferred" pack with respect to the MIDX. Before
multi-pack reuse existed, the idea behind having a preferred pack was
that it was a way to indicate which pack we should prioritize for
pack-reuse.
But now that we can reuse objects from any pack stored in a MIDX, the
concept of a preferred pack doesn't need to exist. It still makes some
sense (if you want to use single-pack reuse for some reason, etc.) but
I'm trying to push us away from it.
So to answer your question of "why does it matter if there is a base
or not?" the reason is that this series treats the preferred pack as a
property of the chain instead of the individual layers. And the
mention of it here is to differentiate between how we compare packs in
the base (favoring the preferred pack) versus subsequent layers
(reflecting the pack order within that layer).
> > +3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
> > + same MIDX layer. Sort `o1` and `o2` by their offset within their
> > + containing packfile.
> > +
> > +=== Reachability bitmaps and incremental MIDXs
> > +
> > +Each layer of an incremental MIDX chain may have its objects (and the
> > +objects from any previous layer in the same MIDX chain) represented in
> > +its own `*.bitmap` file.
> > +
> > +The structure of a `*.bitmap` file belonging to an incremental MIDX
> > +chain is identical to that of a non-incremental MIDX bitmap, or a
> > +classic single-pack bitmap. Since objects are added to the end of the
> > +incremental MIDX's pseudo-pack order (see: above), it is possible to
> > +extend a bitmap when appending to the end of a MIDX chain.
> > +
> > +(Note: it is possible likewise to compress a contiguous sequence of MIDX
> > +incremental layers, and their `*.bitmap`(s) into a single layer and
> > +`*.bitmap`, but this is not yet implemented.)
>
> Fair. What do we currently do in this context? Do we just keep on
> appending layer after layer?
That's right. At this point in the project we only know how to append
layers and compress the whole chain into a single layer. But
fundamentally it is possible to compress any contiguous sub-sequence of
the chain into a single layer, and part three of this project will do
just that.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-02-28 23:39 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-02-28 23:39 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:12AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:22PM -0500, Taylor Blau wrote:
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index 4fa9dfc771a..bba9c6a905a 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
> > return read_bitmap(index->map, index->map_size, &index->map_pos);
> > }
> >
> > +static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
> > +{
> > + if (index->midx) {
> > + struct multi_pack_index *m = index->midx;
> > + return m->num_objects + m->num_objects_in_base;
> > + }
> > + return index->pack->num_objects;
> > +}
> >
> > static uint32_t bitmap_num_objects(struct bitmap_index *index)
> > {
> > if (index->midx)
>
> Okay, despite counting our own objects, we also need to account for any
> objects that the MIDX layer that we depend on may refer to. I assume
> that this is basically recursive, and that the base itself would also
> account for its next layer, if any.
>
> What is interesting to see after this commit is what callsites remain
> for `bitmap_num_objects()`. Most of them are converted, but some still
> exist:
>
> - `load_bitmap_header()`, where we use it to determine the size of the
> hash cache. Makes sense.
>
> - `pseudo_merge_bitmap_for_commit()`, where we use it to compute the
> merged bitmap of a specific commit. This one feels weird to me, I
> would have expected to use `bitmap_non_extended_bits()` here.
Great question. The reason is that this function determines a bitmap
which identifies the parents of a given commit, and that bitmap is
compared against the set of pseudo-merge commits we know about in a
given layer to determine whether or not we have a matching pseudo-merge.
I left a comment to that effect nearby since this is far from obvious
(including to me -- I had to take a few minutes to remember how all of
this works!).
> - `filter_bitmap_blob_limit()`, where we seem to filter through the
> bitmap of the current layer. I _think_ it makes sense to retain.
>
> - `create_bitmap_mapping()`, which feels like it should be converted?
I *think* that this is OK because we are only remapping one layer at a
time, but I'll have to double check. I'd do so now, but I'm trying to
respond as much as I can before my week is over ;-).
> It would be nice to document in the commit message why those functions
> don't need to be converted to help guide the reader a bit.
The conversions here are case-by-case, so I lean towards documenting any
non-obvious ones inline with a brief comment (like the adjustment I made
above for pseudo_merge_bitmap_for_commit()).
> > @@ -491,7 +499,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
> > const struct midx_pack_key *key = va;
> > struct multi_pack_index *midx = key->midx;
> >
> > - uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
> > + size_t pos = (uint32_t*)vb - (const uint32_t *)midx->revindex_data;
>
> Micronit: missing space between `uint32_t` and `*`.
Great catch, thanks!
> > + uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
> > uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
> > off_t versus_offset;
> >
>
> Okay, the calculation to calculate the position is basically the same,
> but we now also offset the position by the number of objects in
> preceding layers. Makes sense.
Exactly.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 03/13] pack-bitmap.c: open and store incremental bitmap layers
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-02-28 23:49 ` Taylor Blau
2025-03-03 10:55 ` Patrick Steinhardt
0 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-02-28 23:49 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:16AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:26PM -0500, Taylor Blau wrote:
> > Prepare the pack-bitmap machinery to work with incremental MIDXs by
> > adding a new "base" field to keep track of the bitmap index associated
> > with the previous MIDX layer.
> >
> > The changes in this commit are mostly boilerplate to open the correct
> > bitmap(s), add them to the chain bitmap layers along the "base" pointer,
>
> s/bitmap layers/of &/
>
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index bba9c6a905a..41675a69f68 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -54,6 +54,13 @@ struct bitmap_index {
> > struct packed_git *pack;
> > struct multi_pack_index *midx;
> >
> > + /*
> > + * If using a multi-pack index chain, 'base' points to the
> > + * bitmap index corresponding to this bitmap's midx->base_midx.
> > + */
> > + struct bitmap_index *base;
> > + uint32_t base_nr;
> > +
>
> It would be nice to point out that `base_nr` is not 0-indexed, but
> 1-indexed, which is rather uncommon. Is there any particular reason why
> you made it 1-indexed?
Hah, I have no idea! If I remember correctly, it's because it makes it
(slightly) more convenient to do:
ewah_or_iterator_init(it, bitmap_git->commits_all,
bitmap_git->base_nr);
, instead of incrementing 'base_nr' by 1 to determine the number of
sub-iterators to allocate.
So I think there are a couple of options here. Short of doing nothing,
we could:
1. Rename 'base_nr' to 'layers_nr' which would make it clearer that the
count includes the current layer, thus making it 1-indexed.
2. Leave 'base_nr' named as-is, but make it 0-indexed, and have callers add
1 when they need to know the number of layers.
I prefer the explicitness of (2), which is how I adjusted things
locally. But if you prefer (1) or some yet-unknown (3), I'm happy to
adjust it further!
> > @@ -397,10 +409,17 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
> > {
> > struct stat st;
> > char *bitmap_name = midx_bitmap_filename(midx);
> > - int fd = git_open(bitmap_name);
> > + int fd;
> > uint32_t i, preferred_pack;
> > struct packed_git *preferred;
> >
> > + fd = git_open(bitmap_name);
> > + if (fd < 0 && errno == ENOENT) {
> > + FREE_AND_NULL(bitmap_name);
> > + bitmap_name = midx_bitmap_filename(midx);
> > + fd = git_open(bitmap_name);
> > + }
> > +
>
> Wait, this looks weird to me. `bitmap_name` already contains the result
> of `midx_bitmap_filename()`, so you're essentially retrying the exact
> same operation as before?
Hmm. I have no idea, but you're exactly right. I dropped it from my
local copy.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-03-01 0:12 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-01 0:12 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:19AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:29PM -0500, Taylor Blau wrote:
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index 41675a69f68..e3fdcf8a01a 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -946,18 +946,21 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
> > struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
> > struct commit *commit)
> > {
> > - khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
> > - commit->object.oid);
> > + khiter_t hash_pos;
> > + if (!bitmap_git)
> > + return NULL;
> > +
> > + hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
> > if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
> > struct stored_bitmap *bitmap = NULL;
> > if (!bitmap_git->table_lookup)
> > - return NULL;
> > + return bitmap_for_commit(bitmap_git->base, commit);
> >
> > /* this is a fairly hot codepath - no trace2_region please */
> > /* NEEDSWORK: cache misses aren't recorded */
> > bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
> > if (!bitmap)
> > - return NULL;
> > + return bitmap_for_commit(bitmap_git->base, commit);
> > return lookup_stored_bitmap(bitmap);
> > }
> > return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
>
> One of the things that worries me a bit is that by recursing, we
> essentially are bound in the depth of MIDX layers as we may otherwise
> bust the stack. Not that I expect us to typically have thousands of
> layers, but if there ever was a bug this may fail in bad ways.
I think it's a valid concern, but in practice I suspect we are unlikely
to run into it. If we have enough MIDX layers to blow the stack, we
probably have much bigger problems to worry about ;-).
> I already asked this for a previous commit, but what is the current
> state regarding compaction of the layers? Do we need to be worried about
> this or do we already know to keep things limited in general?
I mentioned upthread, but briefly: we don't compact MIDX layers today,
but the design is such that it is possible (and planned) to do in the
future (part three of this multi-series thing).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 06/13] pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-03-01 0:16 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-01 0:16 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:23AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:35PM -0500, Taylor Blau wrote:
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index c2c824347a6..1dddb242434 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -2347,14 +2348,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
> > uint32_t pack_int_id;
> >
> > if (bitmap_is_midx(bitmap_git)) {
> > + struct multi_pack_index *m = bitmap_git->midx;
> > uint32_t preferred_pack_pos;
> >
> > - if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
> > + while (m->base_midx)
> > + m = m->base_midx;
> > +
> > + if (midx_preferred_pack(m, &preferred_pack_pos) < 0) {
> > warning(_("unable to compute preferred pack, disabling pack-reuse"));
> > return;
> > }
>
> Instead of completely disabling preferred packs, should we maybe fall
> back to the preferred pack of the next-higher layer?
The upper layers aren't really supposed to have a notion of a preferred
pack, just whatever pack happens to be first in that layer's order
(which by definition of the pseudo-pack order makes it preferred, so to
speak).
But the base layer definitely can have a preferred pack, and failing to
find it usually means that there is other corruption or something else
wrong.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-03-01 0:19 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-01 0:19 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:26AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:38PM -0500, Taylor Blau wrote:
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index 1dddb242434..02864a0e1f7 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -2564,13 +2584,57 @@ static void test_show_commit(struct commit *commit, void *data)
> > display_progress(tdata->prg, ++tdata->seen);
> > }
> >
> > +static uint32_t bitmap_total_entry_count(struct bitmap_index *bitmap_git)
> > +{
> > + uint32_t total = 0;
> > + do {
> > + total = st_add(total, bitmap_git->entry_count);
> > + bitmap_git = bitmap_git->base;
> > + } while (bitmap_git);
> > +
> > + return total;
> > +}
> > +
> > +static void prepare_bitmap_test_data(struct bitmap_test_data *tdata,
> > + struct bitmap_index *bitmap_git)
>
> Nit: according to our style guide this should be called
> `bitmap_test_data_prepare()`.
Ah, I didn't know about that part of the style guide. It looks like it
comes from your 10f0723c8d (Documentation: document idiomatic function
names, 2024-07-30), which is semi-recent ;-).
> > +{
> > + memset(tdata, 0, sizeof(struct bitmap_test_data));
> > +
> > + tdata->bitmap_git = bitmap_git;
> > + tdata->base = bitmap_new();
> > + tdata->commits = ewah_to_bitmap(bitmap_git->commits);
> > + tdata->trees = ewah_to_bitmap(bitmap_git->trees);
> > + tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
> > + tdata->tags = ewah_to_bitmap(bitmap_git->tags);
> > +
> > + if (bitmap_git->base) {
> > + CALLOC_ARRAY(tdata->base_tdata, 1);
> > + prepare_bitmap_test_data(tdata->base_tdata, bitmap_git->base);
> > + }
> > +}
> > +
> > +static void free_bitmap_test_data(struct bitmap_test_data *tdata)
>
> Same nit here, this should be called `bitmap_test_data_free()`. In fact,
> it should be called `bitmap_test_data_release()`, because we don't free
> `tadata` itself.
Noted, and adjusted!
> > @@ -2579,17 +2643,26 @@ void test_bitmap_walk(struct rev_info *revs)
> > if (revs->pending.nr != 1)
> > die(_("you must specify exactly one commit to test"));
> >
> > - fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
> > + fprintf_ln(stderr, "Bitmap v%d test (%d entries%s, %d total)",
> > bitmap_git->version,
> > bitmap_git->entry_count,
> > - bitmap_git->table_lookup ? "" : " loaded");
> > + bitmap_git->table_lookup ? "" : " loaded",
> > + bitmap_total_entry_count(bitmap_git));
> >
> > root = revs->pending.objects[0].item;
> > - bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
> > + bm = find_bitmap_for_commit(bitmap_git, (struct commit *)root, &found);
> >
> > if (bm) {
> > fprintf_ln(stderr, "Found bitmap for '%s'. %d bits / %08x checksum",
> > - oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm));
> > + oid_to_hex(&root->oid),
> > + (int)bm->bit_size, ewah_checksum(bm));
> > +
> > + if (bitmap_is_midx(found))
> > + fprintf_ln(stderr, "Located via MIDX '%s'.",
> > + hash_to_hex(get_midx_checksum(found->midx)));
> > + else
> > + fprintf_ln(stderr, "Located via pack '%s'.",
> > + hash_to_hex(found->pack->hash));
> >
> > result = ewah_to_bitmap(bm);
> > }
>
> I'm a bit surprised that this doesn't result in any changes required in
> our tests.
That's intentional, the aim of this commit is really just to get us
ready for the eventuality of having incremental MIDXs. In the meantime,
we don't disrupt the behavior of --test-bitmap for single-layer MIDX
bitmaps.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 10/13] ewah: implement `struct ewah_or_iterator`
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-03-01 0:22 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-01 0:22 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:30AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:47PM -0500, Taylor Blau wrote:
> > diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
> > index 8785cbc54a8..b3a7ada0714 100644
> > --- a/ewah/ewah_bitmap.c
> > +++ b/ewah/ewah_bitmap.c
> > @@ -372,6 +372,39 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
> > read_new_rlw(it);
> > }
> >
> > +void ewah_or_iterator_init(struct ewah_or_iterator *it,
> > + struct ewah_bitmap **parents, size_t nr)
> > +{
> > + size_t i;
> > +
> > + memset(it, 0, sizeof(*it));
> > +
> > + ALLOC_ARRAY(it->its, nr);
> > + for (i = 0; i < nr; i++)
> > + ewah_iterator_init(&it->its[it->nr++], parents[i]);
> > +}
> > +
> > +int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
> > +{
> > + eword_t buf, out = 0;
> > + size_t i;
> > + int ret = 0;
> > +
> > + for (i = 0; i < it->nr; i++)
> > + if (ewah_iterator_next(&buf, &it->its[i])) {
> > + out |= buf;
> > + ret = 1;
> > + }
> > +
> > + *next = out;
> > + return ret;
> > +}
>
> Yup, this looks rather straight-forward: we advance each of our
> subiterators and or their respective results into the accumulated result
> that we end up returning to the user.
>
> One thing that surprised me though is that we don't seem to be able to
> tell whether all of the iterators could be moved on to the next result
> or not. But I guess that makes sense: iterators of lower levels will
> cover less objects and will thus eventually be exhausted before the
> iterators on the higher levels.
Exactly. Conceptually each iterator ends with an infinite number of
zero'd bits, and indeed they are expected to be different lengths for
the reason you mention.
> > +void ewah_or_iterator_free(struct ewah_or_iterator *it)
> > +{
> > + free(it->its);
> > +}
> > +
>
> This should be called `ewar_or_iterator_release()` because we don't
> releate `it`, as documented by our style guide.
Great call, thanks.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-03-01 0:26 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-01 0:26 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:34AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:50PM -0500, Taylor Blau wrote:
> > @@ -586,7 +604,29 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
> > return load_pack_revindex(r, bitmap_git->pack);
> > }
> >
> > -static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
> > +static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
> > +{
> > + struct bitmap_index *curr = bitmap_git;
> > + size_t i = bitmap_git->base_nr - 1;
> > +
> > + ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr);
> > + ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr);
> > + ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr);
> > + ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr);
> > +
> > + while (curr) {
> > + bitmap_git->commits_all[i] = curr->commits;
> > + bitmap_git->trees_all[i] = curr->trees;
> > + bitmap_git->blobs_all[i] = curr->blobs;
> > + bitmap_git->tags_all[i] = curr->tags;
> > +
> > + curr = curr->base;
> > + i -= 1;
>
> Do we want to `BUG()` in case `i == 0` before this statement?
Good idea, but I think we only want to do this when curr is non-NULL
after assigning it to curr->base, so something like:
--- 8< ---
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 48da3f3b5b..7cea838f58 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -616,6 +616,9 @@ static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
bitmap_git->tags_all[i] = curr->tags;
curr = curr->base;
+ if (curr && !i)
+ BUG("unexpected number of bitmap layers, expected %lu",
+ bitmap_git->base_nr);
i -= 1;
}
}
--- >8 ---
(though note that this is on top of an adjustment I made earlier to make
base_nr 0-indexed as it always should have been, though I don't think it
is relevant to the portion of the change shown here).
Thanks,
Taylor
^ permalink raw reply related [flat|nested] 136+ messages in thread
* Re: [PATCH v3 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-03-01 0:28 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-01 0:28 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:43AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:53PM -0500, Taylor Blau wrote:
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index 348488e2d9e..83696d834f6 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -1622,25 +1622,29 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
> > }
> > }
> >
> > -static void init_type_iterator(struct ewah_iterator *it,
> > +static void init_type_iterator(struct ewah_or_iterator *it,
> > struct bitmap_index *bitmap_git,
> > enum object_type type)
> > {
> > switch (type) {
> > case OBJ_COMMIT:
> > - ewah_iterator_init(it, bitmap_git->commits);
> > + ewah_or_iterator_init(it, bitmap_git->commits_all,
> > + bitmap_git->base_nr);
> > break;
> >
> > case OBJ_TREE:
> > - ewah_iterator_init(it, bitmap_git->trees);
> > + ewah_or_iterator_init(it, bitmap_git->trees_all,
> > + bitmap_git->base_nr);
> > break;
> >
> > case OBJ_BLOB:
> > - ewah_iterator_init(it, bitmap_git->blobs);
> > + ewah_or_iterator_init(it, bitmap_git->blobs_all,
> > + bitmap_git->base_nr);
> > break;
> >
> > case OBJ_TAG:
> > - ewah_iterator_init(it, bitmap_git->tags);
> > + ewah_or_iterator_init(it, bitmap_git->tags_all,
> > + bitmap_git->base_nr);
> > break;
> >
> > default:
>
> One thing I wonder here is whether we want to continue using the
> non-layered iterator in case we don't have an MIDX. But I guess it makes
> it easier to not do it like that because we have less conditionals, and
> overall the or'd logic shouldn't perform significantly workse anyway. So
> as long as we always initialize `base_nr` and the relevant `*_all`
> fields even in the non-MIDX case we're fine.
I had a similar thought when writing this and agree with you. We do set
up the _all iterators properly at the top of a chain (see the "if
(!recursing)" conditional inside of load_bitmap()).
Since we call that function for single-pack bitmaps as well (naturally
we are not recursing in that case), we'll call load_all_type_bitmaps()
in that case too.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 13/13] midx: implement writing incremental MIDX bitmaps
2025-02-28 10:01 ` Patrick Steinhardt
@ 2025-03-01 0:31 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-01 0:31 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 11:01:39AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:56PM -0500, Taylor Blau wrote:
> > diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> > index 49758e2525f..1fbebe84479 100644
> > --- a/pack-bitmap-write.c
> > +++ b/pack-bitmap-write.c
> > @@ -25,6 +25,8 @@
> > #include "alloc.h"
> > #include "refs.h"
> > #include "strmap.h"
> > +#include "midx.h"
> > +#include "pack-revindex.h"
>
> Nit: let's keep the headers sorted alphabetically.
>
> > @@ -206,19 +215,37 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer,
> > static uint32_t find_object_pos(struct bitmap_writer *writer,
> > const struct object_id *oid, int *found)
> > {
> > - struct object_entry *entry = packlist_find(writer->to_pack, oid);
> > + struct object_entry *entry;
> > +
> > + entry = packlist_find(writer->to_pack, oid);
> > + if (entry) {
> > + uint32_t base_objects = 0;
> > + if (writer->midx)
> > + base_objects = writer->midx->num_objects +
> > + writer->midx->num_objects_in_base;
> > +
> > + if (found)
> > + *found = 1;
> > + return oe_in_pack_pos(writer->to_pack, entry) + base_objects;
> > + } else if (writer->midx) {
> > + uint32_t at, pos;
> > +
> > + if (!bsearch_midx(oid, writer->midx, &at))
> > + goto missing;
> > + if (midx_to_pack_pos(writer->midx, at, &pos) < 0)
> > + goto missing;
> >
> > - if (!entry) {
> > if (found)
> > - *found = 0;
> > - warning("Failed to write bitmap index. Packfile doesn't have full closure "
> > - "(object %s is missing)", oid_to_hex(oid));
> > - return 0;
> > + *found = 1;
> > + return pos;
> > }
> >
> > +missing:
> > if (found)
> > - *found = 1;
> > - return oe_in_pack_pos(writer->to_pack, entry);
> > + *found = 0;
> > + warning("Failed to write bitmap index. Packfile doesn't have full closure "
> > + "(object %s is missing)", oid_to_hex(oid));
>
> Is this warning still accurate? I assume that in the MIDX case we don't
> have to have full closure in a single packfile, as that would otherwise
> make the whole thing rather moot.
Good question -- the warning is still "accurate" in the sense that the
thing we're generating the bitmap against has to have full object
closure under reachability.
It's important to note that even though in any individual layer we are
only selecting bitmaps from the set of commits in the union of packs in
the corresponding MIDX layer, they may well reference objects from
earlier layers. And there it isn't important that we find some reachable
object in the same MIDX layer, but rather that we find it in an earlier
layer or the current layer.
It's less than accurate in the sense that it says "Packfile doesn't have
full closure" and isn't marked for translation, but I punted on it here.
(This is definitely not the first time I've had the thought to change
it, so maybe I just should have done it here.)
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps
2025-02-28 23:26 ` Taylor Blau
@ 2025-03-03 10:54 ` Patrick Steinhardt
0 siblings, 0 replies; 136+ messages in thread
From: Patrick Steinhardt @ 2025-03-03 10:54 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 06:26:45PM -0500, Taylor Blau wrote:
> On Fri, Feb 28, 2025 at 11:01:04AM +0100, Patrick Steinhardt wrote:
> > On Tue, Nov 19, 2024 at 05:07:19PM -0500, Taylor Blau wrote:
> > > diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
> > > index cc063b30bea..a063262c360 100644
> > > --- a/Documentation/technical/multi-pack-index.txt
> > > +++ b/Documentation/technical/multi-pack-index.txt
> > > @@ -164,6 +164,70 @@ objects_nr($H2) + objects_nr($H1) + i
> > > (in the C implementation, this is often computed as `i +
> > > m->num_objects_in_base`).
> > >
> > > +=== Pseudo-pack order for incremental MIDXs
> > > +
> > > +The original implementation of multi-pack reachability bitmaps defined
> > > +the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
> > > +titled "multi-pack-index reverse indexes") roughly as follows:
> > > +
> > > +____
> > > +In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
> > > +objects in packs stored by the MIDX, laid out in pack order, and the
> > > +packs arranged in MIDX order (with the preferred pack coming first).
> > > +____
> > > +
> > > +In the incremental MIDX design, we extend this definition to include
> > > +objects from multiple layers of the MIDX chain. The pseudo-pack order
> > > +for incremental MIDXs is determined by concatenating the pseudo-pack
> > > +ordering for each layer of the MIDX chain in order. Formally two objects
> > > +`o1` and `o2` are compared as follows:
> > > +
> > > +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
> > > + `o1` is considered less than `o2`.
> >
> > Just as a refresher for myself: what is the consequence of an object
> > `o1` sorting earlier than `o2`? In the case where those refer to
> > different objects it is only used to establish the pseudo-pack order so
> > that we know how to interpret the bitmaps. But in the case where those
> > two objects refer to the same underlying object, e.g. because the object
> > is contained in two packs, it also impacts which of both objects would
> > be preferred e.g. during a clone, right?
>
> Great question -- the pseudo-pack order here is how we translate the set
> of objects in a MIDX into their corresponding bit positions in the
> bitmap.
>
> So if "o1" sorts ahead of "o2", that means that "o1" will appear in an
> earlier bit position than "o2". But note that we're talking about
> objects in a MIDX chain here, comprised of objects from each MIDX'd layer of
> that chain. So by that point the duplicates have already been filtered
> out, since:
>
> - The MIDX only stores one copy of an object in any given MIDX, and
>
> - The incremental MIDX design avoids putting objects from earlier
> layers in later ones.
>
> I tried to get at this a few lines up with "[...] a MIDX's pseudo-pack
> is the de-duplicated concatenation of [...]" to make clear that o1 != o2
> here. But let me know if you think I should clarify or emphasize that
> point further.
Okay, the deduplication bit was a bit subtle, so I missed that part. And
once one has learned about it my question makes less sense, as I was
expecting that an object may appear in the same MIDX chain multiple
times.
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v3 03/13] pack-bitmap.c: open and store incremental bitmap layers
2025-02-28 23:49 ` Taylor Blau
@ 2025-03-03 10:55 ` Patrick Steinhardt
0 siblings, 0 replies; 136+ messages in thread
From: Patrick Steinhardt @ 2025-03-03 10:55 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Fri, Feb 28, 2025 at 06:49:03PM -0500, Taylor Blau wrote:
> On Fri, Feb 28, 2025 at 11:01:16AM +0100, Patrick Steinhardt wrote:
> > On Tue, Nov 19, 2024 at 05:07:26PM -0500, Taylor Blau wrote:
> > > Prepare the pack-bitmap machinery to work with incremental MIDXs by
> > > adding a new "base" field to keep track of the bitmap index associated
> > > with the previous MIDX layer.
> > >
> > > The changes in this commit are mostly boilerplate to open the correct
> > > bitmap(s), add them to the chain bitmap layers along the "base" pointer,
> >
> > s/bitmap layers/of &/
> >
> > > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > > index bba9c6a905a..41675a69f68 100644
> > > --- a/pack-bitmap.c
> > > +++ b/pack-bitmap.c
> > > @@ -54,6 +54,13 @@ struct bitmap_index {
> > > struct packed_git *pack;
> > > struct multi_pack_index *midx;
> > >
> > > + /*
> > > + * If using a multi-pack index chain, 'base' points to the
> > > + * bitmap index corresponding to this bitmap's midx->base_midx.
> > > + */
> > > + struct bitmap_index *base;
> > > + uint32_t base_nr;
> > > +
> >
> > It would be nice to point out that `base_nr` is not 0-indexed, but
> > 1-indexed, which is rather uncommon. Is there any particular reason why
> > you made it 1-indexed?
>
> Hah, I have no idea! If I remember correctly, it's because it makes it
> (slightly) more convenient to do:
>
> ewah_or_iterator_init(it, bitmap_git->commits_all,
> bitmap_git->base_nr);
>
> , instead of incrementing 'base_nr' by 1 to determine the number of
> sub-iterators to allocate.
>
> So I think there are a couple of options here. Short of doing nothing,
> we could:
>
> 1. Rename 'base_nr' to 'layers_nr' which would make it clearer that the
> count includes the current layer, thus making it 1-indexed.
>
> 2. Leave 'base_nr' named as-is, but make it 0-indexed, and have callers add
> 1 when they need to know the number of layers.
>
> I prefer the explicitness of (2), which is how I adjusted things
> locally. But if you prefer (1) or some yet-unknown (3), I'm happy to
> adjust it further!
Yup, I also favor (2) here as it is the least surprising option to me.
Patrick
^ permalink raw reply [flat|nested] 136+ messages in thread
* [PATCH v4 00/13] midx: incremental multi-pack indexes, part two
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (14 preceding siblings ...)
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
` (13 more replies)
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
16 siblings, 14 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
This is a new round of my series to implement bitmap support for
incremental multi-pack indexes (MIDXs). It has been rebased on current
'master', which is 683c54c999 (Git 2.49, 2025-03-14) at the time of
writing.
== Changes since last time
This round addresses helpful review from Patrick Steinhardt. It
clarifies a number of concepts which were previously less than clear,
and makes a number of cosmetic changes to align more closely with our
CodingGuidelines in certain areas.
There are a few technical changes which you can see in the range-diff
below, but they are relatively minor. The series is functionally
unchanged since last time.
== Original cover letter
This series is based on 'master', with an additional merge between
tb/incremental-midx-part-1[1] and my newer series to fix a handful of
bugs related to pseudo-merge bitmaps[2].
This is the second of three series to implement support for incremental
multi-pack indexes (MIDXs). This series brings support for bitmaps that
are tied to incremental MIDXs in addition to regular MIDX bitmaps.
The details are laid out in the commits themselves, but the high-level
approach is as follows:
- Each layer in the incremental MIDX chain has its own corresponding
*.bitmap file. Each bitmap contains commits / pseudo-merges which
are selected only from the commits in that layer. Likewise, only
that layer's objects are written in the type-level bitmaps.
- The reachability traversal is only conducted on the top-most bitmap
corresponding to the most recent layer in the incremental MIDX
chain. Earlier layers may be consulted to retrieve commit /
pseudo-merge reachability bitmaps, but only the top-most bitmap's
"result" and "haves" fields are used.
- In essence, the top-most bitmap is the only one that "matters", and
earlier bitmaps are merely used to look up commit and pseudo-merge
bitmaps from that layer.
- Whenever we need to look at the type-level bitmaps corresponding to
the whole incremental MIDX chain, a new "ewah_or_iterator" is used.
This works in concept like a typical ewah_iterator, except works
over many EWAH bitmaps in parallel, OR-ing their results together
before returning them to the user.
In effect, this allows us to treat the union of all type-level
bitmaps (each of which only stores information about the objects its
corresponding layer within the incremental MIDX chain) as a single
type-level bitmap corresponding to all of the objects across every
layer of the incremental MIDX chain.
The sum total of this series is that we are able to append new commits /
pseudo-merges to a repository's reachability bitmaps without having to
rewrite existing bitmaps, making the operation much cheaper to perform
in large repositories.
The series is laid out roughly as follows:
- The first patch describes the technical details of incremental MIDX
bitmaps.
- The second patch adjusts the pack-revindex internals to prepare for
incremental MIDX bitmaps.
- The next seven patches adjust various components of the pack-bitmap
internals to do the same.
- The next three patches introduce and adjust callers to use the
ewah_or_iterator (as above).
- The final patch implements writing incremental MIDX bitmaps, and
introduces tests.
After this series, the remaining goals for this project include being
able to compact contiguous runs of incremental MIDX layers into a single
layer to support growing the chain of MIDX layers without the chain
itself becoming too long.
Thanks in advance for your review!
[1]: https://lore.kernel.org/git/cover.1722958595.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/cover.1723743050.git.me@ttaylorr.com/
Taylor Blau (13):
Documentation: describe incremental MIDX bitmaps
pack-revindex: prepare for incremental MIDX bitmaps
pack-bitmap.c: open and store incremental bitmap layers
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: keep track of each layer's type bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
midx: implement writing incremental MIDX bitmaps
Documentation/technical/multi-pack-index.adoc | 71 ++++
builtin/pack-objects.c | 3 +-
ewah/ewah_bitmap.c | 33 ++
ewah/ewok.h | 12 +
midx-write.c | 57 ++-
pack-bitmap-write.c | 65 +++-
pack-bitmap.c | 342 ++++++++++++++----
pack-bitmap.h | 4 +-
pack-revindex.c | 34 +-
t/t5334-incremental-multi-pack-index.sh | 84 +++++
10 files changed, 583 insertions(+), 122 deletions(-)
Range-diff against v3:
1: caed2c6ec3 ! 1: f565f2fff1 Documentation: describe incremental MIDX bitmaps
@@ Commit message
Signed-off-by: Taylor Blau <me@ttaylorr.com>
- ## Documentation/technical/multi-pack-index.txt ##
-@@ Documentation/technical/multi-pack-index.txt: objects_nr($H2) + objects_nr($H1) + i
+ ## Documentation/technical/multi-pack-index.adoc ##
+@@ Documentation/technical/multi-pack-index.adoc: objects_nr($H2) + objects_nr($H1) + i
(in the C implementation, this is often computed as `i +
m->num_objects_in_base`).
@@ Documentation/technical/multi-pack-index.txt: objects_nr($H2) + objects_nr($H1)
+
+1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
+ `o1` is considered less than `o2`.
++
+2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
-+ MIDX layer has no base, then If one of `pack(o1)` and `pack(o2)` is
-+ preferred and the other is not, then the preferred one sorts first. If
-+ there is a base layer (i.e. the MIDX layer is not the first layer in
-+ the chain), then if `pack(o1)` appears earlier in that MIDX layer's
-+ pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
-+ appears earlier, than the opposite is true.
++ MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
++ preferred and the other is not, then the preferred one sorts first. If
++ there is a base layer (i.e. the MIDX layer is not the first layer in
++ the chain), then if `pack(o1)` appears earlier in that MIDX layer's
++ pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
++ appears earlier, than the opposite is true.
++
+3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
-+ same MIDX layer. Sort `o1` and `o2` by their offset within their
-+ containing packfile.
++ same MIDX layer. Sort `o1` and `o2` by their offset within their
++ containing packfile.
++
++Note that the preferred pack is a property of the MIDX chain, not the
++individual layers themselves. Fundamentally we could introduce a
++per-layer preferred pack, but this is less relevant now that we can
++perform multi-pack reuse across the set of packs in a MIDX.
+
+=== Reachability bitmaps and incremental MIDXs
+
2: b902513f43 ! 2: f2a232e556 pack-revindex: prepare for incremental MIDX bitmaps
@@ pack-bitmap.c: static int ext_index_add_object(struct bitmap_index *bitmap_git,
}
struct bitmap_show_data {
+@@ pack-bitmap.c: struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_g
+ if (pos < 0 || pos >= bitmap_num_objects(bitmap_git))
+ goto done;
+
++ /*
++ * Use bitmap-relative positions instead of offsetting
++ * by bitmap_git->num_objects_in_base because we use
++ * this to find a match in pseudo_merge_for_parents(),
++ * and pseudo-merge groups cannot span multiple bitmap
++ * layers.
++ */
+ bitmap_set(parents, pos);
+ }
+
+- match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges,
+- parents);
++ match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges, parents);
+
+ done:
+ bitmap_free(parents);
@@ pack-bitmap.c: static void show_extended_objects(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
struct object *obj;
@@ pack-bitmap.c: static unsigned long get_size_by_pos(struct bitmap_index *bitmap_
struct eindex *eindex = &bitmap_git->ext_index;
- struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
+ struct object *obj = eindex->objects[pos - bitmap_non_extended_bits(bitmap_git)];
- if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
+ if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
+ &oi, 0) < 0)
die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
- }
@@ pack-bitmap.c: static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t objects_nr;
size_t i, pos;
@@ pack-bitmap.c: static off_t get_disk_usage_for_extended(struct bitmap_index *bit
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
continue;
- if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
+ if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
## pack-revindex.c ##
@@ pack-revindex.c: int load_midx_revindex(struct multi_pack_index *m)
trace2_data_string("load_midx_revindex", the_repository,
"source", "rev");
-- get_midx_filename_ext(&revindex_name, m->object_dir,
+- get_midx_filename_ext(m->repo->hash_algo, &revindex_name, m->object_dir,
- get_midx_checksum(m), MIDX_EXT_REV);
+ if (m->has_chain)
-+ get_split_midx_filename_ext(&revindex_name, m->object_dir,
-+ get_midx_checksum(m), MIDX_EXT_REV);
++ get_split_midx_filename_ext(m->repo->hash_algo, &revindex_name,
++ m->object_dir, get_midx_checksum(m),
++ MIDX_EXT_REV);
+ else
-+ get_midx_filename_ext(&revindex_name, m->object_dir,
-+ get_midx_checksum(m), MIDX_EXT_REV);
++ get_midx_filename_ext(m->repo->hash_algo, &revindex_name,
++ m->object_dir, get_midx_checksum(m),
++ MIDX_EXT_REV);
ret = load_revindex_from_disk(revindex_name.buf,
m->num_objects,
@@ pack-revindex.c: static int midx_pack_order_cmp(const void *va, const void *vb)
struct multi_pack_index *midx = key->midx;
- uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
-+ size_t pos = (uint32_t*)vb - (const uint32_t *)midx->revindex_data;
++ size_t pos = (uint32_t *)vb - (const uint32_t *)midx->revindex_data;
+ uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
off_t versus_offset;
3: 5b5d625cbe ! 3: aca0318fb1 pack-bitmap.c: open and store incremental bitmap layers
@@ pack-bitmap.c: struct bitmap_index {
+ /*
+ * If using a multi-pack index chain, 'base' points to the
+ * bitmap index corresponding to this bitmap's midx->base_midx.
++ *
++ * base_nr indicates how many layers precede this one, and is
++ * zero when base is NULL.
+ */
+ struct bitmap_index *base;
+ uint32_t base_nr;
@@ pack-bitmap.c: static int load_bitmap_entries_v1(struct bitmap_index *index)
char *midx_bitmap_filename(struct multi_pack_index *midx)
{
struct strbuf buf = STRBUF_INIT;
-- get_midx_filename_ext(&buf, midx->object_dir, get_midx_checksum(midx),
-- MIDX_EXT_BITMAP);
+- get_midx_filename_ext(midx->repo->hash_algo, &buf, midx->object_dir,
+- get_midx_checksum(midx), MIDX_EXT_BITMAP);
+ if (midx->has_chain)
-+ get_split_midx_filename_ext(&buf, midx->object_dir,
++ get_split_midx_filename_ext(midx->repo->hash_algo, &buf,
++ midx->object_dir,
+ get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
+ else
-+ get_midx_filename_ext(&buf, midx->object_dir,
-+ get_midx_checksum(midx), MIDX_EXT_BITMAP);
++ get_midx_filename_ext(midx->repo->hash_algo, &buf,
++ midx->object_dir, get_midx_checksum(midx),
++ MIDX_EXT_BITMAP);
return strbuf_detach(&buf, NULL);
}
-@@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
- {
- struct stat st;
- char *bitmap_name = midx_bitmap_filename(midx);
-- int fd = git_open(bitmap_name);
-+ int fd;
- uint32_t i, preferred_pack;
- struct packed_git *preferred;
-
-+ fd = git_open(bitmap_name);
-+ if (fd < 0 && errno == ENOENT) {
-+ FREE_AND_NULL(bitmap_name);
-+ bitmap_name = midx_bitmap_filename(midx);
-+ fd = git_open(bitmap_name);
-+ }
-+
- if (fd < 0) {
- if (errno != ENOENT)
- warning_errno("cannot open '%s'", bitmap_name);
@@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
goto cleanup;
}
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+- if (prepare_midx_pack(bitmap_repo(bitmap_git),
+- bitmap_git->midx,
+- i)) {
+ for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
- if (prepare_midx_pack(the_repository, bitmap_git->midx, i)) {
++ if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
warning(_("could not open pack %s"),
bitmap_git->midx->pack_names[i]);
-@@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
- goto cleanup;
- }
-
-- preferred = bitmap_git->midx->packs[preferred_pack];
-+ preferred = nth_midxed_pack(bitmap_git->midx, preferred_pack);
- if (!is_pack_valid(preferred)) {
- warning(_("preferred pack (%s) is invalid"),
- preferred->pack_name);
- goto cleanup;
+ goto cleanup;
+ }
}
+ if (midx->base_midx) {
+ bitmap_git->base = prepare_midx_bitmap_git(midx->base_midx);
+ bitmap_git->base_nr = bitmap_git->base->base_nr + 1;
+ } else {
-+ bitmap_git->base_nr = 1;
++ bitmap_git->base_nr = 0;
+ }
+
return 0;
@@ pack-bitmap.c: static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, st
bitmap_git->map_size = xsize_t(st.st_size);
bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
bitmap_git->map_pos = 0;
-+ bitmap_git->base_nr = 1;
++ bitmap_git->base_nr = 0;
close(fd);
if (load_bitmap_header(bitmap_git) < 0) {
@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r)
struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
{
-- struct repository *r = the_repository;
+- struct repository *r = midx->repo;
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
4: 16259667fb = 4: 832fd0e8dc pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
5: b7a45d7eff = 5: c7c9f89956 pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
6: c8401fa0fb = 6: 14d3d80c3d pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
7: 17ab23dd76 ! 7: b45a9ccbc2 pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
@@ pack-bitmap.c: static void test_show_commit(struct commit *commit, void *data)
+ return total;
+}
+
-+static void prepare_bitmap_test_data(struct bitmap_test_data *tdata,
++static void bitmap_test_data_prepare(struct bitmap_test_data *tdata,
+ struct bitmap_index *bitmap_git)
+{
+ memset(tdata, 0, sizeof(struct bitmap_test_data));
@@ pack-bitmap.c: static void test_show_commit(struct commit *commit, void *data)
+
+ if (bitmap_git->base) {
+ CALLOC_ARRAY(tdata->base_tdata, 1);
-+ prepare_bitmap_test_data(tdata->base_tdata, bitmap_git->base);
++ bitmap_test_data_prepare(tdata->base_tdata, bitmap_git->base);
+ }
+}
+
-+static void free_bitmap_test_data(struct bitmap_test_data *tdata)
++static void bitmap_test_data_release(struct bitmap_test_data *tdata)
+{
+ if (!tdata)
+ return;
+
-+ free_bitmap_test_data(tdata->base_tdata);
++ bitmap_test_data_release(tdata->base_tdata);
+ free(tdata->base_tdata);
+
+ bitmap_free(tdata->base);
@@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
+
+ if (bitmap_is_midx(found))
+ fprintf_ln(stderr, "Located via MIDX '%s'.",
-+ hash_to_hex(get_midx_checksum(found->midx)));
++ hash_to_hex_algop(get_midx_checksum(found->midx),
++ revs->repo->hash_algo));
+ else
+ fprintf_ln(stderr, "Located via pack '%s'.",
-+ hash_to_hex(found->pack->hash));
++ hash_to_hex_algop(found->pack->hash,
++ revs->repo->hash_algo));
result = ewah_to_bitmap(bm);
}
@@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
- tdata.trees = ewah_to_bitmap(bitmap_git->trees);
- tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
- tdata.tags = ewah_to_bitmap(bitmap_git->tags);
-+ prepare_bitmap_test_data(&tdata, bitmap_git);
- tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
++ bitmap_test_data_prepare(&tdata, bitmap_git);
+ tdata.prg = start_progress(revs->repo,
+ "Verifying bitmap entries",
+ result_popcnt);
- tdata.seen = 0;
traverse_commit_list(revs, &test_show_commit, &test_show_object, &tdata);
@@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
- bitmap_free(tdata.trees);
- bitmap_free(tdata.blobs);
- bitmap_free(tdata.tags);
-+ free_bitmap_test_data(&tdata);
++ bitmap_test_data_release(&tdata);
free_bitmap_index(bitmap_git);
}
8: 75d170ce07 = 8: c1eefeae99 pack-bitmap.c: compute disk-usage with incremental MIDXs
9: 0b4fcfcecb = 9: 11c4b7b949 pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
10: e1b5f6181e = 10: cb08ad6a62 ewah: implement `struct ewah_or_iterator`
11: 9ab8fb472f ! 11: a29f4ee60d pack-bitmap.c: keep track of each layer's type bitmaps
@@ pack-bitmap.c: static int load_reverse_index(struct repository *r, struct bitmap
+static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
+{
+ struct bitmap_index *curr = bitmap_git;
-+ size_t i = bitmap_git->base_nr - 1;
++ size_t i = bitmap_git->base_nr;
+
-+ ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr);
-+ ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr);
-+ ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr);
-+ ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr);
++ ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr + 1);
++ ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr + 1);
++ ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr + 1);
++ ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr + 1);
+
+ while (curr) {
+ bitmap_git->commits_all[i] = curr->commits;
@@ pack-bitmap.c: static int load_reverse_index(struct repository *r, struct bitmap
+ bitmap_git->tags_all[i] = curr->tags;
+
+ curr = curr->base;
++ if (curr && !i)
++ BUG("unexpected number of bitmap layers, expected %"PRIu32,
++ bitmap_git->base_nr + 1);
+ i -= 1;
+ }
+}
12: 87cb011e7f ! 12: a1cf65bedc pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
@@ pack-bitmap.c: static void show_extended_objects(struct bitmap_index *bitmap_git
case OBJ_COMMIT:
- ewah_iterator_init(it, bitmap_git->commits);
+ ewah_or_iterator_init(it, bitmap_git->commits_all,
-+ bitmap_git->base_nr);
++ bitmap_git->base_nr + 1);
break;
case OBJ_TREE:
- ewah_iterator_init(it, bitmap_git->trees);
+ ewah_or_iterator_init(it, bitmap_git->trees_all,
-+ bitmap_git->base_nr);
++ bitmap_git->base_nr + 1);
break;
case OBJ_BLOB:
- ewah_iterator_init(it, bitmap_git->blobs);
+ ewah_or_iterator_init(it, bitmap_git->blobs_all,
-+ bitmap_git->base_nr);
++ bitmap_git->base_nr + 1);
break;
case OBJ_TAG:
- ewah_iterator_init(it, bitmap_git->tags);
+ ewah_or_iterator_init(it, bitmap_git->tags_all,
-+ bitmap_git->base_nr);
++ bitmap_git->base_nr + 1);
break;
default:
13: 77ddd1170f ! 13: d0d564685b midx: implement writing incremental MIDX bitmaps
@@ builtin/pack-objects.c: static void write_pack_file(void)
bitmap_writer_build_type_index(&bitmap_writer,
written_list);
+ ## ewah/ewah_bitmap.c ##
+@@ ewah/ewah_bitmap.c: int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
+ return ret;
+ }
+
+-void ewah_or_iterator_free(struct ewah_or_iterator *it)
++void ewah_or_iterator_release(struct ewah_or_iterator *it)
+ {
+ free(it->its);
+ }
+
+ ## ewah/ewok.h ##
+@@ ewah/ewok.h: void ewah_or_iterator_init(struct ewah_or_iterator *it,
+
+ int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
+
+-void ewah_or_iterator_free(struct ewah_or_iterator *it);
++void ewah_or_iterator_release(struct ewah_or_iterator *it);
+
+ void ewah_xor(
+ struct ewah_bitmap *ewah_i,
+
## midx-write.c ##
@@ midx-write.c: static uint32_t *midx_pack_order(struct write_midx_context *ctx)
return pack_order;
@@ midx-write.c: static uint32_t *midx_pack_order(struct write_midx_context *ctx)
struct strbuf buf = STRBUF_INIT;
char *tmp_file;
- trace2_region_enter("midx", "write_midx_reverse_index", the_repository);
+ trace2_region_enter("midx", "write_midx_reverse_index", ctx->repo);
-- strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+- strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex_algop(midx_hash,
+- ctx->repo->hash_algo));
+ if (ctx->incremental)
-+ get_split_midx_filename_ext(&buf, object_dir, midx_hash,
++ get_split_midx_filename_ext(ctx->repo->hash_algo, &buf,
++ object_dir, midx_hash,
+ MIDX_EXT_REV);
+ else
-+ get_midx_filename_ext(&buf, object_dir, midx_hash,
-+ MIDX_EXT_REV);
++ get_midx_filename_ext(ctx->repo->hash_algo, &buf, object_dir,
++ midx_hash, MIDX_EXT_REV);
- tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
- midx_hash, WRITE_REV);
+ tmp_file = write_rev_file_order(ctx->repo->hash_algo, NULL, ctx->pack_order,
+ ctx->entries_nr, midx_hash, WRITE_REV);
@@ midx-write.c: static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
return cb.commits;
}
--static int write_midx_bitmap(const char *midx_name,
+-static int write_midx_bitmap(struct repository *r, const char *midx_name,
+static int write_midx_bitmap(struct write_midx_context *ctx,
+ const char *object_dir,
const unsigned char *midx_hash,
@@ midx-write.c: static struct commit **find_commits_for_midx_bitmap(uint32_t *inde
struct bitmap_writer writer;
struct pack_idx_entry **index;
- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
-- hash_to_hex(midx_hash));
+- hash_to_hex_algop(midx_hash, r->hash_algo));
+ struct strbuf bitmap_name = STRBUF_INIT;
+
+- trace2_region_enter("midx", "write_midx_bitmap", r);
++ trace2_region_enter("midx", "write_midx_bitmap", ctx->repo);
+
+ if (ctx->incremental)
-+ get_split_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
++ get_split_midx_filename_ext(ctx->repo->hash_algo, &bitmap_name,
++ object_dir, midx_hash,
+ MIDX_EXT_BITMAP);
+ else
-+ get_midx_filename_ext(&bitmap_name, object_dir, midx_hash,
-+ MIDX_EXT_BITMAP);
++ get_midx_filename_ext(ctx->repo->hash_algo, &bitmap_name,
++ object_dir, midx_hash, MIDX_EXT_BITMAP);
- trace2_region_enter("midx", "write_midx_bitmap", the_repository);
-
-@@ midx-write.c: static int write_midx_bitmap(const char *midx_name,
+ if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
+ options |= BITMAP_OPT_HASH_CACHE;
+@@ midx-write.c: static int write_midx_bitmap(struct repository *r, const char *midx_name,
for (i = 0; i < pdata->nr_objects; i++)
index[i] = &pdata->objects[i].idx;
-- bitmap_writer_init(&writer, the_repository, pdata);
-+ bitmap_writer_init(&writer, the_repository, pdata,
+- bitmap_writer_init(&writer, r, pdata);
++ bitmap_writer_init(&writer, ctx->repo, pdata,
+ ctx->incremental ? ctx->base_midx : NULL);
bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS);
bitmap_writer_build_type_index(&writer, index);
-@@ midx-write.c: static int write_midx_bitmap(const char *midx_name,
+@@ midx-write.c: static int write_midx_bitmap(struct repository *r, const char *midx_name,
* bitmap_writer_finish().
*/
for (i = 0; i < pdata->nr_objects; i++)
@@ midx-write.c: static int write_midx_bitmap(const char *midx_name,
bitmap_writer_select_commits(&writer, commits, commits_nr);
ret = bitmap_writer_build(&writer);
-@@ midx-write.c: static int write_midx_bitmap(const char *midx_name,
+@@ midx-write.c: static int write_midx_bitmap(struct repository *r, const char *midx_name,
goto cleanup;
bitmap_writer_set_checksum(&writer, midx_hash);
@@ midx-write.c: static int write_midx_bitmap(const char *midx_name,
+ strbuf_release(&bitmap_name);
bitmap_writer_free(&writer);
- trace2_region_leave("midx", "write_midx_bitmap", the_repository);
-@@ midx-write.c: static int write_midx_internal(const char *object_dir,
- trace2_region_enter("midx", "write_midx_internal", the_repository);
+- trace2_region_leave("midx", "write_midx_bitmap", r);
++ trace2_region_leave("midx", "write_midx_bitmap", ctx->repo);
+
+ return ret;
+ }
+@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *object_dir,
+ ctx.repo = r;
ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
- if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
@@ midx-write.c: static int write_midx_internal(const char *object_dir,
if (ctx.incremental)
strbuf_addf(&midx_name,
-@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *object_dir,
if (ctx.incremental) {
struct multi_pack_index *m = ctx.base_midx;
while (m) {
+ if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+ error(_("could not load reverse index for MIDX %s"),
-+ hash_to_hex(get_midx_checksum(m)));
++ hash_to_hex_algop(get_midx_checksum(m),
++ m->repo->hash_algo));
+ result = 1;
+ goto cleanup;
+ }
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
-@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *object_dir,
if (flags & MIDX_WRITE_REV_INDEX &&
git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
@@ midx-write.c: static int write_midx_internal(const char *object_dir,
if (flags & MIDX_WRITE_BITMAP) {
struct packing_data pdata;
-@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *object_dir,
FREE_AND_NULL(ctx.entries);
ctx.entries_nr = 0;
-- if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
+- if (write_midx_bitmap(r, midx_name.buf, midx_hash, &pdata,
- commits, commits_nr, ctx.pack_order,
+ if (write_midx_bitmap(&ctx, object_dir,
+ midx_hash, &pdata, commits, commits_nr,
@@ pack-bitmap-write.c: void bitmap_writer_finish(struct bitmap_writer *writer,
write_selected_commits_v1(writer, f, offsets);
+ ## pack-bitmap.c ##
+@@ pack-bitmap.c: static void show_objects_for_type(
+ }
+ }
+
+- ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
+ }
+
+ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
+@@ pack-bitmap.c: static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
+ bitmap_unset(to_filter, pos);
+ }
+
+- ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
+ bitmap_free(tips);
+ }
+
+@@ pack-bitmap.c: static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
+ bitmap_unset(to_filter, pos);
+ }
+
+- ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
+ bitmap_free(tips);
+ }
+
+@@ pack-bitmap.c: static uint32_t count_object_type(struct bitmap_index *bitmap_git,
+ count++;
+ }
+
+- ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
+
+ return count;
+ }
+@@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
+ }
+ }
+
+- ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
+
+ return total;
+ }
+
## pack-bitmap.h ##
@@ pack-bitmap.h: struct bitmap_writer {
base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e
--
2.49.0.13.gd0d564685b
^ permalink raw reply [flat|nested] 136+ messages in thread
* [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 1:16 ` Jeff King
2025-03-18 2:42 ` Elijah Newren
2025-03-14 20:18 ` [PATCH v4 02/13] pack-revindex: prepare for " Taylor Blau
` (12 subsequent siblings)
13 siblings, 2 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare to implement support for reachability bitmaps for the new
incremental multi-pack index (MIDX) feature over the following commits.
This commit begins by first describing the relevant format and usage
details for incremental MIDX bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/technical/multi-pack-index.adoc | 71 +++++++++++++++++++
1 file changed, 71 insertions(+)
diff --git a/Documentation/technical/multi-pack-index.adoc b/Documentation/technical/multi-pack-index.adoc
index cc063b30be..ab98ecfeb9 100644
--- a/Documentation/technical/multi-pack-index.adoc
+++ b/Documentation/technical/multi-pack-index.adoc
@@ -164,6 +164,77 @@ objects_nr($H2) + objects_nr($H1) + i
(in the C implementation, this is often computed as `i +
m->num_objects_in_base`).
+=== Pseudo-pack order for incremental MIDXs
+
+The original implementation of multi-pack reachability bitmaps defined
+the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
+titled "multi-pack-index reverse indexes") roughly as follows:
+
+____
+In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the MIDX, laid out in pack order, and the
+packs arranged in MIDX order (with the preferred pack coming first).
+____
+
+In the incremental MIDX design, we extend this definition to include
+objects from multiple layers of the MIDX chain. The pseudo-pack order
+for incremental MIDXs is determined by concatenating the pseudo-pack
+ordering for each layer of the MIDX chain in order. Formally two objects
+`o1` and `o2` are compared as follows:
+
+1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
+ `o1` is considered less than `o2`.
+
+2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
+ MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
+ preferred and the other is not, then the preferred one sorts first. If
+ there is a base layer (i.e. the MIDX layer is not the first layer in
+ the chain), then if `pack(o1)` appears earlier in that MIDX layer's
+ pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
+ appears earlier, than the opposite is true.
+
+3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
+ same MIDX layer. Sort `o1` and `o2` by their offset within their
+ containing packfile.
+
+Note that the preferred pack is a property of the MIDX chain, not the
+individual layers themselves. Fundamentally we could introduce a
+per-layer preferred pack, but this is less relevant now that we can
+perform multi-pack reuse across the set of packs in a MIDX.
+
+=== Reachability bitmaps and incremental MIDXs
+
+Each layer of an incremental MIDX chain may have its objects (and the
+objects from any previous layer in the same MIDX chain) represented in
+its own `*.bitmap` file.
+
+The structure of a `*.bitmap` file belonging to an incremental MIDX
+chain is identical to that of a non-incremental MIDX bitmap, or a
+classic single-pack bitmap. Since objects are added to the end of the
+incremental MIDX's pseudo-pack order (see: above), it is possible to
+extend a bitmap when appending to the end of a MIDX chain.
+
+(Note: it is possible likewise to compress a contiguous sequence of MIDX
+incremental layers, and their `*.bitmap`(s) into a single layer and
+`*.bitmap`, but this is not yet implemented.)
+
+The object positions used are global within the pseudo-pack order, so
+subsequent layers will have, for example, `m->num_objects_in_base`
+number of `0` bits in each of their four type bitmaps. This follows from
+the fact that we only write type bitmap entries for objects present in
+the layer immediately corresponding to the bitmap).
+
+Note also that only the bitmap pertaining to the most recent layer in an
+incremental MIDX chain is used to store reachability information about
+the interesting and uninteresting objects in a reachability query.
+Earlier bitmap layers are only used to look up commit and pseudo-merge
+bitmaps from that layer, as well as the type-level bitmaps for objects
+in that layer.
+
+To simplify the implementation, type-level bitmaps are iterated
+simultaneously, and their results are OR'd together to avoid recursively
+calling internal bitmap functions.
+
Future Work
-----------
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
2025-03-14 20:18 ` [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 1:27 ` Jeff King
2025-03-18 2:43 ` Elijah Newren
2025-03-14 20:18 ` [PATCH v4 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
` (11 subsequent siblings)
13 siblings, 2 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:
- load_midx_revindex() learns to use the appropriate MIDX filename
depending on whether the given 'struct multi_pack_index *' is
incremental or not.
- pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
object position in the MIDX pseudo-pack order, and finds the
earliest containing MIDX (similar to midx.c::midx_for_object().
- midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
number of objects in the base (since 'vb - midx->revindx_data' is
relative to the containing MIDX, and pack_pos_to_midx() expects a
global position).
Likewise, this function adjusts its output by adding
m->num_objects_in_base to return a global position out through the
`*pos` pointer.
Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 40 ++++++++++++++++++++++++++++------------
pack-revindex.c | 34 +++++++++++++++++++++++++---------
2 files changed, 53 insertions(+), 21 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6406953d32..c26d85b5db 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
return read_bitmap(index->map, index->map_size, &index->map_pos);
}
+static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
+{
+ if (index->midx) {
+ struct multi_pack_index *m = index->midx;
+ return m->num_objects + m->num_objects_in_base;
+ }
+ return index->pack->num_objects;
+}
+
static uint32_t bitmap_num_objects(struct bitmap_index *index)
{
if (index->midx)
@@ -924,7 +933,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
if (pos < kh_end(positions)) {
int bitmap_pos = kh_value(positions, pos);
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
}
return -1;
@@ -992,7 +1001,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
bitmap_pos = kh_value(eindex->positions, hash_pos);
}
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
}
struct bitmap_show_data {
@@ -1342,11 +1351,17 @@ struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_g
if (pos < 0 || pos >= bitmap_num_objects(bitmap_git))
goto done;
+ /*
+ * Use bitmap-relative positions instead of offsetting
+ * by bitmap_git->num_objects_in_base because we use
+ * this to find a match in pseudo_merge_for_parents(),
+ * and pseudo-merge groups cannot span multiple bitmap
+ * layers.
+ */
bitmap_set(parents, pos);
}
- match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges,
- parents);
+ match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges, parents);
done:
bitmap_free(parents);
@@ -1500,7 +1515,8 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
struct object *obj;
- if (!bitmap_get(objects, st_add(bitmap_num_objects(bitmap_git), i)))
+ if (!bitmap_get(objects,
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
continue;
obj = eindex->objects[i];
@@ -1679,7 +1695,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* them individually.
*/
for (i = 0; i < eindex->count; i++) {
- size_t pos = st_add(i, bitmap_num_objects(bitmap_git));
+ size_t pos = st_add(i, bitmap_non_extended_bits(bitmap_git));
if (eindex->objects[i]->type == type &&
bitmap_get(to_filter, pos) &&
!bitmap_get(tips, pos))
@@ -1705,7 +1721,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
oi.sizep = &size;
- if (pos < bitmap_num_objects(bitmap_git)) {
+ if (pos < bitmap_non_extended_bits(bitmap_git)) {
struct packed_git *pack;
off_t ofs;
@@ -1729,7 +1745,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
}
} else {
struct eindex *eindex = &bitmap_git->ext_index;
- struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
+ struct object *obj = eindex->objects[pos - bitmap_non_extended_bits(bitmap_git)];
if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
&oi, 0) < 0)
die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
@@ -1882,7 +1898,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t objects_nr;
size_t i, pos;
- objects_nr = bitmap_num_objects(bitmap_git);
+ objects_nr = bitmap_non_extended_bits(bitmap_git);
pos = objects_nr / BITS_IN_EWORD;
if (pos > result->word_alloc)
@@ -2419,7 +2435,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
if (eindex->objects[i]->type == type &&
bitmap_get(objects,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
count++;
}
@@ -2820,7 +2836,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
BUG("rebuild_existing_bitmaps: missing required rev-cache "
"extension");
- num_objects = bitmap_num_objects(bitmap_git);
+ num_objects = bitmap_non_extended_bits(bitmap_git);
CALLOC_ARRAY(reposition, num_objects);
for (i = 0; i < num_objects; ++i) {
@@ -2963,7 +2979,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
struct object *obj = eindex->objects[i];
if (!bitmap_get(result,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
continue;
if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
diff --git a/pack-revindex.c b/pack-revindex.c
index d3832478d9..d3faab6a37 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -383,8 +383,14 @@ int load_midx_revindex(struct multi_pack_index *m)
trace2_data_string("load_midx_revindex", the_repository,
"source", "rev");
- get_midx_filename_ext(m->repo->hash_algo, &revindex_name, m->object_dir,
- get_midx_checksum(m), MIDX_EXT_REV);
+ if (m->has_chain)
+ get_split_midx_filename_ext(m->repo->hash_algo, &revindex_name,
+ m->object_dir, get_midx_checksum(m),
+ MIDX_EXT_REV);
+ else
+ get_midx_filename_ext(m->repo->hash_algo, &revindex_name,
+ m->object_dir, get_midx_checksum(m),
+ MIDX_EXT_REV);
ret = load_revindex_from_disk(revindex_name.buf,
m->num_objects,
@@ -471,11 +477,15 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
{
+ while (m && pos < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
if (!m->revindex_data)
BUG("pack_pos_to_midx: reverse index not yet loaded");
- if (m->num_objects <= pos)
+ if (m->num_objects + m->num_objects_in_base <= pos)
BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
- return get_be32(m->revindex_data + pos);
+ return get_be32(m->revindex_data + pos - m->num_objects_in_base);
}
struct midx_pack_key {
@@ -491,7 +501,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
const struct midx_pack_key *key = va;
struct multi_pack_index *midx = key->midx;
- uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+ size_t pos = (uint32_t *)vb - (const uint32_t *)midx->revindex_data;
+ uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
off_t versus_offset;
@@ -529,9 +540,9 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
{
uint32_t *found;
- if (key->pack >= m->num_packs)
+ if (key->pack >= m->num_packs + m->num_packs_in_base)
BUG("MIDX pack lookup out of bounds (%"PRIu32" >= %"PRIu32")",
- key->pack, m->num_packs);
+ key->pack, m->num_packs + m->num_packs_in_base);
/*
* The preferred pack sorts first, so determine its identifier by
* looking at the first object in pseudo-pack order.
@@ -551,7 +562,8 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
if (!found)
return -1;
- *pos = found - m->revindex_data;
+ *pos = (found - m->revindex_data) + m->num_objects_in_base;
+
return 0;
}
@@ -559,9 +571,13 @@ int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
{
struct midx_pack_key key;
+ while (m && at < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, at);
if (!m->revindex_data)
BUG("midx_to_pack_pos: reverse index not yet loaded");
- if (m->num_objects <= at)
+ if (m->num_objects + m->num_objects_in_base <= at)
BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
key.pack = nth_midxed_pack_int_id(m, at);
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 03/13] pack-bitmap.c: open and store incremental bitmap layers
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
2025-03-14 20:18 ` [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
2025-03-14 20:18 ` [PATCH v4 02/13] pack-revindex: prepare for " Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 4:13 ` Elijah Newren
2025-03-14 20:18 ` [PATCH v4 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
` (10 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.
The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain bitmap layers along the "base" pointer,
ensures that the correct packs and their reverse indexes are loaded
across MIDX layers, etc.
While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.
Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 62 +++++++++++++++++++++++++++++++++++++++------------
1 file changed, 48 insertions(+), 14 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index c26d85b5db..72fb11d014 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -54,6 +54,16 @@ struct bitmap_index {
struct packed_git *pack;
struct multi_pack_index *midx;
+ /*
+ * If using a multi-pack index chain, 'base' points to the
+ * bitmap index corresponding to this bitmap's midx->base_midx.
+ *
+ * base_nr indicates how many layers precede this one, and is
+ * zero when base is NULL.
+ */
+ struct bitmap_index *base;
+ uint32_t base_nr;
+
/* mmapped buffer of the whole bitmap index */
unsigned char *map;
size_t map_size; /* size of the mmaped buffer */
@@ -386,8 +396,15 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
char *midx_bitmap_filename(struct multi_pack_index *midx)
{
struct strbuf buf = STRBUF_INIT;
- get_midx_filename_ext(midx->repo->hash_algo, &buf, midx->object_dir,
- get_midx_checksum(midx), MIDX_EXT_BITMAP);
+ if (midx->has_chain)
+ get_split_midx_filename_ext(midx->repo->hash_algo, &buf,
+ midx->object_dir,
+ get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(midx->repo->hash_algo, &buf,
+ midx->object_dir, get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
return strbuf_detach(&buf, NULL);
}
@@ -454,16 +471,21 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
goto cleanup;
}
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
- if (prepare_midx_pack(bitmap_repo(bitmap_git),
- bitmap_git->midx,
- i)) {
+ for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
+ if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
warning(_("could not open pack %s"),
bitmap_git->midx->pack_names[i]);
goto cleanup;
}
}
+ if (midx->base_midx) {
+ bitmap_git->base = prepare_midx_bitmap_git(midx->base_midx);
+ bitmap_git->base_nr = bitmap_git->base->base_nr + 1;
+ } else {
+ bitmap_git->base_nr = 0;
+ }
+
return 0;
cleanup:
@@ -515,6 +537,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
bitmap_git->map_size = xsize_t(st.st_size);
bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
bitmap_git->map_pos = 0;
+ bitmap_git->base_nr = 0;
close(fd);
if (load_bitmap_header(bitmap_git) < 0) {
@@ -534,8 +557,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git)
{
if (bitmap_is_midx(bitmap_git)) {
- uint32_t i;
- int ret;
+ struct multi_pack_index *m;
/*
* The multi-pack-index's .rev file is already loaded via
@@ -544,10 +566,15 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
* But we still need to open the individual pack .rev files,
* since we will need to make use of them in pack-objects.
*/
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
- ret = load_pack_revindex(r, bitmap_git->midx->packs[i]);
- if (ret)
- return ret;
+ for (m = bitmap_git->midx; m; m = m->base_midx) {
+ uint32_t i;
+ int ret;
+
+ for (i = 0; i < m->num_packs; i++) {
+ ret = load_pack_revindex(r, m->packs[i]);
+ if (ret)
+ return ret;
+ }
}
return 0;
}
@@ -573,6 +600,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
goto failed;
+ if (bitmap_git->base) {
+ if (!bitmap_is_midx(bitmap_git))
+ BUG("non-MIDX bitmap has non-NULL base bitmap index");
+ if (load_bitmap(r, bitmap_git->base) < 0)
+ goto failed;
+ }
+
return 0;
failed:
@@ -657,10 +691,9 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
{
- struct repository *r = midx->repo;
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
+ if (!open_midx_bitmap_1(bitmap_git, midx))
return bitmap_git;
free_bitmap_index(bitmap_git);
@@ -2899,6 +2932,7 @@ void free_bitmap_index(struct bitmap_index *b)
close_midx_revindex(b->midx);
}
free_pseudo_merge_map(&b->pseudo_merges);
+ free_bitmap_index(b->base);
free(b);
}
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (2 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 1:38 ` Jeff King
2025-03-14 20:18 ` [PATCH v4 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
` (9 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.
Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.
The changes to do so are as follows:
- Avoid initializing hash_pos at its declaration, since
bitmap_for_commit() is now a recursive function and may receive a
NULL bitmap_index pointer as its first argument.
- In cases where we would previously return NULL (to indicate that a
lookup failed and the given bitmap_index does not contain an entry
corresponding to the given commit), recursively call the function on
the previous bitmap layer.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 72fb11d014..615d5de85e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -941,18 +941,21 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
struct commit *commit)
{
- khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
- commit->object.oid);
+ khiter_t hash_pos;
+ if (!bitmap_git)
+ return NULL;
+
+ hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
return lookup_stored_bitmap(bitmap);
}
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 05/13] pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (3 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
` (8 subsequent siblings)
13 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Since we may ask for a pack_id that is in an earlier MIDX layer relative
to the one corresponding to our bitmap, use nth_midxed_pack() instead of
accessing the ->packs array directly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 615d5de85e..1b4fec0033 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1635,7 +1635,7 @@ static void show_objects_for_type(
nth_midxed_object_oid(&oid, m, index_pos);
pack_id = nth_midxed_pack_int_id(m, index_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
} else {
index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 06/13] pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (4 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 4:13 ` Elijah Newren
2025-03-14 20:18 ` [PATCH v4 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
` (7 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
In a similar fashion as previous commits in the first phase of
incremental MIDXs, enumerate not just the packs in the current
incremental MIDX layer, but previous ones as well.
Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
likely to contain the most amount of reusable sections.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1b4fec0033..7a41535425 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2333,7 +2333,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
multi_pack_reuse = 0;
if (multi_pack_reuse) {
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+ struct multi_pack_index *m = bitmap_git->midx;
+ for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
struct bitmapped_pack pack;
if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
warning(_("unable to load pack: '%s', disabling pack-reuse"),
@@ -2359,14 +2360,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t pack_int_id;
if (bitmap_is_midx(bitmap_git)) {
+ struct multi_pack_index *m = bitmap_git->midx;
uint32_t preferred_pack_pos;
- if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
+ while (m->base_midx)
+ m = m->base_midx;
+
+ if (midx_preferred_pack(m, &preferred_pack_pos) < 0) {
warning(_("unable to compute preferred pack, disabling pack-reuse"));
return;
}
- pack = bitmap_git->midx->packs[preferred_pack_pos];
+ pack = nth_midxed_pack(m, preferred_pack_pos);
pack_int_id = preferred_pack_pos;
} else {
pack = bitmap_git->pack;
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (5 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 5:31 ` Elijah Newren
2025-03-14 20:18 ` [PATCH v4 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
` (6 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.
The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.
When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.
In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 107 ++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 86 insertions(+), 21 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 7a41535425..bb09ce3cf5 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -938,8 +938,9 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
return NULL;
}
-struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
- struct commit *commit)
+static struct ewah_bitmap *find_bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit,
+ struct bitmap_index **found)
{
khiter_t hash_pos;
if (!bitmap_git)
@@ -949,18 +950,30 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(bitmap);
}
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
}
+struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit)
+{
+ return find_bitmap_for_commit(bitmap_git, commit, NULL);
+}
+
static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
const struct object_id *oid)
{
@@ -2511,6 +2524,8 @@ struct bitmap_test_data {
struct bitmap *tags;
struct progress *prg;
size_t seen;
+
+ struct bitmap_test_data *base_tdata;
};
static void test_bitmap_type(struct bitmap_test_data *tdata,
@@ -2519,6 +2534,11 @@ static void test_bitmap_type(struct bitmap_test_data *tdata,
enum object_type bitmap_type = OBJ_NONE;
int bitmaps_nr = 0;
+ if (bitmap_is_midx(tdata->bitmap_git)) {
+ while (pos < tdata->bitmap_git->midx->num_objects_in_base)
+ tdata = tdata->base_tdata;
+ }
+
if (bitmap_get(tdata->commits, pos)) {
bitmap_type = OBJ_COMMIT;
bitmaps_nr++;
@@ -2582,13 +2602,57 @@ static void test_show_commit(struct commit *commit, void *data)
display_progress(tdata->prg, ++tdata->seen);
}
+static uint32_t bitmap_total_entry_count(struct bitmap_index *bitmap_git)
+{
+ uint32_t total = 0;
+ do {
+ total = st_add(total, bitmap_git->entry_count);
+ bitmap_git = bitmap_git->base;
+ } while (bitmap_git);
+
+ return total;
+}
+
+static void bitmap_test_data_prepare(struct bitmap_test_data *tdata,
+ struct bitmap_index *bitmap_git)
+{
+ memset(tdata, 0, sizeof(struct bitmap_test_data));
+
+ tdata->bitmap_git = bitmap_git;
+ tdata->base = bitmap_new();
+ tdata->commits = ewah_to_bitmap(bitmap_git->commits);
+ tdata->trees = ewah_to_bitmap(bitmap_git->trees);
+ tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
+ tdata->tags = ewah_to_bitmap(bitmap_git->tags);
+
+ if (bitmap_git->base) {
+ CALLOC_ARRAY(tdata->base_tdata, 1);
+ bitmap_test_data_prepare(tdata->base_tdata, bitmap_git->base);
+ }
+}
+
+static void bitmap_test_data_release(struct bitmap_test_data *tdata)
+{
+ if (!tdata)
+ return;
+
+ bitmap_test_data_release(tdata->base_tdata);
+ free(tdata->base_tdata);
+
+ bitmap_free(tdata->base);
+ bitmap_free(tdata->commits);
+ bitmap_free(tdata->trees);
+ bitmap_free(tdata->blobs);
+ bitmap_free(tdata->tags);
+}
+
void test_bitmap_walk(struct rev_info *revs)
{
struct object *root;
struct bitmap *result = NULL;
size_t result_popcnt;
struct bitmap_test_data tdata;
- struct bitmap_index *bitmap_git;
+ struct bitmap_index *bitmap_git, *found;
struct ewah_bitmap *bm;
if (!(bitmap_git = prepare_bitmap_git(revs->repo)))
@@ -2597,17 +2661,28 @@ void test_bitmap_walk(struct rev_info *revs)
if (revs->pending.nr != 1)
die(_("you must specify exactly one commit to test"));
- fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
+ fprintf_ln(stderr, "Bitmap v%d test (%d entries%s, %d total)",
bitmap_git->version,
bitmap_git->entry_count,
- bitmap_git->table_lookup ? "" : " loaded");
+ bitmap_git->table_lookup ? "" : " loaded",
+ bitmap_total_entry_count(bitmap_git));
root = revs->pending.objects[0].item;
- bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
+ bm = find_bitmap_for_commit(bitmap_git, (struct commit *)root, &found);
if (bm) {
fprintf_ln(stderr, "Found bitmap for '%s'. %d bits / %08x checksum",
- oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm));
+ oid_to_hex(&root->oid),
+ (int)bm->bit_size, ewah_checksum(bm));
+
+ if (bitmap_is_midx(found))
+ fprintf_ln(stderr, "Located via MIDX '%s'.",
+ hash_to_hex_algop(get_midx_checksum(found->midx),
+ revs->repo->hash_algo));
+ else
+ fprintf_ln(stderr, "Located via pack '%s'.",
+ hash_to_hex_algop(found->pack->hash,
+ revs->repo->hash_algo));
result = ewah_to_bitmap(bm);
}
@@ -2624,16 +2699,10 @@ void test_bitmap_walk(struct rev_info *revs)
if (prepare_revision_walk(revs))
die(_("revision walk setup failed"));
- tdata.bitmap_git = bitmap_git;
- tdata.base = bitmap_new();
- tdata.commits = ewah_to_bitmap(bitmap_git->commits);
- tdata.trees = ewah_to_bitmap(bitmap_git->trees);
- tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
- tdata.tags = ewah_to_bitmap(bitmap_git->tags);
+ bitmap_test_data_prepare(&tdata, bitmap_git);
tdata.prg = start_progress(revs->repo,
"Verifying bitmap entries",
result_popcnt);
- tdata.seen = 0;
traverse_commit_list(revs, &test_show_commit, &test_show_object, &tdata);
@@ -2645,11 +2714,7 @@ void test_bitmap_walk(struct rev_info *revs)
die(_("mismatch in bitmap results"));
bitmap_free(result);
- bitmap_free(tdata.base);
- bitmap_free(tdata.commits);
- bitmap_free(tdata.trees);
- bitmap_free(tdata.blobs);
- bitmap_free(tdata.tags);
+ bitmap_test_data_release(&tdata);
free_bitmap_index(bitmap_git);
}
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 08/13] pack-bitmap.c: compute disk-usage with incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (6 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 1:41 ` Jeff King
2025-03-14 20:18 ` [PATCH v4 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
` (5 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
In a similar fashion as previous commits, use nth_midxed_pack() instead
of accessing the MIDX's ->packs array directly to support incremental
MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index bb09ce3cf5..8442f8e55f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1778,7 +1778,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
} else {
pack = bitmap_git->pack;
@@ -3047,7 +3047,7 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- struct packed_git *pack = bitmap_git->midx->packs[pack_id];
+ struct packed_git *pack = nth_midxed_pack(bitmap_git->midx, pack_id);
if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
struct object_id oid;
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 09/13] pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (7 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
` (4 subsequent siblings)
13 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8442f8e55f..00acf5ec73 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1082,10 +1082,15 @@ static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git
struct commit *commit,
uint32_t commit_pos)
{
- int ret;
+ struct bitmap_index *curr = bitmap_git;
+ int ret = 0;
- ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges,
- result, commit, commit_pos);
+ while (curr) {
+ ret += apply_pseudo_merges_for_commit(&curr->pseudo_merges,
+ result, commit,
+ commit_pos);
+ curr = curr->base;
+ }
if (ret)
pseudo_merges_satisfied_nr += ret;
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 10/13] ewah: implement `struct ewah_or_iterator`
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (8 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 1:44 ` Jeff King
2025-03-14 20:18 ` [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
` (3 subsequent siblings)
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.
Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.
Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.
There are a couple of alternative approaches which were considered:
- Decompress each EWAH bitmap and OR them together, enumerating a
single (non-EWAH) bitmap. This would work, but has the disadvantage
of decompressing a potentially large bitmap, which may not be
necessary if the caller does not wish to read all of it.
- Recursively call bitmap internal functions, reusing the "result" and
"haves" bitmap from the top-most layer. This approach resembles the
original implementation of this feature, but is inefficient in that
it both (a) requires significant refactoring to implement, and (b)
enumerates large sections of later bitmaps which are all zeros (as
they pertain to objects in earlier layers).
(b) is not so bad in and of itself, but can cause significant
slow-downs when combined with expensive loop bodies.
This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
ewah/ewah_bitmap.c | 33 +++++++++++++++++++++++++++++++++
ewah/ewok.h | 12 ++++++++++++
2 files changed, 45 insertions(+)
diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 67f8f588e0..e92341b8fa 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -371,6 +371,39 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
read_new_rlw(it);
}
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr)
+{
+ size_t i;
+
+ memset(it, 0, sizeof(*it));
+
+ ALLOC_ARRAY(it->its, nr);
+ for (i = 0; i < nr; i++)
+ ewah_iterator_init(&it->its[it->nr++], parents[i]);
+}
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
+{
+ eword_t buf, out = 0;
+ size_t i;
+ int ret = 0;
+
+ for (i = 0; i < it->nr; i++)
+ if (ewah_iterator_next(&buf, &it->its[i])) {
+ out |= buf;
+ ret = 1;
+ }
+
+ *next = out;
+ return ret;
+}
+
+void ewah_or_iterator_free(struct ewah_or_iterator *it)
+{
+ free(it->its);
+}
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 5e357e2493..4b70641045 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -148,6 +148,18 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
*/
int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
+struct ewah_or_iterator {
+ struct ewah_iterator *its;
+ size_t nr;
+};
+
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr);
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
+
+void ewah_or_iterator_free(struct ewah_or_iterator *it);
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (9 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 2:01 ` Jeff King
2025-03-18 6:43 ` Elijah Newren
2025-03-14 20:18 ` [PATCH v4 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
` (2 subsequent siblings)
13 siblings, 2 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.
These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 54 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 00acf5ec73..3517972892 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -81,6 +81,24 @@ struct bitmap_index {
struct ewah_bitmap *blobs;
struct ewah_bitmap *tags;
+ /*
+ * Type index arrays when this bitmap is associated with an
+ * incremental multi-pack index chain.
+ *
+ * If n is the number of unique layers in the MIDX chain, then
+ * commits_all[n-1] is this structs 'commits' field,
+ * commits_all[n-2] is the commits field of this bitmap's
+ * 'base', and so on.
+ *
+ * When either associated either with a non-incremental MIDX, or
+ * a single packfile, these arrays each contain a single
+ * element.
+ */
+ struct ewah_bitmap **commits_all;
+ struct ewah_bitmap **trees_all;
+ struct ewah_bitmap **blobs_all;
+ struct ewah_bitmap **tags_all;
+
/* Map from object ID -> `stored_bitmap` for all the bitmapped commits */
kh_oid_map_t *bitmaps;
@@ -581,7 +599,32 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
return load_pack_revindex(r, bitmap_git->pack);
}
-static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
+static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
+{
+ struct bitmap_index *curr = bitmap_git;
+ size_t i = bitmap_git->base_nr;
+
+ ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr + 1);
+ ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr + 1);
+ ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr + 1);
+ ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr + 1);
+
+ while (curr) {
+ bitmap_git->commits_all[i] = curr->commits;
+ bitmap_git->trees_all[i] = curr->trees;
+ bitmap_git->blobs_all[i] = curr->blobs;
+ bitmap_git->tags_all[i] = curr->tags;
+
+ curr = curr->base;
+ if (curr && !i)
+ BUG("unexpected number of bitmap layers, expected %"PRIu32,
+ bitmap_git->base_nr + 1);
+ i -= 1;
+ }
+}
+
+static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
+ int recursing)
{
assert(bitmap_git->map);
@@ -603,10 +646,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (bitmap_git->base) {
if (!bitmap_is_midx(bitmap_git))
BUG("non-MIDX bitmap has non-NULL base bitmap index");
- if (load_bitmap(r, bitmap_git->base) < 0)
+ if (load_bitmap(r, bitmap_git->base, 1) < 0)
goto failed;
}
+ if (!recursing)
+ load_all_type_bitmaps(bitmap_git);
+
return 0;
failed:
@@ -682,7 +728,7 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
{
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git))
+ if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git, 0))
return bitmap_git;
free_bitmap_index(bitmap_git);
@@ -2050,7 +2096,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
* from disk. this is the point of no return; after this the rev_list
* becomes invalidated and we must perform the revwalk through bitmaps
*/
- if (load_bitmap(revs->repo, bitmap_git) < 0)
+ if (load_bitmap(revs->repo, bitmap_git, 0) < 0)
goto cleanup;
if (!use_boundary_traversal)
@@ -2983,6 +3029,10 @@ void free_bitmap_index(struct bitmap_index *b)
ewah_pool_free(b->trees);
ewah_pool_free(b->blobs);
ewah_pool_free(b->tags);
+ free(b->commits_all);
+ free(b->trees_all);
+ free(b->blobs_all);
+ free(b->tags_all);
if (b->bitmaps) {
struct stored_bitmap *sb;
kh_foreach_value(b->bitmaps, sb, {
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (10 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
@ 2025-03-14 20:18 ` Taylor Blau
2025-03-18 2:05 ` Jeff King
2025-03-14 20:19 ` [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
2025-03-18 2:21 ` [PATCH v4 00/13] midx: incremental multi-pack indexes, part two Jeff King
13 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:18 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 42 +++++++++++++++++++++++++++---------------
1 file changed, 27 insertions(+), 15 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 3517972892..5e6d4ace58 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1629,25 +1629,29 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
}
}
-static void init_type_iterator(struct ewah_iterator *it,
+static void init_type_iterator(struct ewah_or_iterator *it,
struct bitmap_index *bitmap_git,
enum object_type type)
{
switch (type) {
case OBJ_COMMIT:
- ewah_iterator_init(it, bitmap_git->commits);
+ ewah_or_iterator_init(it, bitmap_git->commits_all,
+ bitmap_git->base_nr + 1);
break;
case OBJ_TREE:
- ewah_iterator_init(it, bitmap_git->trees);
+ ewah_or_iterator_init(it, bitmap_git->trees_all,
+ bitmap_git->base_nr + 1);
break;
case OBJ_BLOB:
- ewah_iterator_init(it, bitmap_git->blobs);
+ ewah_or_iterator_init(it, bitmap_git->blobs_all,
+ bitmap_git->base_nr + 1);
break;
case OBJ_TAG:
- ewah_iterator_init(it, bitmap_git->tags);
+ ewah_or_iterator_init(it, bitmap_git->tags_all,
+ bitmap_git->base_nr + 1);
break;
default:
@@ -1664,7 +1668,7 @@ static void show_objects_for_type(
size_t i = 0;
uint32_t offset;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
struct bitmap *objects = bitmap_git->result;
@@ -1672,7 +1676,7 @@ static void show_objects_for_type(
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < objects->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = objects->words[i] & filter;
size_t pos = (i * BITS_IN_EWORD);
@@ -1714,6 +1718,8 @@ static void show_objects_for_type(
show_reach(&oid, object_type, 0, hash, pack, ofs);
}
}
+
+ ewah_or_iterator_free(&it);
}
static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
@@ -1765,7 +1771,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
@@ -1782,7 +1788,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* packfile.
*/
for (i = 0, init_type_iterator(&it, bitmap_git, type);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
if (i < tips->word_alloc)
mask &= ~tips->words[i];
@@ -1802,6 +1808,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_free(&it);
bitmap_free(tips);
}
@@ -1861,14 +1868,14 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
tips = find_tip_objects(bitmap_git, tip_objects, OBJ_BLOB);
for (i = 0, init_type_iterator(&it, bitmap_git, OBJ_BLOB);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
eword_t word = to_filter->words[i] & mask;
unsigned offset;
@@ -1896,6 +1903,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_free(&it);
bitmap_free(tips);
}
@@ -2527,12 +2535,12 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
struct eindex *eindex = &bitmap_git->ext_index;
uint32_t i = 0, count = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
init_type_iterator(&it, bitmap_git, type);
- while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+ while (i < objects->word_alloc && ewah_or_iterator_next(&filter, &it)) {
eword_t word = objects->words[i++] & filter;
count += ewah_bit_popcount64(word);
}
@@ -2544,6 +2552,8 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
count++;
}
+ ewah_or_iterator_free(&it);
+
return count;
}
@@ -3076,13 +3086,13 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
{
struct bitmap *result = bitmap_git->result;
off_t total = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
size_t i;
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < result->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = result->words[i] & filter;
size_t base = (i * BITS_IN_EWORD);
unsigned offset;
@@ -3123,6 +3133,8 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
}
}
+ ewah_or_iterator_free(&it);
+
return total;
}
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (11 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
@ 2025-03-14 20:19 ` Taylor Blau
2025-03-18 2:16 ` Jeff King
2025-03-18 17:13 ` Elijah Newren
2025-03-18 2:21 ` [PATCH v4 00/13] midx: incremental multi-pack indexes, part two Jeff King
13 siblings, 2 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-14 20:19 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 3 +-
ewah/ewah_bitmap.c | 2 +-
ewah/ewok.h | 2 +-
midx-write.c | 57 +++++++++++------
pack-bitmap-write.c | 65 ++++++++++++++-----
pack-bitmap.c | 10 +--
pack-bitmap.h | 4 +-
t/t5334-incremental-multi-pack-index.sh | 84 +++++++++++++++++++++++++
8 files changed, 183 insertions(+), 44 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 58a9b16126..a7e4bb7904 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1397,7 +1397,8 @@ static void write_pack_file(void)
if (write_bitmap_index) {
bitmap_writer_init(&bitmap_writer,
- the_repository, &to_pack);
+ the_repository, &to_pack,
+ NULL);
bitmap_writer_set_checksum(&bitmap_writer, hash);
bitmap_writer_build_type_index(&bitmap_writer,
written_list);
diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index e92341b8fa..056c410efb 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -399,7 +399,7 @@ int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
return ret;
}
-void ewah_or_iterator_free(struct ewah_or_iterator *it)
+void ewah_or_iterator_release(struct ewah_or_iterator *it)
{
free(it->its);
}
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 4b70641045..c29d354236 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -158,7 +158,7 @@ void ewah_or_iterator_init(struct ewah_or_iterator *it,
int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
-void ewah_or_iterator_free(struct ewah_or_iterator *it);
+void ewah_or_iterator_release(struct ewah_or_iterator *it);
void ewah_xor(
struct ewah_bitmap *ewah_i,
diff --git a/midx-write.c b/midx-write.c
index 48d6558253..0897cbd829 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -647,16 +647,22 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
return pack_order;
}
-static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
- struct write_midx_context *ctx)
+static void write_midx_reverse_index(struct write_midx_context *ctx,
+ const char *object_dir,
+ unsigned char *midx_hash)
{
struct strbuf buf = STRBUF_INIT;
char *tmp_file;
trace2_region_enter("midx", "write_midx_reverse_index", ctx->repo);
- strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex_algop(midx_hash,
- ctx->repo->hash_algo));
+ if (ctx->incremental)
+ get_split_midx_filename_ext(ctx->repo->hash_algo, &buf,
+ object_dir, midx_hash,
+ MIDX_EXT_REV);
+ else
+ get_midx_filename_ext(ctx->repo->hash_algo, &buf, object_dir,
+ midx_hash, MIDX_EXT_REV);
tmp_file = write_rev_file_order(ctx->repo->hash_algo, NULL, ctx->pack_order,
ctx->entries_nr, midx_hash, WRITE_REV);
@@ -829,22 +835,29 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
return cb.commits;
}
-static int write_midx_bitmap(struct repository *r, const char *midx_name,
+static int write_midx_bitmap(struct write_midx_context *ctx,
+ const char *object_dir,
const unsigned char *midx_hash,
struct packing_data *pdata,
struct commit **commits,
uint32_t commits_nr,
- uint32_t *pack_order,
unsigned flags)
{
int ret, i;
uint16_t options = 0;
struct bitmap_writer writer;
struct pack_idx_entry **index;
- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
- hash_to_hex_algop(midx_hash, r->hash_algo));
+ struct strbuf bitmap_name = STRBUF_INIT;
- trace2_region_enter("midx", "write_midx_bitmap", r);
+ trace2_region_enter("midx", "write_midx_bitmap", ctx->repo);
+
+ if (ctx->incremental)
+ get_split_midx_filename_ext(ctx->repo->hash_algo, &bitmap_name,
+ object_dir, midx_hash,
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(ctx->repo->hash_algo, &bitmap_name,
+ object_dir, midx_hash, MIDX_EXT_BITMAP);
if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
options |= BITMAP_OPT_HASH_CACHE;
@@ -861,7 +874,8 @@ static int write_midx_bitmap(struct repository *r, const char *midx_name,
for (i = 0; i < pdata->nr_objects; i++)
index[i] = &pdata->objects[i].idx;
- bitmap_writer_init(&writer, r, pdata);
+ bitmap_writer_init(&writer, ctx->repo, pdata,
+ ctx->incremental ? ctx->base_midx : NULL);
bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS);
bitmap_writer_build_type_index(&writer, index);
@@ -879,7 +893,7 @@ static int write_midx_bitmap(struct repository *r, const char *midx_name,
* bitmap_writer_finish().
*/
for (i = 0; i < pdata->nr_objects; i++)
- index[pack_order[i]] = &pdata->objects[i].idx;
+ index[ctx->pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(&writer, commits, commits_nr);
ret = bitmap_writer_build(&writer);
@@ -887,14 +901,14 @@ static int write_midx_bitmap(struct repository *r, const char *midx_name,
goto cleanup;
bitmap_writer_set_checksum(&writer, midx_hash);
- bitmap_writer_finish(&writer, index, bitmap_name, options);
+ bitmap_writer_finish(&writer, index, bitmap_name.buf, options);
cleanup:
free(index);
- free(bitmap_name);
+ strbuf_release(&bitmap_name);
bitmap_writer_free(&writer);
- trace2_region_leave("midx", "write_midx_bitmap", r);
+ trace2_region_leave("midx", "write_midx_bitmap", ctx->repo);
return ret;
}
@@ -1077,8 +1091,6 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
ctx.repo = r;
ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
- if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
- die(_("cannot write incremental MIDX with bitmap"));
if (ctx.incremental)
strbuf_addf(&midx_name,
@@ -1119,6 +1131,13 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (ctx.incremental) {
struct multi_pack_index *m = ctx.base_midx;
while (m) {
+ if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+ error(_("could not load reverse index for MIDX %s"),
+ hash_to_hex_algop(get_midx_checksum(m),
+ m->repo->hash_algo));
+ result = 1;
+ goto cleanup;
+ }
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
@@ -1387,7 +1406,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (flags & MIDX_WRITE_REV_INDEX &&
git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
- write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+ write_midx_reverse_index(&ctx, object_dir, midx_hash);
if (flags & MIDX_WRITE_BITMAP) {
struct packing_data pdata;
@@ -1410,8 +1429,8 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
FREE_AND_NULL(ctx.entries);
ctx.entries_nr = 0;
- if (write_midx_bitmap(r, midx_name.buf, midx_hash, &pdata,
- commits, commits_nr, ctx.pack_order,
+ if (write_midx_bitmap(&ctx, object_dir,
+ midx_hash, &pdata, commits, commits_nr,
flags) < 0) {
error(_("could not write multi-pack bitmap"));
result = 1;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 34e86d4994..8a30853d2e 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -26,6 +26,8 @@
#include "alloc.h"
#include "refs.h"
#include "strmap.h"
+#include "midx.h"
+#include "pack-revindex.h"
struct bitmapped_commit {
struct commit *commit;
@@ -43,7 +45,8 @@ static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer
}
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata)
+ struct packing_data *pdata,
+ struct multi_pack_index *midx)
{
memset(writer, 0, sizeof(struct bitmap_writer));
if (writer->bitmaps)
@@ -51,6 +54,7 @@ void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
writer->bitmaps = kh_init_oid_map();
writer->pseudo_merge_commits = kh_init_oid_map();
writer->to_pack = pdata;
+ writer->midx = midx;
string_list_init_dup(&writer->pseudo_merge_groups);
@@ -113,6 +117,11 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
struct pack_idx_entry **index)
{
uint32_t i;
+ uint32_t base_objects = 0;
+
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
writer->commits = ewah_new();
writer->trees = ewah_new();
@@ -142,19 +151,19 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
switch (real_type) {
case OBJ_COMMIT:
- ewah_set(writer->commits, i);
+ ewah_set(writer->commits, i + base_objects);
break;
case OBJ_TREE:
- ewah_set(writer->trees, i);
+ ewah_set(writer->trees, i + base_objects);
break;
case OBJ_BLOB:
- ewah_set(writer->blobs, i);
+ ewah_set(writer->blobs, i + base_objects);
break;
case OBJ_TAG:
- ewah_set(writer->tags, i);
+ ewah_set(writer->tags, i + base_objects);
break;
default:
@@ -207,19 +216,37 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer,
static uint32_t find_object_pos(struct bitmap_writer *writer,
const struct object_id *oid, int *found)
{
- struct object_entry *entry = packlist_find(writer->to_pack, oid);
+ struct object_entry *entry;
+
+ entry = packlist_find(writer->to_pack, oid);
+ if (entry) {
+ uint32_t base_objects = 0;
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+
+ if (found)
+ *found = 1;
+ return oe_in_pack_pos(writer->to_pack, entry) + base_objects;
+ } else if (writer->midx) {
+ uint32_t at, pos;
+
+ if (!bsearch_midx(oid, writer->midx, &at))
+ goto missing;
+ if (midx_to_pack_pos(writer->midx, at, &pos) < 0)
+ goto missing;
- if (!entry) {
if (found)
- *found = 0;
- warning("Failed to write bitmap index. Packfile doesn't have full closure "
- "(object %s is missing)", oid_to_hex(oid));
- return 0;
+ *found = 1;
+ return pos;
}
+missing:
if (found)
- *found = 1;
- return oe_in_pack_pos(writer->to_pack, entry);
+ *found = 0;
+ warning("Failed to write bitmap index. Packfile doesn't have full closure "
+ "(object %s is missing)", oid_to_hex(oid));
+ return 0;
}
static void compute_xor_offsets(struct bitmap_writer *writer)
@@ -586,7 +613,7 @@ int bitmap_writer_build(struct bitmap_writer *writer)
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct prio_queue tree_queue = { NULL };
struct bitmap_index *old_bitmap;
- uint32_t *mapping;
+ uint32_t *mapping = NULL;
int closed = 1; /* until proven otherwise */
if (writer->show_progress)
@@ -1021,7 +1048,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
struct strbuf tmp_file = STRBUF_INIT;
struct hashfile *f;
off_t *offsets = NULL;
- uint32_t i;
+ uint32_t i, base_objects;
struct bitmap_disk_header header;
@@ -1047,6 +1074,12 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (options & BITMAP_OPT_LOOKUP_TABLE)
CALLOC_ARRAY(offsets, writer->to_pack->nr_objects);
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+ else
+ base_objects = 0;
+
for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
struct bitmapped_commit *stored = &writer->selected[i];
int commit_pos = oid_pos(&stored->commit->object.oid, index,
@@ -1055,7 +1088,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (commit_pos < 0)
BUG(_("trying to write commit not in index"));
- stored->commit_pos = commit_pos;
+ stored->commit_pos = commit_pos + base_objects;
}
write_selected_commits_v1(writer, f, offsets);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 5e6d4ace58..94d1e8474a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1719,7 +1719,7 @@ static void show_objects_for_type(
}
}
- ewah_or_iterator_free(&it);
+ ewah_or_iterator_release(&it);
}
static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
@@ -1808,7 +1808,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
- ewah_or_iterator_free(&it);
+ ewah_or_iterator_release(&it);
bitmap_free(tips);
}
@@ -1903,7 +1903,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
- ewah_or_iterator_free(&it);
+ ewah_or_iterator_release(&it);
bitmap_free(tips);
}
@@ -2552,7 +2552,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
count++;
}
- ewah_or_iterator_free(&it);
+ ewah_or_iterator_release(&it);
return count;
}
@@ -3133,7 +3133,7 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
}
}
- ewah_or_iterator_free(&it);
+ ewah_or_iterator_release(&it);
return total;
}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index d7f4b8b8e9..dd0951088f 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -111,6 +111,7 @@ struct bitmap_writer {
kh_oid_map_t *bitmaps;
struct packing_data *to_pack;
+ struct multi_pack_index *midx; /* if appending to a MIDX chain */
struct bitmapped_commit *selected;
unsigned int selected_nr, selected_alloc;
@@ -125,7 +126,8 @@ struct bitmap_writer {
};
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata);
+ struct packing_data *pdata,
+ struct multi_pack_index *midx);
void bitmap_writer_show_progress(struct bitmap_writer *writer, int show);
void bitmap_writer_set_checksum(struct bitmap_writer *writer,
const unsigned char *sha1);
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
index 26257e5660..46d1f0b864 100755
--- a/t/t5334-incremental-multi-pack-index.sh
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -44,4 +44,88 @@ test_expect_success 'convert incremental to non-incremental' '
compare_results_with_midx 'non-incremental MIDX conversion'
+write_midx_layer () {
+ n=1
+ if test -f $midx_chain
+ then
+ n="$(($(wc -l <$midx_chain) + 1))"
+ fi
+
+ for i in 1 2
+ do
+ test_commit $n.$i &&
+ git repack -d || return 1
+ done &&
+ git multi-pack-index write --bitmap --incremental
+}
+
+test_expect_success 'write initial MIDX layer' '
+ git repack -ad &&
+ write_midx_layer
+'
+
+test_expect_success 'read bitmap from first MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'write another MIDX layer' '
+ write_midx_layer
+'
+
+test_expect_success 'midx verify with multiple layers' '
+ git multi-pack-index verify
+'
+
+test_expect_success 'read bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 2.2
+'
+
+test_expect_success 'read earlier bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'show object from first pack' '
+ git cat-file -p 1.1
+'
+
+test_expect_success 'show object from second pack' '
+ git cat-file -p 2.2
+'
+
+for reuse in false single multi
+do
+ test_expect_success "full clone (pack.allowPackReuse=$reuse)" '
+ rm -fr clone.git &&
+
+ git config pack.allowPackReuse $reuse &&
+ git clone --no-local --bare . clone.git
+ '
+done
+
+test_expect_success 'relink existing MIDX layer' '
+ rm -fr "$midxdir" &&
+
+ GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap &&
+
+ midx_hash="$(test-tool read-midx --checksum $objdir)" &&
+
+ test_path_is_file "$packdir/multi-pack-index" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_commit another &&
+ git repack -d &&
+ git multi-pack-index write --bitmap --incremental &&
+
+ test_path_is_missing "$packdir/multi-pack-index" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.midx" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.rev" &&
+ test_line_count = 2 "$midx_chain"
+
+'
+
test_done
--
2.49.0.13.gd0d564685b
^ permalink raw reply related [flat|nested] 136+ messages in thread
* Re: [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps
2025-03-14 20:18 ` [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
@ 2025-03-18 1:16 ` Jeff King
2025-03-18 23:11 ` Taylor Blau
2025-03-18 2:42 ` Elijah Newren
1 sibling, 1 reply; 136+ messages in thread
From: Jeff King @ 2025-03-18 1:16 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 04:18:20PM -0400, Taylor Blau wrote:
> +In the incremental MIDX design, we extend this definition to include
> +objects from multiple layers of the MIDX chain. The pseudo-pack order
> +for incremental MIDXs is determined by concatenating the pseudo-pack
> +ordering for each layer of the MIDX chain in order. Formally two objects
> +`o1` and `o2` are compared as follows:
> +
> +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
> + `o1` is considered less than `o2`.
> +
> +2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
> + MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
> + preferred and the other is not, then the preferred one sorts first. If
> + there is a base layer (i.e. the MIDX layer is not the first layer in
> + the chain), then if `pack(o1)` appears earlier in that MIDX layer's
> + pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
> + appears earlier, than the opposite is true.
> +
> +3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
> + same MIDX layer. Sort `o1` and `o2` by their offset within their
> + containing packfile.
OK, I think this ordering makes sense. I had to read this description
over several times to make sure I wasn't missing something. The earlier
part that says "it's just concatenating the pack order of the layers" is
a much more intuitive way of looking at it (modulo that you might need
to remove duplicates found in earlier layers).
But I think an even more basic way of thinking about it is that it's the
same as the pseudo-pack order you would get if you had a single midx of
all of the packs in all of the layers (in their layer order). We already
have to deal with (and have documented) duplicates in that case.
Not really suggesting any wording change here, just making sure I
grokked it all.
> +Note that the preferred pack is a property of the MIDX chain, not the
> +individual layers themselves. Fundamentally we could introduce a
> +per-layer preferred pack, but this is less relevant now that we can
> +perform multi-pack reuse across the set of packs in a MIDX.
Calling this out explicitly is good, since it's an obvious question
for somebody to have.
> +=== Reachability bitmaps and incremental MIDXs
> +
> +Each layer of an incremental MIDX chain may have its objects (and the
> +objects from any previous layer in the same MIDX chain) represented in
> +its own `*.bitmap` file.
> +
> +The structure of a `*.bitmap` file belonging to an incremental MIDX
> +chain is identical to that of a non-incremental MIDX bitmap, or a
> +classic single-pack bitmap. Since objects are added to the end of the
> +incremental MIDX's pseudo-pack order (see: above), it is possible to
> +extend a bitmap when appending to the end of a MIDX chain.
> +
> +(Note: it is possible likewise to compress a contiguous sequence of MIDX
> +incremental layers, and their `*.bitmap`(s) into a single layer and
> +`*.bitmap`, but this is not yet implemented.)
> +
> +The object positions used are global within the pseudo-pack order, so
> +subsequent layers will have, for example, `m->num_objects_in_base`
> +number of `0` bits in each of their four type bitmaps. This follows from
> +the fact that we only write type bitmap entries for objects present in
> +the layer immediately corresponding to the bitmap).
OK, so each layer's bitmap does depend on the layers above/before it.
That obviously needs to happen because each incremental midx is not
likely to be a complete reachability set anyway.
But I also wondered what would happen with a situation like this:
A -- B
\
-- C
stored like this:
base midx:
- pack 1:
- object A
- object B, which can reach A
incremental midx:
- pack 2:
- object A
- object C, which can reach A
That is, two objects B and C both depend on A, which is duplicated in
two midx layers. Even if the incremental midx is complete in the sense
that C only depends on A, its bitmap cannot just be "11". Because the
bit position for object A in the incremental midx does not exist in the
pseudo-pack order at all! It must refer to the copy of "A" in the base
midx, so it's correct bitmap is "101" (A and C, but not B).
Again, just talking through it here.
> +Note also that only the bitmap pertaining to the most recent layer in an
> +incremental MIDX chain is used to store reachability information about
> +the interesting and uninteresting objects in a reachability query.
> +Earlier bitmap layers are only used to look up commit and pseudo-merge
> +bitmaps from that layer, as well as the type-level bitmaps for objects
> +in that layer.
I'm not quite sure what this means, but I guess you're saying that
internally as we produce a bitmap, we'll always use the complete bitmap
over all of the layers?
> +To simplify the implementation, type-level bitmaps are iterated
> +simultaneously, and their results are OR'd together to avoid recursively
> +calling internal bitmap functions.
OK, I guess we'll see what this means in the patches. ;)
The general rules for the data structure make sense to me, though.
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2025-03-14 20:18 ` [PATCH v4 02/13] pack-revindex: prepare for " Taylor Blau
@ 2025-03-18 1:27 ` Jeff King
2025-03-19 0:02 ` Taylor Blau
2025-03-18 2:43 ` Elijah Newren
1 sibling, 1 reply; 136+ messages in thread
From: Jeff King @ 2025-03-18 1:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 04:18:24PM -0400, Taylor Blau wrote:
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 6406953d32..c26d85b5db 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
> return read_bitmap(index->map, index->map_size, &index->map_pos);
> }
>
> +static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
> +{
> + if (index->midx) {
> + struct multi_pack_index *m = index->midx;
> + return m->num_objects + m->num_objects_in_base;
> + }
> + return index->pack->num_objects;
> +}
I understand why we need to account for the objects in the base to
offset our total size.
Similar to Patrick's comments on v3, I wondered about why we couldn't
just modify bitmap_num_objects() here, and why some callers would be
left with the other.
I guess sometimes we still need to consider a single layer. We can't
quite just access m->num_objects there, because we still need the midx
vs pack abstraction layer. I just thought there'd be more discussion
here, but it looks the same as v3.
I wonder if it is worth renaming bitmap_num_objects() to indicate that
it is a single layer (and make sure other callers are examined). I
dunno.
I also suspect from previous forays into bitmap indexing that it will be
easy to mix up positions in various units (local to the layer vs in the
global pseudo-pack ordering, for example). In theory we could use types
to help us with this, but they're kind of weak in C (unless we wrap all
of the ints in structs). Maybe not worth it.
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
@ 2025-03-18 1:38 ` Jeff King
2025-03-19 0:13 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2025-03-18 1:38 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 04:18:31PM -0400, Taylor Blau wrote:
> The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
> EWAH-compressed bitmap corresponding to some given commit object.
>
> Teach this function about incremental MIDX bitmaps by teaching it to
> recur on earlier bitmap layers when it fails to find a given commit in
> the current layer.
>
> The changes to do so are as follows:
>
> - Avoid initializing hash_pos at its declaration, since
> bitmap_for_commit() is now a recursive function and may receive a
> NULL bitmap_index pointer as its first argument.
>
> - In cases where we would previously return NULL (to indicate that a
> lookup failed and the given bitmap_index does not contain an entry
> corresponding to the given commit), recursively call the function on
> the previous bitmap layer.
This makes sense, though it does make me wonder if we could/should store
a (midx/pack,pos) pair. I.e., a master hash table stored once for the
whole midx stack. And then you wouldn't need to recurse; it would just
be a single lookup.
Or would that work badly with the lazy nature? You'd need to load all of
the layers to fill it (rather than doing each incrementally). OTOH, if
you ask for the bitmap for commit X you're eventually going to have to
figure out what's in all of the layers as soon as you have a miss and
have to check them all. And I think the lookup table extension is what's
supposed to make that cheap-ish.
I dunno. It has been a long time since I dug into this (and the whole
khash-of-commits thing is just a hack around the lack of the lookup
table in the first place, I think?).
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 08/13] pack-bitmap.c: compute disk-usage with incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
@ 2025-03-18 1:41 ` Jeff King
2025-03-19 0:30 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2025-03-18 1:41 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 04:18:44PM -0400, Taylor Blau wrote:
> In a similar fashion as previous commits, use nth_midxed_pack() instead
> of accessing the MIDX's ->packs array directly to support incremental
> MIDXs.
Probably not worth it to change it in an actual patch, but is it worth
renaming midx->packs to something else to make sure we catch all of the
spots that need to be considered? Or maybe you already did that, which
is how you found all of these. :)
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 10/13] ewah: implement `struct ewah_or_iterator`
2025-03-14 20:18 ` [PATCH v4 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
@ 2025-03-18 1:44 ` Jeff King
2025-03-19 0:33 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2025-03-18 1:44 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 04:18:50PM -0400, Taylor Blau wrote:
> +void ewah_or_iterator_free(struct ewah_or_iterator *it)
> +{
> + free(it->its);
> +}
Hmm, I thought this was going to be come "_release()" based on the last
round?
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2025-03-14 20:18 ` [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
@ 2025-03-18 2:01 ` Jeff King
2025-03-19 0:38 ` Taylor Blau
2025-03-18 6:43 ` Elijah Newren
1 sibling, 1 reply; 136+ messages in thread
From: Jeff King @ 2025-03-18 2:01 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 04:18:53PM -0400, Taylor Blau wrote:
> @@ -81,6 +81,24 @@ struct bitmap_index {
> struct ewah_bitmap *blobs;
> struct ewah_bitmap *tags;
>
> + /*
> + * Type index arrays when this bitmap is associated with an
> + * incremental multi-pack index chain.
> + *
> + * If n is the number of unique layers in the MIDX chain, then
> + * commits_all[n-1] is this structs 'commits' field,
> + * commits_all[n-2] is the commits field of this bitmap's
> + * 'base', and so on.
> + *
> + * When either associated either with a non-incremental MIDX, or
> + * a single packfile, these arrays each contain a single
> + * element.
> + */
> + struct ewah_bitmap **commits_all;
> + struct ewah_bitmap **trees_all;
> + struct ewah_bitmap **blobs_all;
> + struct ewah_bitmap **tags_all;
OK, so these are valid only for the top-level of the chain? I guess
there would not be much point in having the lower levels know about
their incremental versions.
> -static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
> +static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
> +{
> + struct bitmap_index *curr = bitmap_git;
> + size_t i = bitmap_git->base_nr;
> +
> + ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr + 1);
> + ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr + 1);
> + ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr + 1);
> + ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr + 1);
> +
> + while (curr) {
> + bitmap_git->commits_all[i] = curr->commits;
> + bitmap_git->trees_all[i] = curr->trees;
> + bitmap_git->blobs_all[i] = curr->blobs;
> + bitmap_git->tags_all[i] = curr->tags;
> +
> + curr = curr->base;
> + if (curr && !i)
> + BUG("unexpected number of bitmap layers, expected %"PRIu32,
> + bitmap_git->base_nr + 1);
> + i -= 1;
> + }
> +}
It looks like we always allocate these. For the non-incremental case, I
think you could just do:
bitmap_git->commits_all = &bitmap_git->commits;
and so forth. But I doubt that micro-optimization really matters, and it
introduces complications when you have to decide whether to free them or
not.
(And if you really cared about micro-optimizing, probably trying to
prevent the extra pointer-chase in the first place would be a more
productive path).
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
2025-03-14 20:18 ` [PATCH v4 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
@ 2025-03-18 2:05 ` Jeff King
2025-03-19 23:02 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2025-03-18 2:05 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 04:18:56PM -0400, Taylor Blau wrote:
> -static void init_type_iterator(struct ewah_iterator *it,
> +static void init_type_iterator(struct ewah_or_iterator *it,
> struct bitmap_index *bitmap_git,
> enum object_type type)
> {
> switch (type) {
> case OBJ_COMMIT:
> - ewah_iterator_init(it, bitmap_git->commits);
> + ewah_or_iterator_init(it, bitmap_git->commits_all,
> + bitmap_git->base_nr + 1);
This certainly makes sense. It looks like we now use the or_iterator
unconditionally, even for non-layered queries. It's probably a little
slower in practice, just because it's an extra layer of indirection. But
I don't know if trying to micro-optimize here is worth it. In general
I'd say no, but sometimes there are surprising tight loops with bitmaps.
I dunno. I guess it would be easy enough to do a simple before/after
benchmark on a single packfile with this series. I wouldn't expect it to
find anything, but might not hurt to double check.
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps
2025-03-14 20:19 ` [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
@ 2025-03-18 2:16 ` Jeff King
2025-03-20 0:14 ` Taylor Blau
2025-03-18 17:13 ` Elijah Newren
1 sibling, 1 reply; 136+ messages in thread
From: Jeff King @ 2025-03-18 2:16 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 04:19:00PM -0400, Taylor Blau wrote:
> +write_midx_layer () {
> + n=1
> + if test -f $midx_chain
> + then
> + n="$(($(wc -l <$midx_chain) + 1))"
> + fi
> +
> + for i in 1 2
> + do
> + test_commit $n.$i &&
> + git repack -d || return 1
> + done &&
> + git multi-pack-index write --bitmap --incremental
> +}
> +
> +test_expect_success 'write initial MIDX layer' '
> + git repack -ad &&
> + write_midx_layer
> +'
> +
> +test_expect_success 'read bitmap from first MIDX layer' '
> + git rev-list --test-bitmap 1.2
> +'
> +
> +test_expect_success 'write another MIDX layer' '
> + write_midx_layer
> +'
> +
> +test_expect_success 'midx verify with multiple layers' '
> + git multi-pack-index verify
> +'
Perhaps a silly suggestion, but do you want to confirm in one of these
tests that there are in fact multiple layers of bitmaps? (I expect it to
be true, but just trying to cover all bases in the test).
I guess that happens somewhat here:
> +test_expect_success 'relink existing MIDX layer' '
> + rm -fr "$midxdir" &&
> +
> + GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap &&
> +
> + midx_hash="$(test-tool read-midx --checksum $objdir)" &&
> +
> + test_path_is_file "$packdir/multi-pack-index" &&
> + test_path_is_file "$packdir/multi-pack-index-$midx_hash.bitmap" &&
> + test_path_is_file "$packdir/multi-pack-index-$midx_hash.rev" &&
> +
> + test_commit another &&
> + git repack -d &&
> + git multi-pack-index write --bitmap --incremental &&
> +
> + test_path_is_missing "$packdir/multi-pack-index" &&
> + test_path_is_missing "$packdir/multi-pack-index-$midx_hash.bitmap" &&
> + test_path_is_missing "$packdir/multi-pack-index-$midx_hash.rev" &&
> +
> + test_path_is_file "$midxdir/multi-pack-index-$midx_hash.midx" &&
> + test_path_is_file "$midxdir/multi-pack-index-$midx_hash.bitmap" &&
> + test_path_is_file "$midxdir/multi-pack-index-$midx_hash.rev" &&
> + test_line_count = 2 "$midx_chain"
where we check that we switched to $midxdir.
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 00/13] midx: incremental multi-pack indexes, part two
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
` (12 preceding siblings ...)
2025-03-14 20:19 ` [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
@ 2025-03-18 2:21 ` Jeff King
2025-03-20 0:18 ` Taylor Blau
13 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2025-03-18 2:21 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 04:18:12PM -0400, Taylor Blau wrote:
> This is a new round of my series to implement bitmap support for
> incremental multi-pack indexes (MIDXs). It has been rebased on current
> 'master', which is 683c54c999 (Git 2.49, 2025-03-14) at the time of
> writing.
I read over this and didn't find anything objectionable (I left a few
comments here and there). I think I've said this before with big
bitmap/midx series: the biggest issue is that it's hard to know what you
might have missed. Especially in terms of corner cases. So it all looks
reasonable to me (including the overall design), but ultimately I think
it's more fruitful to put it through the paces on real-looking data than
it is to try to go over every inch of the midx code with a fine-tooth
comb. And I'd guess the eventual fate here is for this code to get
exercise on GitHub, which would help with that shaking out.
So mainly I tried to look for things that might hurt the non-incremental
cases, and didn't see anything (modulo one or two questions about
micro-optimizations, though I expect the answer there is "nothing big
enough to measure"). So if this can progress towards the "shaking out"
phase, and has the potential to hurt only people who turn on the new
feature, that seems like a good path to me.
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps
2025-03-14 20:18 ` [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
2025-03-18 1:16 ` Jeff King
@ 2025-03-18 2:42 ` Elijah Newren
2025-03-18 23:19 ` Taylor Blau
1 sibling, 1 reply; 136+ messages in thread
From: Elijah Newren @ 2025-03-18 2:42 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Prepare to implement support for reachability bitmaps for the new
> incremental multi-pack index (MIDX) feature over the following commits.
>
> This commit begins by first describing the relevant format and usage
> details for incremental MIDX bitmaps.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> Documentation/technical/multi-pack-index.adoc | 71 +++++++++++++++++++
> 1 file changed, 71 insertions(+)
>
> diff --git a/Documentation/technical/multi-pack-index.adoc b/Documentation/technical/multi-pack-index.adoc
> index cc063b30be..ab98ecfeb9 100644
> --- a/Documentation/technical/multi-pack-index.adoc
> +++ b/Documentation/technical/multi-pack-index.adoc
> @@ -164,6 +164,77 @@ objects_nr($H2) + objects_nr($H1) + i
> (in the C implementation, this is often computed as `i +
> m->num_objects_in_base`).
>
> +=== Pseudo-pack order for incremental MIDXs
> +
> +The original implementation of multi-pack reachability bitmaps defined
> +the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
> +titled "multi-pack-index reverse indexes") roughly as follows:
> +
> +____
> +In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
> +objects in packs stored by the MIDX, laid out in pack order, and the
> +packs arranged in MIDX order (with the preferred pack coming first).
> +____
> +
> +In the incremental MIDX design, we extend this definition to include
> +objects from multiple layers of the MIDX chain. The pseudo-pack order
> +for incremental MIDXs is determined by concatenating the pseudo-pack
> +ordering for each layer of the MIDX chain in order. Formally two objects
> +`o1` and `o2` are compared as follows:
> +
> +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
> + `o1` is considered less than `o2`.
For sorting order, 'less than' doesn't tell us if you are sorting
smallest to greatest or greatest to smallest. Maybe "less than (so
its order is earlier than) `o2'" ?
> +
> +2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
> + MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
> + preferred and the other is not, then the preferred one sorts first. If
> + there is a base layer (i.e. the MIDX layer is not the first layer in
> + the chain), then if `pack(o1)` appears earlier in that MIDX layer's
> + pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
s/than/then/
> + appears earlier, than the opposite is true.
s/than/then/
> +
> +3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
> + same MIDX layer. Sort `o1` and `o2` by their offset within their
> + containing packfile.
> +
> +Note that the preferred pack is a property of the MIDX chain, not the
> +individual layers themselves. Fundamentally we could introduce a
> +per-layer preferred pack, but this is less relevant now that we can
> +perform multi-pack reuse across the set of packs in a MIDX.
> +
> +=== Reachability bitmaps and incremental MIDXs
> +
> +Each layer of an incremental MIDX chain may have its objects (and the
> +objects from any previous layer in the same MIDX chain) represented in
> +its own `*.bitmap` file.
> +
> +The structure of a `*.bitmap` file belonging to an incremental MIDX
> +chain is identical to that of a non-incremental MIDX bitmap, or a
> +classic single-pack bitmap. Since objects are added to the end of the
> +incremental MIDX's pseudo-pack order (see: above), it is possible to
drop the colon?
> +extend a bitmap when appending to the end of a MIDX chain.
> +
> +(Note: it is possible likewise to compress a contiguous sequence of MIDX
> +incremental layers, and their `*.bitmap`(s) into a single layer and
> +`*.bitmap`, but this is not yet implemented.)
"`*.bitmap`(s)" feels slightly awkward and only saves 2 characters.
Maybe just "`*.bitmap` files"?
> +
> +The object positions used are global within the pseudo-pack order, so
> +subsequent layers will have, for example, `m->num_objects_in_base`
> +number of `0` bits in each of their four type bitmaps. This follows from
> +the fact that we only write type bitmap entries for objects present in
> +the layer immediately corresponding to the bitmap).
> +
> +Note also that only the bitmap pertaining to the most recent layer in an
> +incremental MIDX chain is used to store reachability information about
> +the interesting and uninteresting objects in a reachability query.
> +Earlier bitmap layers are only used to look up commit and pseudo-merge
> +bitmaps from that layer, as well as the type-level bitmaps for objects
> +in that layer.
> +
> +To simplify the implementation, type-level bitmaps are iterated
> +simultaneously, and their results are OR'd together to avoid recursively
> +calling internal bitmap functions.
> +
> Future Work
> -----------
Should the patch also remove the first item from Future Work, since
this series is implementing it?
> --
> 2.49.0.13.gd0d564685b
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2025-03-14 20:18 ` [PATCH v4 02/13] pack-revindex: prepare for " Taylor Blau
2025-03-18 1:27 ` Jeff King
@ 2025-03-18 2:43 ` Elijah Newren
2025-03-19 0:03 ` Taylor Blau
1 sibling, 1 reply; 136+ messages in thread
From: Elijah Newren @ 2025-03-18 2:43 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Prepare the reverse index machinery to handle object lookups in an
> incremental MIDX bitmap. These changes are broken out across a few
> functions:
>
> - load_midx_revindex() learns to use the appropriate MIDX filename
> depending on whether the given 'struct multi_pack_index *' is
> incremental or not.
>
> - pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
> object position in the MIDX pseudo-pack order, and finds the
> earliest containing MIDX (similar to midx.c::midx_for_object().
s/finds/find/ ?
>
> - midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
> number of objects in the base (since 'vb - midx->revindx_data' is
> relative to the containing MIDX, and pack_pos_to_midx() expects a
> global position).
>
> Likewise, this function adjusts its output by adding
> m->num_objects_in_base to return a global position out through the
> `*pos` pointer.
>
> Together, these changes are sufficient to use the multi-pack index's
> reverse index format for incremental multi-pack reachability bitmaps.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> pack-bitmap.c | 40 ++++++++++++++++++++++++++++------------
> pack-revindex.c | 34 +++++++++++++++++++++++++---------
> 2 files changed, 53 insertions(+), 21 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 6406953d32..c26d85b5db 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
> return read_bitmap(index->map, index->map_size, &index->map_pos);
> }
>
> +static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
> +{
> + if (index->midx) {
> + struct multi_pack_index *m = index->midx;
> + return m->num_objects + m->num_objects_in_base;
> + }
> + return index->pack->num_objects;
> +}
> +
> static uint32_t bitmap_num_objects(struct bitmap_index *index)
> {
> if (index->midx)
> @@ -924,7 +933,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
>
> if (pos < kh_end(positions)) {
> int bitmap_pos = kh_value(positions, pos);
> - return bitmap_pos + bitmap_num_objects(bitmap_git);
> + return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
> }
>
> return -1;
> @@ -992,7 +1001,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
> bitmap_pos = kh_value(eindex->positions, hash_pos);
> }
>
> - return bitmap_pos + bitmap_num_objects(bitmap_git);
> + return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
> }
>
> struct bitmap_show_data {
> @@ -1342,11 +1351,17 @@ struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_g
> if (pos < 0 || pos >= bitmap_num_objects(bitmap_git))
> goto done;
>
> + /*
> + * Use bitmap-relative positions instead of offsetting
> + * by bitmap_git->num_objects_in_base because we use
> + * this to find a match in pseudo_merge_for_parents(),
> + * and pseudo-merge groups cannot span multiple bitmap
> + * layers.
> + */
> bitmap_set(parents, pos);
> }
>
> - match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges,
> - parents);
> + match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges, parents);
>
> done:
> bitmap_free(parents);
> @@ -1500,7 +1515,8 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
> for (i = 0; i < eindex->count; ++i) {
> struct object *obj;
>
> - if (!bitmap_get(objects, st_add(bitmap_num_objects(bitmap_git), i)))
> + if (!bitmap_get(objects,
> + st_add(bitmap_non_extended_bits(bitmap_git), i)))
> continue;
>
> obj = eindex->objects[i];
> @@ -1679,7 +1695,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
> * them individually.
> */
> for (i = 0; i < eindex->count; i++) {
> - size_t pos = st_add(i, bitmap_num_objects(bitmap_git));
> + size_t pos = st_add(i, bitmap_non_extended_bits(bitmap_git));
> if (eindex->objects[i]->type == type &&
> bitmap_get(to_filter, pos) &&
> !bitmap_get(tips, pos))
> @@ -1705,7 +1721,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
>
> oi.sizep = &size;
>
> - if (pos < bitmap_num_objects(bitmap_git)) {
> + if (pos < bitmap_non_extended_bits(bitmap_git)) {
> struct packed_git *pack;
> off_t ofs;
>
> @@ -1729,7 +1745,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
> }
> } else {
> struct eindex *eindex = &bitmap_git->ext_index;
> - struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
> + struct object *obj = eindex->objects[pos - bitmap_non_extended_bits(bitmap_git)];
> if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
> &oi, 0) < 0)
> die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
> @@ -1882,7 +1898,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git,
> uint32_t objects_nr;
> size_t i, pos;
>
> - objects_nr = bitmap_num_objects(bitmap_git);
> + objects_nr = bitmap_non_extended_bits(bitmap_git);
> pos = objects_nr / BITS_IN_EWORD;
>
> if (pos > result->word_alloc)
> @@ -2419,7 +2435,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
> for (i = 0; i < eindex->count; ++i) {
> if (eindex->objects[i]->type == type &&
> bitmap_get(objects,
> - st_add(bitmap_num_objects(bitmap_git), i)))
> + st_add(bitmap_non_extended_bits(bitmap_git), i)))
> count++;
> }
>
> @@ -2820,7 +2836,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
> BUG("rebuild_existing_bitmaps: missing required rev-cache "
> "extension");
>
> - num_objects = bitmap_num_objects(bitmap_git);
> + num_objects = bitmap_non_extended_bits(bitmap_git);
> CALLOC_ARRAY(reposition, num_objects);
>
> for (i = 0; i < num_objects; ++i) {
> @@ -2963,7 +2979,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
> struct object *obj = eindex->objects[i];
>
> if (!bitmap_get(result,
> - st_add(bitmap_num_objects(bitmap_git), i)))
> + st_add(bitmap_non_extended_bits(bitmap_git), i)))
> continue;
>
> if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
> diff --git a/pack-revindex.c b/pack-revindex.c
> index d3832478d9..d3faab6a37 100644
> --- a/pack-revindex.c
> +++ b/pack-revindex.c
> @@ -383,8 +383,14 @@ int load_midx_revindex(struct multi_pack_index *m)
> trace2_data_string("load_midx_revindex", the_repository,
> "source", "rev");
>
> - get_midx_filename_ext(m->repo->hash_algo, &revindex_name, m->object_dir,
> - get_midx_checksum(m), MIDX_EXT_REV);
> + if (m->has_chain)
> + get_split_midx_filename_ext(m->repo->hash_algo, &revindex_name,
> + m->object_dir, get_midx_checksum(m),
> + MIDX_EXT_REV);
> + else
> + get_midx_filename_ext(m->repo->hash_algo, &revindex_name,
> + m->object_dir, get_midx_checksum(m),
> + MIDX_EXT_REV);
>
> ret = load_revindex_from_disk(revindex_name.buf,
> m->num_objects,
> @@ -471,11 +477,15 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
>
> uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
> {
> + while (m && pos < m->num_objects_in_base)
> + m = m->base_midx;
> + if (!m)
> + BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
> if (!m->revindex_data)
> BUG("pack_pos_to_midx: reverse index not yet loaded");
> - if (m->num_objects <= pos)
> + if (m->num_objects + m->num_objects_in_base <= pos)
> BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
> - return get_be32(m->revindex_data + pos);
> + return get_be32(m->revindex_data + pos - m->num_objects_in_base);
> }
>
> struct midx_pack_key {
> @@ -491,7 +501,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
> const struct midx_pack_key *key = va;
> struct multi_pack_index *midx = key->midx;
>
> - uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
> + size_t pos = (uint32_t *)vb - (const uint32_t *)midx->revindex_data;
> + uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
> uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
> off_t versus_offset;
>
> @@ -529,9 +540,9 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
> {
> uint32_t *found;
>
> - if (key->pack >= m->num_packs)
> + if (key->pack >= m->num_packs + m->num_packs_in_base)
> BUG("MIDX pack lookup out of bounds (%"PRIu32" >= %"PRIu32")",
> - key->pack, m->num_packs);
> + key->pack, m->num_packs + m->num_packs_in_base);
> /*
> * The preferred pack sorts first, so determine its identifier by
> * looking at the first object in pseudo-pack order.
> @@ -551,7 +562,8 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
> if (!found)
> return -1;
>
> - *pos = found - m->revindex_data;
> + *pos = (found - m->revindex_data) + m->num_objects_in_base;
> +
> return 0;
> }
>
> @@ -559,9 +571,13 @@ int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
> {
> struct midx_pack_key key;
>
> + while (m && at < m->num_objects_in_base)
> + m = m->base_midx;
> + if (!m)
> + BUG("NULL multi-pack-index for object position: %"PRIu32, at);
> if (!m->revindex_data)
> BUG("midx_to_pack_pos: reverse index not yet loaded");
> - if (m->num_objects <= at)
> + if (m->num_objects + m->num_objects_in_base <= at)
> BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
>
> key.pack = nth_midxed_pack_int_id(m, at);
> --
> 2.49.0.13.gd0d564685b
>
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 03/13] pack-bitmap.c: open and store incremental bitmap layers
2025-03-14 20:18 ` [PATCH v4 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
@ 2025-03-18 4:13 ` Elijah Newren
2025-03-19 0:08 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Elijah Newren @ 2025-03-18 4:13 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Prepare the pack-bitmap machinery to work with incremental MIDXs by
> adding a new "base" field to keep track of the bitmap index associated
> with the previous MIDX layer.
>
> The changes in this commit are mostly boilerplate to open the correct
> bitmap(s), add them to the chain bitmap layers along the "base" pointer,
s/chain/chain of/ ?
> ensures that the correct packs and their reverse indexes are loaded
s/ensures/ensure/ ?
> across MIDX layers, etc.
>
> While we're at it, keep track of a base_nr field to indicate how many
> bitmap layers (including the current bitmap) exist. This will be used in
> a future commit to allocate an array of 'struct ewah_bitmap' pointers to
> collect all of the respective type bitmaps among all layers to
> initialize a multi-EWAH iterator.
>
> Subsequent commits will teach the functions within the pack-bitmap
> machinery how to interact with these new fields.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> pack-bitmap.c | 62 +++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 48 insertions(+), 14 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index c26d85b5db..72fb11d014 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -54,6 +54,16 @@ struct bitmap_index {
> struct packed_git *pack;
> struct multi_pack_index *midx;
>
> + /*
> + * If using a multi-pack index chain, 'base' points to the
> + * bitmap index corresponding to this bitmap's midx->base_midx.
> + *
> + * base_nr indicates how many layers precede this one, and is
> + * zero when base is NULL.
> + */
> + struct bitmap_index *base;
> + uint32_t base_nr;
> +
> /* mmapped buffer of the whole bitmap index */
> unsigned char *map;
> size_t map_size; /* size of the mmaped buffer */
> @@ -386,8 +396,15 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
> char *midx_bitmap_filename(struct multi_pack_index *midx)
> {
> struct strbuf buf = STRBUF_INIT;
> - get_midx_filename_ext(midx->repo->hash_algo, &buf, midx->object_dir,
> - get_midx_checksum(midx), MIDX_EXT_BITMAP);
> + if (midx->has_chain)
> + get_split_midx_filename_ext(midx->repo->hash_algo, &buf,
> + midx->object_dir,
> + get_midx_checksum(midx),
> + MIDX_EXT_BITMAP);
> + else
> + get_midx_filename_ext(midx->repo->hash_algo, &buf,
> + midx->object_dir, get_midx_checksum(midx),
> + MIDX_EXT_BITMAP);
>
> return strbuf_detach(&buf, NULL);
> }
> @@ -454,16 +471,21 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
> goto cleanup;
> }
>
> - for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> - if (prepare_midx_pack(bitmap_repo(bitmap_git),
> - bitmap_git->midx,
> - i)) {
> + for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
> + if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
> warning(_("could not open pack %s"),
> bitmap_git->midx->pack_names[i]);
> goto cleanup;
> }
> }
>
> + if (midx->base_midx) {
> + bitmap_git->base = prepare_midx_bitmap_git(midx->base_midx);
> + bitmap_git->base_nr = bitmap_git->base->base_nr + 1;
> + } else {
> + bitmap_git->base_nr = 0;
> + }
> +
> return 0;
>
> cleanup:
> @@ -515,6 +537,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
> bitmap_git->map_size = xsize_t(st.st_size);
> bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
> bitmap_git->map_pos = 0;
> + bitmap_git->base_nr = 0;
> close(fd);
>
> if (load_bitmap_header(bitmap_git) < 0) {
> @@ -534,8 +557,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
> static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git)
> {
> if (bitmap_is_midx(bitmap_git)) {
> - uint32_t i;
> - int ret;
> + struct multi_pack_index *m;
>
> /*
> * The multi-pack-index's .rev file is already loaded via
> @@ -544,10 +566,15 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
> * But we still need to open the individual pack .rev files,
> * since we will need to make use of them in pack-objects.
> */
> - for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> - ret = load_pack_revindex(r, bitmap_git->midx->packs[i]);
> - if (ret)
> - return ret;
> + for (m = bitmap_git->midx; m; m = m->base_midx) {
> + uint32_t i;
> + int ret;
> +
> + for (i = 0; i < m->num_packs; i++) {
> + ret = load_pack_revindex(r, m->packs[i]);
> + if (ret)
> + return ret;
> + }
> }
> return 0;
> }
> @@ -573,6 +600,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
> if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
> goto failed;
>
> + if (bitmap_git->base) {
> + if (!bitmap_is_midx(bitmap_git))
> + BUG("non-MIDX bitmap has non-NULL base bitmap index");
> + if (load_bitmap(r, bitmap_git->base) < 0)
> + goto failed;
> + }
> +
> return 0;
>
> failed:
> @@ -657,10 +691,9 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
>
> struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
> {
> - struct repository *r = midx->repo;
> struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
>
> - if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
> + if (!open_midx_bitmap_1(bitmap_git, midx))
> return bitmap_git;
>
> free_bitmap_index(bitmap_git);
> @@ -2899,6 +2932,7 @@ void free_bitmap_index(struct bitmap_index *b)
> close_midx_revindex(b->midx);
> }
> free_pseudo_merge_map(&b->pseudo_merges);
> + free_bitmap_index(b->base);
> free(b);
> }
>
> --
> 2.49.0.13.gd0d564685b
>
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 06/13] pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
@ 2025-03-18 4:13 ` Elijah Newren
2025-03-19 0:17 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Elijah Newren @ 2025-03-18 4:13 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> In a similar fashion as previous commits in the first phase of
> incremental MIDXs, enumerate not just the packs in the current
> incremental MIDX layer, but previous ones as well.
>
> Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
> single pack from a MIDX, use the oldest layer's preferred pack as it is
> likely to contain the most amount of reusable sections.
"most amount" => "largest number" or "largest size" ?
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> pack-bitmap.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 1b4fec0033..7a41535425 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -2333,7 +2333,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
> multi_pack_reuse = 0;
>
> if (multi_pack_reuse) {
> - for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> + struct multi_pack_index *m = bitmap_git->midx;
> + for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
> struct bitmapped_pack pack;
> if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
> warning(_("unable to load pack: '%s', disabling pack-reuse"),
> @@ -2359,14 +2360,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
> uint32_t pack_int_id;
>
> if (bitmap_is_midx(bitmap_git)) {
> + struct multi_pack_index *m = bitmap_git->midx;
> uint32_t preferred_pack_pos;
>
> - if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
> + while (m->base_midx)
> + m = m->base_midx;
> +
> + if (midx_preferred_pack(m, &preferred_pack_pos) < 0) {
> warning(_("unable to compute preferred pack, disabling pack-reuse"));
> return;
> }
>
> - pack = bitmap_git->midx->packs[preferred_pack_pos];
> + pack = nth_midxed_pack(m, preferred_pack_pos);
> pack_int_id = preferred_pack_pos;
> } else {
> pack = bitmap_git->pack;
> --
> 2.49.0.13.gd0d564685b
>
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2025-03-14 20:18 ` [PATCH v4 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
@ 2025-03-18 5:31 ` Elijah Newren
2025-03-19 0:30 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Elijah Newren @ 2025-03-18 5:31 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Implement support for the special `--test-bitmap` mode of `git rev-list`
> when using incremental MIDXs.
>
> The bitmap_test_data structure is extended to contain a "base" pointer
> that mirrors the structure of the bitmap chain that it is being used to
> test.
>
> When we find a commit to test, we first chase down the ->base pointer to
> find the appropriate bitmap_test_data for the bitmap layer that the
> given commit is contained within, and then perform the test on that
> bitmap.
>
> In order to implement this, light modifications are made to
> bitmap_for_commit() to reimplement it in terms of a new function,
> find_bitmap_for_commit(), which fills out a pointer which indicates the
> bitmap layer which contains the given commit.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> pack-bitmap.c | 107 ++++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 86 insertions(+), 21 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 7a41535425..bb09ce3cf5 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
[...]
> +
> +static void bitmap_test_data_prepare(struct bitmap_test_data *tdata,
> + struct bitmap_index *bitmap_git)
> +{
> + memset(tdata, 0, sizeof(struct bitmap_test_data));
So, the first thing this function does is 0 out tdata.
> +
> + tdata->bitmap_git = bitmap_git;
> + tdata->base = bitmap_new();
> + tdata->commits = ewah_to_bitmap(bitmap_git->commits);
> + tdata->trees = ewah_to_bitmap(bitmap_git->trees);
> + tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
> + tdata->tags = ewah_to_bitmap(bitmap_git->tags);
> +
> + if (bitmap_git->base) {
> + CALLOC_ARRAY(tdata->base_tdata, 1);
We use CALLOC to both allocate the array and set it all to 0...
> + bitmap_test_data_prepare(tdata->base_tdata, bitmap_git->base);
and then call bitmap_test_data_prepare() which will re-zero it all out.
Should we either ditch the zeroing at the beginning of the function,
or use xmalloc instead of CALLOC_ARRAY, to avoid duplicate zeroing?
> + }
> +}
Didn't spot anything else.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2025-03-14 20:18 ` [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
2025-03-18 2:01 ` Jeff King
@ 2025-03-18 6:43 ` Elijah Newren
2025-03-19 0:39 ` Taylor Blau
1 sibling, 1 reply; 136+ messages in thread
From: Elijah Newren @ 2025-03-18 6:43 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Prepare for reading the type-level bitmaps from previous bitmap layers
> by maintaining an array for each type, where each element in that type's
> array corresponds to one layer's bitmap for that type.
>
> These fields will be used in a later commit to instantiate the 'struct
> ewah_or_iterator' for each type.
>
All I spotted was some possible wording fixups...
> + *
> + * When either associated either with a non-incremental MIDX, or
> + * a single packfile, these arrays each contain a single
> + * element.
> + */
Drop the first "either", and the first comma?
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps
2025-03-14 20:19 ` [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
2025-03-18 2:16 ` Jeff King
@ 2025-03-18 17:13 ` Elijah Newren
2025-03-20 0:16 ` Taylor Blau
1 sibling, 1 reply; 136+ messages in thread
From: Elijah Newren @ 2025-03-18 17:13 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Fri, Mar 14, 2025 at 1:19 PM Taylor Blau <me@ttaylorr.com> wrote:
[...]
> diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
> index e92341b8fa..056c410efb 100644
> --- a/ewah/ewah_bitmap.c
> +++ b/ewah/ewah_bitmap.c
> @@ -399,7 +399,7 @@ int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
> return ret;
> }
>
> -void ewah_or_iterator_free(struct ewah_or_iterator *it)
> +void ewah_or_iterator_release(struct ewah_or_iterator *it)
> {
> free(it->its);
> }
> diff --git a/ewah/ewok.h b/ewah/ewok.h
> index 4b70641045..c29d354236 100644
> --- a/ewah/ewok.h
> +++ b/ewah/ewok.h
> @@ -158,7 +158,7 @@ void ewah_or_iterator_init(struct ewah_or_iterator *it,
>
> int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
>
> -void ewah_or_iterator_free(struct ewah_or_iterator *it);
> +void ewah_or_iterator_release(struct ewah_or_iterator *it);
Was the rename from these last two hunks squashed into the wrong
patch? Since you're not changing its definition, I'm assuming the
updated name should have been applied to when it was introduced.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps
2025-03-18 1:16 ` Jeff King
@ 2025-03-18 23:11 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-18 23:11 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 09:16:18PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:18:20PM -0400, Taylor Blau wrote:
>
> > +In the incremental MIDX design, we extend this definition to include
> > +objects from multiple layers of the MIDX chain. The pseudo-pack order
> > +for incremental MIDXs is determined by concatenating the pseudo-pack
> > +ordering for each layer of the MIDX chain in order. Formally two objects
> > +`o1` and `o2` are compared as follows:
> > +
> > +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
> > + `o1` is considered less than `o2`.
> > +
> > +2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
> > + MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
> > + preferred and the other is not, then the preferred one sorts first. If
> > + there is a base layer (i.e. the MIDX layer is not the first layer in
> > + the chain), then if `pack(o1)` appears earlier in that MIDX layer's
> > + pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
> > + appears earlier, than the opposite is true.
> > +
> > +3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
> > + same MIDX layer. Sort `o1` and `o2` by their offset within their
> > + containing packfile.
>
> OK, I think this ordering makes sense. I had to read this description
> over several times to make sure I wasn't missing something. The earlier
> part that says "it's just concatenating the pack order of the layers" is
> a much more intuitive way of looking at it (modulo that you might need
> to remove duplicates found in earlier layers).
>
> But I think an even more basic way of thinking about it is that it's the
> same as the pseudo-pack order you would get if you had a single midx of
> all of the packs in all of the layers (in their layer order). We already
> have to deal with (and have documented) duplicates in that case.
>
> Not really suggesting any wording change here, just making sure I
> grokked it all.
Yeah, those are both excellent ways to think about it. I hadn't
considered the "the new ordering is the same as the pseudo-pack order
you'd get if you had a single MIDX of all the packs in layer order"
thing before, but it's quite intuitive.
As a side note, it's somewhat hilarious to me that we could really
write:
"The new ordering is the same as the pseudo-pack order you'd get if
you had a single MIDX of all the packs in layer order, which is the
same order you'd get if you had a single pack containing all of the
objects in MIDX order."
;-)
> > +Note that the preferred pack is a property of the MIDX chain, not the
> > +individual layers themselves. Fundamentally we could introduce a
> > +per-layer preferred pack, but this is less relevant now that we can
> > +perform multi-pack reuse across the set of packs in a MIDX.
>
> Calling this out explicitly is good, since it's an obvious question
> for somebody to have.
Thanks, I think this was an addition from Patrick's earlier review of
the series.
> OK, so each layer's bitmap does depend on the layers above/before it.
> That obviously needs to happen because each incremental midx is not
> likely to be a complete reachability set anyway.
>
> But I also wondered what would happen with a situation like this:
>
> A -- B
> \
> -- C
>
> stored like this:
>
> base midx:
> - pack 1:
> - object A
> - object B, which can reach A
> incremental midx:
> - pack 2:
> - object A
> - object C, which can reach A
>
> That is, two objects B and C both depend on A, which is duplicated in
> two midx layers. Even if the incremental midx is complete in the sense
> that C only depends on A, its bitmap cannot just be "11". Because the
> bit position for object A in the incremental midx does not exist in the
> pseudo-pack order at all! It must refer to the copy of "A" in the base
> midx, so it's correct bitmap is "101" (A and C, but not B).
>
Right. Since the base MIDX has objects A and B, B's bitmap here would be
"11". C's bit position in the subsequent layer is a function of where it
sits not just in that MIDX layer, but how many (de-duplicated) objects
exist in all prior layers. There are two, so the earliest bit position
possible to allocate towards C is the third bit. And since C reaches A,
its bitmap would indeed be "101".
> Again, just talking through it here.
Heh, thanks for saying so. It's good to know when we're just talking
through examples versus asking for changes. (Of course, the mere fact of
talking through an example is sometimes enough to suggest a change by
virtue of that example being confusing enough to need to be talked
through in the first place).
> > +Note also that only the bitmap pertaining to the most recent layer in an
> > +incremental MIDX chain is used to store reachability information about
> > +the interesting and uninteresting objects in a reachability query.
> > +Earlier bitmap layers are only used to look up commit and pseudo-merge
> > +bitmaps from that layer, as well as the type-level bitmaps for objects
> > +in that layer.
>
> I'm not quite sure what this means, but I guess you're saying that
> internally as we produce a bitmap, we'll always use the complete bitmap
> over all of the layers?
That's exactly right.
> > +To simplify the implementation, type-level bitmaps are iterated
> > +simultaneously, and their results are OR'd together to avoid recursively
> > +calling internal bitmap functions.
>
> OK, I guess we'll see what this means in the patches. ;)
>
> The general rules for the data structure make sense to me, though.
Great, and thanks in advance for the review as I work through the rest
of your emails :-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps
2025-03-18 2:42 ` Elijah Newren
@ 2025-03-18 23:19 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-18 23:19 UTC (permalink / raw)
To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 07:42:54PM -0700, Elijah Newren wrote:
> > +In the incremental MIDX design, we extend this definition to include
> > +objects from multiple layers of the MIDX chain. The pseudo-pack order
> > +for incremental MIDXs is determined by concatenating the pseudo-pack
> > +ordering for each layer of the MIDX chain in order. Formally two objects
> > +`o1` and `o2` are compared as follows:
> > +
> > +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
> > + `o1` is considered less than `o2`.
>
> For sorting order, 'less than' doesn't tell us if you are sorting
> smallest to greatest or greatest to smallest. Maybe "less than (so
> its order is earlier than) `o2'" ?
Oh, good suggestion. I found the alternative a little verbose, but went
with "sorts ahead of" instead of "less than".
> > +
> > +2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
> > + MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
> > + preferred and the other is not, then the preferred one sorts first. If
> > + there is a base layer (i.e. the MIDX layer is not the first layer in
> > + the chain), then if `pack(o1)` appears earlier in that MIDX layer's
> > + pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
>
> s/than/then/
>
> > + appears earlier, than the opposite is true.
>
> s/than/then/
Good catch on both accounts ;-).
> > +The structure of a `*.bitmap` file belonging to an incremental MIDX
> > +chain is identical to that of a non-incremental MIDX bitmap, or a
> > +classic single-pack bitmap. Since objects are added to the end of the
> > +incremental MIDX's pseudo-pack order (see: above), it is possible to
>
> drop the colon?
Yep, dropped.
> > +extend a bitmap when appending to the end of a MIDX chain.
> > +
> > +(Note: it is possible likewise to compress a contiguous sequence of MIDX
> > +incremental layers, and their `*.bitmap`(s) into a single layer and
> > +`*.bitmap`, but this is not yet implemented.)
>
> "`*.bitmap`(s)" feels slightly awkward and only saves 2 characters.
> Maybe just "`*.bitmap` files"?
Fair suggestion, sure!
> > Future Work
> > -----------
>
> Should the patch also remove the first item from Future Work, since
> this series is implementing it?
Hah, that was quite satisfying to do. I moved that to its own commit,
though, since this series doesn't implement incremental MIDXs, but
bitmap support for them. Incremental MIDXs were "done" as of b9497848df
(Merge branch 'tb/incremental-midx-part-1', 2024-08-19).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2025-03-18 1:27 ` Jeff King
@ 2025-03-19 0:02 ` Taylor Blau
2025-03-19 0:07 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:02 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 09:27:26PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:18:24PM -0400, Taylor Blau wrote:
>
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index 6406953d32..c26d85b5db 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
> > return read_bitmap(index->map, index->map_size, &index->map_pos);
> > }
> >
> > +static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
> > +{
> > + if (index->midx) {
> > + struct multi_pack_index *m = index->midx;
> > + return m->num_objects + m->num_objects_in_base;
> > + }
> > + return index->pack->num_objects;
> > +}
>
> I understand why we need to account for the objects in the base to
> offset our total size.
>
> Similar to Patrick's comments on v3, I wondered about why we couldn't
> just modify bitmap_num_objects() here, and why some callers would be
> left with the other.
>
> I guess sometimes we still need to consider a single layer. We can't
> quite just access m->num_objects there, because we still need the midx
> vs pack abstraction layer. I just thought there'd be more discussion
> here, but it looks the same as v3.
Right; some callers care about the number of objects in *their* layer,
like computing the size of some bitmap extensions, bounds-checking
pseudo-merge commit lookups, or generating positions for objects in the
extended index.
I'm happy to include that discussion somewhere in the commit message or
as a comment nearby bitmap_non_extended_bits(), but I'm not sure which
is better. If you have thoughts, LMK.
> I wonder if it is worth renaming bitmap_num_objects() to indicate that
> it is a single layer (and make sure other callers are examined). I
> dunno.
>
> I also suspect from previous forays into bitmap indexing that it will be
> easy to mix up positions in various units (local to the layer vs in the
> global pseudo-pack ordering, for example). In theory we could use types
> to help us with this, but they're kind of weak in C (unless we wrap all
> of the ints in structs). Maybe not worth it.
Perhaps. I do like the idea of using types to help with all of this, but
like you I suspect they may be more trouble than they're worth.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2025-03-18 2:43 ` Elijah Newren
@ 2025-03-19 0:03 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:03 UTC (permalink / raw)
To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 07:43:07PM -0700, Elijah Newren wrote:
> On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > Prepare the reverse index machinery to handle object lookups in an
> > incremental MIDX bitmap. These changes are broken out across a few
> > functions:
> >
> > - load_midx_revindex() learns to use the appropriate MIDX filename
> > depending on whether the given 'struct multi_pack_index *' is
> > incremental or not.
> >
> > - pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
> > object position in the MIDX pseudo-pack order, and finds the
> > earliest containing MIDX (similar to midx.c::midx_for_object().
>
> s/finds/find/ ?
Good eyes, fixed.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2025-03-19 0:02 ` Taylor Blau
@ 2025-03-19 0:07 ` Taylor Blau
2025-03-26 18:08 ` Jeff King
0 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:07 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Tue, Mar 18, 2025 at 08:02:41PM -0400, Taylor Blau wrote:
> > I understand why we need to account for the objects in the base to
> > offset our total size.
> >
> > Similar to Patrick's comments on v3, I wondered about why we couldn't
> > just modify bitmap_num_objects() here, and why some callers would be
> > left with the other.
> >
> > I guess sometimes we still need to consider a single layer. We can't
> > quite just access m->num_objects there, because we still need the midx
> > vs pack abstraction layer. I just thought there'd be more discussion
> > here, but it looks the same as v3.
>
> Right; some callers care about the number of objects in *their* layer,
> like computing the size of some bitmap extensions, bounds-checking
> pseudo-merge commit lookups, or generating positions for objects in the
> extended index.
>
> I'm happy to include that discussion somewhere in the commit message or
> as a comment nearby bitmap_non_extended_bits(), but I'm not sure which
> is better. If you have thoughts, LMK.
I renamed this function to bitmap_num_objects_total(), which I think
more clearly distinguishes it from bitmap_num_objects(). If you have
other thoughts or things you think I should do in addition to that, LMK.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 03/13] pack-bitmap.c: open and store incremental bitmap layers
2025-03-18 4:13 ` Elijah Newren
@ 2025-03-19 0:08 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:08 UTC (permalink / raw)
To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 09:13:45PM -0700, Elijah Newren wrote:
> On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > Prepare the pack-bitmap machinery to work with incremental MIDXs by
> > adding a new "base" field to keep track of the bitmap index associated
> > with the previous MIDX layer.
> >
> > The changes in this commit are mostly boilerplate to open the correct
> > bitmap(s), add them to the chain bitmap layers along the "base" pointer,
>
> s/chain/chain of/ ?
Good eyes again!
> > ensures that the correct packs and their reverse indexes are loaded
>
> s/ensures/ensure/ ?
...and again!
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
2025-03-18 1:38 ` Jeff King
@ 2025-03-19 0:13 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:13 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 09:38:23PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:18:31PM -0400, Taylor Blau wrote:
>
> > The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
> > EWAH-compressed bitmap corresponding to some given commit object.
> >
> > Teach this function about incremental MIDX bitmaps by teaching it to
> > recur on earlier bitmap layers when it fails to find a given commit in
> > the current layer.
> >
> > The changes to do so are as follows:
> >
> > - Avoid initializing hash_pos at its declaration, since
> > bitmap_for_commit() is now a recursive function and may receive a
> > NULL bitmap_index pointer as its first argument.
> >
> > - In cases where we would previously return NULL (to indicate that a
> > lookup failed and the given bitmap_index does not contain an entry
> > corresponding to the given commit), recursively call the function on
> > the previous bitmap layer.
>
> This makes sense, though it does make me wonder if we could/should store
> a (midx/pack,pos) pair. I.e., a master hash table stored once for the
> whole midx stack. And then you wouldn't need to recurse; it would just
> be a single lookup.
>
> Or would that work badly with the lazy nature? You'd need to load all of
> the layers to fill it (rather than doing each incrementally). OTOH, if
> you ask for the bitmap for commit X you're eventually going to have to
> figure out what's in all of the layers as soon as you have a miss and
> have to check them all. And I think the lookup table extension is what's
> supposed to make that cheap-ish.
I think that it's a good idea, though TBH I think there is even more
room for improvement there, like recording cache misses. I suspect the
details are fiddly enough that I'd rather tackle them outside of this
already-fiddly series, though ;-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 06/13] pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
2025-03-18 4:13 ` Elijah Newren
@ 2025-03-19 0:17 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:17 UTC (permalink / raw)
To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 09:13:52PM -0700, Elijah Newren wrote:
> On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > In a similar fashion as previous commits in the first phase of
> > incremental MIDXs, enumerate not just the packs in the current
> > incremental MIDX layer, but previous ones as well.
> >
> > Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
> > single pack from a MIDX, use the oldest layer's preferred pack as it is
> > likely to contain the most amount of reusable sections.
>
> "most amount" => "largest number" or "largest size" ?
Good call; between the two I prefer "largest number".
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2025-03-18 5:31 ` Elijah Newren
@ 2025-03-19 0:30 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:30 UTC (permalink / raw)
To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 10:31:06PM -0700, Elijah Newren wrote:
> > +static void bitmap_test_data_prepare(struct bitmap_test_data *tdata,
> > + struct bitmap_index *bitmap_git)
> > +{
> > + memset(tdata, 0, sizeof(struct bitmap_test_data));
>
> So, the first thing this function does is 0 out tdata.
>
> > +
> > + tdata->bitmap_git = bitmap_git;
> > + tdata->base = bitmap_new();
> > + tdata->commits = ewah_to_bitmap(bitmap_git->commits);
> > + tdata->trees = ewah_to_bitmap(bitmap_git->trees);
> > + tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
> > + tdata->tags = ewah_to_bitmap(bitmap_git->tags);
> > +
> > + if (bitmap_git->base) {
> > + CALLOC_ARRAY(tdata->base_tdata, 1);
>
> We use CALLOC to both allocate the array and set it all to 0...
>
> > + bitmap_test_data_prepare(tdata->base_tdata, bitmap_git->base);
>
> and then call bitmap_test_data_prepare() which will re-zero it all out.
>
> Should we either ditch the zeroing at the beginning of the function,
> or use xmalloc instead of CALLOC_ARRAY, to avoid duplicate zeroing?
Ah... good point. I think between the two we should drop the
CALLOC_ARRAY() and just xmalloc() it.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 08/13] pack-bitmap.c: compute disk-usage with incremental MIDXs
2025-03-18 1:41 ` Jeff King
@ 2025-03-19 0:30 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:30 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 09:41:39PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:18:44PM -0400, Taylor Blau wrote:
>
> > In a similar fashion as previous commits, use nth_midxed_pack() instead
> > of accessing the MIDX's ->packs array directly to support incremental
> > MIDXs.
>
> Probably not worth it to change it in an actual patch, but is it worth
> renaming midx->packs to something else to make sure we catch all of the
> spots that need to be considered? Or maybe you already did that, which
> is how you found all of these. :)
That's how I found them originally ;-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 10/13] ewah: implement `struct ewah_or_iterator`
2025-03-18 1:44 ` Jeff King
@ 2025-03-19 0:33 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:33 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 09:44:17PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:18:50PM -0400, Taylor Blau wrote:
>
> > +void ewah_or_iterator_free(struct ewah_or_iterator *it)
> > +{
> > + free(it->its);
> > +}
>
> Hmm, I thought this was going to be come "_release()" based on the last
> round?
Oops, yes -- it should have been. I could have sworn I made that change,
but I must be hallucinating. In either case, it's fixed up in the round
that I'll send shortly.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2025-03-18 2:01 ` Jeff King
@ 2025-03-19 0:38 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:38 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 10:01:13PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:18:53PM -0400, Taylor Blau wrote:
>
> > @@ -81,6 +81,24 @@ struct bitmap_index {
> > struct ewah_bitmap *blobs;
> > struct ewah_bitmap *tags;
> >
> > + /*
> > + * Type index arrays when this bitmap is associated with an
> > + * incremental multi-pack index chain.
> > + *
> > + * If n is the number of unique layers in the MIDX chain, then
> > + * commits_all[n-1] is this structs 'commits' field,
> > + * commits_all[n-2] is the commits field of this bitmap's
> > + * 'base', and so on.
> > + *
> > + * When either associated either with a non-incremental MIDX, or
> > + * a single packfile, these arrays each contain a single
> > + * element.
> > + */
> > + struct ewah_bitmap **commits_all;
> > + struct ewah_bitmap **trees_all;
> > + struct ewah_bitmap **blobs_all;
> > + struct ewah_bitmap **tags_all;
>
> OK, so these are valid only for the top-level of the chain? I guess
> there would not be much point in having the lower levels know about
> their incremental versions.
Right; all of the "useful" computation like counting, traversing,
filtering, etc. is all done at the top-most layer in any chain, so these
have no meaning/purpose for lower layers.
> > -static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
> > +static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
> > +{
> > + struct bitmap_index *curr = bitmap_git;
> > + size_t i = bitmap_git->base_nr;
> > +
> > + ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr + 1);
> > + ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr + 1);
> > + ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr + 1);
> > + ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr + 1);
> > +
> > + while (curr) {
> > + bitmap_git->commits_all[i] = curr->commits;
> > + bitmap_git->trees_all[i] = curr->trees;
> > + bitmap_git->blobs_all[i] = curr->blobs;
> > + bitmap_git->tags_all[i] = curr->tags;
> > +
> > + curr = curr->base;
> > + if (curr && !i)
> > + BUG("unexpected number of bitmap layers, expected %"PRIu32,
> > + bitmap_git->base_nr + 1);
> > + i -= 1;
> > + }
> > +}
>
> It looks like we always allocate these. For the non-incremental case, I
> think you could just do:
>
> bitmap_git->commits_all = &bitmap_git->commits;
>
> and so forth. But I doubt that micro-optimization really matters, and it
> introduces complications when you have to decide whether to free them or
> not.
The complications aren't actually too bad... I think you just have to
avoid free()-ing them when you have a non-NULL 'base' pointer. I think
it would look something like:
--- 8< ---
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2270a646f6..8a530fa7d8 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -604,6 +604,15 @@ static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
struct bitmap_index *curr = bitmap_git;
size_t i = bitmap_git->base_nr;
+ if (!bitmap_git->base) {
+ bitmap_git->commits_all = &bitmap_git->commits;
+ bitmap_git->trees_all = &bitmap_git->trees;
+ bitmap_git->blobs_all = &bitmap_git->blobs;
+ bitmap_git->tags_all = &bitmap_git->tags;
+
+ return;
+ }
+
ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr + 1);
ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr + 1);
ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr + 1);
@@ -3031,10 +3040,12 @@ void free_bitmap_index(struct bitmap_index *b)
ewah_pool_free(b->trees);
ewah_pool_free(b->blobs);
ewah_pool_free(b->tags);
- free(b->commits_all);
- free(b->trees_all);
- free(b->blobs_all);
- free(b->tags_all);
+ if (b->base) {
+ free(b->commits_all);
+ free(b->trees_all);
+ free(b->blobs_all);
+ free(b->tags_all);
+ }
if (b->bitmaps) {
struct stored_bitmap *sb;
kh_foreach_value(b->bitmaps, sb, {
--- >8 ---
, but it leaves a bad taste in my mouth tying the NULL-ness of
'bitmap_git->base' to the allocation/freeing of these arrays. Maybe I'm
being paranoid, but it feels like a potential landmine.
> (And if you really cared about micro-optimizing, probably trying to
> prevent the extra pointer-chase in the first place would be a more
> productive path).
Yup.
Thanks,
Taylor
^ permalink raw reply related [flat|nested] 136+ messages in thread
* Re: [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps
2025-03-18 6:43 ` Elijah Newren
@ 2025-03-19 0:39 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 0:39 UTC (permalink / raw)
To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 11:43:47PM -0700, Elijah Newren wrote:
> > + *
> > + * When either associated either with a non-incremental MIDX, or
> > + * a single packfile, these arrays each contain a single
> > + * element.
> > + */
>
> Drop the first "either", and the first comma?
Good catch, thanks!
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
2025-03-18 2:05 ` Jeff King
@ 2025-03-19 23:02 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-19 23:02 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 10:05:26PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:18:56PM -0400, Taylor Blau wrote:
>
> > -static void init_type_iterator(struct ewah_iterator *it,
> > +static void init_type_iterator(struct ewah_or_iterator *it,
> > struct bitmap_index *bitmap_git,
> > enum object_type type)
> > {
> > switch (type) {
> > case OBJ_COMMIT:
> > - ewah_iterator_init(it, bitmap_git->commits);
> > + ewah_or_iterator_init(it, bitmap_git->commits_all,
> > + bitmap_git->base_nr + 1);
>
> This certainly makes sense. It looks like we now use the or_iterator
> unconditionally, even for non-layered queries. It's probably a little
> slower in practice, just because it's an extra layer of indirection. But
> I don't know if trying to micro-optimize here is worth it. In general
> I'd say no, but sometimes there are surprising tight loops with bitmaps.
>
> I dunno. I guess it would be easy enough to do a simple before/after
> benchmark on a single packfile with this series. I wouldn't expect it to
> find anything, but might not hurt to double check.
Should be OK. We're adding one extra allocation, and one extra
function call on each _next() iteration. So I think we should be OK
here, and indeed...
$ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in
$ hyperfine -L v ,.compile 'git{v} pack-objects --stdout --delta-base-offset --use-bitmap-index --revs <in >/dev/null'
Benchmark 1: git pack-objects --stdout --delta-base-offset --use-bitmap-index --revs <in >/dev/null
Time (mean ± σ): 1.715 s ± 0.026 s [User: 4.353 s, System: 0.206 s]
Range (min … max): 1.692 s … 1.785 s 10 runs
Benchmark 2: git.compile pack-objects --stdout --delta-base-offset --use-bitmap-index --revs <in >/dev/null
Time (mean ± σ): 1.712 s ± 0.021 s [User: 4.401 s, System: 0.223 s]
Range (min … max): 1.676 s … 1.749 s 10 runs
Summary
git.compile pack-objects --stdout --delta-base-offset --use-bitmap-index --revs <in >/dev/null ran
1.00 ± 0.02 times faster than git pack-objects --stdout --delta-base-offset --use-bitmap-index --revs <in >/dev/null
Looks like we're about the same. 'git' here is the latest release, and
'git.compile' is this commit compiled with -O2 for an apples-to-apples
comparison.
In this benchmark it looks like this build is maybe ~1ms faster than the
stock build, but we're well within the noise range here, so I don't
think there's any statistical significance to the results.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps
2025-03-18 2:16 ` Jeff King
@ 2025-03-20 0:14 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 0:14 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 10:16:05PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:19:00PM -0400, Taylor Blau wrote:
>
> > +write_midx_layer () {
> > + n=1
> > + if test -f $midx_chain
> > + then
> > + n="$(($(wc -l <$midx_chain) + 1))"
> > + fi
> > +
> > + for i in 1 2
> > + do
> > + test_commit $n.$i &&
> > + git repack -d || return 1
> > + done &&
> > + git multi-pack-index write --bitmap --incremental
> > +}
> > +
> > +test_expect_success 'write initial MIDX layer' '
> > + git repack -ad &&
> > + write_midx_layer
> > +'
> > +
> > +test_expect_success 'read bitmap from first MIDX layer' '
> > + git rev-list --test-bitmap 1.2
> > +'
> > +
> > +test_expect_success 'write another MIDX layer' '
> > + write_midx_layer
> > +'
> > +
> > +test_expect_success 'midx verify with multiple layers' '
> > + git multi-pack-index verify
> > +'
>
> Perhaps a silly suggestion, but do you want to confirm in one of these
> tests that there are in fact multiple layers of bitmaps? (I expect it to
> be true, but just trying to cover all bases in the test).
I don't think it's a silly suggestion. As you note, we do implicitly
check it further down, but doing something like the following
test_path_is_dir "$midx_chain" &&
test_line_count = 2 "$midx_chain" &&
explicitly before calling 'git multi-pack-index verify' would be nice to
have.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps
2025-03-18 17:13 ` Elijah Newren
@ 2025-03-20 0:16 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 0:16 UTC (permalink / raw)
To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Tue, Mar 18, 2025 at 10:13:03AM -0700, Elijah Newren wrote:
> On Fri, Mar 14, 2025 at 1:19 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> [...]
> > diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
> > index e92341b8fa..056c410efb 100644
> > --- a/ewah/ewah_bitmap.c
> > +++ b/ewah/ewah_bitmap.c
> > @@ -399,7 +399,7 @@ int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
> > return ret;
> > }
> >
> > -void ewah_or_iterator_free(struct ewah_or_iterator *it)
> > +void ewah_or_iterator_release(struct ewah_or_iterator *it)
> > {
> > free(it->its);
> > }
> > diff --git a/ewah/ewok.h b/ewah/ewok.h
> > index 4b70641045..c29d354236 100644
> > --- a/ewah/ewok.h
> > +++ b/ewah/ewok.h
> > @@ -158,7 +158,7 @@ void ewah_or_iterator_init(struct ewah_or_iterator *it,
> >
> > int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
> >
> > -void ewah_or_iterator_free(struct ewah_or_iterator *it);
> > +void ewah_or_iterator_release(struct ewah_or_iterator *it);
>
> Was the rename from these last two hunks squashed into the wrong
> patch? Since you're not changing its definition, I'm assuming the
> updated name should have been applied to when it was introduced.
Hah! I knew that I made this change, so I was confused in
https://lore.kernel.org/git/Z9oQ4moLVKh3+vul@nand.local/
when it didn't show up in that patch.
It got rebased out of my local copy of this patch automatically since I
had manually applied the rename to the earlier patch while responding to
that review comment.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 00/13] midx: incremental multi-pack indexes, part two
2025-03-18 2:21 ` [PATCH v4 00/13] midx: incremental multi-pack indexes, part two Jeff King
@ 2025-03-20 0:18 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 0:18 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Mon, Mar 17, 2025 at 10:21:34PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:18:12PM -0400, Taylor Blau wrote:
>
> > This is a new round of my series to implement bitmap support for
> > incremental multi-pack indexes (MIDXs). It has been rebased on current
> > 'master', which is 683c54c999 (Git 2.49, 2025-03-14) at the time of
> > writing.
>
> I read over this and didn't find anything objectionable (I left a few
> comments here and there). I think I've said this before with big
> bitmap/midx series: the biggest issue is that it's hard to know what you
> might have missed. Especially in terms of corner cases. So it all looks
> reasonable to me (including the overall design), but ultimately I think
> it's more fruitful to put it through the paces on real-looking data than
> it is to try to go over every inch of the midx code with a fine-tooth
> comb. And I'd guess the eventual fate here is for this code to get
> exercise on GitHub, which would help with that shaking out.
Thanks for reviewing it, and sorry that this series is so dense to begin
with.
I agree that the proof is really in the pudding here, and the best way
to confirm that we squashed everything is by putting real usage through
the new paths and seeing what shakes out.
I think the main thing that I was hoping for here were two things:
1. That others thought the overall design and approach are sane, and
that we're not painting ourselves into a corner.
2. That we are unlikely to regress non-incremental bitmap usage.
> So mainly I tried to look for things that might hurt the non-incremental
> cases, and didn't see anything (modulo one or two questions about
> micro-optimizations, though I expect the answer there is "nothing big
> enough to measure"). So if this can progress towards the "shaking out"
> phase, and has the potential to hurt only people who turn on the new
> feature, that seems like a good path to me.
...and that's exactly what I got :-).
I'll send a small reroll shortly that addresses the comments from you
and Elijah (thanks, Elijah!) and I'll look forward to hearing what you
think of that round.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* [PATCH v5 00/14] midx: incremental multi-pack indexes, part two
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
` (15 preceding siblings ...)
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 01/14] Documentation: remove a "future work" item from the MIDX docs Taylor Blau
` (14 more replies)
16 siblings, 15 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
This is another new round of my series to implement bitmap support for
incremental multi-pack indexes (MIDXs). It is still based on 683c54c999
(Git 2.49, 2025-03-14).
== Changes since last time
This round addresses thorough review from Elijah and Peff. The series
substantively is unchanged, but there are lots of little quality-of-life
and commit message readability improvements throughout. As usual, there
is a range-diff below for convenience.
== Original cover letter
This series is based on 'master', with an additional merge between
tb/incremental-midx-part-1[1] and my newer series to fix a handful of
bugs related to pseudo-merge bitmaps[2].
This is the second of three series to implement support for incremental
multi-pack indexes (MIDXs). This series brings support for bitmaps that
are tied to incremental MIDXs in addition to regular MIDX bitmaps.
The details are laid out in the commits themselves, but the high-level
approach is as follows:
- Each layer in the incremental MIDX chain has its own corresponding
*.bitmap file. Each bitmap contains commits / pseudo-merges which
are selected only from the commits in that layer. Likewise, only
that layer's objects are written in the type-level bitmaps.
- The reachability traversal is only conducted on the top-most bitmap
corresponding to the most recent layer in the incremental MIDX
chain. Earlier layers may be consulted to retrieve commit /
pseudo-merge reachability bitmaps, but only the top-most bitmap's
"result" and "haves" fields are used.
- In essence, the top-most bitmap is the only one that "matters", and
earlier bitmaps are merely used to look up commit and pseudo-merge
bitmaps from that layer.
- Whenever we need to look at the type-level bitmaps corresponding to
the whole incremental MIDX chain, a new "ewah_or_iterator" is used.
This works in concept like a typical ewah_iterator, except works
over many EWAH bitmaps in parallel, OR-ing their results together
before returning them to the user.
In effect, this allows us to treat the union of all type-level
bitmaps (each of which only stores information about the objects its
corresponding layer within the incremental MIDX chain) as a single
type-level bitmap corresponding to all of the objects across every
layer of the incremental MIDX chain.
The sum total of this series is that we are able to append new commits /
pseudo-merges to a repository's reachability bitmaps without having to
rewrite existing bitmaps, making the operation much cheaper to perform
in large repositories.
The series is laid out roughly as follows:
- The first patch describes the technical details of incremental MIDX
bitmaps.
- The second patch adjusts the pack-revindex internals to prepare for
incremental MIDX bitmaps.
- The next seven patches adjust various components of the pack-bitmap
internals to do the same.
- The next three patches introduce and adjust callers to use the
ewah_or_iterator (as above).
- The final patch implements writing incremental MIDX bitmaps, and
introduces tests.
After this series, the remaining goals for this project include being
able to compact contiguous runs of incremental MIDX layers into a single
layer to support growing the chain of MIDX layers without the chain
itself becoming too long.
Thanks in advance for your review!
[1]: https://lore.kernel.org/git/cover.1722958595.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/cover.1723743050.git.me@ttaylorr.com/
Taylor Blau (14):
Documentation: remove a "future work" item from the MIDX docs
Documentation: describe incremental MIDX bitmaps
pack-revindex: prepare for incremental MIDX bitmaps
pack-bitmap.c: open and store incremental bitmap layers
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: keep track of each layer's type bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
midx: implement writing incremental MIDX bitmaps
Documentation/technical/multi-pack-index.adoc | 82 ++++-
builtin/pack-objects.c | 3 +-
ewah/ewah_bitmap.c | 33 ++
ewah/ewok.h | 12 +
midx-write.c | 57 ++-
pack-bitmap-write.c | 65 +++-
pack-bitmap.c | 344 ++++++++++++++----
pack-bitmap.h | 4 +-
pack-revindex.c | 34 +-
t/t5334-incremental-multi-pack-index.sh | 87 +++++
10 files changed, 589 insertions(+), 132 deletions(-)
Range-diff against v4:
-: ---------- > 1: 6af65fdaac Documentation: remove a "future work" item from the MIDX docs
1: f565f2fff1 ! 2: 0897359506 Documentation: describe incremental MIDX bitmaps
@@ Documentation/technical/multi-pack-index.adoc: objects_nr($H2) + objects_nr($H1)
+`o1` and `o2` are compared as follows:
+
+1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
-+ `o1` is considered less than `o2`.
++ `o1` sorts ahead of `o2`.
+
+2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
+ MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
-+ preferred and the other is not, then the preferred one sorts first. If
-+ there is a base layer (i.e. the MIDX layer is not the first layer in
-+ the chain), then if `pack(o1)` appears earlier in that MIDX layer's
-+ pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
-+ appears earlier, than the opposite is true.
++ preferred and the other is not, then the preferred one sorts ahead of
++ the non-preferred one. If there is a base layer (i.e. the MIDX layer
++ is not the first layer in the chain), then if `pack(o1)` appears
++ earlier in that MIDX layer's pack order, then `o1` sorts ahead of
++ `o2`. Likewise if `pack(o2)` appears earlier, then the opposite is
++ true.
+
+3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
+ same MIDX layer. Sort `o1` and `o2` by their offset within their
@@ Documentation/technical/multi-pack-index.adoc: objects_nr($H2) + objects_nr($H1)
+The structure of a `*.bitmap` file belonging to an incremental MIDX
+chain is identical to that of a non-incremental MIDX bitmap, or a
+classic single-pack bitmap. Since objects are added to the end of the
-+incremental MIDX's pseudo-pack order (see: above), it is possible to
++incremental MIDX's pseudo-pack order (see above), it is possible to
+extend a bitmap when appending to the end of a MIDX chain.
+
+(Note: it is possible likewise to compress a contiguous sequence of MIDX
-+incremental layers, and their `*.bitmap`(s) into a single layer and
++incremental layers, and their `*.bitmap` files into a single layer and
+`*.bitmap`, but this is not yet implemented.)
+
+The object positions used are global within the pseudo-pack order, so
2: f2a232e556 ! 3: 5eac0d1485 pack-revindex: prepare for incremental MIDX bitmaps
@@ Commit message
incremental or not.
- pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
- object position in the MIDX pseudo-pack order, and finds the
+ object position in the MIDX pseudo-pack order, and find the
earliest containing MIDX (similar to midx.c::midx_for_object().
- midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
@@ pack-bitmap.c: static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *ind
return read_bitmap(index->map, index->map_size, &index->map_pos);
}
-+static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
++static uint32_t bitmap_num_objects_total(struct bitmap_index *index)
+{
+ if (index->midx) {
+ struct multi_pack_index *m = index->midx;
@@ pack-bitmap.c: static inline int bitmap_position_extended(struct bitmap_index *b
if (pos < kh_end(positions)) {
int bitmap_pos = kh_value(positions, pos);
- return bitmap_pos + bitmap_num_objects(bitmap_git);
-+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
++ return bitmap_pos + bitmap_num_objects_total(bitmap_git);
}
return -1;
@@ pack-bitmap.c: static int ext_index_add_object(struct bitmap_index *bitmap_git,
}
- return bitmap_pos + bitmap_num_objects(bitmap_git);
-+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
++ return bitmap_pos + bitmap_num_objects_total(bitmap_git);
}
struct bitmap_show_data {
@@ pack-bitmap.c: static void show_extended_objects(struct bitmap_index *bitmap_git
- if (!bitmap_get(objects, st_add(bitmap_num_objects(bitmap_git), i)))
+ if (!bitmap_get(objects,
-+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
++ st_add(bitmap_num_objects_total(bitmap_git),
++ i)))
continue;
obj = eindex->objects[i];
@@ pack-bitmap.c: static void filter_bitmap_exclude_type(struct bitmap_index *bitma
*/
for (i = 0; i < eindex->count; i++) {
- size_t pos = st_add(i, bitmap_num_objects(bitmap_git));
-+ size_t pos = st_add(i, bitmap_non_extended_bits(bitmap_git));
++ size_t pos = st_add(i, bitmap_num_objects_total(bitmap_git));
if (eindex->objects[i]->type == type &&
bitmap_get(to_filter, pos) &&
!bitmap_get(tips, pos))
@@ pack-bitmap.c: static unsigned long get_size_by_pos(struct bitmap_index *bitmap_
oi.sizep = &size;
- if (pos < bitmap_num_objects(bitmap_git)) {
-+ if (pos < bitmap_non_extended_bits(bitmap_git)) {
++ if (pos < bitmap_num_objects_total(bitmap_git)) {
struct packed_git *pack;
off_t ofs;
@@ pack-bitmap.c: static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
+ die(_("unable to get size of %s"), oid_to_hex(&oid));
}
} else {
++ size_t eindex_pos = pos - bitmap_num_objects_total(bitmap_git);
struct eindex *eindex = &bitmap_git->ext_index;
- struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
-+ struct object *obj = eindex->objects[pos - bitmap_non_extended_bits(bitmap_git)];
++ struct object *obj = eindex->objects[eindex_pos];
if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
&oi, 0) < 0)
die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
@@ pack-bitmap.c: static void filter_packed_objects_from_bitmap(struct bitmap_index
size_t i, pos;
- objects_nr = bitmap_num_objects(bitmap_git);
-+ objects_nr = bitmap_non_extended_bits(bitmap_git);
++ objects_nr = bitmap_num_objects_total(bitmap_git);
pos = objects_nr / BITS_IN_EWORD;
if (pos > result->word_alloc)
@@ pack-bitmap.c: static uint32_t count_object_type(struct bitmap_index *bitmap_git
if (eindex->objects[i]->type == type &&
bitmap_get(objects,
- st_add(bitmap_num_objects(bitmap_git), i)))
-+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
++ st_add(bitmap_num_objects_total(bitmap_git), i)))
count++;
}
@@ pack-bitmap.c: uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
"extension");
- num_objects = bitmap_num_objects(bitmap_git);
-+ num_objects = bitmap_non_extended_bits(bitmap_git);
++ num_objects = bitmap_num_objects_total(bitmap_git);
CALLOC_ARRAY(reposition, num_objects);
for (i = 0; i < num_objects; ++i) {
@@ pack-bitmap.c: static off_t get_disk_usage_for_extended(struct bitmap_index *bit
if (!bitmap_get(result,
- st_add(bitmap_num_objects(bitmap_git), i)))
-+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
++ st_add(bitmap_num_objects_total(bitmap_git),
++ i)))
continue;
if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
3: aca0318fb1 ! 4: 922ea2f607 pack-bitmap.c: open and store incremental bitmap layers
@@ Commit message
with the previous MIDX layer.
The changes in this commit are mostly boilerplate to open the correct
- bitmap(s), add them to the chain bitmap layers along the "base" pointer,
- ensures that the correct packs and their reverse indexes are loaded
- across MIDX layers, etc.
+ bitmap(s), add them to the chain of bitmap layers along the "base"
+ pointer, ensure that the correct packs and their reverse indexes are
+ loaded across MIDX layers, etc.
While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
4: 832fd0e8dc = 5: 8fedd96614 pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
5: c7c9f89956 = 6: dccc1b2d2e pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
6: 14d3d80c3d ! 7: e31bddd240 pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
@@ Commit message
Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
- likely to contain the most amount of reusable sections.
+ likely to contain the largest number of reusable sections.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
7: b45a9ccbc2 ! 8: d9dfcb5a1b pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
@@ pack-bitmap.c: static void test_show_commit(struct commit *commit, void *data)
+ tdata->tags = ewah_to_bitmap(bitmap_git->tags);
+
+ if (bitmap_git->base) {
-+ CALLOC_ARRAY(tdata->base_tdata, 1);
++ tdata->base_tdata = xmalloc(sizeof(struct bitmap_test_data));
+ bitmap_test_data_prepare(tdata->base_tdata, bitmap_git->base);
+ }
+}
8: c1eefeae99 = 9: b1bd60d25d pack-bitmap.c: compute disk-usage with incremental MIDXs
9: 11c4b7b949 = 10: 7477a8ac03 pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
10: cb08ad6a62 ! 11: 0fbef17acc ewah: implement `struct ewah_or_iterator`
@@ ewah/ewah_bitmap.c: void ewah_iterator_init(struct ewah_iterator *it, struct ewa
+ return ret;
+}
+
-+void ewah_or_iterator_free(struct ewah_or_iterator *it)
++void ewah_or_iterator_release(struct ewah_or_iterator *it)
+{
+ free(it->its);
+}
@@ ewah/ewok.h: void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitma
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
+
-+void ewah_or_iterator_free(struct ewah_or_iterator *it);
++void ewah_or_iterator_release(struct ewah_or_iterator *it);
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
11: a29f4ee60d ! 12: 439e743fd5 pack-bitmap.c: keep track of each layer's type bitmaps
@@ pack-bitmap.c: struct bitmap_index {
+ * commits_all[n-2] is the commits field of this bitmap's
+ * 'base', and so on.
+ *
-+ * When either associated either with a non-incremental MIDX, or
-+ * a single packfile, these arrays each contain a single
-+ * element.
++ * When associated either with a non-incremental MIDX or a
++ * single packfile, these arrays each contain a single element.
+ */
+ struct ewah_bitmap **commits_all;
+ struct ewah_bitmap **trees_all;
12: a1cf65bedc ! 13: dcb45e349e pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
@@ pack-bitmap.c: static void show_objects_for_type(
}
}
+
-+ ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
}
static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
@@ pack-bitmap.c: static void filter_bitmap_exclude_type(struct bitmap_index *bitma
bitmap_unset(to_filter, pos);
}
-+ ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
bitmap_free(tips);
}
@@ pack-bitmap.c: static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_
bitmap_unset(to_filter, pos);
}
-+ ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
bitmap_free(tips);
}
@@ pack-bitmap.c: static uint32_t count_object_type(struct bitmap_index *bitmap_git
count++;
}
-+ ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
+
return count;
}
@@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_
}
}
-+ ewah_or_iterator_free(&it);
++ ewah_or_iterator_release(&it);
+
return total;
}
13: d0d564685b ! 14: 13568cfa3b midx: implement writing incremental MIDX bitmaps
@@ builtin/pack-objects.c: static void write_pack_file(void)
bitmap_writer_build_type_index(&bitmap_writer,
written_list);
- ## ewah/ewah_bitmap.c ##
-@@ ewah/ewah_bitmap.c: int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
- return ret;
- }
-
--void ewah_or_iterator_free(struct ewah_or_iterator *it)
-+void ewah_or_iterator_release(struct ewah_or_iterator *it)
- {
- free(it->its);
- }
-
- ## ewah/ewok.h ##
-@@ ewah/ewok.h: void ewah_or_iterator_init(struct ewah_or_iterator *it,
-
- int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
-
--void ewah_or_iterator_free(struct ewah_or_iterator *it);
-+void ewah_or_iterator_release(struct ewah_or_iterator *it);
-
- void ewah_xor(
- struct ewah_bitmap *ewah_i,
-
## midx-write.c ##
@@ midx-write.c: static uint32_t *midx_pack_order(struct write_midx_context *ctx)
return pack_order;
@@ pack-bitmap-write.c: void bitmap_writer_finish(struct bitmap_writer *writer,
write_selected_commits_v1(writer, f, offsets);
- ## pack-bitmap.c ##
-@@ pack-bitmap.c: static void show_objects_for_type(
- }
- }
-
-- ewah_or_iterator_free(&it);
-+ ewah_or_iterator_release(&it);
- }
-
- static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
-@@ pack-bitmap.c: static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
- bitmap_unset(to_filter, pos);
- }
-
-- ewah_or_iterator_free(&it);
-+ ewah_or_iterator_release(&it);
- bitmap_free(tips);
- }
-
-@@ pack-bitmap.c: static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
- bitmap_unset(to_filter, pos);
- }
-
-- ewah_or_iterator_free(&it);
-+ ewah_or_iterator_release(&it);
- bitmap_free(tips);
- }
-
-@@ pack-bitmap.c: static uint32_t count_object_type(struct bitmap_index *bitmap_git,
- count++;
- }
-
-- ewah_or_iterator_free(&it);
-+ ewah_or_iterator_release(&it);
-
- return count;
- }
-@@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
- }
- }
-
-- ewah_or_iterator_free(&it);
-+ ewah_or_iterator_release(&it);
-
- return total;
- }
-
## pack-bitmap.h ##
@@ pack-bitmap.h: struct bitmap_writer {
@@ t/t5334-incremental-multi-pack-index.sh: test_expect_success 'convert incrementa
+'
+
+test_expect_success 'midx verify with multiple layers' '
++ test_path_is_file "$midx_chain" &&
++ test_line_count = 2 "$midx_chain" &&
++
+ git multi-pack-index verify
+'
+
base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e
--
2.49.0.14.g88b49c1b34
^ permalink raw reply [flat|nested] 136+ messages in thread
* [PATCH v5 01/14] Documentation: remove a "future work" item from the MIDX docs
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 02/14] Documentation: describe incremental MIDX bitmaps Taylor Blau
` (13 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
One of the items listed as "future work" in the MIDX's technical
documentation is to extend the format to allow MIDXs to be written
incrementally across multiple layers.
This was suggested all the way back in ceab693d1f (multi-pack-index: add
design document, 2018-07-12), and implemented in b9497848df (Merge
branch 'tb/incremental-midx-part-1', 2024-08-19). Let's remove it
accordingly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/technical/multi-pack-index.adoc | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/Documentation/technical/multi-pack-index.adoc b/Documentation/technical/multi-pack-index.adoc
index cc063b30be..dea6486f88 100644
--- a/Documentation/technical/multi-pack-index.adoc
+++ b/Documentation/technical/multi-pack-index.adoc
@@ -167,16 +167,6 @@ m->num_objects_in_base`).
Future Work
-----------
-- The multi-pack-index allows many packfiles, especially in a context
- where repacking is expensive (such as a very large repo), or
- unexpected maintenance time is unacceptable (such as a high-demand
- build machine). However, the multi-pack-index needs to be rewritten
- in full every time. We can extend the format to be incremental, so
- writes are fast. By storing a small "tip" multi-pack-index that
- points to large "base" MIDX files, we can keep writes fast while
- still reducing the number of binary searches required for object
- lookups.
-
- If the multi-pack-index is extended to store a "stable object order"
(a function Order(hash) = integer that is constant for a given hash,
even as the multi-pack-index is updated) then MIDX bitmaps could be
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 02/14] Documentation: describe incremental MIDX bitmaps
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
2025-03-20 17:56 ` [PATCH v5 01/14] Documentation: remove a "future work" item from the MIDX docs Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 03/14] pack-revindex: prepare for " Taylor Blau
` (12 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare to implement support for reachability bitmaps for the new
incremental multi-pack index (MIDX) feature over the following commits.
This commit begins by first describing the relevant format and usage
details for incremental MIDX bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/technical/multi-pack-index.adoc | 72 +++++++++++++++++++
1 file changed, 72 insertions(+)
diff --git a/Documentation/technical/multi-pack-index.adoc b/Documentation/technical/multi-pack-index.adoc
index dea6486f88..ffda70aa13 100644
--- a/Documentation/technical/multi-pack-index.adoc
+++ b/Documentation/technical/multi-pack-index.adoc
@@ -164,6 +164,78 @@ objects_nr($H2) + objects_nr($H1) + i
(in the C implementation, this is often computed as `i +
m->num_objects_in_base`).
+=== Pseudo-pack order for incremental MIDXs
+
+The original implementation of multi-pack reachability bitmaps defined
+the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
+titled "multi-pack-index reverse indexes") roughly as follows:
+
+____
+In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the MIDX, laid out in pack order, and the
+packs arranged in MIDX order (with the preferred pack coming first).
+____
+
+In the incremental MIDX design, we extend this definition to include
+objects from multiple layers of the MIDX chain. The pseudo-pack order
+for incremental MIDXs is determined by concatenating the pseudo-pack
+ordering for each layer of the MIDX chain in order. Formally two objects
+`o1` and `o2` are compared as follows:
+
+1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
+ `o1` sorts ahead of `o2`.
+
+2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
+ MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
+ preferred and the other is not, then the preferred one sorts ahead of
+ the non-preferred one. If there is a base layer (i.e. the MIDX layer
+ is not the first layer in the chain), then if `pack(o1)` appears
+ earlier in that MIDX layer's pack order, then `o1` sorts ahead of
+ `o2`. Likewise if `pack(o2)` appears earlier, then the opposite is
+ true.
+
+3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
+ same MIDX layer. Sort `o1` and `o2` by their offset within their
+ containing packfile.
+
+Note that the preferred pack is a property of the MIDX chain, not the
+individual layers themselves. Fundamentally we could introduce a
+per-layer preferred pack, but this is less relevant now that we can
+perform multi-pack reuse across the set of packs in a MIDX.
+
+=== Reachability bitmaps and incremental MIDXs
+
+Each layer of an incremental MIDX chain may have its objects (and the
+objects from any previous layer in the same MIDX chain) represented in
+its own `*.bitmap` file.
+
+The structure of a `*.bitmap` file belonging to an incremental MIDX
+chain is identical to that of a non-incremental MIDX bitmap, or a
+classic single-pack bitmap. Since objects are added to the end of the
+incremental MIDX's pseudo-pack order (see above), it is possible to
+extend a bitmap when appending to the end of a MIDX chain.
+
+(Note: it is possible likewise to compress a contiguous sequence of MIDX
+incremental layers, and their `*.bitmap` files into a single layer and
+`*.bitmap`, but this is not yet implemented.)
+
+The object positions used are global within the pseudo-pack order, so
+subsequent layers will have, for example, `m->num_objects_in_base`
+number of `0` bits in each of their four type bitmaps. This follows from
+the fact that we only write type bitmap entries for objects present in
+the layer immediately corresponding to the bitmap).
+
+Note also that only the bitmap pertaining to the most recent layer in an
+incremental MIDX chain is used to store reachability information about
+the interesting and uninteresting objects in a reachability query.
+Earlier bitmap layers are only used to look up commit and pseudo-merge
+bitmaps from that layer, as well as the type-level bitmaps for objects
+in that layer.
+
+To simplify the implementation, type-level bitmaps are iterated
+simultaneously, and their results are OR'd together to avoid recursively
+calling internal bitmap functions.
+
Future Work
-----------
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 03/14] pack-revindex: prepare for incremental MIDX bitmaps
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
2025-03-20 17:56 ` [PATCH v5 01/14] Documentation: remove a "future work" item from the MIDX docs Taylor Blau
2025-03-20 17:56 ` [PATCH v5 02/14] Documentation: describe incremental MIDX bitmaps Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 04/14] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
` (11 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:
- load_midx_revindex() learns to use the appropriate MIDX filename
depending on whether the given 'struct multi_pack_index *' is
incremental or not.
- pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
object position in the MIDX pseudo-pack order, and find the
earliest containing MIDX (similar to midx.c::midx_for_object().
- midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
number of objects in the base (since 'vb - midx->revindx_data' is
relative to the containing MIDX, and pack_pos_to_midx() expects a
global position).
Likewise, this function adjusts its output by adding
m->num_objects_in_base to return a global position out through the
`*pos` pointer.
Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 43 +++++++++++++++++++++++++++++++------------
pack-revindex.c | 34 +++++++++++++++++++++++++---------
2 files changed, 56 insertions(+), 21 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6406953d32..87f3b5cf4d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
return read_bitmap(index->map, index->map_size, &index->map_pos);
}
+static uint32_t bitmap_num_objects_total(struct bitmap_index *index)
+{
+ if (index->midx) {
+ struct multi_pack_index *m = index->midx;
+ return m->num_objects + m->num_objects_in_base;
+ }
+ return index->pack->num_objects;
+}
+
static uint32_t bitmap_num_objects(struct bitmap_index *index)
{
if (index->midx)
@@ -924,7 +933,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
if (pos < kh_end(positions)) {
int bitmap_pos = kh_value(positions, pos);
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_num_objects_total(bitmap_git);
}
return -1;
@@ -992,7 +1001,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
bitmap_pos = kh_value(eindex->positions, hash_pos);
}
- return bitmap_pos + bitmap_num_objects(bitmap_git);
+ return bitmap_pos + bitmap_num_objects_total(bitmap_git);
}
struct bitmap_show_data {
@@ -1342,11 +1351,17 @@ struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_g
if (pos < 0 || pos >= bitmap_num_objects(bitmap_git))
goto done;
+ /*
+ * Use bitmap-relative positions instead of offsetting
+ * by bitmap_git->num_objects_in_base because we use
+ * this to find a match in pseudo_merge_for_parents(),
+ * and pseudo-merge groups cannot span multiple bitmap
+ * layers.
+ */
bitmap_set(parents, pos);
}
- match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges,
- parents);
+ match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges, parents);
done:
bitmap_free(parents);
@@ -1500,7 +1515,9 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
struct object *obj;
- if (!bitmap_get(objects, st_add(bitmap_num_objects(bitmap_git), i)))
+ if (!bitmap_get(objects,
+ st_add(bitmap_num_objects_total(bitmap_git),
+ i)))
continue;
obj = eindex->objects[i];
@@ -1679,7 +1696,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* them individually.
*/
for (i = 0; i < eindex->count; i++) {
- size_t pos = st_add(i, bitmap_num_objects(bitmap_git));
+ size_t pos = st_add(i, bitmap_num_objects_total(bitmap_git));
if (eindex->objects[i]->type == type &&
bitmap_get(to_filter, pos) &&
!bitmap_get(tips, pos))
@@ -1705,7 +1722,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
oi.sizep = &size;
- if (pos < bitmap_num_objects(bitmap_git)) {
+ if (pos < bitmap_num_objects_total(bitmap_git)) {
struct packed_git *pack;
off_t ofs;
@@ -1728,8 +1745,9 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
die(_("unable to get size of %s"), oid_to_hex(&oid));
}
} else {
+ size_t eindex_pos = pos - bitmap_num_objects_total(bitmap_git);
struct eindex *eindex = &bitmap_git->ext_index;
- struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
+ struct object *obj = eindex->objects[eindex_pos];
if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
&oi, 0) < 0)
die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
@@ -1882,7 +1900,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t objects_nr;
size_t i, pos;
- objects_nr = bitmap_num_objects(bitmap_git);
+ objects_nr = bitmap_num_objects_total(bitmap_git);
pos = objects_nr / BITS_IN_EWORD;
if (pos > result->word_alloc)
@@ -2419,7 +2437,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
for (i = 0; i < eindex->count; ++i) {
if (eindex->objects[i]->type == type &&
bitmap_get(objects,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_num_objects_total(bitmap_git), i)))
count++;
}
@@ -2820,7 +2838,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
BUG("rebuild_existing_bitmaps: missing required rev-cache "
"extension");
- num_objects = bitmap_num_objects(bitmap_git);
+ num_objects = bitmap_num_objects_total(bitmap_git);
CALLOC_ARRAY(reposition, num_objects);
for (i = 0; i < num_objects; ++i) {
@@ -2963,7 +2981,8 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
struct object *obj = eindex->objects[i];
if (!bitmap_get(result,
- st_add(bitmap_num_objects(bitmap_git), i)))
+ st_add(bitmap_num_objects_total(bitmap_git),
+ i)))
continue;
if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
diff --git a/pack-revindex.c b/pack-revindex.c
index d3832478d9..d3faab6a37 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -383,8 +383,14 @@ int load_midx_revindex(struct multi_pack_index *m)
trace2_data_string("load_midx_revindex", the_repository,
"source", "rev");
- get_midx_filename_ext(m->repo->hash_algo, &revindex_name, m->object_dir,
- get_midx_checksum(m), MIDX_EXT_REV);
+ if (m->has_chain)
+ get_split_midx_filename_ext(m->repo->hash_algo, &revindex_name,
+ m->object_dir, get_midx_checksum(m),
+ MIDX_EXT_REV);
+ else
+ get_midx_filename_ext(m->repo->hash_algo, &revindex_name,
+ m->object_dir, get_midx_checksum(m),
+ MIDX_EXT_REV);
ret = load_revindex_from_disk(revindex_name.buf,
m->num_objects,
@@ -471,11 +477,15 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
{
+ while (m && pos < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
if (!m->revindex_data)
BUG("pack_pos_to_midx: reverse index not yet loaded");
- if (m->num_objects <= pos)
+ if (m->num_objects + m->num_objects_in_base <= pos)
BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
- return get_be32(m->revindex_data + pos);
+ return get_be32(m->revindex_data + pos - m->num_objects_in_base);
}
struct midx_pack_key {
@@ -491,7 +501,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
const struct midx_pack_key *key = va;
struct multi_pack_index *midx = key->midx;
- uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+ size_t pos = (uint32_t *)vb - (const uint32_t *)midx->revindex_data;
+ uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
off_t versus_offset;
@@ -529,9 +540,9 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
{
uint32_t *found;
- if (key->pack >= m->num_packs)
+ if (key->pack >= m->num_packs + m->num_packs_in_base)
BUG("MIDX pack lookup out of bounds (%"PRIu32" >= %"PRIu32")",
- key->pack, m->num_packs);
+ key->pack, m->num_packs + m->num_packs_in_base);
/*
* The preferred pack sorts first, so determine its identifier by
* looking at the first object in pseudo-pack order.
@@ -551,7 +562,8 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
if (!found)
return -1;
- *pos = found - m->revindex_data;
+ *pos = (found - m->revindex_data) + m->num_objects_in_base;
+
return 0;
}
@@ -559,9 +571,13 @@ int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
{
struct midx_pack_key key;
+ while (m && at < m->num_objects_in_base)
+ m = m->base_midx;
+ if (!m)
+ BUG("NULL multi-pack-index for object position: %"PRIu32, at);
if (!m->revindex_data)
BUG("midx_to_pack_pos: reverse index not yet loaded");
- if (m->num_objects <= at)
+ if (m->num_objects + m->num_objects_in_base <= at)
BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
key.pack = nth_midxed_pack_int_id(m, at);
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 04/14] pack-bitmap.c: open and store incremental bitmap layers
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (2 preceding siblings ...)
2025-03-20 17:56 ` [PATCH v5 03/14] pack-revindex: prepare for " Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 05/14] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
` (10 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.
The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain of bitmap layers along the "base"
pointer, ensure that the correct packs and their reverse indexes are
loaded across MIDX layers, etc.
While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.
Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 62 +++++++++++++++++++++++++++++++++++++++------------
1 file changed, 48 insertions(+), 14 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 87f3b5cf4d..e84211de15 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -54,6 +54,16 @@ struct bitmap_index {
struct packed_git *pack;
struct multi_pack_index *midx;
+ /*
+ * If using a multi-pack index chain, 'base' points to the
+ * bitmap index corresponding to this bitmap's midx->base_midx.
+ *
+ * base_nr indicates how many layers precede this one, and is
+ * zero when base is NULL.
+ */
+ struct bitmap_index *base;
+ uint32_t base_nr;
+
/* mmapped buffer of the whole bitmap index */
unsigned char *map;
size_t map_size; /* size of the mmaped buffer */
@@ -386,8 +396,15 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
char *midx_bitmap_filename(struct multi_pack_index *midx)
{
struct strbuf buf = STRBUF_INIT;
- get_midx_filename_ext(midx->repo->hash_algo, &buf, midx->object_dir,
- get_midx_checksum(midx), MIDX_EXT_BITMAP);
+ if (midx->has_chain)
+ get_split_midx_filename_ext(midx->repo->hash_algo, &buf,
+ midx->object_dir,
+ get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(midx->repo->hash_algo, &buf,
+ midx->object_dir, get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
return strbuf_detach(&buf, NULL);
}
@@ -454,16 +471,21 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
goto cleanup;
}
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
- if (prepare_midx_pack(bitmap_repo(bitmap_git),
- bitmap_git->midx,
- i)) {
+ for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
+ if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
warning(_("could not open pack %s"),
bitmap_git->midx->pack_names[i]);
goto cleanup;
}
}
+ if (midx->base_midx) {
+ bitmap_git->base = prepare_midx_bitmap_git(midx->base_midx);
+ bitmap_git->base_nr = bitmap_git->base->base_nr + 1;
+ } else {
+ bitmap_git->base_nr = 0;
+ }
+
return 0;
cleanup:
@@ -515,6 +537,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
bitmap_git->map_size = xsize_t(st.st_size);
bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
bitmap_git->map_pos = 0;
+ bitmap_git->base_nr = 0;
close(fd);
if (load_bitmap_header(bitmap_git) < 0) {
@@ -534,8 +557,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git)
{
if (bitmap_is_midx(bitmap_git)) {
- uint32_t i;
- int ret;
+ struct multi_pack_index *m;
/*
* The multi-pack-index's .rev file is already loaded via
@@ -544,10 +566,15 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
* But we still need to open the individual pack .rev files,
* since we will need to make use of them in pack-objects.
*/
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
- ret = load_pack_revindex(r, bitmap_git->midx->packs[i]);
- if (ret)
- return ret;
+ for (m = bitmap_git->midx; m; m = m->base_midx) {
+ uint32_t i;
+ int ret;
+
+ for (i = 0; i < m->num_packs; i++) {
+ ret = load_pack_revindex(r, m->packs[i]);
+ if (ret)
+ return ret;
+ }
}
return 0;
}
@@ -573,6 +600,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
goto failed;
+ if (bitmap_git->base) {
+ if (!bitmap_is_midx(bitmap_git))
+ BUG("non-MIDX bitmap has non-NULL base bitmap index");
+ if (load_bitmap(r, bitmap_git->base) < 0)
+ goto failed;
+ }
+
return 0;
failed:
@@ -657,10 +691,9 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
{
- struct repository *r = midx->repo;
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
+ if (!open_midx_bitmap_1(bitmap_git, midx))
return bitmap_git;
free_bitmap_index(bitmap_git);
@@ -2901,6 +2934,7 @@ void free_bitmap_index(struct bitmap_index *b)
close_midx_revindex(b->midx);
}
free_pseudo_merge_map(&b->pseudo_merges);
+ free_bitmap_index(b->base);
free(b);
}
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 05/14] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (3 preceding siblings ...)
2025-03-20 17:56 ` [PATCH v5 04/14] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 06/14] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
` (9 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.
Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.
The changes to do so are as follows:
- Avoid initializing hash_pos at its declaration, since
bitmap_for_commit() is now a recursive function and may receive a
NULL bitmap_index pointer as its first argument.
- In cases where we would previously return NULL (to indicate that a
lookup failed and the given bitmap_index does not contain an entry
corresponding to the given commit), recursively call the function on
the previous bitmap layer.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index e84211de15..17f1087fba 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -941,18 +941,21 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
struct commit *commit)
{
- khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
- commit->object.oid);
+ khiter_t hash_pos;
+ if (!bitmap_git)
+ return NULL;
+
+ hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return NULL;
+ return bitmap_for_commit(bitmap_git->base, commit);
return lookup_stored_bitmap(bitmap);
}
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 06/14] pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (4 preceding siblings ...)
2025-03-20 17:56 ` [PATCH v5 05/14] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 07/14] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
` (8 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Since we may ask for a pack_id that is in an earlier MIDX layer relative
to the one corresponding to our bitmap, use nth_midxed_pack() instead of
accessing the ->packs array directly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 17f1087fba..f3ef9e43ef 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1636,7 +1636,7 @@ static void show_objects_for_type(
nth_midxed_object_oid(&oid, m, index_pos);
pack_id = nth_midxed_pack_int_id(m, index_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
} else {
index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 07/14] pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (5 preceding siblings ...)
2025-03-20 17:56 ` [PATCH v5 06/14] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 08/14] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
` (7 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
In a similar fashion as previous commits in the first phase of
incremental MIDXs, enumerate not just the packs in the current
incremental MIDX layer, but previous ones as well.
Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
likely to contain the largest number of reusable sections.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index f3ef9e43ef..5ff1bbfd54 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2335,7 +2335,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
multi_pack_reuse = 0;
if (multi_pack_reuse) {
- for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+ struct multi_pack_index *m = bitmap_git->midx;
+ for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
struct bitmapped_pack pack;
if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
warning(_("unable to load pack: '%s', disabling pack-reuse"),
@@ -2361,14 +2362,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
uint32_t pack_int_id;
if (bitmap_is_midx(bitmap_git)) {
+ struct multi_pack_index *m = bitmap_git->midx;
uint32_t preferred_pack_pos;
- if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
+ while (m->base_midx)
+ m = m->base_midx;
+
+ if (midx_preferred_pack(m, &preferred_pack_pos) < 0) {
warning(_("unable to compute preferred pack, disabling pack-reuse"));
return;
}
- pack = bitmap_git->midx->packs[preferred_pack_pos];
+ pack = nth_midxed_pack(m, preferred_pack_pos);
pack_int_id = preferred_pack_pos;
} else {
pack = bitmap_git->pack;
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 08/14] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (6 preceding siblings ...)
2025-03-20 17:56 ` [PATCH v5 07/14] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 09/14] pack-bitmap.c: compute disk-usage with " Taylor Blau
` (6 subsequent siblings)
14 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.
The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.
When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.
In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 107 ++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 86 insertions(+), 21 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 5ff1bbfd54..65ad631ce1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -938,8 +938,9 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
return NULL;
}
-struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
- struct commit *commit)
+static struct ewah_bitmap *find_bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit,
+ struct bitmap_index **found)
{
khiter_t hash_pos;
if (!bitmap_git)
@@ -949,18 +950,30 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(bitmap);
}
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
}
+struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit)
+{
+ return find_bitmap_for_commit(bitmap_git, commit, NULL);
+}
+
static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
const struct object_id *oid)
{
@@ -2513,6 +2526,8 @@ struct bitmap_test_data {
struct bitmap *tags;
struct progress *prg;
size_t seen;
+
+ struct bitmap_test_data *base_tdata;
};
static void test_bitmap_type(struct bitmap_test_data *tdata,
@@ -2521,6 +2536,11 @@ static void test_bitmap_type(struct bitmap_test_data *tdata,
enum object_type bitmap_type = OBJ_NONE;
int bitmaps_nr = 0;
+ if (bitmap_is_midx(tdata->bitmap_git)) {
+ while (pos < tdata->bitmap_git->midx->num_objects_in_base)
+ tdata = tdata->base_tdata;
+ }
+
if (bitmap_get(tdata->commits, pos)) {
bitmap_type = OBJ_COMMIT;
bitmaps_nr++;
@@ -2584,13 +2604,57 @@ static void test_show_commit(struct commit *commit, void *data)
display_progress(tdata->prg, ++tdata->seen);
}
+static uint32_t bitmap_total_entry_count(struct bitmap_index *bitmap_git)
+{
+ uint32_t total = 0;
+ do {
+ total = st_add(total, bitmap_git->entry_count);
+ bitmap_git = bitmap_git->base;
+ } while (bitmap_git);
+
+ return total;
+}
+
+static void bitmap_test_data_prepare(struct bitmap_test_data *tdata,
+ struct bitmap_index *bitmap_git)
+{
+ memset(tdata, 0, sizeof(struct bitmap_test_data));
+
+ tdata->bitmap_git = bitmap_git;
+ tdata->base = bitmap_new();
+ tdata->commits = ewah_to_bitmap(bitmap_git->commits);
+ tdata->trees = ewah_to_bitmap(bitmap_git->trees);
+ tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
+ tdata->tags = ewah_to_bitmap(bitmap_git->tags);
+
+ if (bitmap_git->base) {
+ tdata->base_tdata = xmalloc(sizeof(struct bitmap_test_data));
+ bitmap_test_data_prepare(tdata->base_tdata, bitmap_git->base);
+ }
+}
+
+static void bitmap_test_data_release(struct bitmap_test_data *tdata)
+{
+ if (!tdata)
+ return;
+
+ bitmap_test_data_release(tdata->base_tdata);
+ free(tdata->base_tdata);
+
+ bitmap_free(tdata->base);
+ bitmap_free(tdata->commits);
+ bitmap_free(tdata->trees);
+ bitmap_free(tdata->blobs);
+ bitmap_free(tdata->tags);
+}
+
void test_bitmap_walk(struct rev_info *revs)
{
struct object *root;
struct bitmap *result = NULL;
size_t result_popcnt;
struct bitmap_test_data tdata;
- struct bitmap_index *bitmap_git;
+ struct bitmap_index *bitmap_git, *found;
struct ewah_bitmap *bm;
if (!(bitmap_git = prepare_bitmap_git(revs->repo)))
@@ -2599,17 +2663,28 @@ void test_bitmap_walk(struct rev_info *revs)
if (revs->pending.nr != 1)
die(_("you must specify exactly one commit to test"));
- fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
+ fprintf_ln(stderr, "Bitmap v%d test (%d entries%s, %d total)",
bitmap_git->version,
bitmap_git->entry_count,
- bitmap_git->table_lookup ? "" : " loaded");
+ bitmap_git->table_lookup ? "" : " loaded",
+ bitmap_total_entry_count(bitmap_git));
root = revs->pending.objects[0].item;
- bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
+ bm = find_bitmap_for_commit(bitmap_git, (struct commit *)root, &found);
if (bm) {
fprintf_ln(stderr, "Found bitmap for '%s'. %d bits / %08x checksum",
- oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm));
+ oid_to_hex(&root->oid),
+ (int)bm->bit_size, ewah_checksum(bm));
+
+ if (bitmap_is_midx(found))
+ fprintf_ln(stderr, "Located via MIDX '%s'.",
+ hash_to_hex_algop(get_midx_checksum(found->midx),
+ revs->repo->hash_algo));
+ else
+ fprintf_ln(stderr, "Located via pack '%s'.",
+ hash_to_hex_algop(found->pack->hash,
+ revs->repo->hash_algo));
result = ewah_to_bitmap(bm);
}
@@ -2626,16 +2701,10 @@ void test_bitmap_walk(struct rev_info *revs)
if (prepare_revision_walk(revs))
die(_("revision walk setup failed"));
- tdata.bitmap_git = bitmap_git;
- tdata.base = bitmap_new();
- tdata.commits = ewah_to_bitmap(bitmap_git->commits);
- tdata.trees = ewah_to_bitmap(bitmap_git->trees);
- tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
- tdata.tags = ewah_to_bitmap(bitmap_git->tags);
+ bitmap_test_data_prepare(&tdata, bitmap_git);
tdata.prg = start_progress(revs->repo,
"Verifying bitmap entries",
result_popcnt);
- tdata.seen = 0;
traverse_commit_list(revs, &test_show_commit, &test_show_object, &tdata);
@@ -2647,11 +2716,7 @@ void test_bitmap_walk(struct rev_info *revs)
die(_("mismatch in bitmap results"));
bitmap_free(result);
- bitmap_free(tdata.base);
- bitmap_free(tdata.commits);
- bitmap_free(tdata.trees);
- bitmap_free(tdata.blobs);
- bitmap_free(tdata.tags);
+ bitmap_test_data_release(&tdata);
free_bitmap_index(bitmap_git);
}
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 09/14] pack-bitmap.c: compute disk-usage with incremental MIDXs
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (7 preceding siblings ...)
2025-03-20 17:56 ` [PATCH v5 08/14] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 10/14] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
` (5 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
In a similar fashion as previous commits, use nth_midxed_pack() instead
of accessing the MIDX's ->packs array directly to support incremental
MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 65ad631ce1..4086277de8 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1779,7 +1779,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- pack = bitmap_git->midx->packs[pack_id];
+ pack = nth_midxed_pack(bitmap_git->midx, pack_id);
ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
} else {
pack = bitmap_git->pack;
@@ -3049,7 +3049,7 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
- struct packed_git *pack = bitmap_git->midx->packs[pack_id];
+ struct packed_git *pack = nth_midxed_pack(bitmap_git->midx, pack_id);
if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
struct object_id oid;
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 08/14] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2025-03-20 17:56 ` [PATCH v5 08/14] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:58 ` Taylor Blau
0 siblings, 1 reply; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.
The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.
When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.
In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 107 ++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 86 insertions(+), 21 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 5ff1bbfd54..65ad631ce1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -938,8 +938,9 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
return NULL;
}
-struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
- struct commit *commit)
+static struct ewah_bitmap *find_bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit,
+ struct bitmap_index **found)
{
khiter_t hash_pos;
if (!bitmap_git)
@@ -949,18 +950,30 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
struct stored_bitmap *bitmap = NULL;
if (!bitmap_git->table_lookup)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
/* this is a fairly hot codepath - no trace2_region please */
/* NEEDSWORK: cache misses aren't recorded */
bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
if (!bitmap)
- return bitmap_for_commit(bitmap_git->base, commit);
+ return find_bitmap_for_commit(bitmap_git->base, commit,
+ found);
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(bitmap);
}
+ if (found)
+ *found = bitmap_git;
return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
}
+struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
+ struct commit *commit)
+{
+ return find_bitmap_for_commit(bitmap_git, commit, NULL);
+}
+
static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
const struct object_id *oid)
{
@@ -2513,6 +2526,8 @@ struct bitmap_test_data {
struct bitmap *tags;
struct progress *prg;
size_t seen;
+
+ struct bitmap_test_data *base_tdata;
};
static void test_bitmap_type(struct bitmap_test_data *tdata,
@@ -2521,6 +2536,11 @@ static void test_bitmap_type(struct bitmap_test_data *tdata,
enum object_type bitmap_type = OBJ_NONE;
int bitmaps_nr = 0;
+ if (bitmap_is_midx(tdata->bitmap_git)) {
+ while (pos < tdata->bitmap_git->midx->num_objects_in_base)
+ tdata = tdata->base_tdata;
+ }
+
if (bitmap_get(tdata->commits, pos)) {
bitmap_type = OBJ_COMMIT;
bitmaps_nr++;
@@ -2584,13 +2604,57 @@ static void test_show_commit(struct commit *commit, void *data)
display_progress(tdata->prg, ++tdata->seen);
}
+static uint32_t bitmap_total_entry_count(struct bitmap_index *bitmap_git)
+{
+ uint32_t total = 0;
+ do {
+ total = st_add(total, bitmap_git->entry_count);
+ bitmap_git = bitmap_git->base;
+ } while (bitmap_git);
+
+ return total;
+}
+
+static void bitmap_test_data_prepare(struct bitmap_test_data *tdata,
+ struct bitmap_index *bitmap_git)
+{
+ memset(tdata, 0, sizeof(struct bitmap_test_data));
+
+ tdata->bitmap_git = bitmap_git;
+ tdata->base = bitmap_new();
+ tdata->commits = ewah_to_bitmap(bitmap_git->commits);
+ tdata->trees = ewah_to_bitmap(bitmap_git->trees);
+ tdata->blobs = ewah_to_bitmap(bitmap_git->blobs);
+ tdata->tags = ewah_to_bitmap(bitmap_git->tags);
+
+ if (bitmap_git->base) {
+ tdata->base_tdata = xmalloc(sizeof(struct bitmap_test_data));
+ bitmap_test_data_prepare(tdata->base_tdata, bitmap_git->base);
+ }
+}
+
+static void bitmap_test_data_release(struct bitmap_test_data *tdata)
+{
+ if (!tdata)
+ return;
+
+ bitmap_test_data_release(tdata->base_tdata);
+ free(tdata->base_tdata);
+
+ bitmap_free(tdata->base);
+ bitmap_free(tdata->commits);
+ bitmap_free(tdata->trees);
+ bitmap_free(tdata->blobs);
+ bitmap_free(tdata->tags);
+}
+
void test_bitmap_walk(struct rev_info *revs)
{
struct object *root;
struct bitmap *result = NULL;
size_t result_popcnt;
struct bitmap_test_data tdata;
- struct bitmap_index *bitmap_git;
+ struct bitmap_index *bitmap_git, *found;
struct ewah_bitmap *bm;
if (!(bitmap_git = prepare_bitmap_git(revs->repo)))
@@ -2599,17 +2663,28 @@ void test_bitmap_walk(struct rev_info *revs)
if (revs->pending.nr != 1)
die(_("you must specify exactly one commit to test"));
- fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
+ fprintf_ln(stderr, "Bitmap v%d test (%d entries%s, %d total)",
bitmap_git->version,
bitmap_git->entry_count,
- bitmap_git->table_lookup ? "" : " loaded");
+ bitmap_git->table_lookup ? "" : " loaded",
+ bitmap_total_entry_count(bitmap_git));
root = revs->pending.objects[0].item;
- bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
+ bm = find_bitmap_for_commit(bitmap_git, (struct commit *)root, &found);
if (bm) {
fprintf_ln(stderr, "Found bitmap for '%s'. %d bits / %08x checksum",
- oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm));
+ oid_to_hex(&root->oid),
+ (int)bm->bit_size, ewah_checksum(bm));
+
+ if (bitmap_is_midx(found))
+ fprintf_ln(stderr, "Located via MIDX '%s'.",
+ hash_to_hex_algop(get_midx_checksum(found->midx),
+ revs->repo->hash_algo));
+ else
+ fprintf_ln(stderr, "Located via pack '%s'.",
+ hash_to_hex_algop(found->pack->hash,
+ revs->repo->hash_algo));
result = ewah_to_bitmap(bm);
}
@@ -2626,16 +2701,10 @@ void test_bitmap_walk(struct rev_info *revs)
if (prepare_revision_walk(revs))
die(_("revision walk setup failed"));
- tdata.bitmap_git = bitmap_git;
- tdata.base = bitmap_new();
- tdata.commits = ewah_to_bitmap(bitmap_git->commits);
- tdata.trees = ewah_to_bitmap(bitmap_git->trees);
- tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
- tdata.tags = ewah_to_bitmap(bitmap_git->tags);
+ bitmap_test_data_prepare(&tdata, bitmap_git);
tdata.prg = start_progress(revs->repo,
"Verifying bitmap entries",
result_popcnt);
- tdata.seen = 0;
traverse_commit_list(revs, &test_show_commit, &test_show_object, &tdata);
@@ -2647,11 +2716,7 @@ void test_bitmap_walk(struct rev_info *revs)
die(_("mismatch in bitmap results"));
bitmap_free(result);
- bitmap_free(tdata.base);
- bitmap_free(tdata.commits);
- bitmap_free(tdata.trees);
- bitmap_free(tdata.blobs);
- bitmap_free(tdata.tags);
+ bitmap_test_data_release(&tdata);
free_bitmap_index(bitmap_git);
}
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 10/14] pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (8 preceding siblings ...)
2025-03-20 17:56 ` [PATCH v5 09/14] pack-bitmap.c: compute disk-usage with " Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 11/14] ewah: implement `struct ewah_or_iterator` Taylor Blau
` (4 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 4086277de8..1d1e1a65ca 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1082,10 +1082,15 @@ static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git
struct commit *commit,
uint32_t commit_pos)
{
- int ret;
+ struct bitmap_index *curr = bitmap_git;
+ int ret = 0;
- ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges,
- result, commit, commit_pos);
+ while (curr) {
+ ret += apply_pseudo_merges_for_commit(&curr->pseudo_merges,
+ result, commit,
+ commit_pos);
+ curr = curr->base;
+ }
if (ret)
pseudo_merges_satisfied_nr += ret;
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 11/14] ewah: implement `struct ewah_or_iterator`
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (9 preceding siblings ...)
2025-03-20 17:56 ` [PATCH v5 10/14] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
@ 2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:57 ` [PATCH v5 12/14] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
` (3 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:56 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.
Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.
Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.
There are a couple of alternative approaches which were considered:
- Decompress each EWAH bitmap and OR them together, enumerating a
single (non-EWAH) bitmap. This would work, but has the disadvantage
of decompressing a potentially large bitmap, which may not be
necessary if the caller does not wish to read all of it.
- Recursively call bitmap internal functions, reusing the "result" and
"haves" bitmap from the top-most layer. This approach resembles the
original implementation of this feature, but is inefficient in that
it both (a) requires significant refactoring to implement, and (b)
enumerates large sections of later bitmaps which are all zeros (as
they pertain to objects in earlier layers).
(b) is not so bad in and of itself, but can cause significant
slow-downs when combined with expensive loop bodies.
This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
ewah/ewah_bitmap.c | 33 +++++++++++++++++++++++++++++++++
ewah/ewok.h | 12 ++++++++++++
2 files changed, 45 insertions(+)
diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 67f8f588e0..056c410efb 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -371,6 +371,39 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent)
read_new_rlw(it);
}
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr)
+{
+ size_t i;
+
+ memset(it, 0, sizeof(*it));
+
+ ALLOC_ARRAY(it->its, nr);
+ for (i = 0; i < nr; i++)
+ ewah_iterator_init(&it->its[it->nr++], parents[i]);
+}
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
+{
+ eword_t buf, out = 0;
+ size_t i;
+ int ret = 0;
+
+ for (i = 0; i < it->nr; i++)
+ if (ewah_iterator_next(&buf, &it->its[i])) {
+ out |= buf;
+ ret = 1;
+ }
+
+ *next = out;
+ return ret;
+}
+
+void ewah_or_iterator_release(struct ewah_or_iterator *it)
+{
+ free(it->its);
+}
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 5e357e2493..c29d354236 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -148,6 +148,18 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent);
*/
int ewah_iterator_next(eword_t *next, struct ewah_iterator *it);
+struct ewah_or_iterator {
+ struct ewah_iterator *its;
+ size_t nr;
+};
+
+void ewah_or_iterator_init(struct ewah_or_iterator *it,
+ struct ewah_bitmap **parents, size_t nr);
+
+int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
+
+void ewah_or_iterator_release(struct ewah_or_iterator *it);
+
void ewah_xor(
struct ewah_bitmap *ewah_i,
struct ewah_bitmap *ewah_j,
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 12/14] pack-bitmap.c: keep track of each layer's type bitmaps
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (10 preceding siblings ...)
2025-03-20 17:56 ` [PATCH v5 11/14] ewah: implement `struct ewah_or_iterator` Taylor Blau
@ 2025-03-20 17:57 ` Taylor Blau
2025-03-20 17:57 ` [PATCH v5 13/14] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
` (2 subsequent siblings)
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:57 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.
These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 53 insertions(+), 4 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1d1e1a65ca..5721fa7a0f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -81,6 +81,23 @@ struct bitmap_index {
struct ewah_bitmap *blobs;
struct ewah_bitmap *tags;
+ /*
+ * Type index arrays when this bitmap is associated with an
+ * incremental multi-pack index chain.
+ *
+ * If n is the number of unique layers in the MIDX chain, then
+ * commits_all[n-1] is this structs 'commits' field,
+ * commits_all[n-2] is the commits field of this bitmap's
+ * 'base', and so on.
+ *
+ * When associated either with a non-incremental MIDX or a
+ * single packfile, these arrays each contain a single element.
+ */
+ struct ewah_bitmap **commits_all;
+ struct ewah_bitmap **trees_all;
+ struct ewah_bitmap **blobs_all;
+ struct ewah_bitmap **tags_all;
+
/* Map from object ID -> `stored_bitmap` for all the bitmapped commits */
kh_oid_map_t *bitmaps;
@@ -581,7 +598,32 @@ static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_
return load_pack_revindex(r, bitmap_git->pack);
}
-static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
+static void load_all_type_bitmaps(struct bitmap_index *bitmap_git)
+{
+ struct bitmap_index *curr = bitmap_git;
+ size_t i = bitmap_git->base_nr;
+
+ ALLOC_ARRAY(bitmap_git->commits_all, bitmap_git->base_nr + 1);
+ ALLOC_ARRAY(bitmap_git->trees_all, bitmap_git->base_nr + 1);
+ ALLOC_ARRAY(bitmap_git->blobs_all, bitmap_git->base_nr + 1);
+ ALLOC_ARRAY(bitmap_git->tags_all, bitmap_git->base_nr + 1);
+
+ while (curr) {
+ bitmap_git->commits_all[i] = curr->commits;
+ bitmap_git->trees_all[i] = curr->trees;
+ bitmap_git->blobs_all[i] = curr->blobs;
+ bitmap_git->tags_all[i] = curr->tags;
+
+ curr = curr->base;
+ if (curr && !i)
+ BUG("unexpected number of bitmap layers, expected %"PRIu32,
+ bitmap_git->base_nr + 1);
+ i -= 1;
+ }
+}
+
+static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
+ int recursing)
{
assert(bitmap_git->map);
@@ -603,10 +645,13 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
if (bitmap_git->base) {
if (!bitmap_is_midx(bitmap_git))
BUG("non-MIDX bitmap has non-NULL base bitmap index");
- if (load_bitmap(r, bitmap_git->base) < 0)
+ if (load_bitmap(r, bitmap_git->base, 1) < 0)
goto failed;
}
+ if (!recursing)
+ load_all_type_bitmaps(bitmap_git);
+
return 0;
failed:
@@ -682,7 +727,7 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r)
{
struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git))
+ if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git, 0))
return bitmap_git;
free_bitmap_index(bitmap_git);
@@ -2052,7 +2097,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
* from disk. this is the point of no return; after this the rev_list
* becomes invalidated and we must perform the revwalk through bitmaps
*/
- if (load_bitmap(revs->repo, bitmap_git) < 0)
+ if (load_bitmap(revs->repo, bitmap_git, 0) < 0)
goto cleanup;
if (!use_boundary_traversal)
@@ -2985,6 +3030,10 @@ void free_bitmap_index(struct bitmap_index *b)
ewah_pool_free(b->trees);
ewah_pool_free(b->blobs);
ewah_pool_free(b->tags);
+ free(b->commits_all);
+ free(b->trees_all);
+ free(b->blobs_all);
+ free(b->tags_all);
if (b->bitmaps) {
struct stored_bitmap *sb;
kh_foreach_value(b->bitmaps, sb, {
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 13/14] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (11 preceding siblings ...)
2025-03-20 17:57 ` [PATCH v5 12/14] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
@ 2025-03-20 17:57 ` Taylor Blau
2025-03-20 17:57 ` [PATCH v5 14/14] midx: implement writing incremental MIDX bitmaps Taylor Blau
2025-03-20 20:00 ` [PATCH v5 00/14] midx: incremental multi-pack indexes, part two Elijah Newren
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:57 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 42 +++++++++++++++++++++++++++---------------
1 file changed, 27 insertions(+), 15 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 5721fa7a0f..6f7fd94c36 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1629,25 +1629,29 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
}
}
-static void init_type_iterator(struct ewah_iterator *it,
+static void init_type_iterator(struct ewah_or_iterator *it,
struct bitmap_index *bitmap_git,
enum object_type type)
{
switch (type) {
case OBJ_COMMIT:
- ewah_iterator_init(it, bitmap_git->commits);
+ ewah_or_iterator_init(it, bitmap_git->commits_all,
+ bitmap_git->base_nr + 1);
break;
case OBJ_TREE:
- ewah_iterator_init(it, bitmap_git->trees);
+ ewah_or_iterator_init(it, bitmap_git->trees_all,
+ bitmap_git->base_nr + 1);
break;
case OBJ_BLOB:
- ewah_iterator_init(it, bitmap_git->blobs);
+ ewah_or_iterator_init(it, bitmap_git->blobs_all,
+ bitmap_git->base_nr + 1);
break;
case OBJ_TAG:
- ewah_iterator_init(it, bitmap_git->tags);
+ ewah_or_iterator_init(it, bitmap_git->tags_all,
+ bitmap_git->base_nr + 1);
break;
default:
@@ -1664,7 +1668,7 @@ static void show_objects_for_type(
size_t i = 0;
uint32_t offset;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
struct bitmap *objects = bitmap_git->result;
@@ -1672,7 +1676,7 @@ static void show_objects_for_type(
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < objects->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = objects->words[i] & filter;
size_t pos = (i * BITS_IN_EWORD);
@@ -1714,6 +1718,8 @@ static void show_objects_for_type(
show_reach(&oid, object_type, 0, hash, pack, ofs);
}
}
+
+ ewah_or_iterator_release(&it);
}
static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
@@ -1765,7 +1771,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
@@ -1782,7 +1788,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
* packfile.
*/
for (i = 0, init_type_iterator(&it, bitmap_git, type);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
if (i < tips->word_alloc)
mask &= ~tips->words[i];
@@ -1802,6 +1808,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_release(&it);
bitmap_free(tips);
}
@@ -1862,14 +1869,14 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
{
struct eindex *eindex = &bitmap_git->ext_index;
struct bitmap *tips;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t mask;
uint32_t i;
tips = find_tip_objects(bitmap_git, tip_objects, OBJ_BLOB);
for (i = 0, init_type_iterator(&it, bitmap_git, OBJ_BLOB);
- i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+ i < to_filter->word_alloc && ewah_or_iterator_next(&mask, &it);
i++) {
eword_t word = to_filter->words[i] & mask;
unsigned offset;
@@ -1897,6 +1904,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
bitmap_unset(to_filter, pos);
}
+ ewah_or_iterator_release(&it);
bitmap_free(tips);
}
@@ -2528,12 +2536,12 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
struct eindex *eindex = &bitmap_git->ext_index;
uint32_t i = 0, count = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
init_type_iterator(&it, bitmap_git, type);
- while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+ while (i < objects->word_alloc && ewah_or_iterator_next(&filter, &it)) {
eword_t word = objects->words[i++] & filter;
count += ewah_bit_popcount64(word);
}
@@ -2545,6 +2553,8 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
count++;
}
+ ewah_or_iterator_release(&it);
+
return count;
}
@@ -3077,13 +3087,13 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
{
struct bitmap *result = bitmap_git->result;
off_t total = 0;
- struct ewah_iterator it;
+ struct ewah_or_iterator it;
eword_t filter;
size_t i;
init_type_iterator(&it, bitmap_git, object_type);
for (i = 0; i < result->word_alloc &&
- ewah_iterator_next(&filter, &it); i++) {
+ ewah_or_iterator_next(&filter, &it); i++) {
eword_t word = result->words[i] & filter;
size_t base = (i * BITS_IN_EWORD);
unsigned offset;
@@ -3124,6 +3134,8 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
}
}
+ ewah_or_iterator_release(&it);
+
return total;
}
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* [PATCH v5 14/14] midx: implement writing incremental MIDX bitmaps
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (12 preceding siblings ...)
2025-03-20 17:57 ` [PATCH v5 13/14] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
@ 2025-03-20 17:57 ` Taylor Blau
2025-03-20 20:00 ` [PATCH v5 00/14] midx: incremental multi-pack indexes, part two Elijah Newren
14 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:57 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 3 +-
midx-write.c | 57 ++++++++++------
pack-bitmap-write.c | 65 +++++++++++++-----
pack-bitmap.h | 4 +-
t/t5334-incremental-multi-pack-index.sh | 87 +++++++++++++++++++++++++
5 files changed, 179 insertions(+), 37 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 58a9b16126..a7e4bb7904 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1397,7 +1397,8 @@ static void write_pack_file(void)
if (write_bitmap_index) {
bitmap_writer_init(&bitmap_writer,
- the_repository, &to_pack);
+ the_repository, &to_pack,
+ NULL);
bitmap_writer_set_checksum(&bitmap_writer, hash);
bitmap_writer_build_type_index(&bitmap_writer,
written_list);
diff --git a/midx-write.c b/midx-write.c
index 48d6558253..0897cbd829 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -647,16 +647,22 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
return pack_order;
}
-static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
- struct write_midx_context *ctx)
+static void write_midx_reverse_index(struct write_midx_context *ctx,
+ const char *object_dir,
+ unsigned char *midx_hash)
{
struct strbuf buf = STRBUF_INIT;
char *tmp_file;
trace2_region_enter("midx", "write_midx_reverse_index", ctx->repo);
- strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex_algop(midx_hash,
- ctx->repo->hash_algo));
+ if (ctx->incremental)
+ get_split_midx_filename_ext(ctx->repo->hash_algo, &buf,
+ object_dir, midx_hash,
+ MIDX_EXT_REV);
+ else
+ get_midx_filename_ext(ctx->repo->hash_algo, &buf, object_dir,
+ midx_hash, MIDX_EXT_REV);
tmp_file = write_rev_file_order(ctx->repo->hash_algo, NULL, ctx->pack_order,
ctx->entries_nr, midx_hash, WRITE_REV);
@@ -829,22 +835,29 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
return cb.commits;
}
-static int write_midx_bitmap(struct repository *r, const char *midx_name,
+static int write_midx_bitmap(struct write_midx_context *ctx,
+ const char *object_dir,
const unsigned char *midx_hash,
struct packing_data *pdata,
struct commit **commits,
uint32_t commits_nr,
- uint32_t *pack_order,
unsigned flags)
{
int ret, i;
uint16_t options = 0;
struct bitmap_writer writer;
struct pack_idx_entry **index;
- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
- hash_to_hex_algop(midx_hash, r->hash_algo));
+ struct strbuf bitmap_name = STRBUF_INIT;
- trace2_region_enter("midx", "write_midx_bitmap", r);
+ trace2_region_enter("midx", "write_midx_bitmap", ctx->repo);
+
+ if (ctx->incremental)
+ get_split_midx_filename_ext(ctx->repo->hash_algo, &bitmap_name,
+ object_dir, midx_hash,
+ MIDX_EXT_BITMAP);
+ else
+ get_midx_filename_ext(ctx->repo->hash_algo, &bitmap_name,
+ object_dir, midx_hash, MIDX_EXT_BITMAP);
if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
options |= BITMAP_OPT_HASH_CACHE;
@@ -861,7 +874,8 @@ static int write_midx_bitmap(struct repository *r, const char *midx_name,
for (i = 0; i < pdata->nr_objects; i++)
index[i] = &pdata->objects[i].idx;
- bitmap_writer_init(&writer, r, pdata);
+ bitmap_writer_init(&writer, ctx->repo, pdata,
+ ctx->incremental ? ctx->base_midx : NULL);
bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS);
bitmap_writer_build_type_index(&writer, index);
@@ -879,7 +893,7 @@ static int write_midx_bitmap(struct repository *r, const char *midx_name,
* bitmap_writer_finish().
*/
for (i = 0; i < pdata->nr_objects; i++)
- index[pack_order[i]] = &pdata->objects[i].idx;
+ index[ctx->pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(&writer, commits, commits_nr);
ret = bitmap_writer_build(&writer);
@@ -887,14 +901,14 @@ static int write_midx_bitmap(struct repository *r, const char *midx_name,
goto cleanup;
bitmap_writer_set_checksum(&writer, midx_hash);
- bitmap_writer_finish(&writer, index, bitmap_name, options);
+ bitmap_writer_finish(&writer, index, bitmap_name.buf, options);
cleanup:
free(index);
- free(bitmap_name);
+ strbuf_release(&bitmap_name);
bitmap_writer_free(&writer);
- trace2_region_leave("midx", "write_midx_bitmap", r);
+ trace2_region_leave("midx", "write_midx_bitmap", ctx->repo);
return ret;
}
@@ -1077,8 +1091,6 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
ctx.repo = r;
ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
- if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
- die(_("cannot write incremental MIDX with bitmap"));
if (ctx.incremental)
strbuf_addf(&midx_name,
@@ -1119,6 +1131,13 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (ctx.incremental) {
struct multi_pack_index *m = ctx.base_midx;
while (m) {
+ if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+ error(_("could not load reverse index for MIDX %s"),
+ hash_to_hex_algop(get_midx_checksum(m),
+ m->repo->hash_algo));
+ result = 1;
+ goto cleanup;
+ }
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
@@ -1387,7 +1406,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (flags & MIDX_WRITE_REV_INDEX &&
git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
- write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+ write_midx_reverse_index(&ctx, object_dir, midx_hash);
if (flags & MIDX_WRITE_BITMAP) {
struct packing_data pdata;
@@ -1410,8 +1429,8 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
FREE_AND_NULL(ctx.entries);
ctx.entries_nr = 0;
- if (write_midx_bitmap(r, midx_name.buf, midx_hash, &pdata,
- commits, commits_nr, ctx.pack_order,
+ if (write_midx_bitmap(&ctx, object_dir,
+ midx_hash, &pdata, commits, commits_nr,
flags) < 0) {
error(_("could not write multi-pack bitmap"));
result = 1;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 34e86d4994..8a30853d2e 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -26,6 +26,8 @@
#include "alloc.h"
#include "refs.h"
#include "strmap.h"
+#include "midx.h"
+#include "pack-revindex.h"
struct bitmapped_commit {
struct commit *commit;
@@ -43,7 +45,8 @@ static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer
}
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata)
+ struct packing_data *pdata,
+ struct multi_pack_index *midx)
{
memset(writer, 0, sizeof(struct bitmap_writer));
if (writer->bitmaps)
@@ -51,6 +54,7 @@ void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
writer->bitmaps = kh_init_oid_map();
writer->pseudo_merge_commits = kh_init_oid_map();
writer->to_pack = pdata;
+ writer->midx = midx;
string_list_init_dup(&writer->pseudo_merge_groups);
@@ -113,6 +117,11 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
struct pack_idx_entry **index)
{
uint32_t i;
+ uint32_t base_objects = 0;
+
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
writer->commits = ewah_new();
writer->trees = ewah_new();
@@ -142,19 +151,19 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
switch (real_type) {
case OBJ_COMMIT:
- ewah_set(writer->commits, i);
+ ewah_set(writer->commits, i + base_objects);
break;
case OBJ_TREE:
- ewah_set(writer->trees, i);
+ ewah_set(writer->trees, i + base_objects);
break;
case OBJ_BLOB:
- ewah_set(writer->blobs, i);
+ ewah_set(writer->blobs, i + base_objects);
break;
case OBJ_TAG:
- ewah_set(writer->tags, i);
+ ewah_set(writer->tags, i + base_objects);
break;
default:
@@ -207,19 +216,37 @@ void bitmap_writer_push_commit(struct bitmap_writer *writer,
static uint32_t find_object_pos(struct bitmap_writer *writer,
const struct object_id *oid, int *found)
{
- struct object_entry *entry = packlist_find(writer->to_pack, oid);
+ struct object_entry *entry;
+
+ entry = packlist_find(writer->to_pack, oid);
+ if (entry) {
+ uint32_t base_objects = 0;
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+
+ if (found)
+ *found = 1;
+ return oe_in_pack_pos(writer->to_pack, entry) + base_objects;
+ } else if (writer->midx) {
+ uint32_t at, pos;
+
+ if (!bsearch_midx(oid, writer->midx, &at))
+ goto missing;
+ if (midx_to_pack_pos(writer->midx, at, &pos) < 0)
+ goto missing;
- if (!entry) {
if (found)
- *found = 0;
- warning("Failed to write bitmap index. Packfile doesn't have full closure "
- "(object %s is missing)", oid_to_hex(oid));
- return 0;
+ *found = 1;
+ return pos;
}
+missing:
if (found)
- *found = 1;
- return oe_in_pack_pos(writer->to_pack, entry);
+ *found = 0;
+ warning("Failed to write bitmap index. Packfile doesn't have full closure "
+ "(object %s is missing)", oid_to_hex(oid));
+ return 0;
}
static void compute_xor_offsets(struct bitmap_writer *writer)
@@ -586,7 +613,7 @@ int bitmap_writer_build(struct bitmap_writer *writer)
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct prio_queue tree_queue = { NULL };
struct bitmap_index *old_bitmap;
- uint32_t *mapping;
+ uint32_t *mapping = NULL;
int closed = 1; /* until proven otherwise */
if (writer->show_progress)
@@ -1021,7 +1048,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
struct strbuf tmp_file = STRBUF_INIT;
struct hashfile *f;
off_t *offsets = NULL;
- uint32_t i;
+ uint32_t i, base_objects;
struct bitmap_disk_header header;
@@ -1047,6 +1074,12 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (options & BITMAP_OPT_LOOKUP_TABLE)
CALLOC_ARRAY(offsets, writer->to_pack->nr_objects);
+ if (writer->midx)
+ base_objects = writer->midx->num_objects +
+ writer->midx->num_objects_in_base;
+ else
+ base_objects = 0;
+
for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
struct bitmapped_commit *stored = &writer->selected[i];
int commit_pos = oid_pos(&stored->commit->object.oid, index,
@@ -1055,7 +1088,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
if (commit_pos < 0)
BUG(_("trying to write commit not in index"));
- stored->commit_pos = commit_pos;
+ stored->commit_pos = commit_pos + base_objects;
}
write_selected_commits_v1(writer, f, offsets);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index d7f4b8b8e9..dd0951088f 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -111,6 +111,7 @@ struct bitmap_writer {
kh_oid_map_t *bitmaps;
struct packing_data *to_pack;
+ struct multi_pack_index *midx; /* if appending to a MIDX chain */
struct bitmapped_commit *selected;
unsigned int selected_nr, selected_alloc;
@@ -125,7 +126,8 @@ struct bitmap_writer {
};
void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r,
- struct packing_data *pdata);
+ struct packing_data *pdata,
+ struct multi_pack_index *midx);
void bitmap_writer_show_progress(struct bitmap_writer *writer, int show);
void bitmap_writer_set_checksum(struct bitmap_writer *writer,
const unsigned char *sha1);
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
index 26257e5660..d30d7253d6 100755
--- a/t/t5334-incremental-multi-pack-index.sh
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -44,4 +44,91 @@ test_expect_success 'convert incremental to non-incremental' '
compare_results_with_midx 'non-incremental MIDX conversion'
+write_midx_layer () {
+ n=1
+ if test -f $midx_chain
+ then
+ n="$(($(wc -l <$midx_chain) + 1))"
+ fi
+
+ for i in 1 2
+ do
+ test_commit $n.$i &&
+ git repack -d || return 1
+ done &&
+ git multi-pack-index write --bitmap --incremental
+}
+
+test_expect_success 'write initial MIDX layer' '
+ git repack -ad &&
+ write_midx_layer
+'
+
+test_expect_success 'read bitmap from first MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'write another MIDX layer' '
+ write_midx_layer
+'
+
+test_expect_success 'midx verify with multiple layers' '
+ test_path_is_file "$midx_chain" &&
+ test_line_count = 2 "$midx_chain" &&
+
+ git multi-pack-index verify
+'
+
+test_expect_success 'read bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 2.2
+'
+
+test_expect_success 'read earlier bitmap from second MIDX layer' '
+ git rev-list --test-bitmap 1.2
+'
+
+test_expect_success 'show object from first pack' '
+ git cat-file -p 1.1
+'
+
+test_expect_success 'show object from second pack' '
+ git cat-file -p 2.2
+'
+
+for reuse in false single multi
+do
+ test_expect_success "full clone (pack.allowPackReuse=$reuse)" '
+ rm -fr clone.git &&
+
+ git config pack.allowPackReuse $reuse &&
+ git clone --no-local --bare . clone.git
+ '
+done
+
+test_expect_success 'relink existing MIDX layer' '
+ rm -fr "$midxdir" &&
+
+ GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap &&
+
+ midx_hash="$(test-tool read-midx --checksum $objdir)" &&
+
+ test_path_is_file "$packdir/multi-pack-index" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_commit another &&
+ git repack -d &&
+ git multi-pack-index write --bitmap --incremental &&
+
+ test_path_is_missing "$packdir/multi-pack-index" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_missing "$packdir/multi-pack-index-$midx_hash.rev" &&
+
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.midx" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.bitmap" &&
+ test_path_is_file "$midxdir/multi-pack-index-$midx_hash.rev" &&
+ test_line_count = 2 "$midx_chain"
+
+'
+
test_done
--
2.49.0.14.g88b49c1b34
^ permalink raw reply related [flat|nested] 136+ messages in thread
* Re: [PATCH v5 08/14] pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
2025-03-20 17:56 ` Taylor Blau
@ 2025-03-20 17:58 ` Taylor Blau
0 siblings, 0 replies; 136+ messages in thread
From: Taylor Blau @ 2025-03-20 17:58 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
On Thu, Mar 20, 2025 at 01:56:52PM -0400, Taylor Blau wrote:
> ---
> pack-bitmap.c | 107 ++++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 86 insertions(+), 21 deletions(-)
Oops. I moved the wrong direction from 9->8 instead of 9->10 when
submitting, hence the duplicate patch here.
Please disregard this one, though the rest of the round is fine.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v5 00/14] midx: incremental multi-pack indexes, part two
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
` (13 preceding siblings ...)
2025-03-20 17:57 ` [PATCH v5 14/14] midx: implement writing incremental MIDX bitmaps Taylor Blau
@ 2025-03-20 20:00 ` Elijah Newren
14 siblings, 0 replies; 136+ messages in thread
From: Elijah Newren @ 2025-03-20 20:00 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Patrick Steinhardt
On Thu, Mar 20, 2025 at 10:56 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> This is another new round of my series to implement bitmap support for
> incremental multi-pack indexes (MIDXs). It is still based on 683c54c999
> (Git 2.49, 2025-03-14).
>
> == Changes since last time
>
> This round addresses thorough review from Elijah and Peff. The series
> substantively is unchanged, but there are lots of little quality-of-life
> and commit message readability improvements throughout. As usual, there
> is a range-diff below for convenience.
>
[...]
> Range-diff against v4:
> -: ---------- > 1: 6af65fdaac Documentation: remove a "future work" item from the MIDX docs
> 1: f565f2fff1 ! 2: 0897359506 Documentation: describe incremental MIDX bitmaps
> @@ Documentation/technical/multi-pack-index.adoc: objects_nr($H2) + objects_nr($H1)
> +`o1` and `o2` are compared as follows:
> +
> +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
> -+ `o1` is considered less than `o2`.
> ++ `o1` sorts ahead of `o2`.
> +
> +2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
> + MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
> -+ preferred and the other is not, then the preferred one sorts first. If
> -+ there is a base layer (i.e. the MIDX layer is not the first layer in
> -+ the chain), then if `pack(o1)` appears earlier in that MIDX layer's
> -+ pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
> -+ appears earlier, than the opposite is true.
> ++ preferred and the other is not, then the preferred one sorts ahead of
> ++ the non-preferred one. If there is a base layer (i.e. the MIDX layer
> ++ is not the first layer in the chain), then if `pack(o1)` appears
> ++ earlier in that MIDX layer's pack order, then `o1` sorts ahead of
> ++ `o2`. Likewise if `pack(o2)` appears earlier, then the opposite is
> ++ true.
> +
> +3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
> + same MIDX layer. Sort `o1` and `o2` by their offset within their
> @@ Documentation/technical/multi-pack-index.adoc: objects_nr($H2) + objects_nr($H1)
> +The structure of a `*.bitmap` file belonging to an incremental MIDX
> +chain is identical to that of a non-incremental MIDX bitmap, or a
> +classic single-pack bitmap. Since objects are added to the end of the
> -+incremental MIDX's pseudo-pack order (see: above), it is possible to
> ++incremental MIDX's pseudo-pack order (see above), it is possible to
> +extend a bitmap when appending to the end of a MIDX chain.
> +
> +(Note: it is possible likewise to compress a contiguous sequence of MIDX
> -+incremental layers, and their `*.bitmap`(s) into a single layer and
> ++incremental layers, and their `*.bitmap` files into a single layer and
> +`*.bitmap`, but this is not yet implemented.)
> +
> +The object positions used are global within the pseudo-pack order, so
> 2: f2a232e556 ! 3: 5eac0d1485 pack-revindex: prepare for incremental MIDX bitmaps
> @@ Commit message
> incremental or not.
>
> - pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
> - object position in the MIDX pseudo-pack order, and finds the
> + object position in the MIDX pseudo-pack order, and find the
> earliest containing MIDX (similar to midx.c::midx_for_object().
>
> - midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
> @@ pack-bitmap.c: static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *ind
> return read_bitmap(index->map, index->map_size, &index->map_pos);
> }
>
> -+static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
> ++static uint32_t bitmap_num_objects_total(struct bitmap_index *index)
> +{
> + if (index->midx) {
> + struct multi_pack_index *m = index->midx;
> @@ pack-bitmap.c: static inline int bitmap_position_extended(struct bitmap_index *b
> if (pos < kh_end(positions)) {
> int bitmap_pos = kh_value(positions, pos);
> - return bitmap_pos + bitmap_num_objects(bitmap_git);
> -+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
> ++ return bitmap_pos + bitmap_num_objects_total(bitmap_git);
> }
>
> return -1;
> @@ pack-bitmap.c: static int ext_index_add_object(struct bitmap_index *bitmap_git,
> }
>
> - return bitmap_pos + bitmap_num_objects(bitmap_git);
> -+ return bitmap_pos + bitmap_non_extended_bits(bitmap_git);
> ++ return bitmap_pos + bitmap_num_objects_total(bitmap_git);
> }
>
> struct bitmap_show_data {
> @@ pack-bitmap.c: static void show_extended_objects(struct bitmap_index *bitmap_git
>
> - if (!bitmap_get(objects, st_add(bitmap_num_objects(bitmap_git), i)))
> + if (!bitmap_get(objects,
> -+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
> ++ st_add(bitmap_num_objects_total(bitmap_git),
> ++ i)))
> continue;
>
> obj = eindex->objects[i];
> @@ pack-bitmap.c: static void filter_bitmap_exclude_type(struct bitmap_index *bitma
> */
> for (i = 0; i < eindex->count; i++) {
> - size_t pos = st_add(i, bitmap_num_objects(bitmap_git));
> -+ size_t pos = st_add(i, bitmap_non_extended_bits(bitmap_git));
> ++ size_t pos = st_add(i, bitmap_num_objects_total(bitmap_git));
> if (eindex->objects[i]->type == type &&
> bitmap_get(to_filter, pos) &&
> !bitmap_get(tips, pos))
> @@ pack-bitmap.c: static unsigned long get_size_by_pos(struct bitmap_index *bitmap_
> oi.sizep = &size;
>
> - if (pos < bitmap_num_objects(bitmap_git)) {
> -+ if (pos < bitmap_non_extended_bits(bitmap_git)) {
> ++ if (pos < bitmap_num_objects_total(bitmap_git)) {
> struct packed_git *pack;
> off_t ofs;
>
> @@ pack-bitmap.c: static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
> + die(_("unable to get size of %s"), oid_to_hex(&oid));
> }
> } else {
> ++ size_t eindex_pos = pos - bitmap_num_objects_total(bitmap_git);
> struct eindex *eindex = &bitmap_git->ext_index;
> - struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
> -+ struct object *obj = eindex->objects[pos - bitmap_non_extended_bits(bitmap_git)];
> ++ struct object *obj = eindex->objects[eindex_pos];
> if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
> &oi, 0) < 0)
> die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
> @@ pack-bitmap.c: static void filter_packed_objects_from_bitmap(struct bitmap_index
> size_t i, pos;
>
> - objects_nr = bitmap_num_objects(bitmap_git);
> -+ objects_nr = bitmap_non_extended_bits(bitmap_git);
> ++ objects_nr = bitmap_num_objects_total(bitmap_git);
> pos = objects_nr / BITS_IN_EWORD;
>
> if (pos > result->word_alloc)
> @@ pack-bitmap.c: static uint32_t count_object_type(struct bitmap_index *bitmap_git
> if (eindex->objects[i]->type == type &&
> bitmap_get(objects,
> - st_add(bitmap_num_objects(bitmap_git), i)))
> -+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
> ++ st_add(bitmap_num_objects_total(bitmap_git), i)))
> count++;
> }
>
> @@ pack-bitmap.c: uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
> "extension");
>
> - num_objects = bitmap_num_objects(bitmap_git);
> -+ num_objects = bitmap_non_extended_bits(bitmap_git);
> ++ num_objects = bitmap_num_objects_total(bitmap_git);
> CALLOC_ARRAY(reposition, num_objects);
>
> for (i = 0; i < num_objects; ++i) {
> @@ pack-bitmap.c: static off_t get_disk_usage_for_extended(struct bitmap_index *bit
>
> if (!bitmap_get(result,
> - st_add(bitmap_num_objects(bitmap_git), i)))
> -+ st_add(bitmap_non_extended_bits(bitmap_git), i)))
> ++ st_add(bitmap_num_objects_total(bitmap_git),
> ++ i)))
> continue;
>
> if (oid_object_info_extended(bitmap_repo(bitmap_git), &obj->oid,
> 3: aca0318fb1 ! 4: 922ea2f607 pack-bitmap.c: open and store incremental bitmap layers
> @@ Commit message
> with the previous MIDX layer.
>
> The changes in this commit are mostly boilerplate to open the correct
> - bitmap(s), add them to the chain bitmap layers along the "base" pointer,
> - ensures that the correct packs and their reverse indexes are loaded
> - across MIDX layers, etc.
> + bitmap(s), add them to the chain of bitmap layers along the "base"
> + pointer, ensure that the correct packs and their reverse indexes are
> + loaded across MIDX layers, etc.
>
> While we're at it, keep track of a base_nr field to indicate how many
> bitmap layers (including the current bitmap) exist. This will be used in
> 4: 832fd0e8dc = 5: 8fedd96614 pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
> 5: c7c9f89956 = 6: dccc1b2d2e pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
> 6: 14d3d80c3d ! 7: e31bddd240 pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
> @@ Commit message
>
> Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
> single pack from a MIDX, use the oldest layer's preferred pack as it is
> - likely to contain the most amount of reusable sections.
> + likely to contain the largest number of reusable sections.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> 7: b45a9ccbc2 ! 8: d9dfcb5a1b pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
> @@ pack-bitmap.c: static void test_show_commit(struct commit *commit, void *data)
> + tdata->tags = ewah_to_bitmap(bitmap_git->tags);
> +
> + if (bitmap_git->base) {
> -+ CALLOC_ARRAY(tdata->base_tdata, 1);
> ++ tdata->base_tdata = xmalloc(sizeof(struct bitmap_test_data));
> + bitmap_test_data_prepare(tdata->base_tdata, bitmap_git->base);
> + }
> +}
> 8: c1eefeae99 = 9: b1bd60d25d pack-bitmap.c: compute disk-usage with incremental MIDXs
> 9: 11c4b7b949 = 10: 7477a8ac03 pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
> 10: cb08ad6a62 ! 11: 0fbef17acc ewah: implement `struct ewah_or_iterator`
> @@ ewah/ewah_bitmap.c: void ewah_iterator_init(struct ewah_iterator *it, struct ewa
> + return ret;
> +}
> +
> -+void ewah_or_iterator_free(struct ewah_or_iterator *it)
> ++void ewah_or_iterator_release(struct ewah_or_iterator *it)
> +{
> + free(it->its);
> +}
> @@ ewah/ewok.h: void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitma
> +
> +int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
> +
> -+void ewah_or_iterator_free(struct ewah_or_iterator *it);
> ++void ewah_or_iterator_release(struct ewah_or_iterator *it);
> +
> void ewah_xor(
> struct ewah_bitmap *ewah_i,
> 11: a29f4ee60d ! 12: 439e743fd5 pack-bitmap.c: keep track of each layer's type bitmaps
> @@ pack-bitmap.c: struct bitmap_index {
> + * commits_all[n-2] is the commits field of this bitmap's
> + * 'base', and so on.
> + *
> -+ * When either associated either with a non-incremental MIDX, or
> -+ * a single packfile, these arrays each contain a single
> -+ * element.
> ++ * When associated either with a non-incremental MIDX or a
> ++ * single packfile, these arrays each contain a single element.
> + */
> + struct ewah_bitmap **commits_all;
> + struct ewah_bitmap **trees_all;
> 12: a1cf65bedc ! 13: dcb45e349e pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
> @@ pack-bitmap.c: static void show_objects_for_type(
> }
> }
> +
> -+ ewah_or_iterator_free(&it);
> ++ ewah_or_iterator_release(&it);
> }
>
> static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
> @@ pack-bitmap.c: static void filter_bitmap_exclude_type(struct bitmap_index *bitma
> bitmap_unset(to_filter, pos);
> }
>
> -+ ewah_or_iterator_free(&it);
> ++ ewah_or_iterator_release(&it);
> bitmap_free(tips);
> }
>
> @@ pack-bitmap.c: static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_
> bitmap_unset(to_filter, pos);
> }
>
> -+ ewah_or_iterator_free(&it);
> ++ ewah_or_iterator_release(&it);
> bitmap_free(tips);
> }
>
> @@ pack-bitmap.c: static uint32_t count_object_type(struct bitmap_index *bitmap_git
> count++;
> }
>
> -+ ewah_or_iterator_free(&it);
> ++ ewah_or_iterator_release(&it);
> +
> return count;
> }
> @@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_
> }
> }
>
> -+ ewah_or_iterator_free(&it);
> ++ ewah_or_iterator_release(&it);
> +
> return total;
> }
> 13: d0d564685b ! 14: 13568cfa3b midx: implement writing incremental MIDX bitmaps
> @@ builtin/pack-objects.c: static void write_pack_file(void)
> bitmap_writer_build_type_index(&bitmap_writer,
> written_list);
>
> - ## ewah/ewah_bitmap.c ##
> -@@ ewah/ewah_bitmap.c: int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it)
> - return ret;
> - }
> -
> --void ewah_or_iterator_free(struct ewah_or_iterator *it)
> -+void ewah_or_iterator_release(struct ewah_or_iterator *it)
> - {
> - free(it->its);
> - }
> -
> - ## ewah/ewok.h ##
> -@@ ewah/ewok.h: void ewah_or_iterator_init(struct ewah_or_iterator *it,
> -
> - int ewah_or_iterator_next(eword_t *next, struct ewah_or_iterator *it);
> -
> --void ewah_or_iterator_free(struct ewah_or_iterator *it);
> -+void ewah_or_iterator_release(struct ewah_or_iterator *it);
> -
> - void ewah_xor(
> - struct ewah_bitmap *ewah_i,
> -
> ## midx-write.c ##
> @@ midx-write.c: static uint32_t *midx_pack_order(struct write_midx_context *ctx)
> return pack_order;
> @@ pack-bitmap-write.c: void bitmap_writer_finish(struct bitmap_writer *writer,
>
> write_selected_commits_v1(writer, f, offsets);
>
> - ## pack-bitmap.c ##
> -@@ pack-bitmap.c: static void show_objects_for_type(
> - }
> - }
> -
> -- ewah_or_iterator_free(&it);
> -+ ewah_or_iterator_release(&it);
> - }
> -
> - static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
> -@@ pack-bitmap.c: static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
> - bitmap_unset(to_filter, pos);
> - }
> -
> -- ewah_or_iterator_free(&it);
> -+ ewah_or_iterator_release(&it);
> - bitmap_free(tips);
> - }
> -
> -@@ pack-bitmap.c: static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
> - bitmap_unset(to_filter, pos);
> - }
> -
> -- ewah_or_iterator_free(&it);
> -+ ewah_or_iterator_release(&it);
> - bitmap_free(tips);
> - }
> -
> -@@ pack-bitmap.c: static uint32_t count_object_type(struct bitmap_index *bitmap_git,
> - count++;
> - }
> -
> -- ewah_or_iterator_free(&it);
> -+ ewah_or_iterator_release(&it);
> -
> - return count;
> - }
> -@@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
> - }
> - }
> -
> -- ewah_or_iterator_free(&it);
> -+ ewah_or_iterator_release(&it);
> -
> - return total;
> - }
> -
> ## pack-bitmap.h ##
> @@ pack-bitmap.h: struct bitmap_writer {
>
> @@ t/t5334-incremental-multi-pack-index.sh: test_expect_success 'convert incrementa
> +'
> +
> +test_expect_success 'midx verify with multiple layers' '
> ++ test_path_is_file "$midx_chain" &&
> ++ test_line_count = 2 "$midx_chain" &&
> ++
> + git multi-pack-index verify
> +'
> +
>
> base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e
> --
> 2.49.0.14.g88b49c1b34
Reading the range-diff plus the new first patch, all my feedback has
been addressed with this round.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: [PATCH v4 02/13] pack-revindex: prepare for incremental MIDX bitmaps
2025-03-19 0:07 ` Taylor Blau
@ 2025-03-26 18:08 ` Jeff King
0 siblings, 0 replies; 136+ messages in thread
From: Jeff King @ 2025-03-26 18:08 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, Patrick Steinhardt
On Tue, Mar 18, 2025 at 08:07:53PM -0400, Taylor Blau wrote:
> > Right; some callers care about the number of objects in *their* layer,
> > like computing the size of some bitmap extensions, bounds-checking
> > pseudo-merge commit lookups, or generating positions for objects in the
> > extended index.
> >
> > I'm happy to include that discussion somewhere in the commit message or
> > as a comment nearby bitmap_non_extended_bits(), but I'm not sure which
> > is better. If you have thoughts, LMK.
>
> I renamed this function to bitmap_num_objects_total(), which I think
> more clearly distinguishes it from bitmap_num_objects(). If you have
> other thoughts or things you think I should do in addition to that, LMK.
Yeah, that name is much more clear to me.
-Peff
^ permalink raw reply [flat|nested] 136+ messages in thread
end of thread, other threads:[~2025-03-26 18:08 UTC | newest]
Thread overview: 136+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-15 21:01 [PATCH 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
2024-08-15 21:01 ` [PATCH 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
2024-08-15 21:01 ` [PATCH 02/13] pack-revindex: prepare for " Taylor Blau
2024-08-15 21:01 ` [PATCH 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
2024-08-15 21:01 ` [PATCH 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
2024-08-15 21:01 ` [PATCH 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
2024-08-15 21:01 ` [PATCH 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
2024-08-15 21:01 ` [PATCH 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
2024-08-15 21:01 ` [PATCH 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
2024-08-15 21:01 ` [PATCH 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
2024-08-15 21:01 ` [PATCH 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
2024-08-15 21:01 ` [PATCH 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
2024-08-15 21:01 ` [PATCH 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
2024-08-15 21:01 ` [PATCH 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
2024-08-15 22:28 ` [PATCH v2 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
2024-08-15 22:28 ` [PATCH v2 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
2024-08-15 22:28 ` [PATCH v2 02/13] pack-revindex: prepare for " Taylor Blau
2024-08-15 22:28 ` [PATCH v2 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
2024-08-15 22:29 ` [PATCH v2 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
2024-08-15 22:29 ` [PATCH v2 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
2024-08-15 22:29 ` [PATCH v2 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
2024-08-15 22:29 ` [PATCH v2 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
2024-08-15 22:29 ` [PATCH v2 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
2024-08-15 22:29 ` [PATCH v2 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
2024-08-15 22:29 ` [PATCH v2 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
2024-08-15 22:29 ` [PATCH v2 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
2024-08-15 22:29 ` [PATCH v2 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
2024-08-15 22:29 ` [PATCH v2 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
2024-08-28 17:55 ` [PATCH] fixup! " Junio C Hamano
2024-08-28 18:33 ` Jeff King
2024-08-29 18:57 ` Taylor Blau
2024-08-29 19:27 ` Jeff King
2024-11-19 20:56 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Taylor Blau
2024-11-19 22:07 ` [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-02-28 23:26 ` Taylor Blau
2025-03-03 10:54 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 02/13] pack-revindex: prepare for " Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-02-28 23:39 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-02-28 23:49 ` Taylor Blau
2025-03-03 10:55 ` Patrick Steinhardt
2024-11-19 22:07 ` [PATCH v3 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:12 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
2024-11-19 22:07 ` [PATCH v3 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:16 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:19 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
2024-11-19 22:07 ` [PATCH v3 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
2024-11-19 22:07 ` [PATCH v3 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:22 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:26 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:28 ` Taylor Blau
2024-11-19 22:07 ` [PATCH v3 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
2025-02-28 10:01 ` Patrick Steinhardt
2025-03-01 0:31 ` Taylor Blau
2024-11-20 8:49 ` [PATCH v3 00/13] midx: incremental multi-pack indexes, part two Junio C Hamano
2025-03-14 20:18 ` [PATCH v4 " Taylor Blau
2025-03-14 20:18 ` [PATCH v4 01/13] Documentation: describe incremental MIDX bitmaps Taylor Blau
2025-03-18 1:16 ` Jeff King
2025-03-18 23:11 ` Taylor Blau
2025-03-18 2:42 ` Elijah Newren
2025-03-18 23:19 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 02/13] pack-revindex: prepare for " Taylor Blau
2025-03-18 1:27 ` Jeff King
2025-03-19 0:02 ` Taylor Blau
2025-03-19 0:07 ` Taylor Blau
2025-03-26 18:08 ` Jeff King
2025-03-18 2:43 ` Elijah Newren
2025-03-19 0:03 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 03/13] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
2025-03-18 4:13 ` Elijah Newren
2025-03-19 0:08 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
2025-03-18 1:38 ` Jeff King
2025-03-19 0:13 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 05/13] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
2025-03-14 20:18 ` [PATCH v4 06/13] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
2025-03-18 4:13 ` Elijah Newren
2025-03-19 0:17 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 07/13] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
2025-03-18 5:31 ` Elijah Newren
2025-03-19 0:30 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 08/13] pack-bitmap.c: compute disk-usage with " Taylor Blau
2025-03-18 1:41 ` Jeff King
2025-03-19 0:30 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 09/13] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
2025-03-14 20:18 ` [PATCH v4 10/13] ewah: implement `struct ewah_or_iterator` Taylor Blau
2025-03-18 1:44 ` Jeff King
2025-03-19 0:33 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 11/13] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
2025-03-18 2:01 ` Jeff King
2025-03-19 0:38 ` Taylor Blau
2025-03-18 6:43 ` Elijah Newren
2025-03-19 0:39 ` Taylor Blau
2025-03-14 20:18 ` [PATCH v4 12/13] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
2025-03-18 2:05 ` Jeff King
2025-03-19 23:02 ` Taylor Blau
2025-03-14 20:19 ` [PATCH v4 13/13] midx: implement writing incremental MIDX bitmaps Taylor Blau
2025-03-18 2:16 ` Jeff King
2025-03-20 0:14 ` Taylor Blau
2025-03-18 17:13 ` Elijah Newren
2025-03-20 0:16 ` Taylor Blau
2025-03-18 2:21 ` [PATCH v4 00/13] midx: incremental multi-pack indexes, part two Jeff King
2025-03-20 0:18 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 00/14] " Taylor Blau
2025-03-20 17:56 ` [PATCH v5 01/14] Documentation: remove a "future work" item from the MIDX docs Taylor Blau
2025-03-20 17:56 ` [PATCH v5 02/14] Documentation: describe incremental MIDX bitmaps Taylor Blau
2025-03-20 17:56 ` [PATCH v5 03/14] pack-revindex: prepare for " Taylor Blau
2025-03-20 17:56 ` [PATCH v5 04/14] pack-bitmap.c: open and store incremental bitmap layers Taylor Blau
2025-03-20 17:56 ` [PATCH v5 05/14] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs Taylor Blau
2025-03-20 17:56 ` [PATCH v5 06/14] pack-bitmap.c: teach `show_objects_for_type()` " Taylor Blau
2025-03-20 17:56 ` [PATCH v5 07/14] pack-bitmap.c: support bitmap pack-reuse with " Taylor Blau
2025-03-20 17:56 ` [PATCH v5 08/14] pack-bitmap.c: teach `rev-list --test-bitmap` about " Taylor Blau
2025-03-20 17:56 ` Taylor Blau
2025-03-20 17:58 ` Taylor Blau
2025-03-20 17:56 ` [PATCH v5 09/14] pack-bitmap.c: compute disk-usage with " Taylor Blau
2025-03-20 17:56 ` [PATCH v5 10/14] pack-bitmap.c: apply pseudo-merge commits " Taylor Blau
2025-03-20 17:56 ` [PATCH v5 11/14] ewah: implement `struct ewah_or_iterator` Taylor Blau
2025-03-20 17:57 ` [PATCH v5 12/14] pack-bitmap.c: keep track of each layer's type bitmaps Taylor Blau
2025-03-20 17:57 ` [PATCH v5 13/14] pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators Taylor Blau
2025-03-20 17:57 ` [PATCH v5 14/14] midx: implement writing incremental MIDX bitmaps Taylor Blau
2025-03-20 20:00 ` [PATCH v5 00/14] midx: incremental multi-pack indexes, part two Elijah Newren
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).