Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Taylor Blau <me@ttaylorr.com>, git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal
Date: Fri, 5 May 2023 14:13:45 -0400	[thread overview]
Message-ID: <e6e2401a-519a-4859-d20b-6b947e94e1ec@github.com> (raw)
In-Reply-To: <1993af00cba3af83755da557816c1c7a8b52c844.1683307620.git.me@ttaylorr.com>

On 5/5/2023 1:27 PM, Taylor Blau wrote:

> +static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
> +					    struct rev_info *revs,
> +					    struct object_list *roots)
> +{
> +	struct bitmap_boundary_cb cb = { .bitmap_git = bitmap_git };
> +	unsigned int i;
> +	unsigned int tmp_blobs, tmp_trees, tmp_tags;
> +	int any_missing = 0;
> +
> +	revs->ignore_missing_links = 1;
> +
> +	/*
> +	 * OR in any existing reachability bitmaps among `roots` into `base`.
> +	 */
> +	while (roots) {
> +		struct object *object = roots->item;
> +		roots = roots->next;
> +
> +		if (object->type == OBJ_COMMIT &&
> +		    !obj_in_bitmap(bitmap_git, object, cb.base) &&
> +		    add_commit_to_bitmap(bitmap_git, &cb.base,
> +					 (struct commit *)object)) {
> +			object->flags |= SEEN;
> +			continue;
> +		}
> +
> +		any_missing = 1;
> +	}
> +
> +	if (!any_missing)
> +		goto cleanup;

Here, we check if the list of roots are completely covered by existing
bitmaps. This prevents doing the commit-only walk as well as the boundary
checks.

There's another confusing bit here: if we have a bitmap for the commit,
then we mark it as SEEN. Does that mean that the later commit walk will
skip walking its history? Would we then get a boundary that is actually
further in history than necessary? ("A --not B C" would walk all of
B..A if C has a bitmap, even if a lot of that region is reachable from C.)

My initial thought here was that this is an unlikely case, so the
optimization isn't worth the code complexity. But now, I'm a little
concerned that it actually hurts the later walk in the case of multiple
roots.

> +	tmp_blobs = revs->blob_objects;
> +	tmp_trees = revs->tree_objects;
> +	tmp_tags = revs->blob_objects;
> +	revs->blob_objects = 0;
> +	revs->tree_objects = 0;
> +	revs->tag_objects = 0;
> +
> +	/*
> +	 * We didn't have complete coverage of the roots. First setup a
> +	 * revision walk to (a) OR in any bitmaps that are UNINTERESTING
> +	 * between the tips and boundary, and (b) record the boundary.
> +	 */
> +	trace2_region_enter("pack-bitmap", "boundary-prepare", the_repository);
> +	if (prepare_revision_walk(revs))
> +		die("revision walk setup failed");
> +	trace2_region_leave("pack-bitmap", "boundary-prepare", the_repository);
> +
> +	trace2_region_enter("pack-bitmap", "boundary-traverse", the_repository);
> +	revs->boundary = 1;
> +	traverse_commit_list_filtered(revs,
> +				      show_boundary_commit,
> +				      show_boundary_object,
> +				      &cb, NULL);

Looks good. Callbacks were clear when I read them.

> +	revs->boundary = 0;
> +	revs->blob_objects = tmp_blobs;
> +	revs->tree_objects = tmp_trees;
> +	revs->tag_objects = tmp_tags;

These would seem more natural if they were after the trace2_region_leave().

> +	reset_revision_walk();
> +	clear_object_flags(UNINTERESTING);
> +	trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);
> +
> +	/*
> +	 * Then add the boundary commit(s) as fill-in traversal tips.
> +	 */
> +	trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository);
> +	if (cb.boundary.nr) {

Making this an if block does help keep its variables more local. It does
make me think about whether the trace2 regions should be within the block,
but it's easier to analyze things when the regions are expected to be present.

> +		struct object *obj;
> +		int needs_walk = 0;
> +
> +		for (i = 0; i < cb.boundary.nr; i++) {
> +			obj = cb.boundary.objects[i].item;
> +
> +			if (obj_in_bitmap(bitmap_git, obj, cb.base)) {
> +				obj->flags |= SEEN;

This SEEN makes sense: we want to terminate the walk here as everything
reachable is already in the bitmap.

> +			} else {
> +				add_pending_object(revs, obj, "");
> +				needs_walk = 1;
> +			}
> +		}
> +
> +		if (needs_walk)
> +			cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);

I wonder if fill_in_bitmap() already does "the right thing" when there
are no pending objects or when all pending objects are already in the
bitmap. Do we need to do these checks, or should we just put all boundary
commits in the pending set?

> +	}
> +	trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository);
> +
> +

nit: extra empty line

> +cleanup:
> +	revs->ignore_missing_links = 0;
> +
> +	return cb.base;

Do we need to clean up the cb.boundary object array?

> +}
> +

...

> @@ -1605,18 +1728,15 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
>  	if (load_bitmap(revs->repo, bitmap_git) < 0)
>  		goto cleanup;
>  
> -	object_array_clear(&revs->pending);
> -
>  	if (haves) {
> -		revs->ignore_missing_links = 1;
> -		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
> -		reset_revision_walk();
> -		revs->ignore_missing_links = 0;
> -
> +		haves_bitmap = find_boundary_objects(bitmap_git, revs, haves);
>  		if (!haves_bitmap)
>  			BUG("failed to perform bitmap walk");
>  	}

I mentioned in reply to the cover letter that deleting this older algorithm
is likely premature at this point, and we might want to keep it in general
as an option.

> +	object_array_clear(&revs->pending);
> +	reset_revision_walk();
> +

Thanks,
-Stolee

  reply	other threads:[~2023-05-05 18:13 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-25  0:00 [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-04-25  0:00 ` [PATCH 1/3] revision: support tracking uninteresting commits Taylor Blau
2023-04-25 18:15   ` Derrick Stolee
2023-05-03 21:48     ` Taylor Blau
2023-05-04 13:46       ` Derrick Stolee
2023-05-03 22:08     ` Taylor Blau
2023-05-04 13:59       ` Derrick Stolee
2023-05-05 17:30         ` Taylor Blau
2023-04-25 18:22   ` Junio C Hamano
2023-04-25 18:48     ` Taylor Blau
2023-04-25  0:00 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-04-25 18:32   ` Junio C Hamano
2023-04-25 18:51     ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()`t Taylor Blau
2023-04-25  0:00 ` [PATCH 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-04-25 18:52   ` Junio C Hamano
2023-05-02 21:31     ` Felipe Contreras
2023-05-03 21:42     ` Taylor Blau
2023-05-03 21:58       ` Junio C Hamano
2023-04-25 18:53   ` Derrick Stolee
2023-04-25 18:02 ` [PATCH 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-04-25 18:30   ` Derrick Stolee
2023-04-25 18:57     ` Taylor Blau
2023-04-25 19:52       ` Derrick Stolee
2023-05-03 21:43         ` Taylor Blau
2023-04-25 18:06 ` Derrick Stolee
2023-04-25 19:01   ` Taylor Blau
2023-04-25 20:27     ` Derrick Stolee
2023-05-01 22:08 ` Junio C Hamano
2023-05-02 23:52   ` Taylor Blau
2023-05-05 17:27 ` [PATCH v2 0/2] " Taylor Blau
2023-05-05 17:27   ` [PATCH v2 1/2] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-05 17:27   ` [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-05 18:13     ` Derrick Stolee [this message]
2023-05-05 18:43       ` Taylor Blau
2023-05-05 17:33   ` [PATCH v2 0/2] pack-bitmap: boundary-based " Junio C Hamano
2023-05-05 17:59   ` Derrick Stolee
2023-05-05 18:46     ` [PATCH v2 0/2] pack-bitmap: boundary-based bitmap traversalt Taylor Blau
2023-05-05 20:45       ` Derrick Stolee
2023-05-08 17:38 ` [PATCH v3 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-05-08 17:38   ` [PATCH v3 1/3] object: add object_array initializer helper function Taylor Blau
2023-05-08 17:38   ` [PATCH v3 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-08 17:38   ` [PATCH v3 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-08 20:53     ` Derrick Stolee
2023-05-08 22:12       ` Taylor Blau
2023-05-10 22:55   ` [PATCH v3 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-05-10 23:10     ` Taylor Blau
2023-05-11 15:23       ` Derrick Stolee
2023-06-08 16:25 ` [PATCH v4 " Taylor Blau
2023-06-08 16:25   ` [PATCH v4 1/3] object: add object_array initializer helper function Taylor Blau
2023-06-08 16:25   ` [PATCH v4 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-06-08 16:25   ` [PATCH v4 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6e2401a-519a-4859-d20b-6b947e94e1ec@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).