All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com,
	peff@peff.net, szeder.dev@gmail.com
Subject: Re: Making split commit graphs pick up new options (namely --changed-paths)
Date: Thu, 10 Jun 2021 13:22:56 -0400	[thread overview]
Message-ID: <YMJKcHpN/gL5U6KK@nand.local> (raw)
In-Reply-To: <871r9a2dol.fsf@evledraar.gmail.com>

On Thu, Jun 10, 2021 at 12:40:33PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Is there any way with an existing --split setup that introduces a
> --changed-paths to make the "add bloom filters to the graph" eventually
> consistent, or is some one-off --split=replace the only way to
> grandfather in such a feature?

I'm assuming what you mean is "can I introduce changed-path Bloom
filters into an existing split commit-graph with many layers without
having to recompute the whole thing at once?" If so, then the answer is
yes.

Passing --changed-paths causes the commit-graph machinery to compute
missing Bloom filters for every commit in the graph layer it is writing.
So, if you do something like:

  git commit-graph write --split --reachable --size-multiple=2 \
    --changed-paths

(--size-multiple=2 is the default, but I'm including it for clarity),
then you'll get changed-path Bloom filters for all commits in the new
layer, including any layers which may have been merged to create that
layer.

That all still respects `--max-new-filters`, with preference going to
commits with lower generation numbers before higher ones (unless
including commits from packs explicitly with --stdin-packs, in which
case preference is given in pack order; see
commit-graph.c:commit_pos_cmp() for details).

t4216 shows this for --split=replace, but you could just as easily
imagine a test like this:

    #!/bin/sh

    rm -fr repo
    git init repo
    cd repo

    commit () {
      >$1
      git add $1
      git commit -m "$1"
    }

    # no changed-path Bloom filter
    commit missing
    git commit-graph write --split --reachable --no-changed-paths

    missing="$(git rev-parse HEAD)"
    ~/src/git/t/helper/test-tool bloom get_filter_for_commit "$missing"

    # >= 2x the size of the previous layer, so they will be merged
    commit bloom1
    commit bloom2
    git commit-graph write --split --reachable --changed-paths

    # and the $missing commit has a Bloom filter
    ~/src/git/t/helper/test-tool bloom get_filter_for_commit "$missing"

(One caveat is that if you run that script unmodified, you'll get a
filter on both invcations of the test-tool: that's because it computes
filters on the fly if they are missing. You can change that by s/1/0 in
the call to get_or_compute_bloom_filter()).

> Reading the code there seems to be no way to do that, and we have the
> "chunk_bloom_data" in the graph, as well as "bloom_filter_settings".
>
> I'd expect some way to combine the "max_new_filters" and --split with
> some eventual-consistency logic so that graphs not matching our current
> settings are replaced, or replaced some <limit> at a time.

This is asking about something slightly different, Bloom filter
settings rather than the existence of chagned-path Bloom filters
themselves. The Bloom settings aren't written to the commit-graph
although there has been some discussion about doing this in the past.

If we ever did encode the Bloom settings, I imagine that accomplishing a
sort of "eventually replace all changed-path Bloom filters with these
new settings" would be as simple as considering all filters computed
under different settings to be "uncomputed".

> Also, am I reading the expire_commit_graphs() logic correctly that we
> first write the split graph, and then unlink() things that are too old?
> I.e. if you rely on the commit-graph to optimize things this will make
> things slower until the next run of writing the graph?

Before expire_commit_graphs() is called, we call mark_commit_graphs()
which freshens the mtimes of all surviving commit-graph layers, and then
expire_commit_graphs() removes the stale layers. I'm not sure what
things getting slower is referring to since the resulting commit-graph
has at least as many commits as the commit-graph that existed prior to
the write.

> I expected to find something more gentle there [...]

FWIW, I also find this "expire based on mtimes" thing a little odd for
writing split commit-graphs because we know exactly which layers we want
to get rid of. I suspect that the reuse comes from wanting to unify the
logic for handling '--expire-time' with the expiration that happens
after writing a split commit-graph that merged two or more previous
layers.

I would probably change mark_commit_graphs() to remove those merged
layers explicitly (but still run expire_commit_graphs() to handle
--expire-time). But, come to think of it... if merging >2 layers already
causes the merged layers to be removed, then why would you ever set an
--expire-time yourself?

Thanks,
Taylor

  reply	other threads:[~2021-06-10 17:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 10:40 Making split commit graphs pick up new options (namely --changed-paths) Ævar Arnfjörð Bjarmason
2021-06-10 17:22 ` Taylor Blau [this message]
2021-06-10 18:21   ` Derrick Stolee
2021-06-10 23:56   ` Ævar Arnfjörð Bjarmason
2021-06-11  0:50     ` Taylor Blau
2021-06-11 17:47       ` Derrick Stolee
2021-06-11 19:01         ` Taylor Blau
2021-06-15 14:21           ` Derrick Stolee
2021-06-15 14:35             ` Ævar Arnfjörð Bjarmason
2021-06-16  1:45               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMJKcHpN/gL5U6KK@nand.local \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.