From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Taylor Blau <me@ttaylorr.com>, Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org
Subject: Re: [PATCH 0/2] receive-pack: use advertised reference tips to inform connectivity check
Date: Tue, 1 Nov 2022 12:49:51 +0100 [thread overview]
Message-ID: <Y2EH3+l62S55EJOZ@ncase> (raw)
In-Reply-To: <Y2DgJi6foPyBhycU@coredump.intra.peff.net>
[-- Attachment #1: Type: text/plain, Size: 6412 bytes --]
On Tue, Nov 01, 2022 at 05:00:22AM -0400, Jeff King wrote:
> On Fri, Oct 28, 2022 at 04:42:19PM +0200, Patrick Steinhardt wrote:
>
> > - A client shouldn't assume objects to exist that have not been part
> > of the reference advertisement. But if it excluded an object from
> > the packfile that is reachable via any ref that is excluded from
> > the reference advertisement due to `transfer.hideRefs` we'd have
> > accepted the push anyway. I'd argue that this is a bug in the
> > current implementation.
>
> Like others, I don't think this is a bug exactly. We'd never introduce a
> corruption. We're just more lenient with clients than we need to be.
>
> But I don't think your scheme changes that. In a sense, the tips used by
> "rev-list --not --all" are really an optimization. We will walk the
> history from the to-be-updated ref tips all the way down to the roots if
> we have to. So imagine that I have object X which is not referenced at
> all (neither hidden nor visible ref). We obviously do not advertise it
> to the client, but let's further imagine that a client sends us a pack
> with X..Y, and a request to update some ref to Y.
>
> Both before and after your code, if rev-list is able to walk down from Y
> until either we hit all roots or all UNINTERESTING commits, it will be
> satisfied. So as long as the receiving repo actually has all of the
> history leading up to X, it will allow the push, regardless of your
> patch.
Oh, right! Now I see where my thinko was, which means both you and
Taylor are correct. I somehow assumed that we'd fail the connectivity
check in that case, but all it means is that we now potentially walk
more objects than we'd have done if we used `--not --all`.
> If we wanted to stop being lenient, we'd have to actually check that
> every object we traverse is either reachable, or came from the
> just-pushed pack.
Yes, indeed.
> There's also a subtle timing issue here. Our connectivity check happens
> after we've finished receiving the pack. So not only are we including
> hidden refs, but we are using the ref state at the end of the push
> (after receiving and processing the incoming pack), rather than the
> beginning.
>
> From the same "leniency" lens this seems like the wrong thing. But as
> above, it doesn't matter in practice, because these tips are really an
> optimization to tell rev-list that it can stop traversing.
>
> If you think of the connectivity check less as "did the client try to
> cheat" and more as "is it OK to update these refs without introducing a
> corruption", then it makes sense that you'd want to do read the inputs
> to the check as close to the ref update as possible, because it shrinks
> the race window which could introduce corruption.
Agreed.
> Imagine a situation like this:
>
> 0. We advertise to client that we have commit X.
>
> 1. Client starts pushing up a pack with X..Y and asks to update some
> branch to Y.
>
> 2. Meanwhile, the branch with X is deleted, and X is pruned.
>
> 3. Server finishes receiving the pack. All looks good, and then we
> start a connectivity check.
>
> In the current code, that check starts with the current ref state (with
> X deleted) as a given, and makes sure that we have the objects we need
> to update the refs. After your patches, it would take X as a given, and
> stop traversing when we see it.
>
> That same race exists before your patch, but it's between the time of
> "rev-list --not --all" running and the ref update. After your patch,
> it's between the advertisement and the ref update, which can be a long
> time (hours or even days, if the client is very slow).
>
> In practice I'm not sure how big a deal this is. If we feed the
> now-pruned X to rev-list, it may notice that X went away, though we've
> been reducing the number of checks there in the name of efficiency
> (e.g., if it's still in the commit graph, we'd say "OK, good enough"
> these days, even if we don't have it on disk anymore).
>
> But it feels like a wrong direction to make that race longer if there's
> no need to.
Good point.
> So all that said...
>
> > - Second, by using advertised refs as inputs instead of `git
> > rev-list --not --all` we avoid looking up all refs that are
> > irrelevant to the current push. This can be a huge performance
> > improvement in repos that have a huge amount of internal, hidden
> > refs. In one of our repos with 7m refs, of which 6.8m are hidden,
> > this speeds up pushes from ~30s to ~4.5s.
>
> I like the general direction here of avoiding the hidden refs. The
> client _shouldn't_ have been using them, so we can optimistically assume
> they're useless (and in the case of races or other weirdness, rev-list
> just ends up traversing a bit further).
>
> But we can split the two ideas in your series:
>
> 1. Feed the advertised tips from receive-pack to rev-list.
>
> 2. Avoid traversing from the hidden tips.
>
> Doing (1) gets you (2) for free. But if we don't want to do (1), and I
> don't think we do, we can get (2) by just teaching rev-list to narrow
> the check.
>
> I see some discussion in the other part of the thread, and we may need a
> new rev-list option to do this, as mentioned there. However, you _might_
> be able to do it the existing --exclude mechanism. I.e., something like:
>
> rev-list --stdin --not --exclude 'refs/hidden/*' --all
Yeah, Taylor proposed to add a new `--visible-refs=receive` option that
lets git-rev-list(1) automatically add all references that are visible
when paying attention to `receive.hideRefs`. I actually like this idea
and will likely have a look at how easy or hard it is to implement.
> The gotchas are:
>
> - I'm not 100% sure that --exclude globbing and transfer.hideRefs
> syntax are compatible. You'd want to check.
>
> - these would have to come on the command line (at least with the
> current code). Probably nobody has enough hiderefs patterns for that
> to be a problem (and remember we are passing the glob pattern here,
> not the 6.8M refs themselves). But it could bite somebody in a
> pathological case.
>
> -Peff
Well, we can avoid these gotchas if we used `--visible-refs`.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-11-01 11:52 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-28 14:42 [PATCH 0/2] receive-pack: use advertised reference tips to inform connectivity check Patrick Steinhardt
2022-10-28 14:42 ` [PATCH 1/2] connected: allow supplying different view of reachable objects Patrick Steinhardt
2022-10-28 14:54 ` Ævar Arnfjörð Bjarmason
2022-10-28 18:12 ` Junio C Hamano
2022-10-30 18:49 ` Taylor Blau
2022-10-31 13:10 ` Patrick Steinhardt
2022-11-01 1:16 ` Taylor Blau
2022-10-28 14:42 ` [PATCH 2/2] receive-pack: use advertised reference tips to inform connectivity check Patrick Steinhardt
2022-10-28 15:01 ` Ævar Arnfjörð Bjarmason
2022-10-31 14:21 ` Patrick Steinhardt
2022-10-31 15:36 ` Ævar Arnfjörð Bjarmason
2022-10-30 19:09 ` Taylor Blau
2022-10-31 14:45 ` Patrick Steinhardt
2022-11-01 1:28 ` Taylor Blau
2022-11-01 7:20 ` Patrick Steinhardt
2022-11-01 11:53 ` Patrick Steinhardt
2022-11-02 1:05 ` Taylor Blau
2022-11-01 8:28 ` Jeff King
2022-10-28 16:40 ` [PATCH 0/2] " Junio C Hamano
2022-11-01 1:30 ` Taylor Blau
2022-11-01 9:00 ` Jeff King
2022-11-01 11:49 ` Patrick Steinhardt [this message]
2022-11-03 14:37 ` [PATCH v2 0/3] receive-pack: only use visible refs for " Patrick Steinhardt
2022-11-03 14:37 ` [PATCH v2 1/3] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-03 14:37 ` [PATCH v2 2/3] revision: add new parameter to specify all visible refs Patrick Steinhardt
2022-11-05 12:46 ` Jeff King
2022-11-07 8:20 ` Patrick Steinhardt
2022-11-08 14:32 ` Jeff King
2022-11-05 12:55 ` Jeff King
2022-11-03 14:37 ` [PATCH v2 3/3] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-05 0:40 ` [PATCH v2 0/3] " Taylor Blau
2022-11-05 12:55 ` Jeff King
2022-11-05 12:52 ` Jeff King
2022-11-07 12:16 ` [PATCH v3 0/6] " Patrick Steinhardt
2022-11-07 12:16 ` [PATCH v3 1/6] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-07 12:16 ` [PATCH v3 2/6] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-07 12:16 ` [PATCH v3 3/6] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-07 12:51 ` Ævar Arnfjörð Bjarmason
2022-11-08 9:11 ` Patrick Steinhardt
2022-11-07 12:16 ` [PATCH v3 4/6] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-07 13:34 ` Ævar Arnfjörð Bjarmason
2022-11-07 17:07 ` Ævar Arnfjörð Bjarmason
2022-11-08 9:48 ` Patrick Steinhardt
2022-11-08 9:22 ` Patrick Steinhardt
2022-11-08 0:57 ` Taylor Blau
2022-11-08 8:16 ` Patrick Steinhardt
2022-11-08 14:42 ` Jeff King
2022-11-07 12:16 ` [PATCH v3 5/6] revparse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-08 14:44 ` Jeff King
2022-11-07 12:16 ` [PATCH v3 6/6] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-08 0:59 ` [PATCH v3 0/6] " Taylor Blau
2022-11-08 10:03 ` [PATCH v4 " Patrick Steinhardt
2022-11-08 10:03 ` [PATCH v4 1/6] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-08 13:36 ` Ævar Arnfjörð Bjarmason
2022-11-08 14:49 ` Patrick Steinhardt
2022-11-08 14:51 ` Jeff King
2022-11-08 10:03 ` [PATCH v4 2/6] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-08 10:03 ` [PATCH v4 3/6] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-08 10:03 ` [PATCH v4 4/6] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-08 15:07 ` Jeff King
2022-11-08 21:13 ` Taylor Blau
2022-11-11 5:48 ` Patrick Steinhardt
2022-11-08 10:03 ` [PATCH v4 5/6] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-08 10:04 ` [PATCH v4 6/6] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-11 6:49 ` [PATCH v5 0/7] " Patrick Steinhardt
2022-11-11 6:49 ` [PATCH v5 1/7] refs: fix memory leak when parsing hideRefs config Patrick Steinhardt
2022-11-11 6:49 ` [PATCH v5 2/7] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-11 6:50 ` [PATCH v5 3/7] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-11 6:50 ` [PATCH v5 4/7] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-11 6:50 ` [PATCH v5 5/7] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-11 6:50 ` [PATCH v5 6/7] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-11 6:50 ` [PATCH v5 7/7] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-11 22:18 ` [PATCH v5 0/7] " Taylor Blau
2022-11-15 17:26 ` Jeff King
2022-11-16 21:22 ` Taylor Blau
2022-11-16 22:04 ` Jeff King
2022-11-16 22:33 ` Taylor Blau
2022-11-17 5:45 ` Patrick Steinhardt
2022-11-17 5:46 ` [PATCH v6 " Patrick Steinhardt
2022-11-17 5:46 ` [PATCH v6 1/7] refs: fix memory leak when parsing hideRefs config Patrick Steinhardt
2022-11-17 5:46 ` [PATCH v6 2/7] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-17 5:46 ` [PATCH v6 3/7] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-17 5:46 ` [PATCH v6 4/7] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-17 5:46 ` [PATCH v6 5/7] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-17 5:47 ` [PATCH v6 6/7] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-17 5:47 ` [PATCH v6 7/7] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-17 15:03 ` [PATCH v6 0/7] " Jeff King
2022-11-17 21:24 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2EH3+l62S55EJOZ@ncase \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).