Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/2] receive-pack: use advertised reference tips to inform connectivity check
Date: Sun, 30 Oct 2022 15:09:04 -0400	[thread overview]
Message-ID: <Y17L0IjELU5QlOPL@nand.local> (raw)
In-Reply-To: <006e89f384be1227b922fb6fdc8755ae84cac587.1666967670.git.ps@pks.im>

On Fri, Oct 28, 2022 at 04:42:27PM +0200, Patrick Steinhardt wrote:
> This strategy has the major downside that it will not require any object
> to be sent by the client that is reachable by any of the repositories'
> references. While that sounds like it would be indeed what we are after
> with the connectivity check, it is arguably not. The administrator that
> manages the server-side Git repository may have configured certain refs
> to be hidden during the reference advertisement via `transfer.hideRefs`
> or `receivepack.hideRefs`. Whatever the reason, the result is that the
> client shouldn't expect that any of those hidden references exists on
> the remote side, and neither should they assume any of the pointed-to
> objects to exist except if referenced by any visible reference. But
> because we treat _all_ local refs as uninteresting in the connectivity
> check, a client is free to send a packfile that references objects that
> are only reachable via a hidden reference on the server-side, and we
> will gladly accept it.

You mention below that this is a correctness issue, but I am not sure
that I agree.

The existing behavior is a little strange, I agree, but your argument
relies on an assumption that the history on hidden refs is not part of
the reachable set, which is not the case. Any part of the repository
that is reachable from _any_ reference, hidden or not, is reachable by
definition.

So it's perfectly fine to consider objects on hidden refs to be in the
uninteresting set, because they are reachable. It's odd from the
client's perspective, but I do not see a path to repository corruption
with thee existing behavior.

> Besides the stated fix to correctness this also provides a huge boost to
> performance in the repository mentioned above. Pushing a new commit into
> this repo with `transfer.hideRefs` set up to hide 6.8 million of 7 refs
> as it is configured in Gitaly leads to an almost 7.5-fold speedup:

Nice, here we expect a pretty good speed-up, and indeed...

>     Summary
>       'pks-connectivity-check-hide-refs' ran
>         7.42 ± 0.16 times faster than 'main'

...that's exactly what we get. Good.

> @@ -1928,6 +1933,12 @@ static void execute_commands(struct command *commands,
>  	opt.err_fd = err_fd;
>  	opt.progress = err_fd && !quiet;
>  	opt.env = tmp_objdir_env(tmp_objdir);
> +	if (oidset_size(announced_oids) != 0) {

I'm nitpicking, but this would be preferable as "if (oidset_size(announced_oids))"
without the "!= 0".

> +		oidset_iter_init(announced_oids, &announced_oids_iter);
> +		opt.reachable_oids_fn = iterate_announced_oids;
> +		opt.reachable_oids_data = &announced_oids_iter;
> +	}

Why do we see a slowdown when there there aren't any hidden references?
Or am I misunderstanding your patch message which instead means "we see
a slow-down when there are no hidden references [since we still must
store and enumerate all advertised references]"?

If the latter, could we avoid invoking the new machinery altogether? In
other words, shouldn't receive-pack only set the reachable_oids_fn() to
enumerate advertised references only when the set of advertised
references differs from the behavior of `--not --all`?

>  	if (check_connected(iterate_receive_command_list, &data, &opt))
>  		set_connectivity_errors(commands, si);
>
> @@ -2462,6 +2473,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
>  {
>  	int advertise_refs = 0;
>  	struct command *commands;
> +	struct oidset announced_oids = OIDSET_INIT;

This looks like trading one problem for another. In your above example,
we now need to store 20 bytes of OIDs 6.8M times, or ~130 MiB. Not the
end of the world, but it feels like an avoidable problem.

Could we enumerate the references in a callback to for_each_ref() and
only emit ones which aren't hidden? Storing these and then recalling
them after the fact is worth avoiding.

Thanks,
Taylor

  parent reply	other threads:[~2022-10-30 19:11 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-28 14:42 [PATCH 0/2] receive-pack: use advertised reference tips to inform connectivity check Patrick Steinhardt
2022-10-28 14:42 ` [PATCH 1/2] connected: allow supplying different view of reachable objects Patrick Steinhardt
2022-10-28 14:54   ` Ævar Arnfjörð Bjarmason
2022-10-28 18:12   ` Junio C Hamano
2022-10-30 18:49     ` Taylor Blau
2022-10-31 13:10     ` Patrick Steinhardt
2022-11-01  1:16       ` Taylor Blau
2022-10-28 14:42 ` [PATCH 2/2] receive-pack: use advertised reference tips to inform connectivity check Patrick Steinhardt
2022-10-28 15:01   ` Ævar Arnfjörð Bjarmason
2022-10-31 14:21     ` Patrick Steinhardt
2022-10-31 15:36       ` Ævar Arnfjörð Bjarmason
2022-10-30 19:09   ` Taylor Blau [this message]
2022-10-31 14:45     ` Patrick Steinhardt
2022-11-01  1:28       ` Taylor Blau
2022-11-01  7:20         ` Patrick Steinhardt
2022-11-01 11:53           ` Patrick Steinhardt
2022-11-02  1:05             ` Taylor Blau
2022-11-01  8:28       ` Jeff King
2022-10-28 16:40 ` [PATCH 0/2] " Junio C Hamano
2022-11-01  1:30 ` Taylor Blau
2022-11-01  9:00 ` Jeff King
2022-11-01 11:49   ` Patrick Steinhardt
2022-11-03 14:37 ` [PATCH v2 0/3] receive-pack: only use visible refs for " Patrick Steinhardt
2022-11-03 14:37   ` [PATCH v2 1/3] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-03 14:37   ` [PATCH v2 2/3] revision: add new parameter to specify all visible refs Patrick Steinhardt
2022-11-05 12:46     ` Jeff King
2022-11-07  8:20       ` Patrick Steinhardt
2022-11-08 14:32         ` Jeff King
2022-11-05 12:55     ` Jeff King
2022-11-03 14:37   ` [PATCH v2 3/3] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-05  0:40   ` [PATCH v2 0/3] " Taylor Blau
2022-11-05 12:55     ` Jeff King
2022-11-05 12:52   ` Jeff King
2022-11-07 12:16 ` [PATCH v3 0/6] " Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 1/6] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 2/6] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 3/6] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-07 12:51     ` Ævar Arnfjörð Bjarmason
2022-11-08  9:11       ` Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 4/6] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-07 13:34     ` Ævar Arnfjörð Bjarmason
2022-11-07 17:07       ` Ævar Arnfjörð Bjarmason
2022-11-08  9:48         ` Patrick Steinhardt
2022-11-08  9:22       ` Patrick Steinhardt
2022-11-08  0:57     ` Taylor Blau
2022-11-08  8:16       ` Patrick Steinhardt
2022-11-08 14:42         ` Jeff King
2022-11-07 12:16   ` [PATCH v3 5/6] revparse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-08 14:44     ` Jeff King
2022-11-07 12:16   ` [PATCH v3 6/6] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-08  0:59   ` [PATCH v3 0/6] " Taylor Blau
2022-11-08 10:03 ` [PATCH v4 " Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 1/6] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-08 13:36     ` Ævar Arnfjörð Bjarmason
2022-11-08 14:49       ` Patrick Steinhardt
2022-11-08 14:51     ` Jeff King
2022-11-08 10:03   ` [PATCH v4 2/6] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 3/6] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 4/6] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-08 15:07     ` Jeff King
2022-11-08 21:13       ` Taylor Blau
2022-11-11  5:48       ` Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 5/6] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-08 10:04   ` [PATCH v4 6/6] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-11  6:49 ` [PATCH v5 0/7] " Patrick Steinhardt
2022-11-11  6:49   ` [PATCH v5 1/7] refs: fix memory leak when parsing hideRefs config Patrick Steinhardt
2022-11-11  6:49   ` [PATCH v5 2/7] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 3/7] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 4/7] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 5/7] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 6/7] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 7/7] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-11 22:18   ` [PATCH v5 0/7] " Taylor Blau
2022-11-15 17:26     ` Jeff King
2022-11-16 21:22       ` Taylor Blau
2022-11-16 22:04         ` Jeff King
2022-11-16 22:33           ` Taylor Blau
2022-11-17  5:45             ` Patrick Steinhardt
2022-11-17  5:46 ` [PATCH v6 " Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 1/7] refs: fix memory leak when parsing hideRefs config Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 2/7] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 3/7] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 4/7] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 5/7] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-17  5:47   ` [PATCH v6 6/7] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-17  5:47   ` [PATCH v6 7/7] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-17 15:03   ` [PATCH v6 0/7] " Jeff King
2022-11-17 21:24     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y17L0IjELU5QlOPL@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).