Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Han Xin <hanxin.hx@bytedance.com>, git@vger.kernel.org
Cc: xingxin.xx@bytedance.com, jonathantanmy@google.com,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v1] negotiator/default.c: avoid stack overflow
Date: Mon, 24 Apr 2023 10:44:38 -0400	[thread overview]
Message-ID: <2bcaeba9-20bc-1ca8-849b-ac54342c71e3@github.com> (raw)
In-Reply-To: <20230424022318.80469-1-hanxin.hx@bytedance.com>

On 4/23/2023 10:23 PM, Han Xin wrote:
> mark_common() in negotiator/default.c may overflow the stack due to
> recursive function calls. Avoid this by instead recursing using a
> heap-allocated data structure.

I'm really happy to see that since you could replace the if statement
with a while statement, most of the existing logic could stay without
a bunch of whitespace changes.
 
> This is the same case as [1].
> 
> 1. https://lore.kernel.org/git/20221025232934.1504445-1-jonathantanmy@google.com/

Thanks for the link, though this could be replaced with

  4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25)

now that the change exists in the commit history.

One thing that is missing from that change is a test, and such a test
could be generalized to apply to all negotiators. This could maybe
help any potential future negotiator avoid this bug. Did you think
about what such a test could look like? Perhaps test_commit_bulk
could help, but we'd probably need to create so many commits that the
test would need to be marked as expensive. That's probably a major
reason to not include a test and rely on avoiding recursion when
possible.

> -	if (commit != NULL && !(commit->object.flags & COMMON)) {
> +	struct prio_queue queue = { NULL };
> +
> +	prio_queue_put(&queue, commit);

Should we check the conditions what were removed? The COMMON flag
is likely only useful for the recursion, but prio_queue_put() is
not careful about NULL values. However, no callers should be
providing NULL commits here.

Couldn't hurt to add
	
	if (!commit)
		return;

before the prio_queue_put().

> +	while ((commit = prio_queue_get(&queue))) {
>  		struct object *o = (struct object *)commit;
>  
> +		if (commit == NULL || (commit->object.flags & COMMON))
> +			continue;

The NULL condition is definitely unnecessary here as it is checked
by the while condition. The "& COMMON" is helpful if the commit
gained the COMMON flag after being inserted into the queue.

>  		if (!ancestors_only)
>  			o->flags |= COMMON;
>  


> @@ -70,15 +76,17 @@ static void mark_common(struct negotiation_state *ns, struct commit *commit,
>  				ns->non_common_revs--;
>  			if (!o->parsed && !dont_parse)
>  				if (repo_parse_commit(the_repository, commit))
> -					return;
> +					continue;
>  
> +			ancestors_only = 0;

This caught me off guard, but this flag essentially says "should
I mark the first commit as common or not?". It would probably be
clearer if this was done before the loop, and then was ignored
within the loop, setting the flag on each parent in this loop:

>  			for (parents = commit->parents;
>  					parents;
>  					parents = parents->next)
> -				mark_common(ns, parents->item, 0,
> -					    dont_parse);
> +				prio_queue_put(&queue, parents->item);

It would have an extra benefit: your walk may duplicate objects in the
priority queue (there is no duplicate protection in prio_queue_put).
But, we could use

	if (!(parents->item->object.flags & COMMON)) {
		parents->item->object.flags |= COMMON;
		prio_queue_put(&queue, parents->item);
	}

as duplicate protection _and_ a clearer way to demonstrate what
ancestors_only is doing. Without this protection, it is possible
to have exponential growth in the priority queue using simple
merge commits.

You'd need this at the beginning:

	if (!commit)
		return;

	prio_queue_put(&queue, commit);
	if (!ancestors_only)
		commit->object.flags |= COMMON;
> diff --git a/negotiator/skipping.c b/negotiator/skipping.c
> index c7d6ab39bc..3d262b3533 100644
> --- a/negotiator/skipping.c
> +++ b/negotiator/skipping.c
> @@ -108,6 +108,8 @@ static void mark_common(struct data *data, struct commit *seen_commit)
>  				prio_queue_put(&queue, p->item);
>  		}
>  	}
> +
> +	clear_prio_queue(&queue);

This memory leak cleanup in the skipping negotiator is good to
do, but should be split into its own change.

In addition, the mark_common() method there seems to have a few
problems:

 1. It does not do duplicate protection before prio_queue_put().
    (The COMMON bit would work here, too.)
 2. When it translated from recursive to iterative it kept "return"
    statements that should probably be "continue" statements.
 3. It does not attempt to parse commits, and instead returns
    immediately when finding an unparsed commit. This is something
    that it did in its original version, so maybe it is by design,
    but it doesn't match the doc comment for the method.

Consider fixing these issues while you are here.

Thanks,
-Stolee

  reply	other threads:[~2023-04-24 14:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-24  2:23 [PATCH v1] negotiator/default.c: avoid stack overflow Han Xin
2023-04-24 14:44 ` Derrick Stolee [this message]
2023-04-25  3:02   ` [External] " Han Xin
2023-04-25 13:34     ` Derrick Stolee
2023-04-26  4:05 ` [PATCH v2 0/2] negotiator/default: " Han Xin
2023-04-26  4:05   ` [PATCH v2 1/2] " Han Xin
2023-04-26 11:13     ` Derrick Stolee
2023-04-26 11:40       ` [External] " Han Xin
2023-04-26  4:05   ` [PATCH v2 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
2023-04-26 11:08     ` Derrick Stolee
2023-04-26 11:55       ` [External] " Han Xin
2023-04-26 13:15   ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Han Xin
2023-04-26 13:15     ` [PATCH v3 1/2] " Han Xin
2023-04-26 17:14       ` Junio C Hamano
2023-04-26 17:30         ` Derrick Stolee
2023-04-26 17:38           ` Junio C Hamano
2023-04-26 13:15     ` [PATCH v3 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
2023-05-01 22:11     ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Junio C Hamano
2023-05-02  1:49       ` Derrick Stolee
2023-05-02 15:51         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2bcaeba9-20bc-1ca8-849b-ac54342c71e3@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanxin.hx@bytedance.com \
    --cc=jonathantanmy@google.com \
    --cc=xingxin.xx@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).