Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] negotiator/default.c: avoid stack overflow
@ 2023-04-24  2:23 Han Xin
  2023-04-24 14:44 ` Derrick Stolee
  2023-04-26  4:05 ` [PATCH v2 0/2] negotiator/default: " Han Xin
  0 siblings, 2 replies; 20+ messages in thread
From: Han Xin @ 2023-04-24  2:23 UTC (permalink / raw)
  To: git; +Cc: Han Xin, xingxin.xx, jonathantanmy, Junio C Hamano

mark_common() in negotiator/default.c may overflow the stack due to
recursive function calls. Avoid this by instead recursing using a
heap-allocated data structure.

This is the same case as [1].

1. https://lore.kernel.org/git/20221025232934.1504445-1-jonathantanmy@google.com/

Reported-by: Xin Xing <xingxin.xx@bytedance.com>
Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 negotiator/default.c  | 16 ++++++++++++----
 negotiator/skipping.c |  2 ++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/negotiator/default.c b/negotiator/default.c
index f4b78eb47d..6ab7f11409 100644
--- a/negotiator/default.c
+++ b/negotiator/default.c
@@ -55,9 +55,15 @@ static int clear_marks(const char *refname, const struct object_id *oid,
 static void mark_common(struct negotiation_state *ns, struct commit *commit,
 		int ancestors_only, int dont_parse)
 {
-	if (commit != NULL && !(commit->object.flags & COMMON)) {
+	struct prio_queue queue = { NULL };
+
+	prio_queue_put(&queue, commit);
+	while ((commit = prio_queue_get(&queue))) {
 		struct object *o = (struct object *)commit;
 
+		if (commit == NULL || (commit->object.flags & COMMON))
+			continue;
+
 		if (!ancestors_only)
 			o->flags |= COMMON;
 
@@ -70,15 +76,17 @@ static void mark_common(struct negotiation_state *ns, struct commit *commit,
 				ns->non_common_revs--;
 			if (!o->parsed && !dont_parse)
 				if (repo_parse_commit(the_repository, commit))
-					return;
+					continue;
 
+			ancestors_only = 0;
 			for (parents = commit->parents;
 					parents;
 					parents = parents->next)
-				mark_common(ns, parents->item, 0,
-					    dont_parse);
+				prio_queue_put(&queue, parents->item);
 		}
 	}
+
+	clear_prio_queue(&queue);
 }
 
 /*
diff --git a/negotiator/skipping.c b/negotiator/skipping.c
index c7d6ab39bc..3d262b3533 100644
--- a/negotiator/skipping.c
+++ b/negotiator/skipping.c
@@ -108,6 +108,8 @@ static void mark_common(struct data *data, struct commit *seen_commit)
 				prio_queue_put(&queue, p->item);
 		}
 	}
+
+	clear_prio_queue(&queue);
 }
 
 /*
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v1] negotiator/default.c: avoid stack overflow
  2023-04-24  2:23 [PATCH v1] negotiator/default.c: avoid stack overflow Han Xin
@ 2023-04-24 14:44 ` Derrick Stolee
  2023-04-25  3:02   ` [External] " Han Xin
  2023-04-26  4:05 ` [PATCH v2 0/2] negotiator/default: " Han Xin
  1 sibling, 1 reply; 20+ messages in thread
From: Derrick Stolee @ 2023-04-24 14:44 UTC (permalink / raw)
  To: Han Xin, git; +Cc: xingxin.xx, jonathantanmy, Junio C Hamano

On 4/23/2023 10:23 PM, Han Xin wrote:
> mark_common() in negotiator/default.c may overflow the stack due to
> recursive function calls. Avoid this by instead recursing using a
> heap-allocated data structure.

I'm really happy to see that since you could replace the if statement
with a while statement, most of the existing logic could stay without
a bunch of whitespace changes.
 
> This is the same case as [1].
> 
> 1. https://lore.kernel.org/git/20221025232934.1504445-1-jonathantanmy@google.com/

Thanks for the link, though this could be replaced with

  4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25)

now that the change exists in the commit history.

One thing that is missing from that change is a test, and such a test
could be generalized to apply to all negotiators. This could maybe
help any potential future negotiator avoid this bug. Did you think
about what such a test could look like? Perhaps test_commit_bulk
could help, but we'd probably need to create so many commits that the
test would need to be marked as expensive. That's probably a major
reason to not include a test and rely on avoiding recursion when
possible.

> -	if (commit != NULL && !(commit->object.flags & COMMON)) {
> +	struct prio_queue queue = { NULL };
> +
> +	prio_queue_put(&queue, commit);

Should we check the conditions what were removed? The COMMON flag
is likely only useful for the recursion, but prio_queue_put() is
not careful about NULL values. However, no callers should be
providing NULL commits here.

Couldn't hurt to add
	
	if (!commit)
		return;

before the prio_queue_put().

> +	while ((commit = prio_queue_get(&queue))) {
>  		struct object *o = (struct object *)commit;
>  
> +		if (commit == NULL || (commit->object.flags & COMMON))
> +			continue;

The NULL condition is definitely unnecessary here as it is checked
by the while condition. The "& COMMON" is helpful if the commit
gained the COMMON flag after being inserted into the queue.

>  		if (!ancestors_only)
>  			o->flags |= COMMON;
>  


> @@ -70,15 +76,17 @@ static void mark_common(struct negotiation_state *ns, struct commit *commit,
>  				ns->non_common_revs--;
>  			if (!o->parsed && !dont_parse)
>  				if (repo_parse_commit(the_repository, commit))
> -					return;
> +					continue;
>  
> +			ancestors_only = 0;

This caught me off guard, but this flag essentially says "should
I mark the first commit as common or not?". It would probably be
clearer if this was done before the loop, and then was ignored
within the loop, setting the flag on each parent in this loop:

>  			for (parents = commit->parents;
>  					parents;
>  					parents = parents->next)
> -				mark_common(ns, parents->item, 0,
> -					    dont_parse);
> +				prio_queue_put(&queue, parents->item);

It would have an extra benefit: your walk may duplicate objects in the
priority queue (there is no duplicate protection in prio_queue_put).
But, we could use

	if (!(parents->item->object.flags & COMMON)) {
		parents->item->object.flags |= COMMON;
		prio_queue_put(&queue, parents->item);
	}

as duplicate protection _and_ a clearer way to demonstrate what
ancestors_only is doing. Without this protection, it is possible
to have exponential growth in the priority queue using simple
merge commits.

You'd need this at the beginning:

	if (!commit)
		return;

	prio_queue_put(&queue, commit);
	if (!ancestors_only)
		commit->object.flags |= COMMON;
> diff --git a/negotiator/skipping.c b/negotiator/skipping.c
> index c7d6ab39bc..3d262b3533 100644
> --- a/negotiator/skipping.c
> +++ b/negotiator/skipping.c
> @@ -108,6 +108,8 @@ static void mark_common(struct data *data, struct commit *seen_commit)
>  				prio_queue_put(&queue, p->item);
>  		}
>  	}
> +
> +	clear_prio_queue(&queue);

This memory leak cleanup in the skipping negotiator is good to
do, but should be split into its own change.

In addition, the mark_common() method there seems to have a few
problems:

 1. It does not do duplicate protection before prio_queue_put().
    (The COMMON bit would work here, too.)
 2. When it translated from recursive to iterative it kept "return"
    statements that should probably be "continue" statements.
 3. It does not attempt to parse commits, and instead returns
    immediately when finding an unparsed commit. This is something
    that it did in its original version, so maybe it is by design,
    but it doesn't match the doc comment for the method.

Consider fixing these issues while you are here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [External] Re: [PATCH v1] negotiator/default.c: avoid stack overflow
  2023-04-24 14:44 ` Derrick Stolee
@ 2023-04-25  3:02   ` Han Xin
  2023-04-25 13:34     ` Derrick Stolee
  0 siblings, 1 reply; 20+ messages in thread
From: Han Xin @ 2023-04-25  3:02 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, xingxin.xx, jonathantanmy, Junio C Hamano

On Mon, Apr 24, 2023 at 10:44 PM Derrick Stolee
<derrickstolee@github.com> wrote:
>
> > This is the same case as [1].
> >
> > 1. https://lore.kernel.org/git/20221025232934.1504445-1-jonathantanmy@google.com/
>
> Thanks for the link, though this could be replaced with
>
>   4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25)
>
> now that the change exists in the commit history.

make sense.

>
> One thing that is missing from that change is a test, and such a test
> could be generalized to apply to all negotiators. This could maybe
> help any potential future negotiator avoid this bug. Did you think
> about what such a test could look like? Perhaps test_commit_bulk
> could help, but we'd probably need to create so many commits that the
> test would need to be marked as expensive. That's probably a major
> reason to not include a test and rely on avoiding recursion when
> possible.

I first found this issue in a large repository with numerous merge commits.
To address it, I added a test case which fast-imports 10,000 commits and
runs them through run_with_limited_stack(). Although expensive, this
approach was successful in executing the test case without any issues.

>
> > -     if (commit != NULL && !(commit->object.flags & COMMON)) {
> > +     struct prio_queue queue = { NULL };
> > +
> > +     prio_queue_put(&queue, commit);
>
> Should we check the conditions what were removed? The COMMON flag
> is likely only useful for the recursion, but prio_queue_put() is
> not careful about NULL values. However, no callers should be
> providing NULL commits here.
>
> Couldn't hurt to add
>
>         if (!commit)
>                 return;

make sense.

>
> before the prio_queue_put().
>
> > +     while ((commit = prio_queue_get(&queue))) {
> >               struct object *o = (struct object *)commit;
> >
> > +             if (commit == NULL || (commit->object.flags & COMMON))
> > +                     continue;
>
> The NULL condition is definitely unnecessary here as it is checked
> by the while condition. The "& COMMON" is helpful if the commit
> gained the COMMON flag after being inserted into the queue.
>
> >               if (!ancestors_only)
> >                       o->flags |= COMMON;
> >
>
>
> > @@ -70,15 +76,17 @@ static void mark_common(struct negotiation_state *ns, struct commit *commit,
> >                               ns->non_common_revs--;
> >                       if (!o->parsed && !dont_parse)
> >                               if (repo_parse_commit(the_repository, commit))
> > -                                     return;
> > +                                     continue;
> >
> > +                     ancestors_only = 0;
>
> This caught me off guard, but this flag essentially says "should
> I mark the first commit as common or not?". It would probably be
> clearer if this was done before the loop, and then was ignored
> within the loop, setting the flag on each parent in this loop:
>
> >                       for (parents = commit->parents;
> >                                       parents;
> >                                       parents = parents->next)
> > -                             mark_common(ns, parents->item, 0,
> > -                                         dont_parse);
> > +                             prio_queue_put(&queue, parents->item);
>

I'll think about how to optimize this again.

ancestors_only is used multiple times in the original logic:
1.
              if (!ancestors_only)
                     o->flags |= COMMON;
2.
             if (!(o->flags & SEEN))
                     rev_list_push(ns, commit, SEEN);
             else {
                     struct commit_list *parents;

                     if (!ancestors_only && !(o->flags & POPPED))
                             ns->non_common_revs--;

Should we use this ?

             if (!ancestors_only) {
                    commit->object.flags |= COMMON;

                    if ((commit->object.flags & SEEN) &&
!(commit->object.flags & POPPED))
                             ns->non_common_revs--;
             }

and

                   for (parents = commit->parents;
                             parents;
                             parents = parents->next) {
                             if (parents->item->object.flags & COMMON)
                                      continue;

                            parents->item->object.flags |= COMMON;

                            if ((parents->item->object.flags & SEEN)
                                     && !(parents->item->object.flags & POPPED))
                                      ns->non_common_revs--;

                            prio_queue_put(&queue, parents->item);
                   }

> It would have an extra benefit: your walk may duplicate objects in the
> priority queue (there is no duplicate protection in prio_queue_put).
> But, we could use
>
>         if (!(parents->item->object.flags & COMMON)) {
>                 parents->item->object.flags |= COMMON;
>                 prio_queue_put(&queue, parents->item);
>         }
>
> as duplicate protection _and_ a clearer way to demonstrate what
> ancestors_only is doing. Without this protection, it is possible
> to have exponential growth in the priority queue using simple
> merge commits.
>
> You'd need this at the beginning:
>
>         if (!commit)
>                 return;
>
>         prio_queue_put(&queue, commit);
>         if (!ancestors_only)
>                 commit->object.flags |= COMMON;

Make sense.

> > diff --git a/negotiator/skipping.c b/negotiator/skipping.c
> > index c7d6ab39bc..3d262b3533 100644
> > --- a/negotiator/skipping.c
> > +++ b/negotiator/skipping.c
> > @@ -108,6 +108,8 @@ static void mark_common(struct data *data, struct commit *seen_commit)
> >                               prio_queue_put(&queue, p->item);
> >               }
> >       }
> > +
> > +     clear_prio_queue(&queue);
>
> This memory leak cleanup in the skipping negotiator is good to
> do, but should be split into its own change.
>
> In addition, the mark_common() method there seems to have a few
> problems:
>
>  1. It does not do duplicate protection before prio_queue_put().
>     (The COMMON bit would work here, too.)
>  2. When it translated from recursive to iterative it kept "return"
>     statements that should probably be "continue" statements.
>  3. It does not attempt to parse commits, and instead returns
>     immediately when finding an unparsed commit. This is something
>     that it did in its original version, so maybe it is by design,
>     but it doesn't match the doc comment for the method.
>
> Consider fixing these issues while you are here.
>

Make sense.

Thanks.
-Han Xin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [External] Re: [PATCH v1] negotiator/default.c: avoid stack overflow
  2023-04-25  3:02   ` [External] " Han Xin
@ 2023-04-25 13:34     ` Derrick Stolee
  0 siblings, 0 replies; 20+ messages in thread
From: Derrick Stolee @ 2023-04-25 13:34 UTC (permalink / raw)
  To: Han Xin; +Cc: git, xingxin.xx, jonathantanmy, Junio C Hamano

On 4/24/2023 11:02 PM, Han Xin wrote:
> On Mon, Apr 24, 2023 at 10:44 PM Derrick Stolee
> <derrickstolee@github.com> wrote:

>>> @@ -70,15 +76,17 @@ static void mark_common(struct negotiation_state *ns, struct commit *commit,
>>>                               ns->non_common_revs--;
>>>                       if (!o->parsed && !dont_parse)
>>>                               if (repo_parse_commit(the_repository, commit))
>>> -                                     return;
>>> +                                     continue;
>>>
>>> +                     ancestors_only = 0;
>>
>> This caught me off guard, but this flag essentially says "should
>> I mark the first commit as common or not?". It would probably be
>> clearer if this was done before the loop, and then was ignored
>> within the loop, setting the flag on each parent in this loop:
>>
>>>                       for (parents = commit->parents;
>>>                                       parents;
>>>                                       parents = parents->next)
>>> -                             mark_common(ns, parents->item, 0,
>>> -                                         dont_parse);
>>> +                             prio_queue_put(&queue, parents->item);
>>
> 
> I'll think about how to optimize this again.
> 
> ancestors_only is used multiple times in the original logic:
> 1.
>               if (!ancestors_only)
>                      o->flags |= COMMON;
> 2.
>              if (!(o->flags & SEEN))
>                      rev_list_push(ns, commit, SEEN);
>              else {
>                      struct commit_list *parents;
> 
>                      if (!ancestors_only && !(o->flags & POPPED))
>                              ns->non_common_revs--;

Good point. Thanks for checking.
 
> Should we use this ?
> 
>              if (!ancestors_only) {
>                     commit->object.flags |= COMMON;
> 
>                     if ((commit->object.flags & SEEN) &&
> !(commit->object.flags & POPPED))
>                              ns->non_common_revs--;
>              }

This seems correct, although your email seems to have done a strange
line wrap that I'm sure you'll fix in the actual patch.

> and
> 
>                    for (parents = commit->parents;
>                              parents;
>                              parents = parents->next) {
>                              if (parents->item->object.flags & COMMON)
>                                       continue;
> 
>                             parents->item->object.flags |= COMMON;

Thanks, this part avoids duplicate additions to the queue.

>                             if ((parents->item->object.flags & SEEN)
>                                      && !(parents->item->object.flags & POPPED))
>                                       ns->non_common_revs--;

And this matches the non_common_revs part.

If you want this code to be a little cleaner, you could add

	struct commit *p = parents->item;

at the start of the loop and then s/parents->item/p/ for the rest
of the uses in the loop.

>                             prio_queue_put(&queue, parents->item);
>                    }

>> In addition, the mark_common() method there seems to have a few
>> problems:
...
>> Consider fixing these issues while you are here.
>>
> 
> Make sense.

I'm looking forward to your v2!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 0/2] negotiator/default: avoid stack overflow
  2023-04-24  2:23 [PATCH v1] negotiator/default.c: avoid stack overflow Han Xin
  2023-04-24 14:44 ` Derrick Stolee
@ 2023-04-26  4:05 ` Han Xin
  2023-04-26  4:05   ` [PATCH v2 1/2] " Han Xin
                     ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Han Xin @ 2023-04-26  4:05 UTC (permalink / raw)
  To: git; +Cc: Han Xin, xingxin.xx, jonathantanmy, derrickstolee, Junio C Hamano

This series avoid stack overflow in negotiator/default.c and memory leak
in negotiator/skipping.c.

Changes since v1:
* Add duplicate protection in negotiator/default.c and
  negotiator/skipping.c.
* Split the memory leak cleanup in negotiator/skipping.c into its own
  change and fix some other problems sugguested by Derrick Stolee.
* Minor grammar/comment etc. fixes throughout.

Han Xin (2):
  negotiator/default: avoid stack overflow
  negotiator/skipping: fix some problems in mark_common()

 negotiator/default.c  | 39 +++++++++++++++++++++++++++++----------
 negotiator/skipping.c | 10 ++++++----
 2 files changed, 35 insertions(+), 14 deletions(-)

Range-diff against v1:
1:  a0a1473f5e < -:  ---------- negotiator/default.c: avoid stack overflow
-:  ---------- > 1:  935be72eb9 negotiator/default: avoid stack overflow
-:  ---------- > 2:  abbb1bc0b3 negotiator/skipping: fix some problems in mark_common()
-- 
2.40.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2] negotiator/default: avoid stack overflow
  2023-04-26  4:05 ` [PATCH v2 0/2] negotiator/default: " Han Xin
@ 2023-04-26  4:05   ` Han Xin
  2023-04-26 11:13     ` Derrick Stolee
  2023-04-26  4:05   ` [PATCH v2 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
  2023-04-26 13:15   ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Han Xin
  2 siblings, 1 reply; 20+ messages in thread
From: Han Xin @ 2023-04-26  4:05 UTC (permalink / raw)
  To: git; +Cc: Han Xin, xingxin.xx, jonathantanmy, derrickstolee, Junio C Hamano

mark_common() in negotiator/default.c may overflow the stack due to
recursive function calls. Avoid this by instead recursing using a
heap-allocated data structure.

This is the same case as [1].

1. 4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25)

Reported-by: Xin Xing <xingxin.xx@bytedance.com>
Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 negotiator/default.c | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/negotiator/default.c b/negotiator/default.c
index f4b78eb47d..635cdd6483 100644
--- a/negotiator/default.c
+++ b/negotiator/default.c
@@ -55,30 +55,49 @@ static int clear_marks(const char *refname, const struct object_id *oid,
 static void mark_common(struct negotiation_state *ns, struct commit *commit,
 		int ancestors_only, int dont_parse)
 {
-	if (commit != NULL && !(commit->object.flags & COMMON)) {
-		struct object *o = (struct object *)commit;
+	struct prio_queue queue = { NULL };
+
+	if (!commit || (commit->object.flags & COMMON))
+		return;
+
+	prio_queue_put(&queue, commit);
+	if (!ancestors_only) {
+		commit->object.flags |= COMMON;
 
-		if (!ancestors_only)
-			o->flags |= COMMON;
+		if ((commit->object.flags & SEEN) && !(commit->object.flags & POPPED))
+			ns->non_common_revs--;
+	}
+	while ((commit = prio_queue_get(&queue))) {
+		struct object *o = (struct object *)commit;
 
 		if (!(o->flags & SEEN))
 			rev_list_push(ns, commit, SEEN);
 		else {
 			struct commit_list *parents;
 
-			if (!ancestors_only && !(o->flags & POPPED))
-				ns->non_common_revs--;
 			if (!o->parsed && !dont_parse)
 				if (repo_parse_commit(the_repository, commit))
-					return;
+					continue;
 
 			for (parents = commit->parents;
 					parents;
-					parents = parents->next)
-				mark_common(ns, parents->item, 0,
-					    dont_parse);
+					parents = parents->next) {
+				struct commit *p = parents->item;
+
+				if (p->object.flags & COMMON)
+					continue;
+
+				p->object.flags |= COMMON;
+
+				if ((p->object.flags & SEEN) && !(p->object.flags & POPPED))
+					ns->non_common_revs--;
+
+				prio_queue_put(&queue, parents->item);
+			}
 		}
 	}
+
+	clear_prio_queue(&queue);
 }
 
 /*
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2] negotiator/skipping: fix some problems in mark_common()
  2023-04-26  4:05 ` [PATCH v2 0/2] negotiator/default: " Han Xin
  2023-04-26  4:05   ` [PATCH v2 1/2] " Han Xin
@ 2023-04-26  4:05   ` Han Xin
  2023-04-26 11:08     ` Derrick Stolee
  2023-04-26 13:15   ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Han Xin
  2 siblings, 1 reply; 20+ messages in thread
From: Han Xin @ 2023-04-26  4:05 UTC (permalink / raw)
  To: git; +Cc: Han Xin, xingxin.xx, jonathantanmy, derrickstolee, Junio C Hamano

Fixed the following problems:

1. prio_queue() should be used with clear_prio_queue(), otherwise there
   will be a memory leak.
2. It does not do duplicate protection before prio_queue_put().
   (The COMMON bit would work here, too.)
3. When it translated from recursive to iterative it kept "return"
   statements that should probably be "continue" statements.
4. It does not attempt to parse commits, and instead returns
   immediately when finding an unparsed commit. This is something
   that it did in its original version, so maybe it is by design,
   but it doesn't match the doc comment for the method.

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 negotiator/skipping.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/negotiator/skipping.c b/negotiator/skipping.c
index c7d6ab39bc..b06dcb197b 100644
--- a/negotiator/skipping.c
+++ b/negotiator/skipping.c
@@ -85,7 +85,7 @@ static int clear_marks(const char *refname, const struct object_id *oid,
 }
 
 /*
- * Mark this SEEN commit and all its SEEN ancestors as COMMON.
+ * Mark this SEEN commit and all its parsed SEEN ancestors as COMMON.
  */
 static void mark_common(struct data *data, struct commit *seen_commit)
 {
@@ -96,18 +96,20 @@ static void mark_common(struct data *data, struct commit *seen_commit)
 	while ((c = prio_queue_get(&queue))) {
 		struct commit_list *p;
 		if (c->object.flags & COMMON)
-			return;
+			continue;
 		c->object.flags |= COMMON;
 		if (!(c->object.flags & POPPED))
 			data->non_common_revs--;
 
 		if (!c->object.parsed)
-			return;
+			continue;
 		for (p = c->parents; p; p = p->next) {
-			if (p->item->object.flags & SEEN)
+			if (p->item->object.flags & SEEN || p->item->object.flags & COMMON)
 				prio_queue_put(&queue, p->item);
 		}
 	}
+
+	clear_prio_queue(&queue);
 }
 
 /*
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] negotiator/skipping: fix some problems in mark_common()
  2023-04-26  4:05   ` [PATCH v2 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
@ 2023-04-26 11:08     ` Derrick Stolee
  2023-04-26 11:55       ` [External] " Han Xin
  0 siblings, 1 reply; 20+ messages in thread
From: Derrick Stolee @ 2023-04-26 11:08 UTC (permalink / raw)
  To: Han Xin, git; +Cc: xingxin.xx, jonathantanmy, Junio C Hamano

On 4/26/2023 12:05 AM, Han Xin wrote:
> Fixed the following problems:

This might be a good time to reference the change from recursive to
iterative:

  The mark_common() method in negotiator/skipping.c was converted
  from recursive to iterative in 4654134976f (negotiator/skipping:
  avoid stack overflow, 2022-10-25), but there is some more work
  to do:
 
> 1. prio_queue() should be used with clear_prio_queue(), otherwise there
>    will be a memory leak.
> 2. It does not do duplicate protection before prio_queue_put().
>    (The COMMON bit would work here, too.)
> 3. When it translated from recursive to iterative it kept "return"
>    statements that should probably be "continue" statements.
> 4. It does not attempt to parse commits, and instead returns
>    immediately when finding an unparsed commit. This is something
>    that it did in its original version, so maybe it is by design,
>    but it doesn't match the doc comment for the method.
> 
> Helped-by: Derrick Stolee <derrickstolee@github.com>
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> ---
>  negotiator/skipping.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/negotiator/skipping.c b/negotiator/skipping.c
> index c7d6ab39bc..b06dcb197b 100644
> --- a/negotiator/skipping.c
> +++ b/negotiator/skipping.c
> @@ -85,7 +85,7 @@ static int clear_marks(const char *refname, const struct object_id *oid,
>  }
>  
>  /*
> - * Mark this SEEN commit and all its SEEN ancestors as COMMON.
> + * Mark this SEEN commit and all its parsed SEEN ancestors as COMMON.

Ok, the doc comment is updated here instead of starting to parse
commits. Since this is the behavior since it was first introduced
in 42cc7485a2e (negotiator/skipping: skip commits during fetch,
2018-07-16), this is the best thing to do.

I notice from a second glance that we are only walking the commits
that are marked SEEN, so we are only visiting commits with respect
to a previous walk of some kind, which provides some explanation.

>  	while ((c = prio_queue_get(&queue))) {
>  		struct commit_list *p;
>  		if (c->object.flags & COMMON)
> -			return;
> +			continue;
>  		c->object.flags |= COMMON;
>  		if (!(c->object.flags & POPPED))
>  			data->non_common_revs--;
>  
>  		if (!c->object.parsed)
> -			return;
> +			continue;
>  		for (p = c->parents; p; p = p->next) {
> -			if (p->item->object.flags & SEEN)
> +			if (p->item->object.flags & SEEN || p->item->object.flags & COMMON)
>  				prio_queue_put(&queue, p->item);

This is the incorrect check for the COMMON bit, because it is
a positive check (we add the common bit after we pop a commit
from the queue) _and_ because we could add a commit multiple
times before it is first popped and that bit is added.

Instead, we need

			if ((p->item->object.flags & SEEN) &&
			    !(p->item->object.flags & COMMON)) {
				p->item->object.flags |= COMMON;
				prio_queue_put(&queue, p->item);
			}

and at the start of the loop we need to add the COMMON bit to
the starting commit. We also need to remove this bit from the
main section of the loop:

  		if (c->object.flags & COMMON)
			continue;
 		c->object.flags |= COMMON;

because it does nothing if the COMMON bit is added before
being added to the queue.

I'm very suspicious that this change did not trigger a test
failure, since the behavior is quite different from the previous
version. Of course, the recursive-to-iterative change was first
to change the behavior, so I'm not surprised that it isn't caught
by tests. What kind of tests can we introduce to harden our
coverage here?

>  		}
>  	}
> +
> +	clear_prio_queue(&queue);

Good to clean up this queue.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] negotiator/default: avoid stack overflow
  2023-04-26  4:05   ` [PATCH v2 1/2] " Han Xin
@ 2023-04-26 11:13     ` Derrick Stolee
  2023-04-26 11:40       ` [External] " Han Xin
  0 siblings, 1 reply; 20+ messages in thread
From: Derrick Stolee @ 2023-04-26 11:13 UTC (permalink / raw)
  To: Han Xin, git; +Cc: xingxin.xx, jonathantanmy, Junio C Hamano

On 4/26/2023 12:05 AM, Han Xin wrote:
> mark_common() in negotiator/default.c may overflow the stack due to
> recursive function calls. Avoid this by instead recursing using a
> heap-allocated data structure.
> 
> This is the same case as [1].
> 
> 1. 4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25)

We would typically write this inline, such as:

  This is the same case as 4654134976f (negotiator/skipping: avoid
  stack overflow, 2022-10-25)

> -	if (commit != NULL && !(commit->object.flags & COMMON)) {
> -		struct object *o = (struct object *)commit;
> +	struct prio_queue queue = { NULL };
> +
> +	if (!commit || (commit->object.flags & COMMON))
> +		return;
> +
> +	prio_queue_put(&queue, commit);
> +	if (!ancestors_only) {
> +		commit->object.flags |= COMMON;
>  
> -		if (!ancestors_only)
> -			o->flags |= COMMON;
> +		if ((commit->object.flags & SEEN) && !(commit->object.flags & POPPED))
> +			ns->non_common_revs--;
> +	}
> +	while ((commit = prio_queue_get(&queue))) {
> +		struct object *o = (struct object *)commit;
>  
>  		if (!(o->flags & SEEN))
>  			rev_list_push(ns, commit, SEEN);
>  		else {
>  			struct commit_list *parents;
>  
> -			if (!ancestors_only && !(o->flags & POPPED))
> -				ns->non_common_revs--;
>  			if (!o->parsed && !dont_parse)
>  				if (repo_parse_commit(the_repository, commit))
> -					return;
> +					continue;
>  
>  			for (parents = commit->parents;
>  					parents;
> -					parents = parents->next)
> -				mark_common(ns, parents->item, 0,
> -					    dont_parse);
> +					parents = parents->next) {
> +				struct commit *p = parents->item;
> +
> +				if (p->object.flags & COMMON)
> +					continue;
> +
> +				p->object.flags |= COMMON;
> +
> +				if ((p->object.flags & SEEN) && !(p->object.flags & POPPED))
> +					ns->non_common_revs--;
> +
> +				prio_queue_put(&queue, parents->item);
> +			}
>  		}
>  	}
> +
> +	clear_prio_queue(&queue);

Thanks for this version. It looks like an identical set of actions
in the commit walk, but the change from DFS to priority queue is
a welcome change.

-Stolee

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [External] Re: [PATCH v2 1/2] negotiator/default: avoid stack overflow
  2023-04-26 11:13     ` Derrick Stolee
@ 2023-04-26 11:40       ` Han Xin
  0 siblings, 0 replies; 20+ messages in thread
From: Han Xin @ 2023-04-26 11:40 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, xingxin.xx, jonathantanmy, Junio C Hamano

On Wed, Apr 26, 2023 at 7:13 PM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 4/26/2023 12:05 AM, Han Xin wrote:
> > mark_common() in negotiator/default.c may overflow the stack due to
> > recursive function calls. Avoid this by instead recursing using a
> > heap-allocated data structure.
> >
> > This is the same case as [1].
> >
> > 1. 4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25)
>
> We would typically write this inline, such as:
>
>   This is the same case as 4654134976f (negotiator/skipping: avoid
>   stack overflow, 2022-10-25)
>

Make sense.

Thanks
-Han Xin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [External] Re: [PATCH v2 2/2] negotiator/skipping: fix some problems in mark_common()
  2023-04-26 11:08     ` Derrick Stolee
@ 2023-04-26 11:55       ` Han Xin
  0 siblings, 0 replies; 20+ messages in thread
From: Han Xin @ 2023-04-26 11:55 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, xingxin.xx, jonathantanmy, Junio C Hamano

On Wed, Apr 26, 2023 at 7:09 PM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 4/26/2023 12:05 AM, Han Xin wrote:
> > Fixed the following problems:
>
> This might be a good time to reference the change from recursive to
> iterative:
>
>   The mark_common() method in negotiator/skipping.c was converted
>   from recursive to iterative in 4654134976f (negotiator/skipping:
>   avoid stack overflow, 2022-10-25), but there is some more work
>   to do:
>

Make sense.

> >       while ((c = prio_queue_get(&queue))) {
> >               struct commit_list *p;
> >               if (c->object.flags & COMMON)
> > -                     return;
> > +                     continue;
> >               c->object.flags |= COMMON;
> >               if (!(c->object.flags & POPPED))
> >                       data->non_common_revs--;
> >
> >               if (!c->object.parsed)
> > -                     return;
> > +                     continue;
> >               for (p = c->parents; p; p = p->next) {
> > -                     if (p->item->object.flags & SEEN)
> > +                     if (p->item->object.flags & SEEN || p->item->object.flags & COMMON)
> >                               prio_queue_put(&queue, p->item);
>
> This is the incorrect check for the COMMON bit, because it is
> a positive check (we add the common bit after we pop a commit
> from the queue) _and_ because we could add a commit multiple
> times before it is first popped and that bit is added.
>

Yes, I introduced a silly thing.

> Instead, we need
>
>                         if ((p->item->object.flags & SEEN) &&
>                             !(p->item->object.flags & COMMON)) {
>                                 p->item->object.flags |= COMMON;
>                                 prio_queue_put(&queue, p->item);
>                         }
>
> and at the start of the loop we need to add the COMMON bit to
> the starting commit. We also need to remove this bit from the
> main section of the loop:
>
>                 if (c->object.flags & COMMON)
>                         continue;
>                 c->object.flags |= COMMON;
>
> because it does nothing if the COMMON bit is added before
> being added to the queue.
>

Make sense.
And with this, we should do return before loop:

                if (seen_commit->object.flags & COMMON)
                        return;

                prio_queue_put(&queue, seen_commit);
                while ((c = prio_queue_get(&queue))) {

> I'm very suspicious that this change did not trigger a test
> failure, since the behavior is quite different from the previous
> version. Of course, the recursive-to-iterative change was first
> to change the behavior, so I'm not surprised that it isn't caught
> by tests. What kind of tests can we introduce to harden our
> coverage here?
>

With "p->item->object.flags & COMMON", it takes more meaningless
walking, but doesn't seem to introduce any errors. I haven't found any
good way to avoid similar problems.

Thanks
-Han Xin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 0/2] negotiator/default: avoid stack overflow
  2023-04-26  4:05 ` [PATCH v2 0/2] negotiator/default: " Han Xin
  2023-04-26  4:05   ` [PATCH v2 1/2] " Han Xin
  2023-04-26  4:05   ` [PATCH v2 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
@ 2023-04-26 13:15   ` Han Xin
  2023-04-26 13:15     ` [PATCH v3 1/2] " Han Xin
                       ` (2 more replies)
  2 siblings, 3 replies; 20+ messages in thread
From: Han Xin @ 2023-04-26 13:15 UTC (permalink / raw)
  To: git; +Cc: Han Xin, xingxin.xx, jonathantanmy, derrickstolee, Junio C Hamano

This series avoid stack overflow in negotiator/default.c and memory leak
in negotiator/skipping.c.

Changes since v2:
* Rewrite the commit link in the typical format.
* Fix the incorrect check for the COMMON bit introduced in v2.

Han Xin (2):
  negotiator/default: avoid stack overflow
  negotiator/skipping: fix some problems in mark_common()

 negotiator/default.c  | 39 +++++++++++++++++++++++++++++----------
 negotiator/skipping.c | 22 +++++++++++++++-------
 2 files changed, 44 insertions(+), 17 deletions(-)

Range-diff against v2:
1:  935be72eb9 ! 1:  0e69d70805 negotiator/default: avoid stack overflow
    @@ Commit message
         recursive function calls. Avoid this by instead recursing using a
         heap-allocated data structure.
     
    -    This is the same case as [1].
    -
    -    1. 4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25)
    +    This is the same case as 4654134976f (negotiator/skipping: avoid
    +    stack overflow, 2022-10-25)
     
         Reported-by: Xin Xing <xingxin.xx@bytedance.com>
         Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
2:  abbb1bc0b3 ! 2:  8b5c92a4d5 negotiator/skipping: fix some problems in mark_common()
    @@ Metadata
      ## Commit message ##
         negotiator/skipping: fix some problems in mark_common()
     
    -    Fixed the following problems:
    +    The mark_common() method in negotiator/skipping.c was converted
    +    from recursive to iterative in 4654134976f (negotiator/skipping:
    +    avoid stack overflow, 2022-10-25), but there is some more work
    +    to do:
     
         1. prio_queue() should be used with clear_prio_queue(), otherwise there
            will be a memory leak.
    @@ negotiator/skipping.c: static int clear_marks(const char *refname, const struct
       */
      static void mark_common(struct data *data, struct commit *seen_commit)
      {
    -@@ negotiator/skipping.c: static void mark_common(struct data *data, struct commit *seen_commit)
    + 	struct prio_queue queue = { NULL };
    + 	struct commit *c;
    + 
    ++	if (seen_commit->object.flags & COMMON)
    ++		return;
    ++
    + 	prio_queue_put(&queue, seen_commit);
    ++	seen_commit->object.flags |= COMMON;
      	while ((c = prio_queue_get(&queue))) {
      		struct commit_list *p;
    - 		if (c->object.flags & COMMON)
    +-		if (c->object.flags & COMMON)
     -			return;
    -+			continue;
    - 		c->object.flags |= COMMON;
    +-		c->object.flags |= COMMON;
    ++
      		if (!(c->object.flags & POPPED))
      			data->non_common_revs--;
      
    @@ negotiator/skipping.c: static void mark_common(struct data *data, struct commit
     +			continue;
      		for (p = c->parents; p; p = p->next) {
     -			if (p->item->object.flags & SEEN)
    -+			if (p->item->object.flags & SEEN || p->item->object.flags & COMMON)
    - 				prio_queue_put(&queue, p->item);
    +-				prio_queue_put(&queue, p->item);
    ++			if (!(p->item->object.flags & SEEN) ||
    ++			    (p->item->object.flags & COMMON))
    ++				continue;
    ++
    ++			p->item->object.flags |= COMMON;
    ++			prio_queue_put(&queue, p->item);
      		}
      	}
     +
-- 
2.40.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/2] negotiator/default: avoid stack overflow
  2023-04-26 13:15   ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Han Xin
@ 2023-04-26 13:15     ` Han Xin
  2023-04-26 17:14       ` Junio C Hamano
  2023-04-26 13:15     ` [PATCH v3 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
  2023-05-01 22:11     ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Junio C Hamano
  2 siblings, 1 reply; 20+ messages in thread
From: Han Xin @ 2023-04-26 13:15 UTC (permalink / raw)
  To: git; +Cc: Han Xin, xingxin.xx, jonathantanmy, derrickstolee, Junio C Hamano

mark_common() in negotiator/default.c may overflow the stack due to
recursive function calls. Avoid this by instead recursing using a
heap-allocated data structure.

This is the same case as 4654134976f (negotiator/skipping: avoid
stack overflow, 2022-10-25)

Reported-by: Xin Xing <xingxin.xx@bytedance.com>
Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 negotiator/default.c | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/negotiator/default.c b/negotiator/default.c
index f4b78eb47d..635cdd6483 100644
--- a/negotiator/default.c
+++ b/negotiator/default.c
@@ -55,30 +55,49 @@ static int clear_marks(const char *refname, const struct object_id *oid,
 static void mark_common(struct negotiation_state *ns, struct commit *commit,
 		int ancestors_only, int dont_parse)
 {
-	if (commit != NULL && !(commit->object.flags & COMMON)) {
-		struct object *o = (struct object *)commit;
+	struct prio_queue queue = { NULL };
+
+	if (!commit || (commit->object.flags & COMMON))
+		return;
+
+	prio_queue_put(&queue, commit);
+	if (!ancestors_only) {
+		commit->object.flags |= COMMON;
 
-		if (!ancestors_only)
-			o->flags |= COMMON;
+		if ((commit->object.flags & SEEN) && !(commit->object.flags & POPPED))
+			ns->non_common_revs--;
+	}
+	while ((commit = prio_queue_get(&queue))) {
+		struct object *o = (struct object *)commit;
 
 		if (!(o->flags & SEEN))
 			rev_list_push(ns, commit, SEEN);
 		else {
 			struct commit_list *parents;
 
-			if (!ancestors_only && !(o->flags & POPPED))
-				ns->non_common_revs--;
 			if (!o->parsed && !dont_parse)
 				if (repo_parse_commit(the_repository, commit))
-					return;
+					continue;
 
 			for (parents = commit->parents;
 					parents;
-					parents = parents->next)
-				mark_common(ns, parents->item, 0,
-					    dont_parse);
+					parents = parents->next) {
+				struct commit *p = parents->item;
+
+				if (p->object.flags & COMMON)
+					continue;
+
+				p->object.flags |= COMMON;
+
+				if ((p->object.flags & SEEN) && !(p->object.flags & POPPED))
+					ns->non_common_revs--;
+
+				prio_queue_put(&queue, parents->item);
+			}
 		}
 	}
+
+	clear_prio_queue(&queue);
 }
 
 /*
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 2/2] negotiator/skipping: fix some problems in mark_common()
  2023-04-26 13:15   ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Han Xin
  2023-04-26 13:15     ` [PATCH v3 1/2] " Han Xin
@ 2023-04-26 13:15     ` Han Xin
  2023-05-01 22:11     ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Junio C Hamano
  2 siblings, 0 replies; 20+ messages in thread
From: Han Xin @ 2023-04-26 13:15 UTC (permalink / raw)
  To: git; +Cc: Han Xin, xingxin.xx, jonathantanmy, derrickstolee, Junio C Hamano

The mark_common() method in negotiator/skipping.c was converted
from recursive to iterative in 4654134976f (negotiator/skipping:
avoid stack overflow, 2022-10-25), but there is some more work
to do:

1. prio_queue() should be used with clear_prio_queue(), otherwise there
   will be a memory leak.
2. It does not do duplicate protection before prio_queue_put().
   (The COMMON bit would work here, too.)
3. When it translated from recursive to iterative it kept "return"
   statements that should probably be "continue" statements.
4. It does not attempt to parse commits, and instead returns
   immediately when finding an unparsed commit. This is something
   that it did in its original version, so maybe it is by design,
   but it doesn't match the doc comment for the method.

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 negotiator/skipping.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/negotiator/skipping.c b/negotiator/skipping.c
index c7d6ab39bc..6a5450b460 100644
--- a/negotiator/skipping.c
+++ b/negotiator/skipping.c
@@ -85,29 +85,37 @@ static int clear_marks(const char *refname, const struct object_id *oid,
 }
 
 /*
- * Mark this SEEN commit and all its SEEN ancestors as COMMON.
+ * Mark this SEEN commit and all its parsed SEEN ancestors as COMMON.
  */
 static void mark_common(struct data *data, struct commit *seen_commit)
 {
 	struct prio_queue queue = { NULL };
 	struct commit *c;
 
+	if (seen_commit->object.flags & COMMON)
+		return;
+
 	prio_queue_put(&queue, seen_commit);
+	seen_commit->object.flags |= COMMON;
 	while ((c = prio_queue_get(&queue))) {
 		struct commit_list *p;
-		if (c->object.flags & COMMON)
-			return;
-		c->object.flags |= COMMON;
+
 		if (!(c->object.flags & POPPED))
 			data->non_common_revs--;
 
 		if (!c->object.parsed)
-			return;
+			continue;
 		for (p = c->parents; p; p = p->next) {
-			if (p->item->object.flags & SEEN)
-				prio_queue_put(&queue, p->item);
+			if (!(p->item->object.flags & SEEN) ||
+			    (p->item->object.flags & COMMON))
+				continue;
+
+			p->item->object.flags |= COMMON;
+			prio_queue_put(&queue, p->item);
 		}
 	}
+
+	clear_prio_queue(&queue);
 }
 
 /*
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] negotiator/default: avoid stack overflow
  2023-04-26 13:15     ` [PATCH v3 1/2] " Han Xin
@ 2023-04-26 17:14       ` Junio C Hamano
  2023-04-26 17:30         ` Derrick Stolee
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2023-04-26 17:14 UTC (permalink / raw)
  To: Han Xin; +Cc: git, xingxin.xx, jonathantanmy, derrickstolee

Han Xin <hanxin.hx@bytedance.com> writes:

> mark_common() in negotiator/default.c may overflow the stack due to
> recursive function calls. Avoid this by instead recursing using a
> heap-allocated data structure.
>
> This is the same case as 4654134976f (negotiator/skipping: avoid
> stack overflow, 2022-10-25)
>
> Reported-by: Xin Xing <xingxin.xx@bytedance.com>
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> ---
>  negotiator/default.c | 39 +++++++++++++++++++++++++++++----------
>  1 file changed, 29 insertions(+), 10 deletions(-)
>
> diff --git a/negotiator/default.c b/negotiator/default.c
> index f4b78eb47d..635cdd6483 100644
> --- a/negotiator/default.c
> +++ b/negotiator/default.c
> @@ -55,30 +55,49 @@ static int clear_marks(const char *refname, const struct object_id *oid,
>  static void mark_common(struct negotiation_state *ns, struct commit *commit,
>  		int ancestors_only, int dont_parse)
>  {
> -	if (commit != NULL && !(commit->object.flags & COMMON)) {
> -		struct object *o = (struct object *)commit;
> +	struct prio_queue queue = { NULL };
> +
> +	if (!commit || (commit->object.flags & COMMON))
> +		return;

The original naive recursive marker had a large if block guarded by
the opposite condition around the whole thing, which amounts to the
same as this early return.  Good.

> +	prio_queue_put(&queue, commit);

And the code now uses on-stack priority queue here, and bootstraps
the machinery by placing the first element here.  OK.

> +	if (!ancestors_only) {
> +		commit->object.flags |= COMMON;
>  
> -		if (!ancestors_only)
> -			o->flags |= COMMON;

These two are equivalent, which is good.

> +		if ((commit->object.flags & SEEN) && !(commit->object.flags & POPPED))
> +			ns->non_common_revs--;

Hmph, this is a bit unexpected to duplicate the non_common_revs
counting logic here.  In the original, this piece of code was there
just after we decided to continue digging into the parents, and even
if this patch changes the mechanism with which "digging into the
parents" from recursion to priority queue, it is not obvious why we
can keep doing the decrementing for the current commit we are
looking at, instead of doing that for parents of the commit like
this patch does.  In other words, it is not clear why it needs to be
changed while going from recursive to iterative.

Is it because ancestors_only is not usable inside the loop in the
iterative version?  That is, if ancestors_only is not set, we do
paint the initial commit as COMMON just as the parents we discover
in the loop, but when ancestors_only is set, we need to skip painting
the initial commit as COMMON, so the patch moves that logic?

It may solve the issue of special casing the initial commit, but it
feels backwards in that the resulting loop becomes harder to
understand by making it necessary to process the initial commit
outside the loop only halfway.

It may make it easier to understand if we had another local
variable, "struct commit *skip_commit", that is NULL by default but
is set to the initial commit when ancestors_only is set, and do the
painting with COMMON and counting of non_common_revs all inside the
loop for the current commit that is being processed (instead of the
parents the loop discovered).  I dunno.  It would avoid duplicating
the logic and implements the "ancestors_only, do not paint or count
the initial commit" in a more readable and straight-forward way, no?

Thanks.

> +	}
> +	while ((commit = prio_queue_get(&queue))) {
> +		struct object *o = (struct object *)commit;
>  
>  		if (!(o->flags & SEEN))
>  			rev_list_push(ns, commit, SEEN);
>  		else {
>  			struct commit_list *parents;
>  
> -			if (!ancestors_only && !(o->flags & POPPED))
> -				ns->non_common_revs--;
>  			if (!o->parsed && !dont_parse)
>  				if (repo_parse_commit(the_repository, commit))
> -					return;
> +					continue;
>  
>  			for (parents = commit->parents;
>  					parents;
> -					parents = parents->next)
> -				mark_common(ns, parents->item, 0,
> -					    dont_parse);
> +					parents = parents->next) {
> +				struct commit *p = parents->item;
> +
> +				if (p->object.flags & COMMON)
> +					continue;
> +
> +				p->object.flags |= COMMON;
> +
> +				if ((p->object.flags & SEEN) && !(p->object.flags & POPPED))
> +					ns->non_common_revs--;
> +
> +				prio_queue_put(&queue, parents->item);
> +			}
>  		}
>  	}
> +
> +	clear_prio_queue(&queue);
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] negotiator/default: avoid stack overflow
  2023-04-26 17:14       ` Junio C Hamano
@ 2023-04-26 17:30         ` Derrick Stolee
  2023-04-26 17:38           ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Derrick Stolee @ 2023-04-26 17:30 UTC (permalink / raw)
  To: Junio C Hamano, Han Xin; +Cc: git, xingxin.xx, jonathantanmy

On 4/26/2023 1:14 PM, Junio C Hamano wrote:
> Han Xin <hanxin.hx@bytedance.com> writes:

>> +		if ((commit->object.flags & SEEN) && !(commit->object.flags & POPPED))
>> +			ns->non_common_revs--;
> 
> Hmph, this is a bit unexpected to duplicate the non_common_revs
> counting logic here.  In the original, this piece of code was there
> just after we decided to continue digging into the parents, and even
> if this patch changes the mechanism with which "digging into the
> parents" from recursion to priority queue, it is not obvious why we
> can keep doing the decrementing for the current commit we are
> looking at, instead of doing that for parents of the commit like
> this patch does.  In other words, it is not clear why it needs to be
> changed while going from recursive to iterative.
> 
> Is it because ancestors_only is not usable inside the loop in the
> iterative version?  That is, if ancestors_only is not set, we do
> paint the initial commit as COMMON just as the parents we discover
> in the loop, but when ancestors_only is set, we need to skip painting
> the initial commit as COMMON, so the patch moves that logic?
> 
> It may solve the issue of special casing the initial commit, but it
> feels backwards in that the resulting loop becomes harder to
> understand by making it necessary to process the initial commit
> outside the loop only halfway.

The "ancestors_only" parameter is about treating the initial commit
differently than the ancestors. Since we add the initial commit to
the priority queue, the only way to know we are dealing with an
ancestor and not the initial commit is to do the processing when
visiting a parent for the first time.

> It may make it easier to understand if we had another local
> variable, "struct commit *skip_commit", that is NULL by default but
> is set to the initial commit when ancestors_only is set,

This is an interesting idea and could reduce the duplicated logic
for nw->common_revs.

> and do the
> painting with COMMON and counting of non_common_revs all inside the
> loop for the current commit that is being processed (instead of the
> parents the loop discovered).  I dunno.  It would avoid duplicating
> the logic and implements the "ancestors_only, do not paint or count
> the initial commit" in a more readable and straight-forward way, no?

However, we need to do the painting with COMMON when visiting a
parent in order to avoid adding duplicate entries to the priority
queue (and potentially growing exponentially).

Since we need to examine and modify a parent before adding it to
the queue, it is natural to do other "we are visiting a commit"
logic in that same place.

It is unfortunate that the logic for nw->common_revs is duplicated,
but I think this is a cleaner approach than other considered
approaches.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] negotiator/default: avoid stack overflow
  2023-04-26 17:30         ` Derrick Stolee
@ 2023-04-26 17:38           ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2023-04-26 17:38 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Han Xin, git, xingxin.xx, jonathantanmy

Derrick Stolee <derrickstolee@github.com> writes:

> It is unfortunate that the logic for nw->common_revs is duplicated,
> but I think this is a cleaner approach than other considered
> approaches.

It is subjective and as long as the result works correctly, I am
fine with it ;-).

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] negotiator/default: avoid stack overflow
  2023-04-26 13:15   ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Han Xin
  2023-04-26 13:15     ` [PATCH v3 1/2] " Han Xin
  2023-04-26 13:15     ` [PATCH v3 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
@ 2023-05-01 22:11     ` Junio C Hamano
  2023-05-02  1:49       ` Derrick Stolee
  2 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2023-05-01 22:11 UTC (permalink / raw)
  To: Han Xin; +Cc: git, xingxin.xx, jonathantanmy, derrickstolee

Han Xin <hanxin.hx@bytedance.com> writes:

> This series avoid stack overflow in negotiator/default.c and memory leak
> in negotiator/skipping.c.
>
> Changes since v2:
> * Rewrite the commit link in the typical format.
> * Fix the incorrect check for the COMMON bit introduced in v2.

I see Derrick pointed out a logic error during the review of v2 and
this round corrects it.  Is everybody happy with this iteration and
considers it safe to merge it to 'next'?

Thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] negotiator/default: avoid stack overflow
  2023-05-01 22:11     ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Junio C Hamano
@ 2023-05-02  1:49       ` Derrick Stolee
  2023-05-02 15:51         ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Derrick Stolee @ 2023-05-02  1:49 UTC (permalink / raw)
  To: Junio C Hamano, Han Xin; +Cc: git, xingxin.xx, jonathantanmy

On 5/1/2023 6:11 PM, Junio C Hamano wrote:
> Han Xin <hanxin.hx@bytedance.com> writes:
> 
>> This series avoid stack overflow in negotiator/default.c and memory leak
>> in negotiator/skipping.c.
>>
>> Changes since v2:
>> * Rewrite the commit link in the typical format.
>> * Fix the incorrect check for the COMMON bit introduced in v2.
> 
> I see Derrick pointed out a logic error during the review of v2 and
> this round corrects it.  Is everybody happy with this iteration and
> considers it safe to merge it to 'next'?

Sorry for the lack of confirmation on that. I do think the v3
patches are good (the cover letter says v2).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/2] negotiator/default: avoid stack overflow
  2023-05-02  1:49       ` Derrick Stolee
@ 2023-05-02 15:51         ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2023-05-02 15:51 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Han Xin, git, xingxin.xx, jonathantanmy

Derrick Stolee <derrickstolee@github.com> writes:

>>> Changes since v2:
>>> * Rewrite the commit link in the typical format.
>>> * Fix the incorrect check for the COMMON bit introduced in v2.
>> 
>> I see Derrick pointed out a logic error during the review of v2 and
>> this round corrects it.  Is everybody happy with this iteration and
>> considers it safe to merge it to 'next'?
>
> Sorry for the lack of confirmation on that. I do think the v3
> patches are good (the cover letter says v2).

Thanks.  Will mark the topic for 'next', then.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-05-02 15:51 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-24  2:23 [PATCH v1] negotiator/default.c: avoid stack overflow Han Xin
2023-04-24 14:44 ` Derrick Stolee
2023-04-25  3:02   ` [External] " Han Xin
2023-04-25 13:34     ` Derrick Stolee
2023-04-26  4:05 ` [PATCH v2 0/2] negotiator/default: " Han Xin
2023-04-26  4:05   ` [PATCH v2 1/2] " Han Xin
2023-04-26 11:13     ` Derrick Stolee
2023-04-26 11:40       ` [External] " Han Xin
2023-04-26  4:05   ` [PATCH v2 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
2023-04-26 11:08     ` Derrick Stolee
2023-04-26 11:55       ` [External] " Han Xin
2023-04-26 13:15   ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Han Xin
2023-04-26 13:15     ` [PATCH v3 1/2] " Han Xin
2023-04-26 17:14       ` Junio C Hamano
2023-04-26 17:30         ` Derrick Stolee
2023-04-26 17:38           ` Junio C Hamano
2023-04-26 13:15     ` [PATCH v3 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
2023-05-01 22:11     ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Junio C Hamano
2023-05-02  1:49       ` Derrick Stolee
2023-05-02 15:51         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).