Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Han Xin <hanxin.hx@bytedance.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: git@vger.kernel.org, xingxin.xx@bytedance.com,
	jonathantanmy@google.com, Junio C Hamano <gitster@pobox.com>
Subject: Re: [External] Re: [PATCH v1] negotiator/default.c: avoid stack overflow
Date: Tue, 25 Apr 2023 11:02:54 +0800	[thread overview]
Message-ID: <CAKgqsWUEnbmhLL3p9+_P4yH_=A+hz+bBPqmfb8FyRUeW-u7_gw@mail.gmail.com> (raw)
In-Reply-To: <2bcaeba9-20bc-1ca8-849b-ac54342c71e3@github.com>

On Mon, Apr 24, 2023 at 10:44 PM Derrick Stolee
<derrickstolee@github.com> wrote:
>
> > This is the same case as [1].
> >
> > 1. https://lore.kernel.org/git/20221025232934.1504445-1-jonathantanmy@google.com/
>
> Thanks for the link, though this could be replaced with
>
>   4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25)
>
> now that the change exists in the commit history.

make sense.

>
> One thing that is missing from that change is a test, and such a test
> could be generalized to apply to all negotiators. This could maybe
> help any potential future negotiator avoid this bug. Did you think
> about what such a test could look like? Perhaps test_commit_bulk
> could help, but we'd probably need to create so many commits that the
> test would need to be marked as expensive. That's probably a major
> reason to not include a test and rely on avoiding recursion when
> possible.

I first found this issue in a large repository with numerous merge commits.
To address it, I added a test case which fast-imports 10,000 commits and
runs them through run_with_limited_stack(). Although expensive, this
approach was successful in executing the test case without any issues.

>
> > -     if (commit != NULL && !(commit->object.flags & COMMON)) {
> > +     struct prio_queue queue = { NULL };
> > +
> > +     prio_queue_put(&queue, commit);
>
> Should we check the conditions what were removed? The COMMON flag
> is likely only useful for the recursion, but prio_queue_put() is
> not careful about NULL values. However, no callers should be
> providing NULL commits here.
>
> Couldn't hurt to add
>
>         if (!commit)
>                 return;

make sense.

>
> before the prio_queue_put().
>
> > +     while ((commit = prio_queue_get(&queue))) {
> >               struct object *o = (struct object *)commit;
> >
> > +             if (commit == NULL || (commit->object.flags & COMMON))
> > +                     continue;
>
> The NULL condition is definitely unnecessary here as it is checked
> by the while condition. The "& COMMON" is helpful if the commit
> gained the COMMON flag after being inserted into the queue.
>
> >               if (!ancestors_only)
> >                       o->flags |= COMMON;
> >
>
>
> > @@ -70,15 +76,17 @@ static void mark_common(struct negotiation_state *ns, struct commit *commit,
> >                               ns->non_common_revs--;
> >                       if (!o->parsed && !dont_parse)
> >                               if (repo_parse_commit(the_repository, commit))
> > -                                     return;
> > +                                     continue;
> >
> > +                     ancestors_only = 0;
>
> This caught me off guard, but this flag essentially says "should
> I mark the first commit as common or not?". It would probably be
> clearer if this was done before the loop, and then was ignored
> within the loop, setting the flag on each parent in this loop:
>
> >                       for (parents = commit->parents;
> >                                       parents;
> >                                       parents = parents->next)
> > -                             mark_common(ns, parents->item, 0,
> > -                                         dont_parse);
> > +                             prio_queue_put(&queue, parents->item);
>

I'll think about how to optimize this again.

ancestors_only is used multiple times in the original logic:
1.
              if (!ancestors_only)
                     o->flags |= COMMON;
2.
             if (!(o->flags & SEEN))
                     rev_list_push(ns, commit, SEEN);
             else {
                     struct commit_list *parents;

                     if (!ancestors_only && !(o->flags & POPPED))
                             ns->non_common_revs--;

Should we use this ?

             if (!ancestors_only) {
                    commit->object.flags |= COMMON;

                    if ((commit->object.flags & SEEN) &&
!(commit->object.flags & POPPED))
                             ns->non_common_revs--;
             }

and

                   for (parents = commit->parents;
                             parents;
                             parents = parents->next) {
                             if (parents->item->object.flags & COMMON)
                                      continue;

                            parents->item->object.flags |= COMMON;

                            if ((parents->item->object.flags & SEEN)
                                     && !(parents->item->object.flags & POPPED))
                                      ns->non_common_revs--;

                            prio_queue_put(&queue, parents->item);
                   }

> It would have an extra benefit: your walk may duplicate objects in the
> priority queue (there is no duplicate protection in prio_queue_put).
> But, we could use
>
>         if (!(parents->item->object.flags & COMMON)) {
>                 parents->item->object.flags |= COMMON;
>                 prio_queue_put(&queue, parents->item);
>         }
>
> as duplicate protection _and_ a clearer way to demonstrate what
> ancestors_only is doing. Without this protection, it is possible
> to have exponential growth in the priority queue using simple
> merge commits.
>
> You'd need this at the beginning:
>
>         if (!commit)
>                 return;
>
>         prio_queue_put(&queue, commit);
>         if (!ancestors_only)
>                 commit->object.flags |= COMMON;

Make sense.

> > diff --git a/negotiator/skipping.c b/negotiator/skipping.c
> > index c7d6ab39bc..3d262b3533 100644
> > --- a/negotiator/skipping.c
> > +++ b/negotiator/skipping.c
> > @@ -108,6 +108,8 @@ static void mark_common(struct data *data, struct commit *seen_commit)
> >                               prio_queue_put(&queue, p->item);
> >               }
> >       }
> > +
> > +     clear_prio_queue(&queue);
>
> This memory leak cleanup in the skipping negotiator is good to
> do, but should be split into its own change.
>
> In addition, the mark_common() method there seems to have a few
> problems:
>
>  1. It does not do duplicate protection before prio_queue_put().
>     (The COMMON bit would work here, too.)
>  2. When it translated from recursive to iterative it kept "return"
>     statements that should probably be "continue" statements.
>  3. It does not attempt to parse commits, and instead returns
>     immediately when finding an unparsed commit. This is something
>     that it did in its original version, so maybe it is by design,
>     but it doesn't match the doc comment for the method.
>
> Consider fixing these issues while you are here.
>

Make sense.

Thanks.
-Han Xin

  reply	other threads:[~2023-04-25  3:03 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-24  2:23 [PATCH v1] negotiator/default.c: avoid stack overflow Han Xin
2023-04-24 14:44 ` Derrick Stolee
2023-04-25  3:02   ` Han Xin [this message]
2023-04-25 13:34     ` [External] " Derrick Stolee
2023-04-26  4:05 ` [PATCH v2 0/2] negotiator/default: " Han Xin
2023-04-26  4:05   ` [PATCH v2 1/2] " Han Xin
2023-04-26 11:13     ` Derrick Stolee
2023-04-26 11:40       ` [External] " Han Xin
2023-04-26  4:05   ` [PATCH v2 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
2023-04-26 11:08     ` Derrick Stolee
2023-04-26 11:55       ` [External] " Han Xin
2023-04-26 13:15   ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Han Xin
2023-04-26 13:15     ` [PATCH v3 1/2] " Han Xin
2023-04-26 17:14       ` Junio C Hamano
2023-04-26 17:30         ` Derrick Stolee
2023-04-26 17:38           ` Junio C Hamano
2023-04-26 13:15     ` [PATCH v3 2/2] negotiator/skipping: fix some problems in mark_common() Han Xin
2023-05-01 22:11     ` [PATCH v2 0/2] negotiator/default: avoid stack overflow Junio C Hamano
2023-05-02  1:49       ` Derrick Stolee
2023-05-02 15:51         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKgqsWUEnbmhLL3p9+_P4yH_=A+hz+bBPqmfb8FyRUeW-u7_gw@mail.gmail.com' \
    --to=hanxin.hx@bytedance.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=xingxin.xx@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).