Re: SRCU: kworker hung in synchronize_srcu

RCU Archive mirror
 help / color / mirror / Atom feed

From: zhuangel570 <zhuangel570@gmail.com>
To: Neeraj upadhyay <neeraj.iitr10@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	paulmck@kernel.org, rcu@vger.kernel.org, josh@joshtriplett.org,
	rostedt@goodmis.org, mathieu.desnoyers@efficios.com,
	jiangshanlai@gmail.com, like.xu.linux@gmail.com,
	linussli@tencent.com, foxywang@tencent.com
Subject: Re: SRCU: kworker hung in synchronize_srcu
Date: Sat, 7 Oct 2023 17:13:47 +0800	[thread overview]
Message-ID: <CANZk6aSHU++f6GN7bPhxfbXEySP8TkHSmrvcmzeL4AhP+Ra4sQ@mail.gmail.com> (raw)
In-Reply-To: <CAFwiDX8FVzo3M9LORPbMK2XST8LqyoGO_YAo+kJekBO80m3PJg@mail.gmail.com>

On Sun, Oct 1, 2023 at 10:27 AM Neeraj upadhyay <neeraj.iitr10@gmail.com> wrote:
>
> On Sun, Oct 1, 2023 at 5:49 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Sat, Sep 30, 2023 at 6:01 AM Neeraj upadhyay <neeraj.iitr10@gmail.com> wrote:
> > >
> > > On Fri, Sep 29, 2023 at 3:35 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > Hello,
> > > > Firstly, kudos to the detailed report and analysis. Rare failures are
> > > > hard and your usage crash/kdump is awesome to dig deeper into the
> > > > issue..
> > > >
> > > > On Thu, Sep 28, 2023 at 3:59 AM zhuangel570 <zhuangel570@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > We encounter SRCU hung issue in stable tree 5.4.203, we are running VM create
> > > > > and destroy concurrent test, the issue happens after several weeks. Now we
> > > > > didn't have a way to reproduce this issue, the issue happens randomly, this
> > > > > is the second time we found it in this year.
> > > > >
> > > > >
> > > > >
> > > > > CASE1: entry of CPU 136 belongs to GP 288 was accelerated to GP 292
> > > > > - [CPU 136] [GP 280-284] finished, yet not enter srcu_invoke_callbacks.
> > > > > - [CPU 136] [GP 284-288] starting, new synchronize_srcu request, queue entry
> > > > >   to SDP.
> > > > > - [CPU 041] [GP 284-288] starting, new synchronize_srcu request, workload run
> > > > >   faster than CPU 136, start GP, set rcu_seq_start.
> > > > > - [CPU 136] [GP 284-288] starting, call srcu_funnel_gp_start, found no need
> > > > >   to start GP.
> > > > > - [CPU 136] [GP 280-284] finished, start to run srcu_invoke_callbacks,
> > > > >   "accelerate" the seq of new added entry to 292 (it should be 288).
> > > >
> > > > But srcu_gp_seq is at 304 right now. How does it matter that the CB is
> > > > marked for 292? It should be ready to execute anyway even at 292. Note
> > > > the meaning of "acceleration", the idea is to start conservatively and
> > > > move the callbacks forward as more accurate information is available.
> > > > Considering this, 292 initially should be OK IMHO (that's just more
> > > > conservative than 288)..
> > > >
> > >
> > > Maybe it matters, as for a CPU, the callbacks will only be scheduled
> > > in srcu_gp_end() for the GPs, for which it has updated ->srcu_data_have_cbs[idx]
> > > and ->srcu_have_cbs[idx]
> >
> > Right but if I am looking at the code correctly, nothing guarantees
> > that srcu_invoke_callbacks is called before srcu_gp_seq can advance.
> > So all callbacks that were previously queued for older grace periods
> > should be run whenever srcu_invoke_callbacks() eventually runs. That's
> > why I was thinking that part looked normal to me (segments having
> > older GP numbers).
> >
> > > > >
> > > > > /*
> > > > >  * CASE2
> > > > >  * - entry of CPU 136 belongs to GP 288 was accelerated to GP 296.
> > > > >  * - GP0: 280-284, GP1: 284-288, GP2: 288-292.
> > > > >  */
> > > > >
> > > > > /* [GP0][CPU-136] */
> > > > > process_srcu {
> > > > >  srcu_gp_end
> > > > > }
> > > > >
> > > > >                   /* [GP1][CPU-136] */
> > > > >                   synchronize_srcu {
> > > > >                    __call_srcu {
> > > > >                     rcu_segcblist_enqueue
> > > > >                                            /* [GP1][CPU-041] */
> > > > >                                            synchronize_srcu {
> > > > >                                             __call_srcu {
> > > > >                                              srcu_funnel_gp_start
> > > > >                                               srcu_gp_start
> > > > >                                             }
> > > > >                                            }
> > > > >                                            process_srcu {
> > > > >                                             srcu_gp_end
> > > > >                                              rcu_seq_end
> > > > >                                            }
> > > > >                   /* [GP1][CPU-136] */
> > > > >                     srcu_funnel_gp_start
> > > > >                    }
> > > > >                   }
> > > > > /* [GP0][CPU-136] */
> > > > > srcu_invoke_callbacks {
> > > >
> > > > If srcu_invoke_callbacks() was really called for the rdp, I would have
> > > > expected rcu_segcblist_advance() to advance all those pending
> > > > callbacks to 304.
> > >
> > > If the callback is in NEXT_TAIL and not assigned GP num,
> > > rcu_segcblist_advance() won't move it and next accelerate in
> > > srcu_invoke_callbacks() will
> > > assign it the next gp sequence num.
> >
> > Sure, and after that again it will call srcu_schedule_cbs_sdp() so
> > that should be fine and the next workqueue invocation
> > srcu_invoke_callbacks() can advance at that time. Right?
> >
>
> But "more" only checks for CBs in DONE tail. The callbacks which have been just
> accelerated are not advanced to DONE tail.
>
> Having said above, I am still trying to figure out, which callbacks
> are actually being pointed
> by NEXT tail. Given that __call_srcu() already does a advance and
> accelerate, all enqueued
> callbacks would be in either WAIT tail (the callbacks for current
> active GP) or NEXT_READY
> tail (the callbacks for next GP after current one completes). So, they
> should already have
> GP num assigned and srcu_invoke_callbacks() won't accelerate them.
> Only case I can
> think of is, if current GP completes after we sample
> rcu_seq_current(&ssp->srcu_gp_seq) for
> rcu_segcblist_advance() (so, WAIT tail cbs are not moved to DONE tail)
> and a new GP is started
> before we take snapshot ('s') of next GP  for
> rcu_segcblist_accelerate(), then the gp num 's'
> > gp num of NEXT_READY_TAIL and will be put in NEXT tail. Not sure
> if my understanding is correct here. Thoughts?
>
> __call_srcu()
>         rcu_segcblist_advance(&sdp->srcu_cblist,
>                               rcu_seq_current(&ssp->srcu_gp_seq));
>         s = rcu_seq_snap(&ssp->srcu_gp_seq);
>         (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s);

Thanks for helping to confirm this!

What I want to explain in my CASE (maybe is not totally correct, I "created"
it which base the hung context), is rcu_segcblist_advance and
rcu_segcblist_accelerate in srcu_invoke_callbacks maybe using not-matched
GP seq. And this makes the callbacks "accelerate" to wrong GP, which has no
CBS event on that SNP (CPU groups), then callback leaked in segcglist, cause
__synchronize_srcu hang.

>
> Thanks
> Neeraj
>
>
> > if (more) {
> >   srcu_schedule_cbs_sdp(sdp, 0);
> > }
> >
> > thanks,
> >
> >  - Joel

next prev parent reply	other threads:[~2023-10-07  9:14 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-28  7:59 SRCU: kworker hung in synchronize_srcu zhuangel570
2023-09-28 21:39 ` Joel Fernandes
2023-09-28 21:40   ` Joel Fernandes
2023-09-29  1:50   ` Zhouyi Zhou
2023-09-29 22:44   ` Frederic Weisbecker
2023-09-30  2:45     ` Neeraj upadhyay
2023-09-30  7:30       ` Frederic Weisbecker
2023-09-30  9:10       ` Frederic Weisbecker
2023-09-30  9:48         ` Neeraj upadhyay
2023-09-30 10:01   ` Neeraj upadhyay
2023-10-01  0:19     ` Joel Fernandes
2023-10-01  2:27       ` Neeraj upadhyay
2023-10-01 22:32         ` Frederic Weisbecker
2023-10-01 22:39           ` Frederic Weisbecker
2023-10-02  2:21             ` Neeraj upadhyay
2023-10-02 11:05               ` Frederic Weisbecker
2023-10-02 22:46               ` Frederic Weisbecker
2023-10-03 12:06                 ` Neeraj upadhyay
2023-10-02  2:17           ` Neeraj upadhyay
2023-10-02 10:41             ` Frederic Weisbecker
2023-10-02 13:22               ` Neeraj upadhyay
2023-10-02 21:09                 ` Frederic Weisbecker
2023-10-03 12:00                   ` Neeraj upadhyay
2023-10-03 12:22                   ` Frederic Weisbecker
2023-10-03 18:46                     ` Neeraj upadhyay
2023-10-07  9:13         ` zhuangel570 [this message]
2023-10-07  8:53   ` zhuangel570
2023-09-30 10:11 ` Neeraj upadhyay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANZk6aSHU++f6GN7bPhxfbXEySP8TkHSmrvcmzeL4AhP+Ra4sQ@mail.gmail.com \
    --to=zhuangel570@gmail.com \
    --cc=foxywang@tencent.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=like.xu.linux@gmail.com \
    --cc=linussli@tencent.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=neeraj.iitr10@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).