All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: paulmck@kernel.org
Cc: Frederic Weisbecker <frederic@kernel.org>, rcu@vger.kernel.org
Subject: Re: [BUG] Random intermittent boost failures (Was Re: [BUG] TREE04..)
Date: Mon, 11 Sep 2023 12:18:16 -0400	[thread overview]
Message-ID: <CAEXW_YSoy38WOxySWn=n_bx=T9MdBsRTx2myuRohm2h70ac9Gg@mail.gmail.com> (raw)
In-Reply-To: <8abef7d3-db8f-4a18-a72d-d23c1adb310d@paulmck-laptop>

On Mon, Sep 11, 2023 at 9:49 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, Sep 11, 2023 at 01:17:30PM +0000, Joel Fernandes wrote:
> > On Mon, Sep 11, 2023 at 01:16:21AM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 11, 2023 at 02:27:25AM +0000, Joel Fernandes wrote:
> > > > On Sun, Sep 10, 2023 at 07:37:13PM -0400, Joel Fernandes wrote:
> > > > > On Sun, Sep 10, 2023 at 5:16 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > > On Sun, Sep 10, 2023 at 08:14:45PM +0000, Joel Fernandes wrote:
> > > > > [...]
> > > > > > > >  I have been running into another intermittent one as well which
> > > > > > > > is the boost failure and that happens once in 10-15 runs or so.
> > > > > > > >
> > > > > > > > I was thinking of running the following configuration on an automated
> > > > > > > > regular basis to at least provide a better clue on the lucky run that
> > > > > > > > catches an issue. But then the issue is it would change timing enough
> > > > > > > > to maybe hide bugs. I could also make it submit logs automatically to
> > > > > > > > the list on such occurrences, but one step at a time and all that.  I
> > > > > > > > do need to add (hopefully less noisy) tick/timer related trace events.
> > > > > > > >
> > > > > > > > # Define the bootargs array
> > > > > > > > bootargs=(
[...]
> > > > > > > So some insight on this boost failure. Just before the boost failures are
> > > > > > > reported, I see the migration thread interferring with the rcu_preempt thread
> > > > > > > (aka GP kthread). See trace below. Of note is that the rcu_preempt thread is
> > > > > > > runnable while context switching, which means its execution is interferred.
> > > > > > > The rcu_preempt thread is at RT prio 2 as can be seen.
> > > > > > >
> > > > > > > So some open-ended questions: what exactly does the migration thread want,
> > > > > > > this is something related to CPU hotplug? And if the migration thread had to
> > > > > > > run, why did the rcu_preempt thread not get pushed to another CPU by the
> > > > > > > scheduler? We have 16 vCPUs for this test.
> > > > > >
> > > > > > Maybe we need a cpus_read_lock() before doing a given boost-test interval
> > > > > > and a cpus_read_unlock() after finishing one?  But much depends on
> > > > > > exactly what is starting those migration threads.
> > > > >
> > > > > But in the field, a real RT task can preempt a reader without doing
> > > > > cpus_read_lock() and may run into a similar boost issue?
> > >
> > > The sysctl_sched_rt_runtime should prevent a livelock in most
> > > configurations.  Here, rcutorture explicitly disables this.
> >
> > I see. Though RT throttling will actually stall the rcu_preempt thread as
> > well in the real world. RT throttling is a bit broken and we're trying to fix
> > it in scheduler land. Even if there are idle CPUs, RT throttling will starve
> > not just the offending RT task, but all of them essentially causing a
> > priority inversion between running RT and CFS tasks.
>
> Fair point.  But that requires that the offending runaway RT task hit both
> a reader and the grace-period kthread.  Keeping in mind that rcutorture
> is provisioning one runaway RT task per CPU, which in the real world is
> hopefully quite rare.  Hopefully.  ;-)

You are right, I exaggerated a bit. Indeed in the real world, RT
throttling can cause a prio inversion with CFS only if all other CPUs
are also RT throttled. Otherwise it tries to migrate the RT task to
another CPU. That's a very great point.

> Sounds like good progress!  Please let me know how it goes!!!

Thanks! Will do,

 - Joel

  reply	other threads:[~2023-09-11 22:00 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-10 20:14 [BUG] Random intermittent boost failures (Was Re: [BUG] TREE04..) Joel Fernandes
2023-09-10 21:16 ` Paul E. McKenney
2023-09-10 23:37   ` Joel Fernandes
2023-09-11  2:27     ` Joel Fernandes
2023-09-11  8:16       ` Paul E. McKenney
2023-09-11 13:17         ` Joel Fernandes
2023-09-11 13:49           ` Paul E. McKenney
2023-09-11 16:18             ` Joel Fernandes [this message]
2023-09-13 20:30         ` Joel Fernandes
2023-09-14 11:11           ` Paul E. McKenney
2023-09-14 13:13             ` Joel Fernandes
2023-09-14 15:23               ` Paul E. McKenney
2023-09-14 18:56                 ` Joel Fernandes
2023-09-14 21:53                   ` Joel Fernandes
2023-09-15  0:13                     ` Joel Fernandes
2023-09-15 11:33                       ` Joel Fernandes
2023-09-15 14:53                         ` Paul E. McKenney
2023-09-15 16:37                           ` Joel Fernandes
2023-09-15 16:57                             ` Paul E. McKenney
2023-09-15 21:14                               ` Joel Fernandes
2023-09-18  6:05                             ` Paul E. McKenney
2023-09-15 14:48                       ` Paul E. McKenney
2023-09-15 14:45                     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEXW_YSoy38WOxySWn=n_bx=T9MdBsRTx2myuRohm2h70ac9Gg@mail.gmail.com' \
    --to=joel@joelfernandes.org \
    --cc=frederic@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.