All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Qais Yousef <qyousef@layalina.io>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wyes Karny <wkarny@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>
Subject: Re: [GIT PULL] Scheduler changes for v6.8
Date: Sun, 14 Jan 2024 19:58:15 +0000	[thread overview]
Message-ID: <20240114195815.nes4bn53tc25djbh@airbuntu> (raw)
In-Reply-To: <CAKfTPtAMxiTbvAYav1JQw+MhjzDPCZDrMLL2JOfsc0GWp+FnOA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4877 bytes --]

On 01/14/24 16:20, Vincent Guittot wrote:
> On Sun, 14 Jan 2024 at 16:12, Qais Yousef <qyousef@layalina.io> wrote:
> >
> > On 01/14/24 14:03, Vincent Guittot wrote:
> >
> > > Thanks for the trace. It was really helpful and I think that I got the
> > > root cause.
> > >
> > > The problem comes from get_capacity_ref_freq() which returns current
> > > freq when arch_scale_freq_invariant() is not enable, and the fact that
> > > we apply map_util_perf() earlier in the path now which is then capped
> > > by max capacity.
> > >
> > > Could you try the below ?
> > >
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index e420e2ee1a10..611c621543f4 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -133,7 +133,7 @@ unsigned long get_capacity_ref_freq(struct
> > > cpufreq_policy *policy)
> > >         if (arch_scale_freq_invariant())
> > >                 return policy->cpuinfo.max_freq;
> > >
> > > -       return policy->cur;
> > > +       return policy->cur + policy->cur >> 2;
> > >  }
> > >
> > >  /**
> >
> > Is this a test patch or a proper fix? I can't see it being the latter. It seems
> 
> It's a proper fix. It's the same mechanism that is used already :
>  - Either you add margin on the utilization to go above current freq
> before it is fully used. This si what was done previously
>  - or you add margin on the freq range to select a higher freq than
> current one before it become fully used

Aren't we applying the 25% headroom twice then?

> 
> > the current logic fails when util is already 1024, and I think we're trying to
> > fix the invariance issue too late.
> >
> > Is the problem that we can't read policy->cur in the scheduler to fix the util
> > while it's being updated that's why it's done here in this case?
> >
> > If this is the problem, shouldn't the logic be if util is max then always go to
> > max frequency? I don't think we have enough info to correct the invariance here
> > IIUC. All we can see the system is saturated at this frequency and whether
> > a small jump or a big jump is required is hard to tell.
> >
> > Something like this
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 95c3c097083e..473d0352030b 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -164,8 +164,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >         struct cpufreq_policy *policy = sg_policy->policy;
> >         unsigned int freq;
> >
> > -       freq = get_capacity_ref_freq(policy);
> > -       freq = map_util_freq(util, freq, max);
> > +       if (util != max) {
> > +               freq = get_capacity_ref_freq(policy);
> > +               freq = map_util_freq(util, freq, max);
> > +       } else {
> > +               freq = policy->cpuinfo.max_freq;
> > +       }
> 
> This is not correct because you will have to wait to reach full
> utilization at the current OPP possibly the lowest OPP before moving
> directly to max OPP

Isn't this already the case? The ratio (util+headroom/max) will be less than
1 until util is 80% (with 25% headroom). And for all values <= 80% * max, we
will request a frequency smaller than/equal policy->cur, no?

ie:

	util = 600
	max = 1024

	freq = 1.25 * 600 * policy->cur / 1024 = 0.73 * policy->cur

(util+headroom/max) must be greater than 1 for us to start going above
policy->cur - which seems to have been working by accident IIUC.

So yes my proposal is incorrect, but it seems the conversion is not right to me
now.

I could reproduce the problem now (thanks Wyes!). I have 3 freqs on my system

2.2GHz, 2.8GHz and 3.8GHz

which (I believe) translates into capacities

~592, ~754, 1024

which means we should pick 2.8GHz as soon as util * 1.25 > 592; which
translates into util = ~473.

But what I see is that we go to 2.8GHz when we jump from 650 to 680 (see
attached picture), which is what you'd expect since we apply two headrooms now,
which means the ratio (util+headroom/max) will be greater than 1 after go above
this value

	1024 * 0.8 * 0.8 = ~655

So I think the math makes sense logically, but we're missing some other
correction factor.

When I re-enable CPPC I see for the same test that we go into 3.8GHz straight
away. My test is simple busyloop via

	cat /dev/zero > /dev/null

I see the CPU util_avg is at 523 at fork. I expected us to run to 2.8GHz here
to be honest, but I am not sure if util_cfs_boost() and util_est() are maybe
causing us to be slightly above 523 and that's why we start with max freq.

Or I've done the math wrong :-) But the two don't behave the same for the same
kernel with and without CPPC.

> 
> >
> >         if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> >                 return sg_policy->next_freq;

[-- Attachment #2: cppc_freq_fix.png --]
[-- Type: image/png, Size: 27789 bytes --]

  reply	other threads:[~2024-01-14 19:58 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-28 12:23 [GIT PULL] Scheduler changes for v6.7 Ingo Molnar
2023-10-30 23:50 ` pr-tracker-bot
2024-01-08 14:07 ` [GIT PULL] Scheduler changes for v6.8 Ingo Molnar
2024-01-09  4:04   ` pr-tracker-bot
2024-01-10 22:19   ` Linus Torvalds
2024-01-10 22:41     ` Linus Torvalds
2024-01-10 22:57       ` Linus Torvalds
2024-01-11  8:11         ` Vincent Guittot
2024-01-11 17:45           ` Linus Torvalds
2024-01-11 17:53             ` Linus Torvalds
2024-01-11 18:16               ` Vincent Guittot
2024-01-12 14:23                 ` Dietmar Eggemann
2024-01-12 16:58                   ` Vincent Guittot
2024-01-12 18:18                   ` Qais Yousef
2024-01-12 19:03                     ` Vincent Guittot
2024-01-12 20:30                       ` Linus Torvalds
2024-01-12 20:49                         ` Linus Torvalds
2024-01-12 21:04                           ` Linus Torvalds
2024-01-13  1:04                             ` Qais Yousef
2024-01-13  1:24                               ` Linus Torvalds
2024-01-13  1:31                                 ` Linus Torvalds
2024-01-13 10:47                                   ` Vincent Guittot
2024-01-13 18:33                                     ` Qais Yousef
2024-01-13 18:37                                 ` Qais Yousef
2024-01-11 11:09         ` [GIT PULL] scheduler fixes Ingo Molnar
2024-01-11 13:04           ` Vincent Guittot
2024-01-11 20:48             ` [PATCH] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commit Ingo Molnar
2024-01-11 22:22               ` Vincent Guittot
2024-01-12 18:24               ` Ingo Molnar
2024-01-12 18:26         ` [GIT PULL] Scheduler changes for v6.8 Ingo Molnar
2024-01-14  9:12         ` Wyes Karny
2024-01-14 11:18           ` Vincent Guittot
2024-01-14 12:37             ` Wyes Karny
2024-01-14 13:02               ` Dietmar Eggemann
2024-01-14 13:05                 ` Vincent Guittot
2024-01-14 13:03               ` Vincent Guittot
2024-01-14 15:12                 ` Qais Yousef
2024-01-14 15:20                   ` Vincent Guittot
2024-01-14 19:58                     ` Qais Yousef [this message]
2024-01-14 23:37                       ` Qais Yousef
2024-01-15  6:25                         ` Wyes Karny
2024-01-15 11:59                           ` Qais Yousef
2024-01-15  8:21                       ` Vincent Guittot
2024-01-15 12:09                         ` Qais Yousef
2024-01-15 13:26                           ` Vincent Guittot
2024-01-15 14:03                             ` Dietmar Eggemann
2024-01-15 15:26                               ` Vincent Guittot
2024-01-15 20:05                                 ` Dietmar Eggemann
2024-01-15  8:42                       ` David Laight
2024-01-14 18:11                 ` Wyes Karny
2024-01-14 18:18                   ` Vincent Guittot
2024-01-11  9:33     ` Ingo Molnar
2024-01-11 11:14     ` [tip: sched/urgent] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commits tip-bot2 for Ingo Molnar
2024-01-11 20:55     ` [tip: sched/urgent] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commit tip-bot2 for Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240114195815.nes4bn53tc25djbh@airbuntu \
    --to=qyousef@layalina.io \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=wkarny@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.