All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Wyes Karny <wkarny@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>
Subject: Re: [GIT PULL] Scheduler changes for v6.8
Date: Sun, 14 Jan 2024 12:18:06 +0100	[thread overview]
Message-ID: <ZaPC7o44lEswxOXp@vingu-book> (raw)
In-Reply-To: <20240114091240.xzdvqk75ifgfj5yx@wyes-pc>

Hi Wyes,

Le dimanche 14 janv. 2024 à 14:42:40 (+0530), Wyes Karny a écrit :
> On Wed, Jan 10, 2024 at 02:57:14PM -0800, Linus Torvalds wrote:
> > On Wed, 10 Jan 2024 at 14:41, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > It's one of these two:
> > >
> > >   f12560779f9d sched/cpufreq: Rework iowait boost
> > >   9c0b4bb7f630 sched/cpufreq: Rework schedutil governor performance estimation
> > >
> > > one more boot to go, then I'll try to revert whichever causes my
> > > machine to perform horribly much worse.
> > 
> > I guess it should come as no surprise that the result is
> > 
> >    9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d is the first bad commit
> > 
> > but to revert cleanly I will have to revert all of
> > 
> >       b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> >       f12560779f9d ("sched/cpufreq: Rework iowait boost")
> >       9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
> > performance estimation")
> > 
> > This is on a 32-core (64-thread) AMD Ryzen Threadripper 3970X, fwiw.
> > 
> > I'll keep that revert in my private test-tree for now (so that I have
> > a working machine again), but I'll move it to my main branch soon
> > unless somebody has a quick fix for this problem.
> 
> Hi Linus,
> 
> I'm able to reproduce this issue with my AMD Ryzen 5600G system.  But
> only if I disable CPPC in BIOS and boot with acpi-cpufreq + schedutil.
> (I believe for your case also CPPC is diabled as log "_CPC object is not
> present" came). Enabling CPPC in BIOS issue not seen in my system.  For
> AMD acpi-cpufreq also uses _CPC object to determine the boost ratio.
> When CPPC is disabled in BIOS something is going wrong and max
> capacity is becoming zero.
> 
> Hi Vincent, Qais,
> 
> I have collected some data with bpftracing:

Thanks for your tests results

> 
> sudo bpftrace -e 'kretprobe:effective_cpu_util /cpu == 1/ { @eff_util = lhist(retval, 0, 1200, 50);} kprobe:get_next_freq /cpu == 1/ { @sugov_eff_util = lhist(arg1, 0, 1200, 50); @sugov_max_cap = lhist(arg2, 0, 1000, 2);} kretprobe:get_next_freq /cpu == 1/ { @sugov_freq = lhist(retval, 1000000, 5000000, 100000);}'
> 
> with running: taskset -c 1 make
> 
> issue case:
> 
> Attaching 3 probes...
> @eff_util:
> [0, 50)             1263 |@                                                   |
> [50, 100)            517 |                                                    |
> [100, 150)           233 |                                                    |
> [150, 200)           297 |                                                    |
> [200, 250)           162 |                                                    |
> [250, 300)            98 |                                                    |
> [300, 350)            75 |                                                    |
> [350, 400)           205 |                                                    |
> [400, 450)           210 |                                                    |
> [450, 500)            16 |                                                    |
> [500, 550)          1532 |@                                                   |
> [550, 600)          1026 |                                                    |
> [600, 650)           761 |                                                    |
> [650, 700)           876 |                                                    |
> [700, 750)          1085 |                                                    |
> [750, 800)           891 |                                                    |
> [800, 850)           816 |                                                    |
> [850, 900)           983 |                                                    |
> [900, 950)           661 |                                                    |
> [950, 1000)          759 |                                                    |
> [1000, 1050)       57433 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 

ok so the output of effective_cpu_util() seems correct or at least to maw utilization
value. In order to be correct, it means that arch_scale_cpu_capacity(cpu) is not zero
because of :

unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
				 unsigned long *min,
				 unsigned long *max)
{
	unsigned long util, irq, scale;
	struct rq *rq = cpu_rq(cpu);

	scale = arch_scale_cpu_capacity(cpu);

	/*
	 * Early check to see if IRQ/steal time saturates the CPU, can be
	 * because of inaccuracies in how we track these -- see
	 * update_irq_load_avg().
	 */
	irq = cpu_util_irq(rq);
	if (unlikely(irq >= scale)) {
		if (min)
			*min = scale;
		if (max)
			*max = scale;
		return scale;
	}
...
}

If arch_scale_cpu_capacity(cpu) returns 0 then effective_cpu_util() should returns
0 too.

Now see below

> @sugov_eff_util:
> [0, 50)             1074 |                                                    |
> [50, 100)            571 |                                                    |
> [100, 150)           259 |                                                    |
> [150, 200)           169 |                                                    |
> [200, 250)           237 |                                                    |
> [250, 300)           156 |                                                    |
> [300, 350)            91 |                                                    |
> [350, 400)            46 |                                                    |
> [400, 450)            52 |                                                    |
> [450, 500)           195 |                                                    |
> [500, 550)           175 |                                                    |
> [550, 600)            46 |                                                    |
> [600, 650)           493 |                                                    |
> [650, 700)          1424 |@                                                   |
> [700, 750)           646 |                                                    |
> [750, 800)           628 |                                                    |
> [800, 850)           612 |                                                    |
> [850, 900)           840 |                                                    |
> [900, 950)           893 |                                                    |
> [950, 1000)          640 |                                                    |
> [1000, 1050)       60679 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_freq:
> [1400000, 1500000)   69911 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_max_cap:
> [0, 2)             69926 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

In get_next_freq(struct sugov_policy *sg_policy, unsigned long util, unsigned long max)

max is 0 and we comes from this path:

static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
				     unsigned int flags)
{

...
	max_cap = arch_scale_cpu_capacity(sg_cpu->cpu);

	if (!sugov_update_single_common(sg_cpu, time, max_cap, flags))
		return;

	next_f = get_next_freq(sg_policy, sg_cpu->util, max_cap);
...

so here arch_scale_cpu_capacity(sg_cpu->cpu) returns 0 ...

AFAICT, AMD platform uses the default 
static __always_inline
unsigned long arch_scale_cpu_capacity(int cpu)
{
	return SCHED_CAPACITY_SCALE;
}

I'm missing something here

> 
> 
> good case:
> 
> Attaching 3 probes...
> @eff_util:
> [0, 50)              246 |@                                                   |
> [50, 100)            150 |@                                                   |
> [100, 150)           191 |@                                                   |
> [150, 200)           239 |@                                                   |
> [200, 250)           117 |                                                    |
> [250, 300)          2101 |@@@@@@@@@@@@@@@                                     |
> [300, 350)          2284 |@@@@@@@@@@@@@@@@                                    |
> [350, 400)           713 |@@@@@                                               |
> [400, 450)           151 |@                                                   |
> [450, 500)           154 |@                                                   |
> [500, 550)          1121 |@@@@@@@@                                            |
> [550, 600)          1901 |@@@@@@@@@@@@@                                       |
> [600, 650)          1208 |@@@@@@@@                                            |
> [650, 700)           606 |@@@@                                                |
> [700, 750)           557 |@@@                                                 |
> [750, 800)           872 |@@@@@@                                              |
> [800, 850)          1092 |@@@@@@@                                             |
> [850, 900)          1416 |@@@@@@@@@@                                          |
> [900, 950)          1107 |@@@@@@@                                             |
> [950, 1000)         1051 |@@@@@@@                                             |
> [1000, 1050)        7260 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_eff_util:
> [0, 50)              241 |                                                    |
> [50, 100)            149 |                                                    |
> [100, 150)            72 |                                                    |
> [150, 200)            95 |                                                    |
> [200, 250)            43 |                                                    |
> [250, 300)            49 |                                                    |
> [300, 350)            19 |                                                    |
> [350, 400)            56 |                                                    |
> [400, 450)            22 |                                                    |
> [450, 500)            29 |                                                    |
> [500, 550)          1840 |@@@@@@                                              |
> [550, 600)          1476 |@@@@@                                               |
> [600, 650)          1027 |@@@                                                 |
> [650, 700)           473 |@                                                   |
> [700, 750)           366 |@                                                   |
> [750, 800)           627 |@@                                                  |
> [800, 850)           930 |@@@                                                 |
> [850, 900)          1285 |@@@@                                                |
> [900, 950)           971 |@@@                                                 |
> [950, 1000)          946 |@@@                                                 |
> [1000, 1050)       13839 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_freq:
> [1400000, 1500000)     648 |@                                                   |
> [1500000, 1600000)       0 |                                                    |
> [1600000, 1700000)       0 |                                                    |
> [1700000, 1800000)      25 |                                                    |
> [1800000, 1900000)       0 |                                                    |
> [1900000, 2000000)       0 |                                                    |
> [2000000, 2100000)       0 |                                                    |
> [2100000, 2200000)       0 |                                                    |
> [2200000, 2300000)       0 |                                                    |
> [2300000, 2400000)       0 |                                                    |
> [2400000, 2500000)       0 |                                                    |
> [2500000, 2600000)       0 |                                                    |
> [2600000, 2700000)       0 |                                                    |
> [2700000, 2800000)       0 |                                                    |
> [2800000, 2900000)       0 |                                                    |
> [2900000, 3000000)       0 |                                                    |
> [3000000, 3100000)       0 |                                                    |
> [3100000, 3125K)       0 |                                                    |
> [3125K, 3300000)       0 |                                                    |
> [3300000, 3400000)       0 |                                                    |
> [3400000, 3500000)       0 |                                                    |
> [3500000, 3600000)       0 |                                                    |
> [3600000, 3700000)       0 |                                                    |
> [3700000, 3800000)       0 |                                                    |
> [3800000, 3900000)       0 |                                                    |
> [3900000, 4000000)   23879 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_max_cap:
> [0, 2)             24555 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> In both case max_cap is zero but selected freq is incorrect in bad case.

Also we have in get_next_freq():
	freq = map_util_freq(util, freq, max);
	       --> util * freq /max

If max was 0, we should have been an error ?

There is something strange that I don't understand

Could you trace on the return of sugov_get_util()
the value of sg_cpu->util ?

Thanks for you help
Vincent

> 
> Thanks,
> Wyes
> 

  reply	other threads:[~2024-01-14 11:18 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-28 12:23 [GIT PULL] Scheduler changes for v6.7 Ingo Molnar
2023-10-30 23:50 ` pr-tracker-bot
2024-01-08 14:07 ` [GIT PULL] Scheduler changes for v6.8 Ingo Molnar
2024-01-09  4:04   ` pr-tracker-bot
2024-01-10 22:19   ` Linus Torvalds
2024-01-10 22:41     ` Linus Torvalds
2024-01-10 22:57       ` Linus Torvalds
2024-01-11  8:11         ` Vincent Guittot
2024-01-11 17:45           ` Linus Torvalds
2024-01-11 17:53             ` Linus Torvalds
2024-01-11 18:16               ` Vincent Guittot
2024-01-12 14:23                 ` Dietmar Eggemann
2024-01-12 16:58                   ` Vincent Guittot
2024-01-12 18:18                   ` Qais Yousef
2024-01-12 19:03                     ` Vincent Guittot
2024-01-12 20:30                       ` Linus Torvalds
2024-01-12 20:49                         ` Linus Torvalds
2024-01-12 21:04                           ` Linus Torvalds
2024-01-13  1:04                             ` Qais Yousef
2024-01-13  1:24                               ` Linus Torvalds
2024-01-13  1:31                                 ` Linus Torvalds
2024-01-13 10:47                                   ` Vincent Guittot
2024-01-13 18:33                                     ` Qais Yousef
2024-01-13 18:37                                 ` Qais Yousef
2024-01-11 11:09         ` [GIT PULL] scheduler fixes Ingo Molnar
2024-01-11 13:04           ` Vincent Guittot
2024-01-11 20:48             ` [PATCH] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commit Ingo Molnar
2024-01-11 22:22               ` Vincent Guittot
2024-01-12 18:24               ` Ingo Molnar
2024-01-12 18:26         ` [GIT PULL] Scheduler changes for v6.8 Ingo Molnar
2024-01-14  9:12         ` Wyes Karny
2024-01-14 11:18           ` Vincent Guittot [this message]
2024-01-14 12:37             ` Wyes Karny
2024-01-14 13:02               ` Dietmar Eggemann
2024-01-14 13:05                 ` Vincent Guittot
2024-01-14 13:03               ` Vincent Guittot
2024-01-14 15:12                 ` Qais Yousef
2024-01-14 15:20                   ` Vincent Guittot
2024-01-14 19:58                     ` Qais Yousef
2024-01-14 23:37                       ` Qais Yousef
2024-01-15  6:25                         ` Wyes Karny
2024-01-15 11:59                           ` Qais Yousef
2024-01-15  8:21                       ` Vincent Guittot
2024-01-15 12:09                         ` Qais Yousef
2024-01-15 13:26                           ` Vincent Guittot
2024-01-15 14:03                             ` Dietmar Eggemann
2024-01-15 15:26                               ` Vincent Guittot
2024-01-15 20:05                                 ` Dietmar Eggemann
2024-01-15  8:42                       ` David Laight
2024-01-14 18:11                 ` Wyes Karny
2024-01-14 18:18                   ` Vincent Guittot
2024-01-11  9:33     ` Ingo Molnar
2024-01-11 11:14     ` [tip: sched/urgent] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commits tip-bot2 for Ingo Molnar
2024-01-11 20:55     ` [tip: sched/urgent] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commit tip-bot2 for Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaPC7o44lEswxOXp@vingu-book \
    --to=vincent.guittot@linaro.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vschneid@redhat.com \
    --cc=wkarny@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.