($INBOX_DIR/description missing)
 help / color / mirror / Atom feed
From: Randy MacLeod <randy.macleod@windriver.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>,
	Martin Jansa <martin.jansa@gmail.com>
Cc: bitbake-devel@lists.openembedded.org, steve@sakoman.com,
	Chen Qi <Qi.Chen@windriver.com>
Subject: Re: [bitbake-devel] [2.0][PATCH 1/5] runqueue: fix PSI check calculation
Date: Thu, 15 Feb 2024 20:48:20 -0500	[thread overview]
Message-ID: <38b07d03-a044-4cf5-b9a2-b993ef7ec320@windriver.com> (raw)
In-Reply-To: <47f538c24ddb16cd2301ee95de59459d32c05cbe.camel@linuxfoundation.org>

[-- Attachment #1: Type: text/plain, Size: 5370 bytes --]

On 2024-02-15 6:14 p.m., Richard Purdie wrote:
> On Thu, 2024-02-15 at 18:06 -0500, Randy MacLeod wrote:
>> On 2023-11-10 9:06 a.m., Martin Jansa wrote:
>>
>>> I forgot to include [RFC] tag in the subject, because these changes 
>>> might be a bit controversial.
>>>
>>> From my testing:
>>> https://github.com/shr-project/test-oe-build-time/commit/d5111f4472ac397c0f1197eb6366ac7d2e56453f 
>>> <https://urldefense.com/v3/__https://github.com/shr-project/test-oe-build-time/commit/d5111f4472ac397c0f1197eb6366ac7d2e56453f__;!!AjveYdw8EvQ!f35PVorsjRHzPeCdLVCittHJcXcQTibeF0q5nqa0CcWnxVr6Zdu3-DT7IpTT6Q1miq7Pyxk8DDBJq1Nij5jI1m3znH8$>
>>> the reasonable pressure values for 2.0 (in kirkstone) and 2.6 (in 
>>> nanbield) are significantly different due to these changes in PSI 
>>> calculation and logic so while 1000 was reasonable value for my 
>>> system in 2.0, I need 100000 with 2.6.
>>>
>>> I've backported these changes long time ago for my local builds to 
>>> use the same PSI values for all builds.
>>>
>>> And the controversy is that backporting these will change the 
>>> expected values (so kind of change in behavior which might not be 
>>> suitable for stable release). But the commits make it sound as a bug 
>>> in the logic, so we can also consider them as bug fixes.
>>>
>>> The change in expected pressure values was reported in:
>>> https://lists.openembedded.org/g/bitbake-devel/message/14942 
>>> <https://urldefense.com/v3/__https://lists.openembedded.org/g/bitbake-devel/message/14942__;!!AjveYdw8EvQ!f35PVorsjRHzPeCdLVCittHJcXcQTibeF0q5nqa0CcWnxVr6Zdu3-DT7IpTT6Q1miq7Pyxk8DDBJq1Nij5jICLP93rQ$>
>>>
>>> If it's not a bug to be fixed in 2.0 then maybe we should mention 
>>> different PSI values in release notes for nanbield (I know I'm too 
>>> late for that).
>>>
>>> Regards,
>>
>> So people were suggesting that the slow YP AB times might be due to 
>> PSI regulation.
>>
>> On a 24 core system, I built core-image-minimal with the original code
>> from before Chen's commit in August 2023
>>
>> poky.git
>> $ git log --oneline -2 653ff4d85cbaf53627f7978b06c1f025ac4694e2
>> 653ff4d85c bitbake: runqueue.py 
>> <https://urldefense.com/v3/__http://runqueue.py__;!!AjveYdw8EvQ!f5JS427vmCgvFjpobXMoKkWemtw0ZHFuTZhawQuPUzSeHjv22YOn5jDF9_QASdCRR2hDP4SptFSb1WF57ZYDLSQ-pN48msQ19wpneHc$>: 
>> fix PSI check logic
>> ac5512b0ac bitbake: fetch2: add Google Cloud Platform (GCP) 
>> fetcher    <---- Old
>>
>>
>> and with current master ~ Feb 2024 :
>>
>> $ git log --oneline -2
>> 9382d731bd bash: nativesdk-bash does not provide /bin/bash so don't 
>> claim to <----- New
>> 0fe85ce0c6 busybox: Explicitly specify tty device for serial consoles
>>
>> Note that the source being build is different and it would have been 
>> better to just change the
>> pressure regulaiton code but this was a weekend 'fun' project. Also, 
>> yes I fetched the code before
>> building with the set of BB_PRESSURE_MAX_CPU values.
>>
>> There wasn't much of a difference as shown in the graph and data below.
>>
>> I could try the same thing for world builds but I'm not sure that's 
>> worthwhile.
>> I still plan to construct a test of PSI regulation using some -native 
>> recipes that invoke stress-ng
>> enough to cause sustained (CPU or IO) pressure and then test if a new 
>> task will ALWAYS be prevented
>> from running until the pressure subsides.
>>
>
> Thanks for checking that, it is helpful to rule things out. I think 
> the bigger question was whether we needed to adjust the pressure 
> values and whether the autobuilder is under-utilising.

Right. Did you want me to send a patch to disable pressure regulation 
completely or
increase the limit in the y-ab-helper config.json for master-next?


> I'm surprised and slightly suspicious when the graphs have correlation 
> that good btw :)

Yeah, it's particularily odd when even the avg10 (seconds) data is so 
variable.
For fun, after dinner today, I took the cpu pressure data from 
BB_PRESSURE_MAX_CPU = 1K, 10K
and played with it.

The 10K limited run finishes in 4945 seconds and the more regulated 1K 
run takes 7193 seconds.
I just munged the logged pressure data using emacs macros because I know 
how to do that and
I didn't bother to figure out how to merge 3 lines using awk, python, 
and then filtered it with awk (1)
to stretch out the 10K time values to get:


You see that the avg10 pressure is generally much higher for the 10K 
run, thereby allowing the full build
to complete in less time. to 1K run, has lower avg10 pressure but 
occasionally jobs escape regulation and
the system is overloaded for a while. Once or twice in the middle of the 
build, a large job escapes regulation.

At least that's what I see! ;-)

With a job server pool and improvements to our regulation algorithm, the 
CPU pressure should smooth out  quite a bit... some day...

../Randy


1)

yocto/psi/feb-2024
✦ ❯ cat yp-ab-build-cpu-feb-9-2024/feb11/reduced-psi-10000/cpu.dat | awk 
'{ if (NR == 1) { for (i = 1; i <= NF; i++) { first_row[i] = $i} }; 
print (($1 - first_row[1])*7193.0/4945.0) ", " $2}' > 
yp-ab-build-cpu-feb-9-2024/feb11/reduced-psi-10000/cpu-time-avg10-scaled-1000-elapsed.dat


>
> Cheers,
>
> Richard
>
>

-- 
# Randy MacLeod
# Wind River Linux

[-- Attachment #2.1: Type: text/html, Size: 8949 bytes --]

[-- Attachment #2.2: veAFub9nsCKTt0Oa.png --]
[-- Type: image/png, Size: 30485 bytes --]

      reply	other threads:[~2024-02-16  1:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <17964777D58C012A.3202@lists.openembedded.org>
2023-11-10 14:06 ` [bitbake-devel] [2.0][PATCH 1/5] runqueue: fix PSI check calculation Martin Jansa
2024-02-15 23:06   ` Randy MacLeod
2024-02-15 23:14     ` Richard Purdie
2024-02-16  1:48       ` Randy MacLeod [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38b07d03-a044-4cf5-b9a2-b993ef7ec320@windriver.com \
    --to=randy.macleod@windriver.com \
    --cc=Qi.Chen@windriver.com \
    --cc=bitbake-devel@lists.openembedded.org \
    --cc=martin.jansa@gmail.com \
    --cc=richard.purdie@linuxfoundation.org \
    --cc=steve@sakoman.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).