Re: [tip:timers/core] [timers] 7ee9887703: netperf.Throughput_Mbps -1.2% regression

oe-lkp.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Frederic Weisbecker <frederic@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel test robot <oliver.sang@intel.com>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com
Subject: Re: [tip:timers/core] [timers]  7ee9887703: netperf.Throughput_Mbps -1.2% regression
Date: Wed, 13 Mar 2024 15:51:08 +0100	[thread overview]
Message-ID: <ZfG9XFwXp_d5E0qc@localhost.localdomain> (raw)
In-Reply-To: <87y1amo7w0.ffs@tglx>

[-- Attachment #1: Type: text/plain, Size: 3090 bytes --]

Le Wed, Mar 13, 2024 at 09:25:51AM +0100, Thomas Gleixner a écrit :
> On Wed, Mar 13 2024 at 00:57, Frederic Weisbecker wrote:
> > So I can reproduce. And after hours staring at traces I haven't really found
> > the real cause of this. 1% difference is not always easy to track down.
> > But here are some sort of conclusion so far:
> >
> > _ There is an increase of ksoftirqd use (+13%) but if I boot with threadirqs
> >   before and after the patch (which means that ksoftirqd is used all the time
> >   for softirq handling) I still see the performance regression. So this
> >   shouldn't play a role here.
> >
> > _ I suspected that timer migrators handling big queues of timers on behalf of
> >   idle CPUs would delay NET_RX softirqs but it doesn't seem to be the case. I
> >   don't see TIMER vector delaying NET_RX vector after the hierarchical pull
> >   model, quite the opposite actually, they are less delayed overall.
> >
> > _ I suspected that timer migrators handling big queues would add scheduling
> >   latency. But it doesn't seem to be the case. Quite the opposite again,
> >   surprisingly.
> >
> > _ I have observed that, in average, timers execute later with the hierarchical
> >   pull model. The following delta:
> >        time of callback execution - bucket_expiry
> >   is 3 times higher with the hierarchical pull model. Whether that plays a role
> >   is unclear. It might still be interesting to investigate.
> >
> > _ The initial perf profile seem to suggest a big increase of task migration. Is
> >   it the result of ping-pong wakeup? Does that play a role?
> 
> Migration is not cheap. The interesting question is whether this is
> caused by remote timer expiry.
> 
> Looking at the perf data there are significant changes vs. idle too:
> 
>     perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 	 36.91 ±  2%     -12.6       24.32 ± 10%     -12.3       24.63 ±  5% 
> 
> That indicates that cpuidle is spending less time in idle polling, which
> means that wakeup latency increases. That obviously might be a result of
> the timer migration properties.

Hmm, looking at the report, I'm reading the reverse.

More idle polling:

      0.00           +13.2       13.15 � 49%     +11.3       11.25 � 55%    perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      

And fewer C3:

     31.82 �  3%     -13.0       18.83 � 12%     -13.2       18.65 �  6% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter

And indeed I would have expected the reverse...

> Do you have traces (before and after) handy to share?

Sure. Here are two snapshots. trace.good is before the pull model and trace.bad
is after. The traces contain:

* sched_switch / sched_wakeup
* timer start and expire_entry
* softirq raise / entry / exit
* tmigr:*
* cpuidle

It's disappointing on the latter though because it only ever enters C1 in my
traces. Likely due to using KVM...

Thanks.

[-- Attachment #2: trace.good.xz --]
[-- Type: application/x-xz, Size: 4622048 bytes --]

[-- Attachment #3: trace.bad.xz --]
[-- Type: application/x-xz, Size: 6233148 bytes --]

     prev parent reply	other threads:[~2024-03-13 14:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  8:09 [tip:timers/core] [timers] 7ee9887703: netperf.Throughput_Mbps -1.2% regression kernel test robot
2024-03-04  0:32 ` Frederic Weisbecker
2024-03-04  2:13   ` Oliver Sang
2024-03-04 11:28     ` Frederic Weisbecker
2024-03-05  2:17       ` Oliver Sang
2024-03-05 10:46         ` Frederic Weisbecker
2024-03-05 11:21         ` Frederic Weisbecker
2024-03-05 11:35         ` Frederic Weisbecker
2024-03-12 23:57 ` Frederic Weisbecker
2024-03-13  8:25   ` Thomas Gleixner
2024-03-13 14:51     ` Frederic Weisbecker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfG9XFwXp_d5E0qc@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).