All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Bristot de Oliveira <bristot@redhat.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, Phil Auld <pauld@redhat.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Kate Carcia <kcarcia@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	Clark Willaims <williams@redhat.com>,
	John Kacur <jkacur@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	linux-doc@vger.kernel.org
Subject: Re: [PATCH V3 9/9] tracing: Add timerlat tracer
Date: Sat, 12 Jun 2021 11:41:41 +0200	[thread overview]
Message-ID: <6808245d-208c-c6d2-1c6e-7410df158992@redhat.com> (raw)
In-Reply-To: <20210611160340.6970e10c@gandalf.local.home>

On 6/11/21 10:03 PM, Steven Rostedt wrote:
> On Fri, 11 Jun 2021 14:59:13 +0200
> Daniel Bristot de Oliveira <bristot@redhat.com> wrote:
> 
>> ------------------ %< -----------------------------
>> It is worth mentioning that the *duration* values reported
>> by the osnoise: events are *net* values. For example, the
>> thread_noise does not include the duration of the overhead caused
>> by the IRQ execution (which indeed accounted for 12736 ns). But
>> the values reported by the timerlat tracer (timerlat_latency)
>> are *gross* values.
>>
>> The art below illustrates a CPU timeline and how the timerlat tracer
>> observes it at the top and the osnoise: events at the bottom. Each "-"
>> in the timelines means 1 us, and the time moves ==>:
>>
>>      External          context irq                  context thread
>>       clock           timer_latency                 timer_latency
>>       event              18 us                          48 us 
>>         |                  ^                             ^
>>         v                  |                             |
>>         |------------------|                             |       <-- timerlat irq timeline
>>         |------------------+-----------------------------|       <-- timerlat thread timeline
>>                            ^                             ^
>>  ===================== CPU timeline ======================================
>>                    [timerlat/ irq]  [ dev irq ]                          
>>  [another thread...^             v..^         v........][timerlat/ thread]  
>>  ===================== CPU timeline ======================================
>>                    |-------------|  |---------|                  <-- irq_noise timeline
>>                                  |--^         v--------|         <-- thread_noise timeline
>>                                  |            |        |
>>                                  |            |        + thread_noise: 10 us
>>                                  |            +-> irq_noise: 9 us
>>                                  +-> irq_noise: 13 us
>>
>>  --------------- >% --------------------------------  
> 
> That's really busy, and honestly, I can't tell what is what.
> 
> The "context irq timer_latency" is a confusing name. Could we just have
> that be "timer irq latency"? And "context thread timer_latency" just be
> "thread latency". Adding too much text to the name actually makes it harder
> to understand. We want to simplify it, not make people have to think harder
> to see it.
> 
> I think we can get rid of the "<-- .* timeline" to the right.  I don't
> think they are necessary. Again, the more you add to the diagram, the
> busier it looks, and the harder it is to read.
> 
> Could we switch "[timerlat/ irq]" to just "[timer irq]" and explain how
> that "context irq timer_latency"/"timer irq latency" is related?
> 
> Should probably state that the "dev irq" is an unrelated device interrupt
> that happened.
> 
> What's with the two CPU timeline lines? Now there I think it would be
> better to have the arrow text by itself.
> 
> And finally, not sure if you plan on doing this, but have a output of the
> trace that would show the above.
> 
> Thus, here's what I would expect to see:
> 
>       External         
>        clock         timer irq latency             e    thread latency
>        event              18 us                          48 us 
>          |                  ^                             ^
>          v                  |                             |
>          |------------------|                             |
>          |------------------+-----------------------------|       
>                             ^                             ^
>   =========================================================================
>                     [timerlat/ irq]  [ dev irq ]                             
>   [another thread...^             v..^         v........][timerlat/ thread]  <-- CPU task timeline
>   =========================================================================
>                     |-------------|  |---------|
>                                   |--^         v--------|
>                                   |            |        |
>                                   |            |        + thread_noise: 10 us
>                                   |            +-> irq_noise: 9 us
>                                   +-> irq_noise: 13 us

It looks good to me!

>  The "[ dev irq ]" above is an interrupt from some device on the system that
>  causes extra noise to the timerlat task.
> 
> I think the above may be easier to understand, especially if the trace
> output that represents it is below.

ok, I can try to capture a trace sample and represent it into the ASCII art
format above.

> Also, I have to ask, shouldn't the "thread noise" really start at the
> "External clock event"?

To go in that direction, we need to track things that delayed the IRQ execution.
We are already tracking other IRQs' execution, but we will have to keep a
history of past executions and "playback" them. This will add some overhead
linear to the past event... and/or some pessimism.

We will also have to track IRQ disabled sections. The problem of tracking IRQ
disabled is that it depends on tracing infra-structure that is not enabled by
default of distros... And there are IRQ delay causes that are not related to the
thread... like idle states... (and all these things create more and more states
to track these things)...

So, I added the timer irq latency to figure out when the problem is related to
things that delay the IRQ, and the stack trace will help us figure out where the
problem is in the thread context. After the IRQ execution, the thread noise is
helpful - even without all the thread noise before the IRQ.

Furthermore, if we start trying to abstract the causes of delay, we will find
the rtsl :-). The rtls events and abstractions give us the worst-case scheduling
latency without adding unneeded pessimism (sound analysis). It covers all the
possible cases, for all any schedulers, even without the need of a measuring
thread like here (or with cyclictest) - and this is a good thing because it does
not change the target system's workload.

The problem is that... rtsl depends on tracing infra-structure that are not
enabled by default on distros, like preempt_ and irq_ disabled events.

So, I see timerlat as a tool for on-the-fly usage, like debugging on customers
(as we do at red hat). It can be enabled by default on distros because it only
depends on existing and already enabled events and causes no overhead when
disabled. rtsl targets more specific cases, like safety-critical systems, where
the overhead is acceptable because of the sound analysis of the scheduling bound
(which is rooted in a formal specification & analysis of the system).

-- Daniel

> -- Steve
> 


  reply	other threads:[~2021-06-12  9:41 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-14 20:51 [PATCH V3 0/9] hwlat improvements and osnoise/timerlat tracers Daniel Bristot de Oliveira
2021-05-14 20:51 ` [PATCH V3 1/9] tracing/hwlat: Fix Clark's email Daniel Bristot de Oliveira
2021-05-14 20:51 ` [PATCH V3 2/9] tracing/hwlat: Implement the mode config option Daniel Bristot de Oliveira
2021-06-03 20:11   ` Steven Rostedt
2021-05-14 20:51 ` [PATCH V3 3/9] tracing/hwlat: Switch disable_migrate to mode none Daniel Bristot de Oliveira
2021-05-14 20:51 ` [PATCH V3 4/9] tracing/hwlat: Implement the per-cpu mode Daniel Bristot de Oliveira
2021-05-27 11:58   ` Juri Lelli
2021-05-27 12:29     ` Daniel Bristot de Oliveira
2021-06-03 21:17   ` Steven Rostedt
2021-06-04 15:31     ` Daniel Bristot de Oliveira
2021-05-14 20:51 ` [PATCH V3 5/9] tracing/trace: Add a generic function to read/write u64 values from tracefs Daniel Bristot de Oliveira
2021-06-03 21:22   ` Steven Rostedt
2021-06-04 16:05     ` Daniel Bristot de Oliveira
2021-06-04 16:18       ` Steven Rostedt
2021-06-04 16:34         ` Daniel Bristot de Oliveira
2021-05-14 20:51 ` [PATCH V3 6/9] trace/hwlat: Use the generic function to read/write width and window Daniel Bristot de Oliveira
2021-06-03 21:27   ` Steven Rostedt
2021-06-04 16:36     ` Daniel Bristot de Oliveira
2021-06-04 20:50       ` Steven Rostedt
2021-05-14 20:51 ` [PATCH V3 7/9] tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers Daniel Bristot de Oliveira
2021-06-03 21:29   ` Steven Rostedt
2021-06-04  4:19     ` Joe Perches
2021-06-04 16:21       ` Steven Rostedt
2021-06-04 19:09         ` [PATCH] treewide: Add missing semicolons to __assign_str uses Joe Perches
2021-06-04 19:09           ` Joe Perches
2021-06-04 19:09           ` Joe Perches
2021-06-04 19:38         ` Joe Perches
2021-06-04 19:38           ` Joe Perches
2021-06-04 19:38           ` Joe Perches
2021-06-07 23:18           ` Jason Gunthorpe
2021-06-07 23:18             ` Jason Gunthorpe
2021-06-07 23:18             ` Jason Gunthorpe
2021-06-12 15:42         ` [PATCH V2] " Joe Perches
2021-06-12 15:42           ` Joe Perches
2021-06-12 15:42           ` Joe Perches
2021-06-12 23:11           ` Steven Rostedt
2021-06-12 23:11             ` Steven Rostedt
2021-06-12 23:11             ` Steven Rostedt
2021-06-30 11:28           ` Joe Perches
2021-06-30 11:28             ` Joe Perches
2021-06-30 11:28             ` Joe Perches
2021-06-30 12:22             ` Steven Rostedt
2021-06-30 12:22               ` Steven Rostedt
2021-06-30 12:22               ` Steven Rostedt
2021-06-04 16:07     ` [PATCH V3 7/9] tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers Daniel Bristot de Oliveira
2021-05-14 20:51 ` [PATCH V3 8/9] tracing: Add osnoise tracer Daniel Bristot de Oliveira
2021-06-03 21:31   ` Steven Rostedt
2021-06-04 21:28   ` Steven Rostedt
2021-06-07 12:00     ` Daniel Bristot de Oliveira
2021-06-07 15:47       ` Steven Rostedt
2021-06-08 15:24         ` Daniel Bristot de Oliveira
2021-06-08 17:17     ` Daniel Bristot de Oliveira
2021-06-08 17:39       ` Steven Rostedt
2021-06-08 19:33         ` Daniel Bristot de Oliveira
2021-06-08 19:42           ` Steven Rostedt
2021-06-09 12:14     ` Daniel Bristot de Oliveira
2021-06-09 13:03       ` Steven Rostedt
2021-06-09 13:44         ` Daniel Bristot de Oliveira
2021-05-14 20:51 ` [PATCH V3 9/9] tracing: Add timerlat tracer Daniel Bristot de Oliveira
2021-06-08  1:36   ` Steven Rostedt
2021-06-11 12:59     ` Daniel Bristot de Oliveira
2021-06-11 20:03       ` Steven Rostedt
2021-06-12  9:41         ` Daniel Bristot de Oliveira [this message]
2021-06-12 23:06           ` Steven Rostedt
2021-06-11 14:13     ` Daniel Bristot de Oliveira
2021-06-11 20:48       ` Steven Rostedt
2021-06-12  8:47         ` Daniel Bristot de Oliveira
2021-06-12 23:09           ` Steven Rostedt
2021-06-15  8:18             ` Daniel Bristot de Oliveira
2021-05-27 12:07 ` [PATCH V3 0/9] hwlat improvements and osnoise/timerlat tracers Juri Lelli
2021-05-29  2:16   ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6808245d-208c-c6d2-1c6e-7410df158992@redhat.com \
    --to=bristot@redhat.com \
    --cc=alexandre.chartre@oracle.com \
    --cc=bigeasy@linutronix.de \
    --cc=corbet@lwn.net \
    --cc=jkacur@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=kcarcia@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.