Re: [PATCH] net/af_packet: cache align Rx/Tx structs

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: "Mattias Rönnblom" <hofors@lysator.liu.se>
To: Stephen Hemminger <stephen@networkplumber.org>,
	Ferruh Yigit <ferruh.yigit@amd.com>
Cc: "Mattias Rönnblom" <mattias.ronnblom@ericsson.com>,
	"John W . Linville" <linville@tuxdriver.com>,
	dev@dpdk.org, "Tyler Retzlaff" <roretzla@linux.microsoft.com>,
	"Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>
Subject: Re: [PATCH] net/af_packet: cache align Rx/Tx structs
Date: Thu, 25 Apr 2024 00:27:36 +0200	[thread overview]
Message-ID: <2371b1a8-bdc5-4184-8491-54e2e3a64211@lysator.liu.se> (raw)
In-Reply-To: <20240424121330.7547e290@hermes.local>

On 2024-04-24 21:13, Stephen Hemminger wrote:
> On Wed, 24 Apr 2024 18:50:50 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>>> I don't know how slow af_packet is, but if you care about performance,
>>> you don't want to use atomic add for statistics.
>>>    
>>
>> There are a few soft drivers already using atomics adds for updating stats.
>> If we document expectations from 'rte_eth_stats_reset()', we can update
>> those usages.
> 
> Using atomic add is lots of extra overhead. The statistics are not guaranteed
> to be perfect.  If nothing else, the bytes and packets can be skewed.
> 

The sad thing here is that in case the counters are reset within the 
load-modify-store cycle of the lcore counter update, the reset may end 
up being a nop. So, it's not like you missed a packet or two, or suffer 
some transient inconsistency, but you completed and permanently ignored 
the reset request.

> The soft drivers af_xdp, af_packet, and tun performance is dominated by the
> overhead of the kernel system call and copies. Yes, alignment is good
> but won't be noticeable.

There aren't any syscalls in the RX path in the af_packet PMD.

I added the same statistics updates as the af_packet PMD uses into an 
benchmark app which consumes ~1000 cc in-between stats updates.

If the equivalent of the RX queue struct was cache aligned, the 
statistics overhead was so small it was difficult to measure. Less than 
3-4 cc per update. This was with volatile, but without atomics.

If the RX queue struct wasn't cache aligned, and sized so a cache line 
generally was used by two (neighboring) cores, the stats incurred a cost 
of ~55 cc per update.

Shaving off 55 cc should translate to a couple of hundred percent 
increased performance for an empty af_packet poll. If your lcore has 
some other primary source of work than the af_packet RX queue, and the 
RX queue is polled often, then this may well be a noticeable gain.

The benchmark was run on 16 Gracemont cores, which in my experience 
seems to have a little shorter core-to-core latency than many other 
systems, provided the remote core/cache line owner is located in the 
same cluster.

next prev parent reply	other threads:[~2024-04-24 22:27 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-23  9:08 [PATCH] net/af_packet: cache align Rx/Tx structs Mattias Rönnblom
2024-04-23 11:15 ` Ferruh Yigit
2024-04-23 20:56   ` Mattias Rönnblom
2024-04-24  0:27     ` Honnappa Nagarahalli
2024-04-24  6:28       ` Mattias Rönnblom
2024-04-24 10:21     ` Ferruh Yigit
2024-04-24 10:28       ` Bruce Richardson
2024-04-24 18:02         ` Ferruh Yigit
2024-04-24 11:57       ` Mattias Rönnblom
2024-04-24 17:50         ` Ferruh Yigit
2024-04-24 19:13           ` Stephen Hemminger
2024-04-24 22:27             ` Mattias Rönnblom [this message]
2024-04-24 23:55               ` Stephen Hemminger
2024-04-25  9:26                 ` Mattias Rönnblom
2024-04-25  9:49                   ` Morten Brørup
2024-04-25 14:04                   ` Ferruh Yigit
2024-04-25 15:06                     ` Mattias Rönnblom
2024-04-25 16:21                       ` Ferruh Yigit
2024-04-25 15:07                     ` Stephen Hemminger
2024-04-25 14:08   ` Ferruh Yigit
2024-04-25 15:08     ` Mattias Rönnblom
2024-04-25 15:35       ` Ferruh Yigit
2024-04-26  7:25         ` Mattias Rönnblom
2024-04-26  7:38 ` Mattias Rönnblom
2024-04-26  8:27   ` Ferruh Yigit
2024-04-26 10:20     ` Mattias Rönnblom
2024-04-26  9:05   ` [PATCH v3] " Mattias Rönnblom
2024-04-26  9:22     ` Morten Brørup
2024-04-26 15:10     ` Stephen Hemminger
2024-04-26 15:41     ` Tyler Retzlaff
2024-04-29  8:46       ` Ferruh Yigit
2024-04-26 21:27 ` [PATCH] " Patrick Robb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2371b1a8-bdc5-4184-8491-54e2e3a64211@lysator.liu.se \
    --to=hofors@lysator.liu.se \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=linville@tuxdriver.com \
    --cc=mattias.ronnblom@ericsson.com \
    --cc=roretzla@linux.microsoft.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.