Re: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs

Linux-HyperV Archive mirror
 help / color / mirror / Atom feed

From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	KY Srinivasan <kys@microsoft.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	Dexuan Cui <decui@microsoft.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"edumazet@google.com" <edumazet@google.com>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	Long Li <longli@microsoft.com>,
	"leon@kernel.org" <leon@kernel.org>,
	"cai.huoqing@linux.dev" <cai.huoqing@linux.dev>,
	"ssengar@linux.microsoft.com" <ssengar@linux.microsoft.com>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	Souradeep Chakrabarti <schakrabarti@microsoft.com>,
	Paul Rosswurm <paulros@microsoft.com>
Subject: Re: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs
Date: Mon, 15 Jan 2024 22:13:43 -0800	[thread overview]
Message-ID: <20240116061343.GA24925@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> (raw)
In-Reply-To: <ZaLgdn53bBoYyT/h@yury-ThinkPad>

On Sat, Jan 13, 2024 at 11:11:50AM -0800, Yury Norov wrote:
> On Sat, Jan 13, 2024 at 04:20:31PM +0000, Michael Kelley wrote:
> > From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Sent: Friday, January 12, 2024 10:31 PM
> > 
> > > On Fri, Jan 12, 2024 at 06:30:44PM +0000, Haiyang Zhang wrote:
> > > >
> > > > > -----Original Message-----
> > > > From: Michael Kelley <mhklinux@outlook.com> Sent: Friday, January 12, 2024 11:37 AM
> > > > >
> > > > > From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Sent:
> > > > > Wednesday, January 10, 2024 10:13 PM
> > > > > >
> > > > > > The test topology was used to check the performance between
> > > > > > cpu_local_spread() and the new approach is :
> > > > > > Case 1
> > > > > > IRQ     Nodes  Cores CPUs
> > > > > > 0       1      0     0-1
> > > > > > 1       1      1     2-3
> > > > > > 2       1      2     4-5
> > > > > > 3       1      3     6-7
> > > > > >
> > > > > > and with existing cpu_local_spread()
> > > > > > Case 2
> > > > > > IRQ    Nodes  Cores CPUs
> > > > > > 0      1      0     0
> > > > > > 1      1      0     1
> > > > > > 2      1      1     2
> > > > > > 3      1      1     3
> > > > > >
> > > > > > Total 4 channels were used, which was set up by ethtool.
> > > > > > case 1 with ntttcp has given 15 percent better performance, than
> > > > > > case 2. During the test irqbalance was disabled as well.
> > > > > >
> > > > > > Also you are right, with 64CPU system this approach will spread
> > > > > > the irqs like the cpu_local_spread() but in the future we will offer
> > > > > > MANA nodes, with more than 64 CPUs. There it this new design will
> > > > > > give better performance.
> > > > > >
> > > > > > I will add this performance benefit details in commit message of
> > > > > > next version.
> > > > >
> > > > > Here are my concerns:
> > > > >
> > > > > 1.  The most commonly used VMs these days have 64 or fewer
> > > > > vCPUs and won't see any performance benefit.
> > > > >
> > > > > 2.  Larger VMs probably won't see the full 15% benefit because
> > > > > all vCPUs in the local NUMA node will be assigned IRQs.  For
> > > > > example, in a VM with 96 vCPUs and 2 NUMA nodes, all 48
> > > > > vCPUs in NUMA node 0 will all be assigned IRQs.  The remaining
> > > > > 16 IRQs will be spread out on the 48 CPUs in NUMA node 1
> > > > > in a way that avoids sharing a core.  But overall the means
> > > > > that 75% of the IRQs will still be sharing a core and
> > > > > presumably not see any perf benefit.
> > > > >
> > > > > 3.  Your experiment was on a relatively small scale:   4 IRQs
> > > > > spread across 2 cores vs. across 4 cores.  Have you run any
> > > > > experiments on VMs with 128 vCPUs (for example) where
> > > > > most of the IRQs are not sharing a core?  I'm wondering if
> > > > > the results with 4 IRQs really scale up to 64 IRQs.  A lot can
> > > > > be different in a VM with 64 cores and 2 NUMA nodes vs.
> > > > > 4 cores in a single node.
> > > > >
> > > > > 4.  The new algorithm prefers assigning to all vCPUs in
> > > > > each NUMA hop over assigning to separate cores.  Are there
> > > > > experiments showing that is the right tradeoff?  What
> > > > > are the results if assigning to separate cores is preferred?
> > > >
> > > > I remember in a customer case, putting the IRQs on the same
> > > > NUMA node has better perf. But I agree, this should be re-tested
> > > > on MANA nic.
> > >
> > > 1) and 2) The change will not decrease the existing performance, but for
> > > system with high number of CPU, will be benefited after this.
> > > 
> > > 3) The result has shown around 6 percent improvement.
> > > 
> > > 4)The test result has shown around 10 percent difference when IRQs are
> > > spread on multiple numa nodes.
> > 
> > OK, this looks pretty good.  Make clear in the commit messages what
> > the tradeoffs are, and what the real-world benefits are expected to be.
> > Some future developer who wants to understand why IRQs are assigned
> > this way will thank you. :-)
> 
> I agree with Michael, this needs to be spoken aloud.
> 
> >From the above, is that correct that the best performance is achieved
> when the # of IRQs is half the nubmer of CPUs in the 1st node, because
> this configuration allows to spread IRQs across cores the most optimal
> way?  And if we have more or less than that, it hurts performance, at
> least for MANA networking?
It does not decrease the performance from current cpu_local_spread(),
but optimum performance comes when node has CPUs double that of number
of IRQs (considering SMT==2). 

Now only if the number of CPUs are same that of number of IRQs,
(that is num of CPUs <= 64) then, we see same performance like existing
design with cpu_local_spread().

If node has more CPUs than 64, then we get better performance than 
cpu_local_spread().
> 
> So, the B|A performance chart may look like this, right?
> 
>   irq     nodes     cores     cpus      perf
>   0       1 | 1     0 | 0     0 | 0-1      0%
>   1       1 | 1     0 | 1     1 | 2-3     +5%
>   2       1 | 1     1 | 2     2 | 4-5    +10%
>   3       1 | 1     1 | 3     3 | 6-7    +15%
>   4       1 | 1     0 | 4     3 | 0-1    +12%
>   ...       |         |         |
>   7       1 | 1     1 | 7     3 | 6-7      0%
>   ...
>  15       2 | 2     3 | 3    15 | 14-15    0%
> 
> Souradeep, can you please confirm that my understanding is correct?
> 
> In v5, can you add a table like the above with real performance
> numbers for your driver? I think that it would help people to
> configure their VMs better when networking is a bottleneck.
> 
I will share a chart on next version of patch 3.
Thanks for the suggestion.
> Thanks,
> Yury

next prev parent reply	other threads:[~2024-01-16  6:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-09 10:51 [PATCH 0/4 net-next] net: mana: Assigning IRQ affinity on HT cores Souradeep Chakrabarti
2024-01-09 10:51 ` [PATCH 1/4 net-next] cpumask: add cpumask_weight_andnot() Souradeep Chakrabarti
2024-01-09 10:51 ` [PATCH 2/4 net-next] cpumask: define cleanup function for cpumasks Souradeep Chakrabarti
2024-01-09 10:51 ` [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs Souradeep Chakrabarti
2024-01-09 19:22   ` Michael Kelley
2024-01-09 20:20     ` Haiyang Zhang
2024-01-10  9:08       ` Souradeep Chakrabarti
2024-01-09 23:28     ` Yury Norov
2024-01-10  0:27       ` Michael Kelley
2024-01-11  6:13         ` Souradeep Chakrabarti
2024-01-12 16:36           ` Michael Kelley
2024-01-12 18:30             ` Haiyang Zhang
2024-01-13  6:30               ` Souradeep Chakrabarti
2024-01-13 16:20                 ` Michael Kelley
2024-01-13 19:11                   ` Yury Norov
2024-01-16  6:13                     ` Souradeep Chakrabarti [this message]
2024-01-10  9:09       ` Souradeep Chakrabarti
2024-01-09 10:51 ` [PATCH 4/4 net-next] net: mana: Assigning IRQ affinity on HT cores Souradeep Chakrabarti
2024-01-09 11:57 ` [PATCH 0/4 " Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240116061343.GA24925@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net \
    --to=schakrabarti@linux.microsoft.com \
    --cc=cai.huoqing@linux.dev \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kys@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mhklinux@outlook.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulros@microsoft.com \
    --cc=schakrabarti@microsoft.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wei.liu@kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).