[PATCH 0/2] L3FWD sample optimisation

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] L3FWD sample optimisation
@ 2014-05-22 16:55 Konstantin Ananyev
       [not found] ` <1400777742-498-1-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ananyev @ 2014-05-22 16:55 UTC (permalink / raw
  To: dev-VfR2kkLFssw, dev-VfR2kkLFssw

With latest HW and optimised RX/TX path there is a huge gap between
tespmd iofwd and l3fwd performance results.
So there is an attempt to optimise l3fwd LPM code path and reduce the gap:
 - Instead of processing each input packet up to completion -      
 divide packet processing into several stages and perform      
 stage by stage for the whole burst.
 - Unroll things by the factor of 4 whenever possible.
 - Use SSE instincts for some operations (bswap, replace MAC addresses, etc).
 - Avoid TX packet buffering whenever possible.
 - Move some checks from RX/TX into setup phase.

 app/test/test_lpm.c                             |   70 ++++
 examples/l3fwd/main.c                           |  467 +++++++++++++++++++++-
 lib/librte_eal/common/Makefile                  |    1 +
 lib/librte_eal/common/include/rte_common_vect.h |   93 +++++
 lib/librte_lpm/rte_lpm.h                        |  117 ++++++
 5 files changed, 726 insertions(+), 22 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_common_vect.h

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] L3FWD sample optimisation
       [not found] ` <1400777742-498-1-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2014-05-23  8:05   ` Thomas Monjalon
  2014-06-04 13:47   ` Cao, Waterman
  2014-06-06  8:26   ` De Lara Guarch, Pablo
  2 siblings, 0 replies; 6+ messages in thread
From: Thomas Monjalon @ 2014-05-23  8:05 UTC (permalink / raw
  To: Konstantin Ananyev; +Cc: dev-VfR2kkLFssw

Hi Konstantin,

2014-05-22 17:55, Konstantin Ananyev:
> With latest HW and optimised RX/TX path there is a huge gap between
> tespmd iofwd and l3fwd performance results.
> So there is an attempt to optimise l3fwd LPM code path and reduce the gap:
>  - Instead of processing each input packet up to completion -
>  divide packet processing into several stages and perform
>  stage by stage for the whole burst.
>  - Unroll things by the factor of 4 whenever possible.
>  - Use SSE instincts for some operations (bswap, replace MAC addresses,
> etc). - Avoid TX packet buffering whenever possible.
>  - Move some checks from RX/TX into setup phase.

As you are doing optimizations, it's important to know the performance gain.
It could help to mitigate future reworks.
So please, could you provide some benchmarking numbers in the commit log?

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] L3FWD sample optimisation
@ 2014-05-28  9:17 Ananyev, Konstantin
       [not found] ` <2601191342CEEE43887BDE71AB9772580EFB3529-kPTMFJFq+rEu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ananyev, Konstantin @ 2014-05-28  9:17 UTC (permalink / raw
  To: Thomas Monjalon; +Cc: dev-VfR2kkLFssw@public.gmane.org

Hi Thomas,

>As you are doing optimizations, it's important to know the performance gain.
>It could help to mitigate future reworks.
>So please, could you provide some benchmarking numbers in the commit log?

Some performance data below.
Also, forgot to mention that new code path can be switched on/off by setting
ENABLE_MULTI_BUFFER_OPTIMIZE macro to 1/0.
Do I need to resubmit the whole patch series, or just a cover letter, or ...?

Konstantin

SUT:   dual-socket board IVB 2.8 GHz  with 4 ports on 4 NIC (all at socket 0) connected to the traffic generator.
2x1GB pages, kernel: 3.11.3-201.fc19.x86_64, gcc 4.8.2.
64B packets, using the packet flooding method.
All 4 ports are managed by one logical core:
Optimised scalar PMD RX/TX was used.

                                                           DIFF % (NEW-OLD)
IPV4-CONT-BURST:                              +23%
IPV6-CONT-BURST :                             +13% 
IPV4/IPV6-CONT-BURST:                   +8%
IPV4-4STREAMSX8:                              +7%
IPV4-4STREAMSX1:                              -2%

Test cases description:
IPV4-CONT-BURST - IPV4 packets all packets from the one input port are destined for the same output port.
IPV6-CONT-BURST - IPV6 packets all packets from the one input port are destined for the same output port.
IPV4/IPV6-CONT-BURST - mix of the first 2 with interleave=1 (e.g: IPV4,IPV6,IPV4,IPV6, ...)
IPV4-4STREAMSX1 - 4 streams of IPV4 packets, where all packets from same stream are destined for the same output port
(e.g: IPV4_DST_P0, IPV4_DST_P1,  IPV4_DST_P2, IPV4_DST_P3, IPV4_DST_P0, ...)
IPV4-4STREAMSX8 - same as above but packets for each stream are coming in groups of 8
(e.g:  IPV4_DST_P0 X 8, IPV4_DST_P1 X 8,  IPV4_DST_P2 X 8, IPV4_DST_P3 X 8, IPV4_DST_P0 X 8, ...)        

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] L3FWD sample optimisation
       [not found] ` <1400777742-498-1-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2014-05-23  8:05   ` Thomas Monjalon
@ 2014-06-04 13:47   ` Cao, Waterman
  2014-06-06  8:26   ` De Lara Guarch, Pablo
  2 siblings, 0 replies; 6+ messages in thread
From: Cao, Waterman @ 2014-06-04 13:47 UTC (permalink / raw
  To: Ananyev, Konstantin, dev-VfR2kkLFssw@public.gmane.org,
	Thomas Monjalon

Tested-by: Waterman Cao <waterman.cao@intel.com>

This patch has been tested by Intel. We performed l3fwd performance test. 
Test result shows that l3fwd performance with this ‘lpm optimization’ patch is much higher than that without this patch. 
Test environment: Fedora 20, Linux Kernel 3.11.10, GCC 4.8.2, Intel Xeon processor E5-2680 v2, with 2 ports on 2 Niantic (all at socket 0)
Please refer performance data from the separate email：
http://dpdk.org/ml/archives/dev/2014-May/002703.html  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] L3FWD sample optimisation
       [not found] ` <1400777742-498-1-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2014-05-23  8:05   ` Thomas Monjalon
  2014-06-04 13:47   ` Cao, Waterman
@ 2014-06-06  8:26   ` De Lara Guarch, Pablo
  2 siblings, 0 replies; 6+ messages in thread
From: De Lara Guarch, Pablo @ 2014-06-06  8:26 UTC (permalink / raw
  To: Ananyev, Konstantin, dev-VfR2kkLFssw@public.gmane.org

Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> -----Original Message-----
> From: dev [mailto:dev-bounces-VfR2kkLFssw@public.gmane.org] On Behalf Of Konstantin Ananyev
> Sent: Thursday, May 22, 2014 5:56 PM
> To: dev-VfR2kkLFssw@public.gmane.org; dev-VfR2kkLFssw@public.gmane.org
> Subject: [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation
> 
> With latest HW and optimised RX/TX path there is a huge gap between
> tespmd iofwd and l3fwd performance results.
> So there is an attempt to optimise l3fwd LPM code path and reduce the gap:
>  - Instead of processing each input packet up to completion -
>  divide packet processing into several stages and perform
>  stage by stage for the whole burst.
>  - Unroll things by the factor of 4 whenever possible.
>  - Use SSE instincts for some operations (bswap, replace MAC addresses, etc).
>  - Avoid TX packet buffering whenever possible.
>  - Move some checks from RX/TX into setup phase.
> 
>  app/test/test_lpm.c                             |   70 ++++
>  examples/l3fwd/main.c                           |  467 +++++++++++++++++++++-
>  lib/librte_eal/common/Makefile                  |    1 +
>  lib/librte_eal/common/include/rte_common_vect.h |   93 +++++
>  lib/librte_lpm/rte_lpm.h                        |  117 ++++++
>  5 files changed, 726 insertions(+), 22 deletions(-)
>  create mode 100644 lib/librte_eal/common/include/rte_common_vect.h
> 
> --
> 1.7.7.6

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] L3FWD sample optimisation
       [not found] ` <2601191342CEEE43887BDE71AB9772580EFB3529-kPTMFJFq+rEu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-06-10 22:44   ` Thomas Monjalon
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Monjalon @ 2014-06-10 22:44 UTC (permalink / raw
  To: Ananyev, Konstantin; +Cc: dev-VfR2kkLFssw

Hi Konstantin,

2014-05-28 09:17, Ananyev, Konstantin:
> Hi Thomas,
> 
> >As you are doing optimizations, it's important to know the performance gain.
> >It could help to mitigate future reworks.
> >So please, could you provide some benchmarking numbers in the commit log?
> 
> Some performance data below.
> Also, forgot to mention that new code path can be switched on/off by setting
> ENABLE_MULTI_BUFFER_OPTIMIZE macro to 1/0.
> Do I need to resubmit the whole patch series, or just a cover letter, or ...?

I think you should resubmit the whole serie after having checked it with checkpatch.pl.
Please keep Acked-by and Tested-by lines from previous mails.

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-10 22:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-28  9:17 [PATCH 0/2] L3FWD sample optimisation Ananyev, Konstantin
     [not found] ` <2601191342CEEE43887BDE71AB9772580EFB3529-kPTMFJFq+rEu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-06-10 22:44   ` Thomas Monjalon
  -- strict thread matches above, loose matches on Subject: below --
2014-05-22 16:55 Konstantin Ananyev
     [not found] ` <1400777742-498-1-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2014-05-23  8:05   ` Thomas Monjalon
2014-06-04 13:47   ` Cao, Waterman
2014-06-06  8:26   ` De Lara Guarch, Pablo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.