* tc tp creation performance degratation since kernel 5.1
@ 2019-06-12 12:03 Jiri Pirko
2019-06-12 12:30 ` Paolo Abeni
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Jiri Pirko @ 2019-06-12 12:03 UTC (permalink / raw
To: netdev; +Cc: vladbu, pablo, xiyou.wangcong, jhs, mlxsw, alexanderk
Hi.
I came across serious performance degradation when adding many tps. I'm
using following script:
------------------------------------------------------------------------
#!/bin/bash
dev=testdummy
ip link add name $dev type dummy
ip link set dev $dev up
tc qdisc add dev $dev ingress
tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
pref_id=1
while [ $pref_id -lt 20000 ]
do
echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
((pref_id++))
done
start=$(date +"%s")
tc -b $tmp_file_name
stop=$(date +"%s")
echo "Insertion duration: $(($stop - $start)) sec"
rm -f $tmp_file_name
ip link del dev $dev
------------------------------------------------------------------------
On my testing vm, result on 5.1 kernel is:
Insertion duration: 3 sec
On net-next this is:
Insertion duration: 54 sec
I did simple prifiling using perf. Output on 5.1 kernel:
77.85% tc [kernel.kallsyms] [k] tcf_chain_tp_find
3.30% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.33% tc_pref_scale.s [kernel.kallsyms] [k] do_syscall_64
0.60% tc_pref_scale.s libc-2.28.so [.] malloc
0.55% tc [kernel.kallsyms] [k] mutex_spin_on_owner
0.51% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
0.40% tc_pref_scale.s libc-2.28.so [.] __gconv_transform_utf8_internal
0.38% tc_pref_scale.s libc-2.28.so [.] _int_free
0.37% tc_pref_scale.s libc-2.28.so [.] __GI___strlen_sse2
0.37% tc [kernel.kallsyms] [k] idr_get_free
Output on net-next:
39.26% tc [kernel.vmlinux] [k] lock_is_held_type
33.99% tc [kernel.vmlinux] [k] tcf_chain_tp_find
12.77% tc [kernel.vmlinux] [k] __asan_load4_noabort
1.90% tc [kernel.vmlinux] [k] __asan_load8_noabort
1.08% tc [kernel.vmlinux] [k] lock_acquire
0.94% tc [kernel.vmlinux] [k] debug_lockdep_rcu_enabled
0.61% tc [kernel.vmlinux] [k] debug_lockdep_rcu_enabled.part.5
0.51% tc [kernel.vmlinux] [k] unwind_next_frame
0.50% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
0.47% tc_pref_scale.s [kernel.vmlinux] [k] lock_acquire
0.47% tc [kernel.vmlinux] [k] lock_release
I didn't investigate this any further now. I fear that this might be
related to Vlad's changes in the area. Any ideas?
Thanks!
Jiri
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tc tp creation performance degratation since kernel 5.1
2019-06-12 12:03 tc tp creation performance degratation since kernel 5.1 Jiri Pirko
@ 2019-06-12 12:30 ` Paolo Abeni
2019-06-13 4:50 ` Jiri Pirko
2019-06-12 12:34 ` Vlad Buslov
2019-06-13 8:11 ` Jiri Pirko
2 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2019-06-12 12:30 UTC (permalink / raw
To: Jiri Pirko, netdev; +Cc: vladbu, pablo, xiyou.wangcong, jhs, mlxsw, alexanderk
Hi,
On Wed, 2019-06-12 at 14:03 +0200, Jiri Pirko wrote:
> I did simple prifiling using perf. Output on 5.1 kernel:
> 77.85% tc [kernel.kallsyms] [k] tcf_chain_tp_find
> 3.30% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
> 1.33% tc_pref_scale.s [kernel.kallsyms] [k] do_syscall_64
> 0.60% tc_pref_scale.s libc-2.28.so [.] malloc
> 0.55% tc [kernel.kallsyms] [k] mutex_spin_on_owner
> 0.51% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
> 0.40% tc_pref_scale.s libc-2.28.so [.] __gconv_transform_utf8_internal
> 0.38% tc_pref_scale.s libc-2.28.so [.] _int_free
> 0.37% tc_pref_scale.s libc-2.28.so [.] __GI___strlen_sse2
> 0.37% tc [kernel.kallsyms] [k] idr_get_free
>
> Output on net-next:
> 39.26% tc [kernel.vmlinux] [k] lock_is_held_type
It looks like you have lockdep enabled here, but not on the 5.1 build.
That would explain such a large perf difference.
Can you please double check?
thanks,
Paolo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tc tp creation performance degratation since kernel 5.1
2019-06-12 12:03 tc tp creation performance degratation since kernel 5.1 Jiri Pirko
2019-06-12 12:30 ` Paolo Abeni
@ 2019-06-12 12:34 ` Vlad Buslov
2019-06-13 5:49 ` Jiri Pirko
2019-06-13 8:11 ` Jiri Pirko
2 siblings, 1 reply; 10+ messages in thread
From: Vlad Buslov @ 2019-06-12 12:34 UTC (permalink / raw
To: Jiri Pirko
Cc: netdev@vger.kernel.org, Vlad Buslov, pablo@netfilter.org,
xiyou.wangcong@gmail.com, jhs@mojatatu.com, mlxsw, Alex Kushnarov
On Wed 12 Jun 2019 at 15:03, Jiri Pirko <jiri@resnulli.us> wrote:
> Hi.
>
> I came across serious performance degradation when adding many tps. I'm
> using following script:
>
> ------------------------------------------------------------------------
> #!/bin/bash
>
> dev=testdummy
> ip link add name $dev type dummy
> ip link set dev $dev up
> tc qdisc add dev $dev ingress
>
> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
> pref_id=1
>
> while [ $pref_id -lt 20000 ]
> do
> echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
> ((pref_id++))
> done
>
> start=$(date +"%s")
> tc -b $tmp_file_name
> stop=$(date +"%s")
> echo "Insertion duration: $(($stop - $start)) sec"
> rm -f $tmp_file_name
>
> ip link del dev $dev
> ------------------------------------------------------------------------
>
> On my testing vm, result on 5.1 kernel is:
> Insertion duration: 3 sec
> On net-next this is:
> Insertion duration: 54 sec
>
> I did simple prifiling using perf. Output on 5.1 kernel:
> 77.85% tc [kernel.kallsyms] [k] tcf_chain_tp_find
> 3.30% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
> 1.33% tc_pref_scale.s [kernel.kallsyms] [k] do_syscall_64
> 0.60% tc_pref_scale.s libc-2.28.so [.] malloc
> 0.55% tc [kernel.kallsyms] [k] mutex_spin_on_owner
> 0.51% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
> 0.40% tc_pref_scale.s libc-2.28.so [.] __gconv_transform_utf8_internal
> 0.38% tc_pref_scale.s libc-2.28.so [.] _int_free
> 0.37% tc_pref_scale.s libc-2.28.so [.] __GI___strlen_sse2
> 0.37% tc [kernel.kallsyms] [k] idr_get_free
Are these results for same config? Here I don't see any lockdep or
KASAN. However in next trace...
>
> Output on net-next:
> 39.26% tc [kernel.vmlinux] [k] lock_is_held_type
> 33.99% tc [kernel.vmlinux] [k] tcf_chain_tp_find
> 12.77% tc [kernel.vmlinux] [k] __asan_load4_noabort
> 1.90% tc [kernel.vmlinux] [k] __asan_load8_noabort
> 1.08% tc [kernel.vmlinux] [k] lock_acquire
> 0.94% tc [kernel.vmlinux] [k] debug_lockdep_rcu_enabled
> 0.61% tc [kernel.vmlinux] [k] debug_lockdep_rcu_enabled.part.5
> 0.51% tc [kernel.vmlinux] [k] unwind_next_frame
> 0.50% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
> 0.47% tc_pref_scale.s [kernel.vmlinux] [k] lock_acquire
> 0.47% tc [kernel.vmlinux] [k] lock_release
... both lockdep and kasan consume most of CPU time.
BTW it takes 5 sec to execute your script on my system with net-next
(debug options disabled).
>
> I didn't investigate this any further now. I fear that this might be
> related to Vlad's changes in the area. Any ideas?
>
> Thanks!
>
> Jiri
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tc tp creation performance degratation since kernel 5.1
2019-06-12 12:30 ` Paolo Abeni
@ 2019-06-13 4:50 ` Jiri Pirko
0 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2019-06-13 4:50 UTC (permalink / raw
To: Paolo Abeni; +Cc: netdev, vladbu, pablo, xiyou.wangcong, jhs, mlxsw, alexanderk
Wed, Jun 12, 2019 at 02:30:37PM CEST, pabeni@redhat.com wrote:
>Hi,
>
>On Wed, 2019-06-12 at 14:03 +0200, Jiri Pirko wrote:
>> I did simple prifiling using perf. Output on 5.1 kernel:
>> 77.85% tc [kernel.kallsyms] [k] tcf_chain_tp_find
>> 3.30% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
>> 1.33% tc_pref_scale.s [kernel.kallsyms] [k] do_syscall_64
>> 0.60% tc_pref_scale.s libc-2.28.so [.] malloc
>> 0.55% tc [kernel.kallsyms] [k] mutex_spin_on_owner
>> 0.51% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>> 0.40% tc_pref_scale.s libc-2.28.so [.] __gconv_transform_utf8_internal
>> 0.38% tc_pref_scale.s libc-2.28.so [.] _int_free
>> 0.37% tc_pref_scale.s libc-2.28.so [.] __GI___strlen_sse2
>> 0.37% tc [kernel.kallsyms] [k] idr_get_free
>>
>> Output on net-next:
>> 39.26% tc [kernel.vmlinux] [k] lock_is_held_type
>
>It looks like you have lockdep enabled here, but not on the 5.1 build.
>
>That would explain such a large perf difference.
>
>Can you please double check?
Will do.
>
>thanks,
>
>Paolo
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tc tp creation performance degratation since kernel 5.1
2019-06-12 12:34 ` Vlad Buslov
@ 2019-06-13 5:49 ` Jiri Pirko
0 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2019-06-13 5:49 UTC (permalink / raw
To: Vlad Buslov
Cc: netdev@vger.kernel.org, pablo@netfilter.org,
xiyou.wangcong@gmail.com, jhs@mojatatu.com, mlxsw, Alex Kushnarov
Wed, Jun 12, 2019 at 02:34:02PM CEST, vladbu@mellanox.com wrote:
>
>On Wed 12 Jun 2019 at 15:03, Jiri Pirko <jiri@resnulli.us> wrote:
>> Hi.
>>
>> I came across serious performance degradation when adding many tps. I'm
>> using following script:
>>
>> ------------------------------------------------------------------------
>> #!/bin/bash
>>
>> dev=testdummy
>> ip link add name $dev type dummy
>> ip link set dev $dev up
>> tc qdisc add dev $dev ingress
>>
>> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
>> pref_id=1
>>
>> while [ $pref_id -lt 20000 ]
>> do
>> echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
>> ((pref_id++))
>> done
>>
>> start=$(date +"%s")
>> tc -b $tmp_file_name
>> stop=$(date +"%s")
>> echo "Insertion duration: $(($stop - $start)) sec"
>> rm -f $tmp_file_name
>>
>> ip link del dev $dev
>> ------------------------------------------------------------------------
>>
>> On my testing vm, result on 5.1 kernel is:
>> Insertion duration: 3 sec
>> On net-next this is:
>> Insertion duration: 54 sec
>>
>> I did simple prifiling using perf. Output on 5.1 kernel:
>> 77.85% tc [kernel.kallsyms] [k] tcf_chain_tp_find
>> 3.30% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
>> 1.33% tc_pref_scale.s [kernel.kallsyms] [k] do_syscall_64
>> 0.60% tc_pref_scale.s libc-2.28.so [.] malloc
>> 0.55% tc [kernel.kallsyms] [k] mutex_spin_on_owner
>> 0.51% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>> 0.40% tc_pref_scale.s libc-2.28.so [.] __gconv_transform_utf8_internal
>> 0.38% tc_pref_scale.s libc-2.28.so [.] _int_free
>> 0.37% tc_pref_scale.s libc-2.28.so [.] __GI___strlen_sse2
>> 0.37% tc [kernel.kallsyms] [k] idr_get_free
>
>Are these results for same config? Here I don't see any lockdep or
>KASAN. However in next trace...
>
>>
>> Output on net-next:
>> 39.26% tc [kernel.vmlinux] [k] lock_is_held_type
>> 33.99% tc [kernel.vmlinux] [k] tcf_chain_tp_find
>> 12.77% tc [kernel.vmlinux] [k] __asan_load4_noabort
>> 1.90% tc [kernel.vmlinux] [k] __asan_load8_noabort
>> 1.08% tc [kernel.vmlinux] [k] lock_acquire
>> 0.94% tc [kernel.vmlinux] [k] debug_lockdep_rcu_enabled
>> 0.61% tc [kernel.vmlinux] [k] debug_lockdep_rcu_enabled.part.5
>> 0.51% tc [kernel.vmlinux] [k] unwind_next_frame
>> 0.50% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
>> 0.47% tc_pref_scale.s [kernel.vmlinux] [k] lock_acquire
>> 0.47% tc [kernel.vmlinux] [k] lock_release
>
>... both lockdep and kasan consume most of CPU time.
>
>BTW it takes 5 sec to execute your script on my system with net-next
>(debug options disabled).
You are right, my bad. Sorry for the fuzz.
>
>>
>> I didn't investigate this any further now. I fear that this might be
>> related to Vlad's changes in the area. Any ideas?
>>
>> Thanks!
>>
>> Jiri
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tc tp creation performance degratation since kernel 5.1
2019-06-12 12:03 tc tp creation performance degratation since kernel 5.1 Jiri Pirko
2019-06-12 12:30 ` Paolo Abeni
2019-06-12 12:34 ` Vlad Buslov
@ 2019-06-13 8:11 ` Jiri Pirko
2019-06-13 10:09 ` Vlad Buslov
2 siblings, 1 reply; 10+ messages in thread
From: Jiri Pirko @ 2019-06-13 8:11 UTC (permalink / raw
To: netdev; +Cc: vladbu, pablo, xiyou.wangcong, jhs, mlxsw, alexanderk, pabeni
I made a mistake during measurements, sorry about that.
This is the correct script:
-----------------------------------------------------------------------
#!/bin/bash
dev=testdummy
ip link add name $dev type dummy
ip link set dev $dev up
tc qdisc add dev $dev ingress
tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
pref_id=1
while [ $pref_id -lt 20000 ]
do
echo "filter add dev $dev ingress proto ip pref $pref_id flower action drop" >> $tmp_file_name
#echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
((pref_id++))
done
start=$(date +"%s")
tc -b $tmp_file_name
stop=$(date +"%s")
echo "Insertion duration: $(($stop - $start)) sec"
rm -f $tmp_file_name
ip link del dev $dev
-----------------------------------------------------------------------
Note the commented out matchall. I don't see the regression with
matchall. However, I see that with flower:
kernel 5.1
Insertion duration: 4 sec
kernel 5.2
Insertion duration: 163 sec
I don't see any significant difference in perf:
kernel 5.1
77.24% tc [kernel.vmlinux] [k] tcf_chain_tp_find
1.67% tc [kernel.vmlinux] [k] mutex_spin_on_owner
1.44% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
0.93% tc [kernel.vmlinux] [k] idr_get_free
0.79% tc_pref_scale_o [kernel.vmlinux] [k] do_syscall_64
0.69% tc [kernel.vmlinux] [k] finish_task_switch
0.53% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
0.49% tc [kernel.vmlinux] [k] __memset
0.36% tc_pref_scale_o libc-2.28.so [.] malloc
0.30% tc_pref_scale_o libc-2.28.so [.] _int_free
0.24% tc [kernel.vmlinux] [k] __memcpy
0.23% tc [cls_flower] [k] fl_change
0.23% tc [kernel.vmlinux] [k] __nla_validate_parse
0.22% tc [kernel.vmlinux] [k] __slab_alloc
75.57% tc [kernel.kallsyms] [k] tcf_chain_tp_find
2.70% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.13% tc_pref_scale_o [kernel.kallsyms] [k] do_syscall_64
0.87% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
0.86% ip [kernel.kallsyms] [k] finish_task_switch
0.67% tc [kernel.kallsyms] [k] memset
0.63% tc [kernel.kallsyms] [k] mutex_spin_on_owner
0.52% tc_pref_scale_o libc-2.28.so [.] malloc
0.48% tc [kernel.kallsyms] [k] idr_get_free
0.46% tc [kernel.kallsyms] [k] fl_change
0.42% tc_pref_scale_o libc-2.28.so [.] _int_free
0.35% tc_pref_scale_o libc-2.28.so [.] __GI___strlen_sse2
0.35% tc_pref_scale_o libc-2.28.so [.] __mbrtowc
0.34% tc_pref_scale_o libc-2.28.so [.] __fcntl64_nocancel_adjusted
Any ideas?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tc tp creation performance degratation since kernel 5.1
2019-06-13 8:11 ` Jiri Pirko
@ 2019-06-13 10:09 ` Vlad Buslov
2019-06-13 11:11 ` Jiri Pirko
0 siblings, 1 reply; 10+ messages in thread
From: Vlad Buslov @ 2019-06-13 10:09 UTC (permalink / raw
To: Jiri Pirko
Cc: netdev@vger.kernel.org, Vlad Buslov, pablo@netfilter.org,
xiyou.wangcong@gmail.com, jhs@mojatatu.com, mlxsw, Alex Kushnarov,
pabeni@redhat.com
On Thu 13 Jun 2019 at 11:11, Jiri Pirko <jiri@resnulli.us> wrote:
> I made a mistake during measurements, sorry about that.
>
> This is the correct script:
> -----------------------------------------------------------------------
> #!/bin/bash
>
> dev=testdummy
> ip link add name $dev type dummy
> ip link set dev $dev up
> tc qdisc add dev $dev ingress
>
> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
> pref_id=1
>
> while [ $pref_id -lt 20000 ]
> do
> echo "filter add dev $dev ingress proto ip pref $pref_id flower action drop" >> $tmp_file_name
> #echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
> ((pref_id++))
> done
>
> start=$(date +"%s")
> tc -b $tmp_file_name
> stop=$(date +"%s")
> echo "Insertion duration: $(($stop - $start)) sec"
> rm -f $tmp_file_name
>
> ip link del dev $dev
> -----------------------------------------------------------------------
>
> Note the commented out matchall. I don't see the regression with
> matchall. However, I see that with flower:
> kernel 5.1
> Insertion duration: 4 sec
> kernel 5.2
> Insertion duration: 163 sec
>
> I don't see any significant difference in perf:
> kernel 5.1
> 77.24% tc [kernel.vmlinux] [k] tcf_chain_tp_find
> 1.67% tc [kernel.vmlinux] [k] mutex_spin_on_owner
> 1.44% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
> 0.93% tc [kernel.vmlinux] [k] idr_get_free
> 0.79% tc_pref_scale_o [kernel.vmlinux] [k] do_syscall_64
> 0.69% tc [kernel.vmlinux] [k] finish_task_switch
> 0.53% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
> 0.49% tc [kernel.vmlinux] [k] __memset
> 0.36% tc_pref_scale_o libc-2.28.so [.] malloc
> 0.30% tc_pref_scale_o libc-2.28.so [.] _int_free
> 0.24% tc [kernel.vmlinux] [k] __memcpy
> 0.23% tc [cls_flower] [k] fl_change
> 0.23% tc [kernel.vmlinux] [k] __nla_validate_parse
> 0.22% tc [kernel.vmlinux] [k] __slab_alloc
>
>
> 75.57% tc [kernel.kallsyms] [k] tcf_chain_tp_find
> 2.70% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
> 1.13% tc_pref_scale_o [kernel.kallsyms] [k] do_syscall_64
> 0.87% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
> 0.86% ip [kernel.kallsyms] [k] finish_task_switch
> 0.67% tc [kernel.kallsyms] [k] memset
> 0.63% tc [kernel.kallsyms] [k] mutex_spin_on_owner
> 0.52% tc_pref_scale_o libc-2.28.so [.] malloc
> 0.48% tc [kernel.kallsyms] [k] idr_get_free
> 0.46% tc [kernel.kallsyms] [k] fl_change
> 0.42% tc_pref_scale_o libc-2.28.so [.] _int_free
> 0.35% tc_pref_scale_o libc-2.28.so [.] __GI___strlen_sse2
> 0.35% tc_pref_scale_o libc-2.28.so [.] __mbrtowc
> 0.34% tc_pref_scale_o libc-2.28.so [.] __fcntl64_nocancel_adjusted
>
> Any ideas?
Thanks for providing reproduction script!
I've investigate the problem and found the root cause. First of all I
noticed that CPU utilization during problematic tc run is quite low
(<10%), so I decided to investigate why tc sleeps so much. I've used bcc
and obtained following off-CPU trace (uninteresting traces are omitted
for brevity):
~$ sudo /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tc`
Tracing off-CPU time (us) of PID 2069 by kernel stack... Hit Ctrl-C to end.
...
finish_task_switch
__sched_text_start
schedule
schedule_timeout
wait_for_completion
__wait_rcu_gp
synchronize_rcu
fl_change
tc_new_tfilter
rtnetlink_rcv_msg
netlink_rcv_skb
netlink_unicast
netlink_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
do_syscall_64
entry_SYSCALL_64_after_hwframe
- tc (2069)
142284953
As you can see 142 seconds are spent sleeping in synchronize_rcu(). The
code is in fl_create_new_mask() function:
err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
&newmask->ht_node, mask_ht_params);
if (err)
goto errout_destroy;
/* Wait until any potential concurrent users of mask are finished */
synchronize_rcu();
The justification for this is described in comment in
fl_check_assign_mask() (user of fl_create_new_mask()):
/* Insert mask as temporary node to prevent concurrent creation of mask
* with same key. Any concurrent lookups with same key will return
* -EAGAIN because mask's refcnt is zero. It is safe to insert
* stack-allocated 'mask' to masks hash table because we call
* synchronize_rcu() before returning from this function (either in case
* of error or after replacing it with heap-allocated mask in
* fl_create_new_mask()).
*/
fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
&mask->ht_node,
mask_ht_params);
The offending commit is part of my series that implements unlocked
flower: 195c234d15c9 ("net: sched: flower: handle concurrent mask
insertion")
The justification presented in it is no longer relevant since Ivan
Vecera changed mask to be dynamically allocated in commit 2cddd2014782
("net/sched: cls_flower: allocate mask dynamically in fl_change()").
With this we can just change fl_change() to deallocate temporary mask
with rcu grace period and remove offending syncrhonize_rcu() call.
Any other suggestions?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tc tp creation performance degratation since kernel 5.1
2019-06-13 10:09 ` Vlad Buslov
@ 2019-06-13 11:11 ` Jiri Pirko
2019-06-13 11:26 ` Vlad Buslov
0 siblings, 1 reply; 10+ messages in thread
From: Jiri Pirko @ 2019-06-13 11:11 UTC (permalink / raw
To: Vlad Buslov
Cc: netdev@vger.kernel.org, pablo@netfilter.org,
xiyou.wangcong@gmail.com, jhs@mojatatu.com, mlxsw, Alex Kushnarov,
pabeni@redhat.com
Thu, Jun 13, 2019 at 12:09:32PM CEST, vladbu@mellanox.com wrote:
>On Thu 13 Jun 2019 at 11:11, Jiri Pirko <jiri@resnulli.us> wrote:
>> I made a mistake during measurements, sorry about that.
>>
>> This is the correct script:
>> -----------------------------------------------------------------------
>> #!/bin/bash
>>
>> dev=testdummy
>> ip link add name $dev type dummy
>> ip link set dev $dev up
>> tc qdisc add dev $dev ingress
>>
>> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
>> pref_id=1
>>
>> while [ $pref_id -lt 20000 ]
>> do
>> echo "filter add dev $dev ingress proto ip pref $pref_id flower action drop" >> $tmp_file_name
>> #echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
>> ((pref_id++))
>> done
>>
>> start=$(date +"%s")
>> tc -b $tmp_file_name
>> stop=$(date +"%s")
>> echo "Insertion duration: $(($stop - $start)) sec"
>> rm -f $tmp_file_name
>>
>> ip link del dev $dev
>> -----------------------------------------------------------------------
>>
>> Note the commented out matchall. I don't see the regression with
>> matchall. However, I see that with flower:
>> kernel 5.1
>> Insertion duration: 4 sec
>> kernel 5.2
>> Insertion duration: 163 sec
>>
>> I don't see any significant difference in perf:
>> kernel 5.1
>> 77.24% tc [kernel.vmlinux] [k] tcf_chain_tp_find
>> 1.67% tc [kernel.vmlinux] [k] mutex_spin_on_owner
>> 1.44% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
>> 0.93% tc [kernel.vmlinux] [k] idr_get_free
>> 0.79% tc_pref_scale_o [kernel.vmlinux] [k] do_syscall_64
>> 0.69% tc [kernel.vmlinux] [k] finish_task_switch
>> 0.53% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>> 0.49% tc [kernel.vmlinux] [k] __memset
>> 0.36% tc_pref_scale_o libc-2.28.so [.] malloc
>> 0.30% tc_pref_scale_o libc-2.28.so [.] _int_free
>> 0.24% tc [kernel.vmlinux] [k] __memcpy
>> 0.23% tc [cls_flower] [k] fl_change
>> 0.23% tc [kernel.vmlinux] [k] __nla_validate_parse
>> 0.22% tc [kernel.vmlinux] [k] __slab_alloc
>>
>>
>> 75.57% tc [kernel.kallsyms] [k] tcf_chain_tp_find
>> 2.70% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
>> 1.13% tc_pref_scale_o [kernel.kallsyms] [k] do_syscall_64
>> 0.87% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>> 0.86% ip [kernel.kallsyms] [k] finish_task_switch
>> 0.67% tc [kernel.kallsyms] [k] memset
>> 0.63% tc [kernel.kallsyms] [k] mutex_spin_on_owner
>> 0.52% tc_pref_scale_o libc-2.28.so [.] malloc
>> 0.48% tc [kernel.kallsyms] [k] idr_get_free
>> 0.46% tc [kernel.kallsyms] [k] fl_change
>> 0.42% tc_pref_scale_o libc-2.28.so [.] _int_free
>> 0.35% tc_pref_scale_o libc-2.28.so [.] __GI___strlen_sse2
>> 0.35% tc_pref_scale_o libc-2.28.so [.] __mbrtowc
>> 0.34% tc_pref_scale_o libc-2.28.so [.] __fcntl64_nocancel_adjusted
>>
>> Any ideas?
>
>Thanks for providing reproduction script!
>
>I've investigate the problem and found the root cause. First of all I
>noticed that CPU utilization during problematic tc run is quite low
>(<10%), so I decided to investigate why tc sleeps so much. I've used bcc
>and obtained following off-CPU trace (uninteresting traces are omitted
>for brevity):
>
>~$ sudo /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tc`
>Tracing off-CPU time (us) of PID 2069 by kernel stack... Hit Ctrl-C to end.
>...
> finish_task_switch
> __sched_text_start
> schedule
> schedule_timeout
> wait_for_completion
> __wait_rcu_gp
> synchronize_rcu
> fl_change
> tc_new_tfilter
> rtnetlink_rcv_msg
> netlink_rcv_skb
> netlink_unicast
> netlink_sendmsg
> sock_sendmsg
> ___sys_sendmsg
> __sys_sendmsg
> do_syscall_64
> entry_SYSCALL_64_after_hwframe
> - tc (2069)
> 142284953
>
>As you can see 142 seconds are spent sleeping in synchronize_rcu(). The
>code is in fl_create_new_mask() function:
>
> err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
> &newmask->ht_node, mask_ht_params);
> if (err)
> goto errout_destroy;
>
> /* Wait until any potential concurrent users of mask are finished */
> synchronize_rcu();
>
>The justification for this is described in comment in
>fl_check_assign_mask() (user of fl_create_new_mask()):
>
> /* Insert mask as temporary node to prevent concurrent creation of mask
> * with same key. Any concurrent lookups with same key will return
> * -EAGAIN because mask's refcnt is zero. It is safe to insert
> * stack-allocated 'mask' to masks hash table because we call
> * synchronize_rcu() before returning from this function (either in case
> * of error or after replacing it with heap-allocated mask in
> * fl_create_new_mask()).
> */
> fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
> &mask->ht_node,
> mask_ht_params);
>
>The offending commit is part of my series that implements unlocked
>flower: 195c234d15c9 ("net: sched: flower: handle concurrent mask
>insertion")
>
>The justification presented in it is no longer relevant since Ivan
>Vecera changed mask to be dynamically allocated in commit 2cddd2014782
>("net/sched: cls_flower: allocate mask dynamically in fl_change()").
>With this we can just change fl_change() to deallocate temporary mask
>with rcu grace period and remove offending syncrhonize_rcu() call.
>
>Any other suggestions?
So basically you just change synchronize_rcu() to kfree_rcu(mask),
correct?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tc tp creation performance degratation since kernel 5.1
2019-06-13 11:11 ` Jiri Pirko
@ 2019-06-13 11:26 ` Vlad Buslov
2019-06-13 14:18 ` Jiri Pirko
0 siblings, 1 reply; 10+ messages in thread
From: Vlad Buslov @ 2019-06-13 11:26 UTC (permalink / raw
To: Jiri Pirko
Cc: Vlad Buslov, netdev@vger.kernel.org, pablo@netfilter.org,
xiyou.wangcong@gmail.com, jhs@mojatatu.com, mlxsw, Alex Kushnarov,
pabeni@redhat.com
On Thu 13 Jun 2019 at 14:11, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Jun 13, 2019 at 12:09:32PM CEST, vladbu@mellanox.com wrote:
>>On Thu 13 Jun 2019 at 11:11, Jiri Pirko <jiri@resnulli.us> wrote:
>>> I made a mistake during measurements, sorry about that.
>>>
>>> This is the correct script:
>>> -----------------------------------------------------------------------
>>> #!/bin/bash
>>>
>>> dev=testdummy
>>> ip link add name $dev type dummy
>>> ip link set dev $dev up
>>> tc qdisc add dev $dev ingress
>>>
>>> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
>>> pref_id=1
>>>
>>> while [ $pref_id -lt 20000 ]
>>> do
>>> echo "filter add dev $dev ingress proto ip pref $pref_id flower action drop" >> $tmp_file_name
>>> #echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
>>> ((pref_id++))
>>> done
>>>
>>> start=$(date +"%s")
>>> tc -b $tmp_file_name
>>> stop=$(date +"%s")
>>> echo "Insertion duration: $(($stop - $start)) sec"
>>> rm -f $tmp_file_name
>>>
>>> ip link del dev $dev
>>> -----------------------------------------------------------------------
>>>
>>> Note the commented out matchall. I don't see the regression with
>>> matchall. However, I see that with flower:
>>> kernel 5.1
>>> Insertion duration: 4 sec
>>> kernel 5.2
>>> Insertion duration: 163 sec
>>>
>>> I don't see any significant difference in perf:
>>> kernel 5.1
>>> 77.24% tc [kernel.vmlinux] [k] tcf_chain_tp_find
>>> 1.67% tc [kernel.vmlinux] [k] mutex_spin_on_owner
>>> 1.44% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
>>> 0.93% tc [kernel.vmlinux] [k] idr_get_free
>>> 0.79% tc_pref_scale_o [kernel.vmlinux] [k] do_syscall_64
>>> 0.69% tc [kernel.vmlinux] [k] finish_task_switch
>>> 0.53% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>>> 0.49% tc [kernel.vmlinux] [k] __memset
>>> 0.36% tc_pref_scale_o libc-2.28.so [.] malloc
>>> 0.30% tc_pref_scale_o libc-2.28.so [.] _int_free
>>> 0.24% tc [kernel.vmlinux] [k] __memcpy
>>> 0.23% tc [cls_flower] [k] fl_change
>>> 0.23% tc [kernel.vmlinux] [k] __nla_validate_parse
>>> 0.22% tc [kernel.vmlinux] [k] __slab_alloc
>>>
>>>
>>> 75.57% tc [kernel.kallsyms] [k] tcf_chain_tp_find
>>> 2.70% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
>>> 1.13% tc_pref_scale_o [kernel.kallsyms] [k] do_syscall_64
>>> 0.87% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>>> 0.86% ip [kernel.kallsyms] [k] finish_task_switch
>>> 0.67% tc [kernel.kallsyms] [k] memset
>>> 0.63% tc [kernel.kallsyms] [k] mutex_spin_on_owner
>>> 0.52% tc_pref_scale_o libc-2.28.so [.] malloc
>>> 0.48% tc [kernel.kallsyms] [k] idr_get_free
>>> 0.46% tc [kernel.kallsyms] [k] fl_change
>>> 0.42% tc_pref_scale_o libc-2.28.so [.] _int_free
>>> 0.35% tc_pref_scale_o libc-2.28.so [.] __GI___strlen_sse2
>>> 0.35% tc_pref_scale_o libc-2.28.so [.] __mbrtowc
>>> 0.34% tc_pref_scale_o libc-2.28.so [.] __fcntl64_nocancel_adjusted
>>>
>>> Any ideas?
>>
>>Thanks for providing reproduction script!
>>
>>I've investigate the problem and found the root cause. First of all I
>>noticed that CPU utilization during problematic tc run is quite low
>>(<10%), so I decided to investigate why tc sleeps so much. I've used bcc
>>and obtained following off-CPU trace (uninteresting traces are omitted
>>for brevity):
>>
>>~$ sudo /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tc`
>>Tracing off-CPU time (us) of PID 2069 by kernel stack... Hit Ctrl-C to end.
>>...
>> finish_task_switch
>> __sched_text_start
>> schedule
>> schedule_timeout
>> wait_for_completion
>> __wait_rcu_gp
>> synchronize_rcu
>> fl_change
>> tc_new_tfilter
>> rtnetlink_rcv_msg
>> netlink_rcv_skb
>> netlink_unicast
>> netlink_sendmsg
>> sock_sendmsg
>> ___sys_sendmsg
>> __sys_sendmsg
>> do_syscall_64
>> entry_SYSCALL_64_after_hwframe
>> - tc (2069)
>> 142284953
>>
>>As you can see 142 seconds are spent sleeping in synchronize_rcu(). The
>>code is in fl_create_new_mask() function:
>>
>> err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
>> &newmask->ht_node, mask_ht_params);
>> if (err)
>> goto errout_destroy;
>>
>> /* Wait until any potential concurrent users of mask are finished */
>> synchronize_rcu();
>>
>>The justification for this is described in comment in
>>fl_check_assign_mask() (user of fl_create_new_mask()):
>>
>> /* Insert mask as temporary node to prevent concurrent creation of mask
>> * with same key. Any concurrent lookups with same key will return
>> * -EAGAIN because mask's refcnt is zero. It is safe to insert
>> * stack-allocated 'mask' to masks hash table because we call
>> * synchronize_rcu() before returning from this function (either in case
>> * of error or after replacing it with heap-allocated mask in
>> * fl_create_new_mask()).
>> */
>> fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
>> &mask->ht_node,
>> mask_ht_params);
>>
>>The offending commit is part of my series that implements unlocked
>>flower: 195c234d15c9 ("net: sched: flower: handle concurrent mask
>>insertion")
>>
>>The justification presented in it is no longer relevant since Ivan
>>Vecera changed mask to be dynamically allocated in commit 2cddd2014782
>>("net/sched: cls_flower: allocate mask dynamically in fl_change()").
>>With this we can just change fl_change() to deallocate temporary mask
>>with rcu grace period and remove offending syncrhonize_rcu() call.
>>
>>Any other suggestions?
>
> So basically you just change synchronize_rcu() to kfree_rcu(mask),
> correct?
Not really. I remove synchronize_rcu() and change all kfree(mask) in
fl_change() to tcf_queue_work(&mask->rwork, fl_mask_free_work) which
uses queue_rcu_work() internally. This would allow us to deallocate
fl_flow_mask in same manner on all code paths and doesn't require any
extensions in fl_flow_mask struct (kfree_rcu would require extending it
with rcu_head).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tc tp creation performance degratation since kernel 5.1
2019-06-13 11:26 ` Vlad Buslov
@ 2019-06-13 14:18 ` Jiri Pirko
0 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2019-06-13 14:18 UTC (permalink / raw
To: Vlad Buslov
Cc: netdev@vger.kernel.org, pablo@netfilter.org,
xiyou.wangcong@gmail.com, jhs@mojatatu.com, mlxsw, Alex Kushnarov,
pabeni@redhat.com
Thu, Jun 13, 2019 at 01:26:17PM CEST, vladbu@mellanox.com wrote:
>
>On Thu 13 Jun 2019 at 14:11, Jiri Pirko <jiri@resnulli.us> wrote:
>> Thu, Jun 13, 2019 at 12:09:32PM CEST, vladbu@mellanox.com wrote:
>>>On Thu 13 Jun 2019 at 11:11, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> I made a mistake during measurements, sorry about that.
>>>>
>>>> This is the correct script:
>>>> -----------------------------------------------------------------------
>>>> #!/bin/bash
>>>>
>>>> dev=testdummy
>>>> ip link add name $dev type dummy
>>>> ip link set dev $dev up
>>>> tc qdisc add dev $dev ingress
>>>>
>>>> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
>>>> pref_id=1
>>>>
>>>> while [ $pref_id -lt 20000 ]
>>>> do
>>>> echo "filter add dev $dev ingress proto ip pref $pref_id flower action drop" >> $tmp_file_name
>>>> #echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
>>>> ((pref_id++))
>>>> done
>>>>
>>>> start=$(date +"%s")
>>>> tc -b $tmp_file_name
>>>> stop=$(date +"%s")
>>>> echo "Insertion duration: $(($stop - $start)) sec"
>>>> rm -f $tmp_file_name
>>>>
>>>> ip link del dev $dev
>>>> -----------------------------------------------------------------------
>>>>
>>>> Note the commented out matchall. I don't see the regression with
>>>> matchall. However, I see that with flower:
>>>> kernel 5.1
>>>> Insertion duration: 4 sec
>>>> kernel 5.2
>>>> Insertion duration: 163 sec
>>>>
>>>> I don't see any significant difference in perf:
>>>> kernel 5.1
>>>> 77.24% tc [kernel.vmlinux] [k] tcf_chain_tp_find
>>>> 1.67% tc [kernel.vmlinux] [k] mutex_spin_on_owner
>>>> 1.44% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
>>>> 0.93% tc [kernel.vmlinux] [k] idr_get_free
>>>> 0.79% tc_pref_scale_o [kernel.vmlinux] [k] do_syscall_64
>>>> 0.69% tc [kernel.vmlinux] [k] finish_task_switch
>>>> 0.53% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>>>> 0.49% tc [kernel.vmlinux] [k] __memset
>>>> 0.36% tc_pref_scale_o libc-2.28.so [.] malloc
>>>> 0.30% tc_pref_scale_o libc-2.28.so [.] _int_free
>>>> 0.24% tc [kernel.vmlinux] [k] __memcpy
>>>> 0.23% tc [cls_flower] [k] fl_change
>>>> 0.23% tc [kernel.vmlinux] [k] __nla_validate_parse
>>>> 0.22% tc [kernel.vmlinux] [k] __slab_alloc
>>>>
>>>>
>>>> 75.57% tc [kernel.kallsyms] [k] tcf_chain_tp_find
>>>> 2.70% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
>>>> 1.13% tc_pref_scale_o [kernel.kallsyms] [k] do_syscall_64
>>>> 0.87% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>>>> 0.86% ip [kernel.kallsyms] [k] finish_task_switch
>>>> 0.67% tc [kernel.kallsyms] [k] memset
>>>> 0.63% tc [kernel.kallsyms] [k] mutex_spin_on_owner
>>>> 0.52% tc_pref_scale_o libc-2.28.so [.] malloc
>>>> 0.48% tc [kernel.kallsyms] [k] idr_get_free
>>>> 0.46% tc [kernel.kallsyms] [k] fl_change
>>>> 0.42% tc_pref_scale_o libc-2.28.so [.] _int_free
>>>> 0.35% tc_pref_scale_o libc-2.28.so [.] __GI___strlen_sse2
>>>> 0.35% tc_pref_scale_o libc-2.28.so [.] __mbrtowc
>>>> 0.34% tc_pref_scale_o libc-2.28.so [.] __fcntl64_nocancel_adjusted
>>>>
>>>> Any ideas?
>>>
>>>Thanks for providing reproduction script!
>>>
>>>I've investigate the problem and found the root cause. First of all I
>>>noticed that CPU utilization during problematic tc run is quite low
>>>(<10%), so I decided to investigate why tc sleeps so much. I've used bcc
>>>and obtained following off-CPU trace (uninteresting traces are omitted
>>>for brevity):
>>>
>>>~$ sudo /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tc`
>>>Tracing off-CPU time (us) of PID 2069 by kernel stack... Hit Ctrl-C to end.
>>>...
>>> finish_task_switch
>>> __sched_text_start
>>> schedule
>>> schedule_timeout
>>> wait_for_completion
>>> __wait_rcu_gp
>>> synchronize_rcu
>>> fl_change
>>> tc_new_tfilter
>>> rtnetlink_rcv_msg
>>> netlink_rcv_skb
>>> netlink_unicast
>>> netlink_sendmsg
>>> sock_sendmsg
>>> ___sys_sendmsg
>>> __sys_sendmsg
>>> do_syscall_64
>>> entry_SYSCALL_64_after_hwframe
>>> - tc (2069)
>>> 142284953
>>>
>>>As you can see 142 seconds are spent sleeping in synchronize_rcu(). The
>>>code is in fl_create_new_mask() function:
>>>
>>> err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
>>> &newmask->ht_node, mask_ht_params);
>>> if (err)
>>> goto errout_destroy;
>>>
>>> /* Wait until any potential concurrent users of mask are finished */
>>> synchronize_rcu();
>>>
>>>The justification for this is described in comment in
>>>fl_check_assign_mask() (user of fl_create_new_mask()):
>>>
>>> /* Insert mask as temporary node to prevent concurrent creation of mask
>>> * with same key. Any concurrent lookups with same key will return
>>> * -EAGAIN because mask's refcnt is zero. It is safe to insert
>>> * stack-allocated 'mask' to masks hash table because we call
>>> * synchronize_rcu() before returning from this function (either in case
>>> * of error or after replacing it with heap-allocated mask in
>>> * fl_create_new_mask()).
>>> */
>>> fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
>>> &mask->ht_node,
>>> mask_ht_params);
>>>
>>>The offending commit is part of my series that implements unlocked
>>>flower: 195c234d15c9 ("net: sched: flower: handle concurrent mask
>>>insertion")
>>>
>>>The justification presented in it is no longer relevant since Ivan
>>>Vecera changed mask to be dynamically allocated in commit 2cddd2014782
>>>("net/sched: cls_flower: allocate mask dynamically in fl_change()").
>>>With this we can just change fl_change() to deallocate temporary mask
>>>with rcu grace period and remove offending syncrhonize_rcu() call.
>>>
>>>Any other suggestions?
>>
>> So basically you just change synchronize_rcu() to kfree_rcu(mask),
>> correct?
>
>Not really. I remove synchronize_rcu() and change all kfree(mask) in
>fl_change() to tcf_queue_work(&mask->rwork, fl_mask_free_work) which
>uses queue_rcu_work() internally. This would allow us to deallocate
>fl_flow_mask in same manner on all code paths and doesn't require any
>extensions in fl_flow_mask struct (kfree_rcu would require extending it
>with rcu_head).
Got it. Makes sense to me. Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-06-13 16:48 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-12 12:03 tc tp creation performance degratation since kernel 5.1 Jiri Pirko
2019-06-12 12:30 ` Paolo Abeni
2019-06-13 4:50 ` Jiri Pirko
2019-06-12 12:34 ` Vlad Buslov
2019-06-13 5:49 ` Jiri Pirko
2019-06-13 8:11 ` Jiri Pirko
2019-06-13 10:09 ` Vlad Buslov
2019-06-13 11:11 ` Jiri Pirko
2019-06-13 11:26 ` Vlad Buslov
2019-06-13 14:18 ` Jiri Pirko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.