From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Ethy H. Brito" <ethy.brito@inexo.com.br>
Cc: "xdp-newbies@vger.kernel.org" <xdp-newbies@vger.kernel.org>,
brouer@redhat.com,
Robert Chacon <robert.chacon@jackrabbitwireless.com>,
Yoel Caspersen <yoel@kviknet.dk>
Subject: Re: Newbie questions
Date: Tue, 22 Jun 2021 11:18:09 +0200 [thread overview]
Message-ID: <20210622111809.16a1431e@carbon> (raw)
In-Reply-To: <20210621222809.2d7633cc@babalu>
On Mon, 21 Jun 2021 22:28:09 -0300
"Ethy H. Brito" <ethy.brito@inexo.com.br> wrote:
> On Fri, 18 Jun 2021 17:37:17 -0300
> "Ethy H. Brito" <ethy.brito@inexo.com.br> wrote:
>
> > On Fri, 18 Jun 2021 19:40:17 +0200
> > Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> > > On Fri, 18 Jun 2021 13:31:06 -0300
> > > "Ethy H. Brito" <ethy.brito@inexo.com.br> wrote:
> > >
> > > > Hi All.
> > > >
> > > > I've been doing some home work reading the docs and some doubts have raised.
> > > > For reference, my environment is
> > > > Ubuntu 20.04
> > > > kernel 5.4.0-66
> > > > tc utility, iproute2-ss200127.
> > > >
> > > > 1) https://xdp-project.net/areas/cpumap.html#cpumap--Create-script-MQ-HTB-silo-setup says that:
> > > > "XPS (Transmit Packet Steering) will take precedence over any changes to
> > > > skb->queue_mapping. You need to disable *XDP* via mask=00 in files
> > > > /sys/class/net/DEV/queues/tx-*/xps_cpus"
> > > >
> > > > Shouldn't it say I need to disable *XPS* (not XDP) using mask=00??
> > >
> > > You are absolutely right it is a typo. Can I ask you to fix that and
> > > send a GitHub PR?
> > >
> > > The file you need to change is:
> > > https://github.com/xdp-project/xdp-project/blob/master/areas/cpumap.org
>
> File edited. PR sent.
Thanks merged it.
> > >
> > > > How to set that CPU-0 will deal with mq queue 7FFF:1, CPU-1 will deal
> > > > with 7FFF:2, and so on?
> > >
> > > That is the role of the XDP program that redirect into a cpumap, and
> > > the key in the cpumap is the CPU number.
>
> OK. I see that in source code.
Yes, see the explanation in the source code.
Also read: https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/howto_debug.org
The "tc_queue_mapping_kern.c" program[1] is the simplest solution,
which only does the skb->queue_mapping, and you have to configure Linux
to set the correct TC minor:major number on a per packet basis (e.g.
via iptables see comment in code).
The "tc_classify_kern.c" program[2] is more advanced and have
implemented a IP-lookup map that have this[3] config per entry:
struct ip_hash_info {
/* lookup key: __u32 IPv4-address */
__u32 cpu;
__u32 tc_handle; /* TC handle MAJOR:MINOR combined in __u32 */
};
[1] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/tc_queue_mapping_kern.c#L40-L76
[2] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/tc_classify_kern.c#L277
[3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/common_kern_user.h#L29-L33
> But I am still pretty in the dark here to start using XDP.
Okay, then let me explain some basic concepts for xdp-cpumap-tc.
1. XDP need to run on physical NIC with driver that supports (native) XDP.
2. XDP is a layer before network stack, before the "SKB" is created.
3. XDP redirect the raw frame to another CPU via XDP_REDIRECT'ing into a cpumap.
4. The cpumap (kthread) running on remote CPU will create the SKB and
call normal network stack on this CPU.
5. The TC-BPF program running on remote CPU update skb->queue_mapping
(and possibly skb->priority) to map packet into the TC-queue of
your choosing.
Notice for you scenario there are 4 BPF-progs running, two XDP and two
TC-BPF. See what is running via cmdline: "bpftool net"
# bpftool net
xdp:
eno49(4) driver id 22
eno50(5) driver id 26
tc:
eno49(4) clsact/egress tc_classify_kern.o:[tc_classify] id 42
eno50(5) clsact/egress tc_classify_kern.o:[tc_classify] id 43
All the BPF-programs share BPF-maps to have the same config.
Maps pinned:
# ls -1 /sys/fs/bpf/tc/globals/
map_ifindex_type
map_ip_hash
map_txq_config
> More newbie questions are necessary.
>
> My goal is simple: to control the bandwidth of a few (or a lot)
> thousands users using an of-the-shelf (almost) box. Two 10Gbps ether
> interface. One internal, one external.
I have access to a production system, that have 2x 25Gbit/s NIC (plus
VLANs for each apartment building), let me check how many customers
they have added. They are using[2] "tc_classify_kern.c" and their
IP-map contains 6086 entries (more than I expected actually).
> What come in thru eth0 goes out to eth0 or eth1 and what comes in
> thru eth1 comes out to eth0.
>
> Is there a road map about what to execute and in what order to
> achieve this task using xdp-cpumap-tc?
This is already available today, and running in production at an ISP.
Sorry for the lack of documentation on how to use it, but it is done.
> I have cloned xdp-cpumap-tc to try figuring it out reading the source code.
> But things did not get together.
>
> For instance, tc_classify_kern.c (as tc_queue_mapping_kern.c) "talks" about a "manuel" (sic)
> setup:
>
> tc qdisc add dev ixgbe2 clsact
> tc filter add dev ixgbe2 egress bpf da obj tc_classify_kern.o sec tc_classify
>
> At what point these commands are to be executed?
> They are not mentioned anywhere else. (tc_mq_htb_setup_example.sh forgot these perhaps?)
This is handled by: tc_classify_user
https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/tc_classify_user.c
The TC commands are called from C-code in this file:
https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/common_user.c
The roadmap is to convert this to use the new libbpf TC API instead, as
it is a mess to have a dependency on the right iproute2 version.
> Which one is be to loaded tc_classify_kern or tc_queue_mapping_kern?
> Or both? None? After and before what?
Actually due to limitation in iproute2 loader, you should load
XDP-programs first (as it will create maps with BTF info).
You cannot load tc_classify_kern and tc_queue_mapping_kern simultaneously.
>
> In the file tc_classify_kern.c, map_ifindex_type is defined
> differently from xdp_iphash_to_cpu_kern.c.
>
> ".size_value = sizeof(struct txq_config)" in the former
> and
> ".size_value = sizeof(__u32)" int the later.
>
> Is this a "Cut and paste" typo? Are they really meant to be two
> different maps?
Hmm... this looks like a copy-paste error. The tc_classify_kern.c
map_ifindex_type should have size_value = sizeof(__u32). It happens to
work because sizeof(struct txq_config) is also 4 bytes.
> Anyway, a step by step guide would be appreciated.
I'm hoping you will create/document that once you learn howto use these
programs ;-)
> Maybe it is time to start populating that BNG-router repo I was told about.
> How can I start helping with that? Worth doing it?
I think we need to convince other ISP's to join in...
... let me CC those guys again.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
prev parent reply other threads:[~2021-06-22 9:18 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-18 16:31 Newbie questions Ethy H. Brito
2021-06-18 17:40 ` Jesper Dangaard Brouer
2021-06-18 20:37 ` Ethy H. Brito
2021-06-22 1:28 ` Ethy H. Brito
2021-06-22 9:18 ` Jesper Dangaard Brouer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210622111809.16a1431e@carbon \
--to=brouer@redhat.com \
--cc=ethy.brito@inexo.com.br \
--cc=robert.chacon@jackrabbitwireless.com \
--cc=xdp-newbies@vger.kernel.org \
--cc=yoel@kviknet.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).