XDP-Newbies Archive mirror
 help / color / mirror / Atom feed
From: Magnus Karlsson <magnus.karlsson@gmail.com>
To: "Gaul, Maximilian" <maximilian.gaul@hm.edu>
Cc: Xdp <xdp-newbies@vger.kernel.org>
Subject: Re: How does the Kernel decide which Umem frame to choose for the next packet?
Date: Mon, 18 May 2020 15:14:18 +0200	[thread overview]
Message-ID: <CAJ8uoz33iGMze_Au6RQDqzsM8Po_E20ZxSxT21TFCwJwkKdW1g@mail.gmail.com> (raw)
In-Reply-To: <0f2212ea98c74001b5c0282bfb6718d7@hm.edu>

On Mon, May 18, 2020 at 11:17 AM Gaul, Maximilian
<maximilian.gaul@hm.edu> wrote:
>
> > User-space decides this by what frames it enters into the fill ring.
> > Kernel-space uses the frames in order from that ring.
> >
> > /Magnus
>
> Thank you for your reply Magnus,
>
> I am sorry to ask again but I am not so sure when this happens.
> So I first check my socket RX-ring for new packets:
>
>                 xsk_ring_cons__peek(&xsk_socket->rx, 1024, &idx_rx)
>
> which looks like this:
>
>                 static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
>                                                          size_t nb, __u32 *idx)
>                 {
>                         size_t entries = xsk_cons_nb_avail(cons, nb);
>
>                         if (entries > 0) {
>                                 /* Make sure we do not speculatively read the data before
>                                  * we have received the packet buffers from the ring.
>                                  */
>                                 libbpf_smp_rmb();
>
>                                 *idx = cons->cached_cons;
>                                 cons->cached_cons += entries;
>                         }
>
>                         return entries;
>                 }
>
> where `idx_rx` is the starting position of descriptors for the new packets in the RX-ring.
>
> My first question here is: How can there already be descriptors of packets in my RX-ring if I didn't enter any frames into the fill ring of the umem yet?
> So I assume libbpf did this for me already?

Yes, that is correct.

> After this call I know how many packets are waiting. So I reserve exactly as many Umem frames:
>
>                 xsk_ring_prod__reserve(&umem_info->fq, rx_rcvd_amnt, &idx_fq);
>
> which looks like this:
>
>                 static inline size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod,
>                                                                 size_t nb, __u32 *idx)
>                 {
>                         if (xsk_prod_nb_free(prod, nb) < nb)
>                                 return 0;
>
>                         *idx = prod->cached_prod;
>                         prod->cached_prod += nb;
>
>                         return nb;
>                 }
>
> But what am I exactly reserving here? How can I reserve anything from the Umem without telling it the RX-ring of my socket?

You are reserving descriptor slots in a producer ring.

> After  this, I extract the RX-ring packet descriptors, starting at `idx_rx`:
>
>                 const struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk_socket->rx, idx_rx + i);
>
> I am also not entirely certain with the zero-copy aspect of AF-XDP. As far as I know the NIC writes incoming packets via DMA directly into system memory. But this time system memory means the Umem area - right? Where with non-zero-copy this would be any position in memory and the Kernel first has to copy the packets into the Umem area?

In zero-copy mode, the NIC DMA:s the packet straight into the umem, so
they are immediately seen by the user space process.

> I am also a bit confused what the size of a RX-queue means in this context. Assuming the output of ethtool:
>
>                 $ ethtool -g eth20
>                 Ring parameters for eth20:
>                 Pre-set maximums:
>                 RX:             8192
>                 RX Mini:        0
>                 RX Jumbo:       0
>                 TX:             8192
>                 Current hardware settings:
>                 RX:             1024
>                 RX Mini:        0
>                 RX Jumbo:       0
>                 TX:             1024
>
> Does this mean that at the moment my NIC can store 1024 incoming packets inside its own memory?

The NIC does not have its own memory. This just means that their can
be 1024 packets that will be processed by the NIC or have been
processed by the NIC but not handled by the driver. Nothing you need
to care about unless you are performance optimizing, or writing a
driver of course :-).

> So there is no connection between the RX-queue size of the NIC and the Umem area?

Correct.

/Magnus

> Sorry for this wall of text. Maybe you can answer a few of my questions, I hope they are not too confusing.
>
> Thank you so much
>
> Max

      reply	other threads:[~2020-05-18 13:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-18  8:37 How does the Kernel decide which Umem frame to choose for the next packet? Gaul, Maximilian
2020-05-18  8:51 ` Magnus Karlsson
2020-05-18  9:17   ` AW: " Gaul, Maximilian
2020-05-18 13:14     ` Magnus Karlsson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ8uoz33iGMze_Au6RQDqzsM8Po_E20ZxSxT21TFCwJwkKdW1g@mail.gmail.com \
    --to=magnus.karlsson@gmail.com \
    --cc=maximilian.gaul@hm.edu \
    --cc=xdp-newbies@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).