Netfilter-Devel Archive mirror
 help / color / mirror / Atom feed
From: Neels Hofmeyr <nhofmeyr@sysmocom.de>
To: netfilter-devel@vger.kernel.org
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Subject: nftables with thousands of chains is unreasonably slow
Date: Mon, 6 May 2024 13:59:55 +0200	[thread overview]
Message-ID: <ZjjGOyXkmeudzzc5@my.box> (raw)

Hi!

I am trying to figure out why adding/deleting chains and retrieving counters
via nftables takes long for high nr of entries. I'd like to share some
findings, to ask whether this is known, whether a solution exists, etc...

I'm writing two programs that interact with nftables via
nft_run_cmd_from_buffer(). (osmo-upf and osmo-hnbgw)

One use case is setting up / tearing down thousands of GTP tunnels via nftables
rule sets one by one, for reference:
https://gitea.osmocom.org/cellular-infrastructure/osmo-upf/src/commit/a21bcec358a5147deb15d156700279f52386a7d7/tests/nft-rule.vty

The other use case is retrieving all counters that are currently present in the
inet table that my client program owns.

Adding the first few hundred chains / reading a few hundred counters finishes
in an ok amount of time. But the more chains are already submitted in the
table, the longer it takes to add another one, etc.

By the time that there are ~4000 chains, adding another chain becomes
prohibitively slow. Some numbers: in total, when setting up 1000 GTP tunnels in
osmo-upf, which means adding 4000 chains into a table owned by the program
(osmo-upf), in total takes 49 seconds. This includes the overhead for talking
PFCP and logging, etc., but by far the most time is taken waiting for
nft_run_cmd_from_buffer().

We'd like to scale up to like 100 times the above -- completely beyond all
reason currently, since the wait time seems to increase exponentially.

I have two angles:

1) workaround: structure the chains and rules differently?
2) analysis: bpftracing tells me that most time is spent in chain_free().

(1) Currently I have one flat table with all the chains in that "top level".
Would it make sense to try breaking that long list up into smaller groups,
maybe batches of 100? As in, pseudocode:

  table osmo-upf {
      group-0 {
         chain-0 {}
	 chain-1 {}
	 ...
      }
      group-1 {
         chain-100 {}
         chain-101 {}
	 ...
      }
      ...
      group-1000 {
         chain-100000 {}
         chain-100001 {}
	 ...
      }
  }

Then I can also retrieve the counters in batches of 100, which might be more
efficient and better to coordinate with concurrent tasks.

(2) Using bpftrace, I drilled a bit into where the time is spent. Two
interesting findings for me so far:

It seems most time is spent in

  nft_run_cmd_from_buffer
    nft_evaluate
      nft_cache_update
        nft_cache_release
          nft_cache_flush
            table_free
              chain_free

The bpftrace shows:

nft_cache_update() was called 6000 times.
(I have 1000 GTP tunnels and expect four chains per GTP tunnel, which would be
4000, not sure why I see 6k, but that's not really that significant.)

chain_free() was called more than 5 million times, that's 1000 times as often
as I would naively expect to see.

Most calls are fast (<16us), but there are some break-outs of up to 8ms, and in
total the 5 million calls amount to 80 seconds. (One bpftrace dump below, FYI)

I'm continuing to think:

- Is this stuff known already, or does it make sense to continue on this path,
  share a reproduction recipe here, and so on?

- can we fix that? Is there some memory leak / unnecessary blow up happening
  that causes this apparent factor 1000 in effort?

- can we defer the free, so my client program gets the nftables results without
  having to wait for the freeing? Can the client program decide when to do the
  nftables freeing, i.e. not within nft_run_cmd_from_buffer()?

Thanks!

~N

foo_bar__count: function foo_bar() was called this many times
...__ms: milliseconds
...__us: microseconds

==========================================================
@handle_free__count: 5190516
@handle_free__total_ms: 16919          <--- in total 17 seconds spent in handle_free()
@handle_free__us: 
[2, 4)           4884915 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8)            280031 |@@                                                  |
[8, 16)            15323 |                                                    |
[16, 32)            9910 |                                                    |
[32, 64)             267 |                                                    |
[64, 128)             24 |                                                    |
[128, 256)             3 |                                                    |
[256, 512)             3 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               3 |                                                    |
[2K, 4K)              48 |                                                    |

@cache_free__count: 11100
@cache_free__total_ms: 230
@cache_free__us: 
[2, 4)              7327 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8)              3461 |@@@@@@@@@@@@@@@@@@@@@@@@                            |
[8, 16)              177 |@                                                   |
[16, 32)              73 |                                                    |
[32, 64)               5 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             0 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               9 |                                                    |
[2K, 4K)              39 |                                                    |
[4K, 8K)               9 |                                                    |

@scope_release__count: 5140608
@scope_release__total_ms: 12679          <--- in total 13 seconds spent in scope_release()
@scope_release__us: 
[1]              1580615 |@@@@@@@@@@@@@@@@@@@@@@@                             |
[2, 4)           3527447 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8)             21717 |                                                    |
[8, 16)             4521 |                                                    |
[16, 32)            6165 |                                                    |
[32, 64)             140 |                                                    |
[64, 128)             10 |                                                    |
[128, 256)             0 |                                                    |
[256, 512)             5 |                                                    |




@set_free__count: 16650
@set_free__total_ms: 118
@set_free__us: 
[1]                 2527 |@@@@@@@@@@@@@@@@@@@                                 |
[2, 4)              6643 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8)              1843 |@@@@@@@@@@@@@@                                      |
[8, 16)             3475 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
[16, 32)            2119 |@@@@@@@@@@@@@@@@                                    |
[32, 64)              37 |                                                    |
[64, 128)              3 |                                                    |
[128, 256)             0 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               3 |                                                    |

@chain_free__count: 5137887          <--- called 5G times
@chain_free__total_ms: 80114         <--- in total 80! seconds spent in scope_release()
@chain_free__us:
[8, 16)          4556941 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32)          550786 |@@@@@@                                              |
[32, 64)           29604 |                                                    |
[64, 128)            397 |                                                    |
[128, 256)            12 |                                                    |
[256, 512)            21 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               3 |                                                    |
[2K, 4K)             117 |                                                    |
[4K, 8K)              15 |                                                    |

==========================================================


-- 
- Neels Hofmeyr <nhofmeyr@sysmocom.de>         https://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Siemensstr. 26a
* 10551 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Director: Harald Welte

             reply	other threads:[~2024-05-06 12:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06 11:59 Neels Hofmeyr [this message]
2024-05-07 22:32 ` nftables with thousands of chains is unreasonably slow Pablo Neira Ayuso
2024-05-08 13:58   ` Neels Hofmeyr
2024-05-08 16:48     ` Pablo Neira Ayuso
2024-05-10  5:46   ` Neels Hofmeyr
2024-05-27 22:31     ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjjGOyXkmeudzzc5@my.box \
    --to=nhofmeyr@sysmocom.de \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).