From: Julian Anastasov <ja@ssi.bg>
To: Simon Horman <horms@verge.net.au>
Cc: lvs-devel@vger.kernel.org, netfilter-devel@vger.kernel.org,
Dust Li <dust.li@linux.alibaba.com>,
Jiejian Wu <jiejian@linux.alibaba.com>,
rcu@vger.kernel.org
Subject: [PATCHv3 net-next 14/14] ipvs: add conn_lfactor and svc_lfactor sysctl vars
Date: Sun, 31 Mar 2024 17:04:01 +0300 [thread overview]
Message-ID: <20240331140401.77657-15-ja@ssi.bg> (raw)
In-Reply-To: <20240331140401.77657-1-ja@ssi.bg>
Allow the default load factor for the connection and service tables
to be configured.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
Documentation/networking/ipvs-sysctl.rst | 33 +++++++++++
net/netfilter/ipvs/ip_vs_ctl.c | 72 ++++++++++++++++++++++++
2 files changed, 105 insertions(+)
diff --git a/Documentation/networking/ipvs-sysctl.rst b/Documentation/networking/ipvs-sysctl.rst
index 3fb5fa142eef..ee9f70f446b4 100644
--- a/Documentation/networking/ipvs-sysctl.rst
+++ b/Documentation/networking/ipvs-sysctl.rst
@@ -29,6 +29,30 @@ backup_only - BOOLEAN
If set, disable the director function while the server is
in backup mode to avoid packet loops for DR/TUN methods.
+conn_lfactor - INTEGER
+ -4 - default
+ Valid range: -8 (larger table) .. 8 (smaller table)
+
+ Controls the sizing of the connection hash table based on the
+ load factor (number of connections per table buckets):
+ 2^conn_lfactor = nodes / buckets
+ As result, the table grows if load increases and shrinks when
+ load decreases in the range of 2^8 - 2^conn_tab_bits (module
+ parameter).
+ The value is a shift count where negative values select
+ buckets = (connection hash nodes << -value) while positive
+ values select buckets = (connection hash nodes >> value). The
+ negative values reduce the collisions and reduce the time for
+ lookups but increase the table size. Positive values will
+ tolerate load above 100% when using smaller table is
+ preferred with the cost of more collisions. If using NAT
+ connections consider decreasing the value with one because
+ they add two nodes in the hash table.
+
+ Example:
+ -4: grow if load goes above 6% (buckets = nodes * 16)
+ 2: grow if load goes above 400% (buckets = nodes / 4)
+
conn_reuse_mode - INTEGER
1 - default
@@ -219,6 +243,15 @@ secure_tcp - INTEGER
The value definition is the same as that of drop_entry and
drop_packet.
+svc_lfactor - INTEGER
+ -3 - default
+ Valid range: -8 (larger table) .. 8 (smaller table)
+
+ Controls the sizing of the service hash table based on the
+ load factor (number of services per table buckets). The table
+ will grow and shrink in the range of 2^4 - 2^20.
+ See conn_lfactor for explanation.
+
sync_threshold - vector of 2 INTEGERs: sync_threshold, sync_period
default 3 50
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index b1c638f83559..a0666dc998fb 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2431,6 +2431,60 @@ static int ipvs_proc_run_estimation(struct ctl_table *table, int write,
return ret;
}
+static int ipvs_proc_conn_lfactor(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct netns_ipvs *ipvs = table->extra2;
+ int *valp = table->data;
+ int val = *valp;
+ int ret;
+
+ struct ctl_table tmp_table = {
+ .data = &val,
+ .maxlen = sizeof(int),
+ };
+
+ ret = proc_dointvec(&tmp_table, write, buffer, lenp, ppos);
+ if (write && ret >= 0) {
+ if (val < -8 || val > 8) {
+ ret = -EINVAL;
+ } else {
+ *valp = val;
+ if (rcu_dereference_protected(ipvs->conn_tab, 1))
+ mod_delayed_work(system_unbound_wq,
+ &ipvs->conn_resize_work, 0);
+ }
+ }
+ return ret;
+}
+
+static int ipvs_proc_svc_lfactor(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct netns_ipvs *ipvs = table->extra2;
+ int *valp = table->data;
+ int val = *valp;
+ int ret;
+
+ struct ctl_table tmp_table = {
+ .data = &val,
+ .maxlen = sizeof(int),
+ };
+
+ ret = proc_dointvec(&tmp_table, write, buffer, lenp, ppos);
+ if (write && ret >= 0) {
+ if (val < -8 || val > 8) {
+ ret = -EINVAL;
+ } else {
+ *valp = val;
+ if (rcu_dereference_protected(ipvs->svc_table, 1))
+ mod_delayed_work(system_unbound_wq,
+ &ipvs->svc_resize_work, 0);
+ }
+ }
+ return ret;
+}
+
/*
* IPVS sysctl table (under the /proc/sys/net/ipv4/vs/)
* Do not change order or insert new entries without
@@ -2619,6 +2673,18 @@ static struct ctl_table vs_vars[] = {
.mode = 0644,
.proc_handler = ipvs_proc_est_nice,
},
+ {
+ .procname = "conn_lfactor",
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = ipvs_proc_conn_lfactor,
+ },
+ {
+ .procname = "svc_lfactor",
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = ipvs_proc_svc_lfactor,
+ },
#ifdef CONFIG_IP_VS_DEBUG
{
.procname = "debug_level",
@@ -4856,6 +4922,12 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
tbl[idx].extra2 = ipvs;
tbl[idx++].data = &ipvs->sysctl_est_nice;
+ tbl[idx].extra2 = ipvs;
+ tbl[idx++].data = &ipvs->sysctl_conn_lfactor;
+
+ tbl[idx].extra2 = ipvs;
+ tbl[idx++].data = &ipvs->sysctl_svc_lfactor;
+
#ifdef CONFIG_IP_VS_DEBUG
/* Global sysctls must be ro in non-init netns */
if (!net_eq(net, &init_net))
--
2.44.0
prev parent reply other threads:[~2024-03-31 14:07 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-31 14:03 [PATCHv3 net-next 00/14] ipvs: per-net tables and optimizations Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 01/14] rculist_bl: add hlist_bl_for_each_entry_continue_rcu Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 02/14] ipvs: make ip_vs_svc_table and ip_vs_svc_fwm_table per netns Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 03/14] ipvs: some service readers can use RCU Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 04/14] ipvs: use single svc table Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 05/14] ipvs: do not keep dest_dst after dest is removed Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 06/14] ipvs: use more counters to avoid service lookups Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 07/14] ipvs: add resizable hash tables Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 08/14] ipvs: use resizable hash table for services Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 09/14] ipvs: switch to per-net connection table Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 10/14] ipvs: show the current conn_tab size to users Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 11/14] ipvs: no_cport and dropentry counters can be per-net Julian Anastasov
2024-03-31 14:03 ` [PATCHv3 net-next 12/14] ipvs: use more keys for connection hashing Julian Anastasov
2024-03-31 14:04 ` [PATCHv3 net-next 13/14] ipvs: add ip_vs_status info Julian Anastasov
2024-03-31 14:04 ` Julian Anastasov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240331140401.77657-15-ja@ssi.bg \
--to=ja@ssi.bg \
--cc=dust.li@linux.alibaba.com \
--cc=horms@verge.net.au \
--cc=jiejian@linux.alibaba.com \
--cc=lvs-devel@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).